LLM Components
- Document selection
- Tokenizing
- Training input
- Training elements
- Architecture: transformer with attention
- Forward prediction
- Loss reporting
- Weight updates (backpropagation)
- Training loop
- Generative completion (base model)
- Fine tuning for applications
Compare to Figures 1.8 and 1.9 in Raschka
Compare to microGPT