Cookbook: Supervised learning¶
This section takes you through examples from the Logits Cookbook that relate to supervised learning.
In general, supervised learning (SL) means learning an input-output mapping from labeled data. In the context of language model fine-tuning, this means minimizing a weighted cross-entropy loss on token sequences---equivalently, maximizing the log-probability of the specified target tokens.
There are a few ways that SL is commonly used in LLM fine-tuning pipelines:
- Instruction tuning: The first step in post-training pipelines, applied to the base (raw, pretrained) model. Typically done on a high-quality dataset that demonstrates the correct format and style, while boosting reasoning and instruction-following.
- Context distillation / prompt distillation: When a model with a long system prompt starts ignoring instructions, you can distill those behaviors into the weights by creating a supervised dataset on a narrow prompt distribution with shorter, targeted instructions.
The library code implementing supervised learning can be found in the supervised directory.