Cookbook: Supervised learning¶

This section takes you through examples from the Logits Cookbook that relate to supervised learning.

In general, supervised learning (SL) means learning an input-output mapping from labeled data. In the context of language model fine-tuning, this means minimizing a weighted cross-entropy loss on token sequences---equivalently, maximizing the log-probability of the specified target tokens.

There are a few ways that SL is commonly used in LLM fine-tuning pipelines:

Instruction tuning: The first step in post-training pipelines, applied to the base (raw, pretrained) model. Typically done on a high-quality dataset that demonstrates the correct format and style, while boosting reasoning and instruction-following.
Context distillation / prompt distillation: When a model with a long system prompt starts ignoring instructions, you can distill those behaviors into the weights by creating a supervised dataset on a narrow prompt distribution with shorter, targeted instructions.

The library code implementing supervised learning can be found in the supervised directory.