⚡ Fine-tuning takes a large pretrained model and continues training it on a smaller domain-specific dataset to specialise its behaviour. Instead of training from scratch, you start with all the knowledge the base model already has and adapt it for your use case — a fraction of the cost, a fraction of the data, dramatically better results on your specific task.
Category: Machine Learning · Difficulty: Intermediate · Last updated: 15 May 2026 · 5 min read
Fine-Tuning — What It Is, How It Specialises AI & When to Use It Instead of Prompting
What is Fine-Tuning?
A medical school does not teach doctors everything from scratch. Students arrive with 16 years of general education — language, biology, chemistry, logic. Medical school takes that foundation and specialises it for medicine over 4-6 years. The general education was not wasted — it is the foundation the specialisation builds on.
Fine-tuning works identically. A large language model trained on trillions of words already knows language, reasoning, facts, and patterns. Fine-tuning takes that foundation and continues training on thousands of domain-specific examples — your company’s customer support conversations, clinical notes, legal contracts, or code in your codebase. The model adjusts its weights slightly to reflect the patterns in that specific domain, producing a specialist rather than a generalist.
How Fine-Tuning works ?
- Start with a pretrained base model — GPT-4, Llama 3, Mistral, or a task-specific foundation model.
- Prepare a dataset of input-output examples representing your target task — ideally 500 to 50,000 high-quality examples.
- Continue training the model on this dataset using a much lower learning rate than pretraining — you want to specialise, not overwrite what the model already knows.
- Monitor validation loss to avoid overfitting — fine-tuned models can memorise small datasets easily.
- Evaluate on held-out examples using task-specific metrics.
- Deploy the fine-tuned model — it now behaves consistently on your task without requiring long system prompts.
Real-world examples
Not theory — what real teams actually shipped using this technique.
- OpenAI fine-tuned GPT-3 using RLHF (Reinforcement Learning from Human Feedback) — human raters ranked model outputs and the model was fine-tuned to prefer higher-ranked responses. This is how ChatGPT’s helpful, safe tone was created.
- Salesforce fine-tunes CodeT5 on their proprietary Apex code repositories so their AI coding assistant understands Salesforce-specific APIs and patterns that are not in public training data.
- A radiology company fine-tunes a vision-language model on 50,000 annotated chest X-rays to produce radiology reports in the exact format used by their hospital system — consistent structure, correct terminology, appropriate hedging language.
Common pitfalls
- Catastrophic forgetting — if the fine-tuning learning rate is too high or the dataset too small, the model overwrites general knowledge and loses capabilities it had before. Use low learning rates and regularisation.
- Overfitting to fine-tuning data — with small datasets the model memorises examples rather than learning general patterns. Use validation sets and early stopping.
- Fine-tuning before prompting — most tasks can be solved with good prompt engineering. Fine-tuning is expensive and time-consuming. Always exhaust prompting options first.
- Data quality is critical — 500 excellent fine-tuning examples outperform 5,000 poor ones. Noisy, inconsistent, or mislabelled fine-tuning data teaches wrong patterns that are hard to reverse.
Frequently asked questions
QUESTION 1 What is fine-tuning in simple terms?
ANSWER 1 Taking a brilliant generalist model and training it specifically for your job — showing it thousands of examples of your task so it specialises without losing its general knowledge.
QUESTION 2 When should you fine-tune instead of prompt engineering?
ANSWER 2 When consistent style is critical and prompting is inconsistent, when specialised vocabulary is needed, or when you want to reduce prompt length and cost in production. Prompt first, fine-tune when prompting hits its limits.
QUESTION 3 What is LoRA and why does it matter?
ANSWER 3 A technique that updates only a tiny fraction of model parameters — reducing fine-tuning memory and compute by 10-100x, making large model fine-tuning practical on modest hardware.
QUESTION 4 What is the difference between fine-tuning and RAG?
ANSWER 4 Fine-tuning bakes knowledge into weights permanently. RAG retrieves knowledge at inference time. Fine-tuning is better for style and format. RAG is better for frequently changing facts.
📬 Get one concept + one use case every Tuesday. Join the newsletter →