Fine-tuning takes a large pretrained model and continues training it on a smaller domain-specific dataset to specialise its behaviour. Instead of training from scratch, you start with all the knowledge the base model already has and adapt it for your use case — a fraction of the cost, a fraction of the data, dramatically better results on your specific task.

Category: Machine Learning · Difficulty: Intermediate · Last updated: 15 May 2026 · 5 min read


Fine-Tuning — What It Is, How It Specialises AI & When to Use It Instead of Prompting

What is Fine-Tuning?

A medical school does not teach doctors everything from scratch. Students arrive with 16 years of general education — language, biology, chemistry, logic. Medical school takes that foundation and specialises it for medicine over 4-6 years. The general education was not wasted — it is the foundation the specialisation builds on.

Fine-tuning works identically. A large language model trained on trillions of words already knows language, reasoning, facts, and patterns. Fine-tuning takes that foundation and continues training on thousands of domain-specific examples — your company’s customer support conversations, clinical notes, legal contracts, or code in your codebase. The model adjusts its weights slightly to reflect the patterns in that specific domain, producing a specialist rather than a generalist.

How Fine-Tuning works ?

  1. Start with a pretrained base model — GPT-4, Llama 3, Mistral, or a task-specific foundation model.
  2. Prepare a dataset of input-output examples representing your target task — ideally 500 to 50,000 high-quality examples.
  3. Continue training the model on this dataset using a much lower learning rate than pretraining — you want to specialise, not overwrite what the model already knows.
  4. Monitor validation loss to avoid overfitting — fine-tuned models can memorise small datasets easily.
  5. Evaluate on held-out examples using task-specific metrics.
  6. Deploy the fine-tuned model — it now behaves consistently on your task without requiring long system prompts.

Real-world examples

Not theory — what real teams actually shipped using this technique.

  • OpenAI fine-tuned GPT-3 using RLHF (Reinforcement Learning from Human Feedback) — human raters ranked model outputs and the model was fine-tuned to prefer higher-ranked responses. This is how ChatGPT’s helpful, safe tone was created.
  • Salesforce fine-tunes CodeT5 on their proprietary Apex code repositories so their AI coding assistant understands Salesforce-specific APIs and patterns that are not in public training data.
  • A radiology company fine-tunes a vision-language model on 50,000 annotated chest X-rays to produce radiology reports in the exact format used by their hospital system — consistent structure, correct terminology, appropriate hedging language.

Common pitfalls

  • Catastrophic forgetting — if the fine-tuning learning rate is too high or the dataset too small, the model overwrites general knowledge and loses capabilities it had before. Use low learning rates and regularisation.
  • Overfitting to fine-tuning data — with small datasets the model memorises examples rather than learning general patterns. Use validation sets and early stopping.
  • Fine-tuning before prompting — most tasks can be solved with good prompt engineering. Fine-tuning is expensive and time-consuming. Always exhaust prompting options first.
  • Data quality is critical — 500 excellent fine-tuning examples outperform 5,000 poor ones. Noisy, inconsistent, or mislabelled fine-tuning data teaches wrong patterns that are hard to reverse.

Frequently asked questions

QUESTION 1 What is fine-tuning in simple terms?

ANSWER 1 Taking a brilliant generalist model and training it specifically for your job — showing it thousands of examples of your task so it specialises without losing its general knowledge.

QUESTION 2 When should you fine-tune instead of prompt engineering?

ANSWER 2 When consistent style is critical and prompting is inconsistent, when specialised vocabulary is needed, or when you want to reduce prompt length and cost in production. Prompt first, fine-tune when prompting hits its limits.

QUESTION 3 What is LoRA and why does it matter?

ANSWER 3 A technique that updates only a tiny fraction of model parameters — reducing fine-tuning memory and compute by 10-100x, making large model fine-tuning practical on modest hardware.

QUESTION 4 What is the difference between fine-tuning and RAG?

ANSWER 4 Fine-tuning bakes knowledge into weights permanently. RAG retrieves knowledge at inference time. Fine-tuning is better for style and format. RAG is better for frequently changing facts.


📬 Get one concept + one use case every Tuesday. Join the newsletter →