A hyperparameter is a configuration setting fixed before training begins — not learned from data. Learning rate, batch size, number of layers, number of trees: all hyperparameters. The model learns its weights automatically from data; you choose the hyperparameters. The same algorithm with poor hyperparameters can perform terribly; with good ones, excellently.

Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read


Hyperparameter — The Settings You Choose Before Training That Determine Everything

What is Hyperparameter ?

Think of training a neural network like baking bread. The recipe (algorithm) is fixed. The ingredients (data) are fixed. But you still have decisions to make: how hot should the oven be? How long should you knead? How long should it rise? These settings — the oven temperature, the kneading time, the rise time — are the hyperparameters. They are not the bread itself. They control the process that produces the bread.

Set the oven too hot and the bread burns. Too cool and it does not rise properly. The right temperature produces a perfect loaf from the same dough and recipe. Hyperparameters work identically — the same model and data produce dramatically different results depending on how you configure the training process.

COMMON HYPERPARAMETERS AND THEIR EFFECTS

Learning rate — how large each weight update is. The most impactful single hyperparameter. Too high: training diverges, loss explodes. Too low: training takes forever or gets stuck. Typical range: 0.1 to 0.00001. Default starting point for Adam: 0.001.

Batch size — how many training examples are processed before each weight update. Larger batches: more stable gradient estimates, faster wall-clock time, but can converge to sharper minima. Smaller batches: noisier updates, often better generalisation. Typical range: 16 to 512.

Number of hidden layers and neurons — controls model capacity. Too few: underfitting, cannot capture complex patterns. Too many: overfitting if data is insufficient.

Dropout rate — fraction of neurons randomly disabled during each training step. Acts as regularisation — prevents co-adaptation and reduces overfitting. Typical range: 0.1 to 0.5.

Number of epochs — how many full passes through the training data. Too few: underfitting. Too many: overfitting. Use early stopping to automate this.

Real-world examples

Not theory — what real teams actually shipped using this technique.

  • OpenAI’s GPT models required extensive hyperparameter search — learning rate schedules, batch sizes, and warmup steps were carefully tuned before long training runs. A wrong learning rate could cause weeks of expensive GPU compute to produce a worse model than a shorter run with better settings.
  • In Kaggle competitions, the winning margin between top submissions often comes from hyperparameter tuning — same XGBoost algorithm, same features, different n_estimators, max_depth, and learning_rate producing meaningfully better validation scores.
  • Google’s AutoML uses Bayesian optimisation to search hyperparameter space automatically — outperforming manual tuning by expert practitioners on many standard benchmarks by finding configurations humans would not intuitively try.

Common pitfalls

  • Tuning on the test set — evaluating hyperparameter choices on the test set causes data leakage. Always tune on a validation set and touch the test set only once at the end.
  • Too many hyperparameters — with many hyperparameters and limited compute, the search space is too large for exhaustive search. Prioritise the learning rate first — it has the largest impact.
  • Overfitting hyperparameters — aggressively tuning hyperparameters to a specific validation set produces overfit hyperparameters that do not generalise. Use cross-validation.
  • Ignoring defaults — for many tasks and algorithms, default hyperparameters work surprisingly well. Always try defaults first before expensive tuning.

Frequently asked questions

QUESTION 1 What is a hyperparameter in simple terms?

ANSWER 1 A dial you set before training — controls how the model learns, not what it learns. Learning rate, batch size, number of layers: hyperparameters set by you. Weights and biases: parameters learned from data.

QUESTION 2 What is the difference between a parameter and a hyperparameter?

ANSWER 2 Parameters: learned from data during training (weights, biases). Hyperparameters: set before training by the practitioner — they control how learning happens, not what is learned.

QUESTION 3 What are the most important hyperparameters?

ANSWER 3 Learning rate (most impactful), batch size, number of layers and neurons, dropout rate, and number of epochs. For tree models: number of trees, max depth, learning rate, subsample rate.

QUESTION 4 How do you find good hyperparameters?

ANSWER 4 Grid search, random search, or Bayesian optimisation. Tools like Optuna and Ray Tune automate the process. Always start with defaults before investing in tuning.


📬 Get one concept + one use case every Tuesday. Join the newsletter →