⚡ KNN (K-Nearest Neighbours) classifies a new data point by finding the K most similar points in the training set and taking a majority vote of their labels. No training phase — just memorise all examples and compare new ones to what you have seen. One of the simplest ML algorithms and the conceptual foundation of modern vector search and recommendation systems.
Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read
KNN — How the Simplest ML Algorithm Makes Predictions by Asking Its Neighbours
What is KNN?
Tell me who your friends are and I will tell you who you are. KNN is this principle made into an algorithm.
To classify a new data point, KNN asks: who are this point’s nearest neighbours in the training data? Find the K most similar training examples (the “nearest” in terms of distance in feature space). Look at their labels. Take a majority vote. That majority label is the prediction.
There is no training phase. No model is built. No weights are learned. KNN simply stores all training examples and uses them directly at prediction time. It is called a lazy learner because it does all its work at prediction time rather than at training time.
How KNN works
- Store all training examples — each with its features and label.
- A new unlabelled point arrives.
- Calculate the distance from this new point to every training example (typically Euclidean distance).
- Find the K training examples with the smallest distances — the K nearest neighbours.
- For classification: take the majority vote of the K neighbours’ labels. That is the prediction.
- For regression: take the mean of the K neighbours’ values. That is the prediction.
Real-world examples
Not theory — what real teams actually shipped using this technique.
- Netflix’s earliest recommendation system was essentially KNN — find users most similar to you (nearest neighbours in the space of viewing history), see what they watched and liked, recommend those. Collaborative filtering is still conceptually KNN at scale.
- Medical diagnosis support: given a patient’s symptoms and test results, find the K most similar past patients in the records, look at their diagnoses — a simple KNN provides a baseline for diagnostic suggestion that is fully interpretable.
- FAISS (Facebook AI Similarity Search) and HNSW (Hierarchical Navigable Small World graphs) are approximate KNN algorithms that power modern vector databases — making semantic search fast enough for production by finding approximate nearest neighbours without checking every training point.
Common pitfalls
- Slow inference at scale — computing distance to every training point for every prediction is O(n) per query. With millions of training points, this is impractical. Approximate nearest neighbour methods solve this.
- Curse of dimensionality — in high dimensions, all points become approximately equidistant from each other. The concept of “nearest neighbour” becomes meaningless when you have hundreds of features. Dimensionality reduction or feature selection is needed before KNN in high-dimensional settings.
- Feature scaling is critical — KNN is purely distance-based. A feature on a 0-100,000 scale dominates a feature on a 0-1 scale. Always normalise features before using KNN.
- Memory scales with training data — all training examples must be stored and searched. Unlike a trained neural network that compresses knowledge into weights, KNN scales linearly with dataset size.
Frequently asked questions
QUESTION 1 What is KNN in simple terms?
ANSWER 1 Find the K most similar examples you have seen, take a majority vote of their labels. No training — just memorise all examples and compare at prediction time.
QUESTION 2 How do you choose K?
ANSWER 2 Test multiple values on a validation set. Small K: sensitive to noise. Large K: over-generalises. Typically 3-15 works well. Use odd K for binary classification to avoid ties.
QUESTION 3 What are the limitations of KNN?
ANSWER 3 Slow at prediction time (scales with training set size), degrades in high dimensions (curse of dimensionality), and requires feature scaling.
QUESTION 4 Is KNN still used in production?
ANSWER 4 Yes — approximate nearest neighbour search (FAISS, HNSW) is essentially fast KNN powering vector databases, semantic search, and recommendation systems.
📬 Get one concept + one use case every Tuesday. Join the newsletter →