SVM (Support Vector Machine)

Q: What is an SVM in simple terms?

An SVM finds the widest possible street between two groups of data points. The street is the margin — the gap between classes. The middle of the street is the decision boundary. Points on the edges of the street — closest to the boundary — are the support vectors. The SVM maximises the width of this margin, producing the most confident possible separator between classes.

Q: What is the kernel trick?

The kernel trick allows SVMs to find non-linear decision boundaries without explicitly computing coordinates in high-dimensional space. Instead of transforming data to a higher dimension (expensive), the kernel function computes similarities between data points as if they were in that higher dimension. RBF (radial basis function) and polynomial kernels enable SVMs to separate classes that are not linearly separable in their original space.

Q: What are support vectors?

Support vectors are the training examples closest to the decision boundary — the points on the edge of the margin. They are the only examples that matter for defining the boundary. If you remove any non-support vector from the training set, the SVM produces the same boundary. This makes SVMs elegant: the decision function depends on a small subset of training examples, not all of them.

Q: When should you use an SVM today?

SVMs remain competitive for: high-dimensional, small-sample problems (text classification with TF-IDF features, bioinformatics), where data size makes neural networks impractical. They often outperform neural networks with fewer than a few thousand training examples. They are also useful as a fast, interpretable baseline before investing in more complex models.

⚡ A Support Vector Machine (SVM) finds the optimal decision boundary between classes — specifically the hyperplane that maximises the gap (margin) between class regions. It dominated classification before deep learning and remains competitive for high-dimensional small-dataset problems. The kernel trick extends it to non-linear boundaries without expensive transformations.

Category: Machine Learning · Difficulty: Intermediate · Last updated: 15 May 2026 · 4 min read

SVM — What It Is, How Maximum Margin Classification Works & Where It Still Beats Neural Networks

What is SVM ?

Imagine two groups of data points plotted on a graph — blue dots and red dots. Many straight lines could separate them. Which one should you choose? SVMs answer this with mathematical elegance: choose the line that maximises the margin — the gap between the line and the nearest points of each class.

A wider margin means the boundary is more confident. Points far from the boundary are easy to classify. Points near a narrow boundary are more likely to be misclassified by noise or small changes. Maximising the margin produces the most robust possible separator — the boundary least likely to be wrong on new data.

Developed by Vapnik and Cortes in 1995, SVMs were the dominant classification method for a decade, used in text classification, image recognition, bioinformatics, and financial prediction. Deep learning eventually outperformed SVMs on large datasets — but SVMs remain the right tool when data is scarce, dimensions are high, or interpretability matters.ions.

How SVM works

Represent each training example as a vector in feature space.
Find the hyperplane (line in 2D, plane in 3D, hyperplane in higher dimensions) that separates the two classes.
Specifically, find the hyperplane that maximises the margin — the distance between the hyperplane and the nearest data point of each class.
The points on the margin edge are support vectors — only these define the hyperplane.
For non-linearly separable data: apply a kernel function that implicitly maps data to a higher-dimensional space where linear separation is possible.
Soft margin SVM allows some misclassifications (controlled by regularisation parameter C) — trading perfect training accuracy for better generalisation.

Real-world examples

Not theory — what real teams actually shipped using this technique.

Text classification with TF-IDF — SVMs on TF-IDF features were the state of the art for spam filtering, news categorisation, and sentiment analysis for much of the 2000s. The high dimensionality of text (vocabulary size) is exactly where SVMs excel.
Bioinformatics — gene expression classification (which of these cancer types does this gene expression profile indicate?) typically involves thousands of features and hundreds of samples — the high-dimension, small-sample regime where SVMs often outperform neural networks.
Face detection — early versions of real-time face detection systems (including in digital cameras) used SVMs on Haar features before CNNs became practical.

Common pitfalls

Kernel choice — performance is sensitive to which kernel and kernel parameters you choose. Poor choices produce poor boundaries. Grid search over kernel parameters is standard but computationally expensive.
Scaling — SVMs require feature normalisation. Features on different scales produce distorted distance calculations that degrade margin quality significantly.
Slow on large datasets — SVM training scales roughly O(n²) to O(n³) with training set size. For millions of examples, training is impractically slow. Linear SVMs (no kernel) are faster and often sufficient for text.
Probability estimates — SVMs do not natively produce probability outputs (just class labels). Platt scaling adds a calibration step to produce probabilities but is a post-hoc approximation.

Frequently asked questions

QUESTION 1 What is an SVM in simple terms?

ANSWER 1 An algorithm that finds the widest possible gap (margin) between two groups — the decision boundary that is furthest from the nearest points of each class.

QUESTION 2 What is the kernel trick?

ANSWER 2 Computing similarities as if data were in a higher dimension — enabling non-linear boundaries without expensive explicit transformation.

QUESTION 3 What are support vectors?

ANSWER 3 The training examples closest to the decision boundary — the only ones that define it. Remove any other point and the boundary stays the same.

QUESTION 4 When should you use an SVM today?

ANSWER 4 High-dimensional, small-sample problems — text classification, bioinformatics — where SVMs often outperform neural networks and train far faster.

Sources & further reading

Cortes & Vapnik (1995). Support-Vector Networks. Machine Learning — the original SVM paper.
Schölkopf & Smola (2002). Learning with Kernels. MIT Press — comprehensive kernel methods reference.
Hastie, Tibshirani & Friedman (2009). The Elements of Statistical Learning. Chapter 12: Support Vector Machines. Free at web.stanford.edu/~hastie/ElemStatLearn/
Scikit-learn SVM documentation: scikit-learn.org/stable/modules/svm.html — practical guide with examples.

📬 Get one concept + one use case every Tuesday. Join the newsletter →