Dimensionality reduction is the process of reducing the number of features in a dataset while preserving as much useful information as possible. It compresses complex high-dimensional data into a simpler form — faster to process, easier to visualise, less prone to overfitting. PCA, t-SNE, and UMAP are the most widely used techniques.

Category: Machine Learning · Difficulty: Intermediate · Last updated: 15 May 2026 · 5 min read


Dimensionality Reduction — How AI Compresses Complex Data Without Losing What Matters

What is Dimensionality Reduction?

Imagine you are trying to understand what makes a wine good. You measure 200 chemical properties of 10,000 wines. But most of those properties are correlated — acidity and pH move together, alcohol content and sugar are related. You do not actually have 200 independent pieces of information. You might have 10 underlying factors that explain most of the variation.

Dimensionality reduction finds those underlying factors. It takes your 200 measurements and compresses them into 10 (or 2, or 50) new variables that capture most of the meaningful variation — discarding what is redundant or just noise. The result is data that trains faster, generalises better, and can actually be visualised by a human.

How Dimensionality Reduction works ?

PCA (Principal Component Analysis — most common):

  1. Centres the data by subtracting the mean of each feature.
  2. Finds the directions (principal components) along which the data varies the most.
  3. Projects each data point onto the top K components — reducing from N dimensions to K.
  4. The first component captures the most variance, the second captures the next most, and so on.
  5. You choose K by looking at how much cumulative variance the top K components explain — often 95% variance retained with 10–20% of original features.

t-SNE and UMAP (for visualisation):
These non-linear methods focus on preserving the local neighbourhood structure of high-dimensional data — making clusters visible when plotted in 2D. Used for exploring embeddings and discovering structure, not for preprocessing before model training.

Real-world examples

Not theory — what real teams actually shipped using this technique.

  • Genomics researchers use PCA to reduce gene expression data from 20,000 genes to a handful of principal components — visualising how cancer subtypes cluster separately and identifying which genes drive the separation.
  • Spotify reduces song audio features (tempo, key, loudness, energy, valence — hundreds of dimensions) to lower-dimensional embeddings that cluster similar-sounding tracks together — the backbone of their recommendation engine.
  • Computer vision teams use UMAP to visualise the 512-dimensional embeddings learned by image classifiers — checking whether the model has learned meaningful separations between classes before deployment.

Common pitfalls

  • Information loss — dimensionality reduction always discards some information. The question is whether what is discarded is noise or signal. Validate that downstream model performance does not degrade.
  • Interpretability loss — PCA components are linear combinations of original features. They capture structure but are no longer interpretable as original variables. “Principal Component 3” is not a meaningful label.
  • t-SNE is not for preprocessing — t-SNE plots reveal cluster structure visually but the resulting coordinates are not meaningful for downstream ML training. Use PCA or autoencoders for that.
  • Choosing the wrong number of dimensions — too few and you lose important signal; too many and you keep the noise you were trying to remove. Use explained variance (PCA) or reconstruction error (autoencoders) to guide selection.

Frequently asked questions

QUESTION 1 What is dimensionality reduction in simple terms?

ANSWER 1 Summarising a lot of information into fewer numbers without losing the important parts — like describing a person with 10 key measurements instead of 500, most of which were redundant.

QUESTION 2 What is the curse of dimensionality?

ANSWER 2 As features increase, data becomes sparse and learning harder. A model needing 100 examples in 2D may need millions in 100 dimensions.

QUESTION 3 What is the difference between PCA and t-SNE?

ANSWER 3 PCA is linear, preserves global variance, used for preprocessing. t-SNE is non-linear, preserves local clusters, used for 2D visualisation and exploration.

QUESTION 4 When should you use dimensionality reduction?

ANSWER 4 Before training on high-dimensional data, for visualisation of embeddings, for compression, and as a denoising step removing dimensions that capture only noise.


📬 Get one concept + one use case every Tuesday. Join the newsletter →