⚡ Feature engineering is using domain knowledge to transform raw data into features that help ML models learn better. A raw timestamp is not useful — but “day of week,” “is holiday,” and “hours since last purchase” are. It is often the single most impactful step in a machine learning project, determining model performance more than which algorithm you choose.
Category: Machine Learning · Difficulty: Intermediate · Last updated: 15 May 2026 · 5 min read
Feature Engineering — What It Is and Why It Matters More Than Algorithm Choice
What is Feature Engineering?
Imagine you are predicting whether a customer will buy something this week. You have their purchase history — a list of dates and amounts. A raw list of dates and amounts tells the model very little. But if you engineer features from that raw data — days since last purchase, total spend in the last 30 days, number of purchases in the last quarter, whether they bought on a weekend, average transaction value — suddenly the model has rich, meaningful signals to work with.
Feature engineering is the craft of creating those signals. It requires domain knowledge (understanding what matters in the problem), creativity (spotting non-obvious transformations), and practical ML knowledge (knowing what forms different algorithms can exploit). In competitions and production alike, the best-engineered features from a domain expert routinely outperform the most sophisticated algorithms applied to raw data.
COMMON TECHNIQUES
Date and time decomposition — extract day of week, month, hour, is_weekend, is_holiday, days_since_event. A timestamp as a single number is nearly useless. Its components are highly predictive.
Normalisation and scaling — StandardScaler (mean 0, std 1) and MinMaxScaler (0 to 1) bring features to comparable ranges. Distance-based algorithms (KNN, SVM) and gradient descent are sensitive to feature scale.
Categorical encoding — one-hot encoding (create a binary column per category), target encoding (replace category with mean target value for that category), ordinal encoding (for ordered categories).
Interaction features — multiply or divide two features to capture relationships. Price × quantity = revenue. Distance × time = speed. Models rarely discover multiplicative relationships without help.
Log transformation — skewed distributions (house prices, income, transaction amounts) become more Gaussian after log transformation, improving linear model performance.
Lag and rolling features — for time series: value_7_days_ago, rolling_7_day_mean, rolling_30_day_std. The recent past is often the best predictor of the near future.
Real-world examples
Not theory — what real teams actually shipped using this technique.
- Kaggle winners consistently credit feature engineering as their primary advantage. In the Instacart market basket prediction competition, features like “days since this user last ordered this product” and “ratio of times user bought this product to total orders” were more predictive than any model improvement.
- Credit scoring models use extensively engineered features: debt-to-income ratio (two raw features divided), number of missed payments in last 12 months (aggregated from transaction history), credit utilisation rate (balance / limit across all cards).
- Ride-sharing demand prediction uses time-engineered features: hour of day, day of week, is_holiday, weather conditions, major events in the city — all derived from raw timestamps and external data sources.
Common pitfalls
- Data leakage — engineering features that include information from the future (using tomorrow’s sales to predict today’s) produces great training metrics that collapse in production.
- Over-engineering — creating hundreds of features introduces noise and slows training. Feature selection after engineering is as important as engineering itself.
- Domain knowledge required — effective feature engineering requires understanding the problem domain. A data scientist without medical knowledge will miss the most predictive clinical features.
- Deep learning on tabular is not magic — despite the appeal of “let the model learn everything,” feature engineering still consistently improves deep learning on structured data. Do not skip it.
Frequently asked questions
QUESTION 1 What is feature engineering in simple terms?
ANSWER 1 Deciding what to tell the model and how — transforming raw data into features that capture the information the model needs. A raw timestamp is not useful; day of week and is holiday are.
QUESTION 2 What are common feature engineering techniques?
ANSWER 2 Date decomposition, normalisation, categorical encoding, interaction features, log transformation, and lag/rolling features for time series.
QUESTION 3 Does deep learning eliminate feature engineering?
ANSWER 3 Partially for images and text. For structured tabular data, thoughtful feature engineering consistently outperforms throwing raw data at any model.
QUESTION 4 What is the difference between feature engineering and feature selection?
ANSWER 4 Engineering creates new features from existing data. Selection chooses which features to include, removing redundant and noisy variables.
📬 Get one concept + one use case every Tuesday. Join the newsletter →