Normalization (Machine Learning)

Normalization in machine learning is a data preprocessing technique used to scale input features so they fall within a similar range, typically between 0 and 1. This helps improve model performance, especially for algorithms sensitive to the scale of data.

Why Normalize Data?

Some machine learning algorithms (e.g., K-Nearest Neighbors, Gradient Descent-based models, Neural Networks) perform better when input features are on a similar scale. Without normalization, features with larger numeric ranges may dominate others, leading to biased results.

Common Normalization Techniques

1. Min-Max Normalization

Scales features to a fixed range, usually [0, 1].

x^{'} = \frac{x - x_{min}}{x_{max} - x_{min}}

Best for bounded data.
Sensitive to outliers.

2. Z-score Normalization (Standardization)

Centers the data around the mean with unit variance.

x^{'} = \frac{x - μ}{σ}

Where:

$μ$ = mean of the feature
$σ$ = standard deviation

Useful for algorithms that assume Gaussian distribution.

3. Max Abs Scaling

Scales data by dividing by the maximum absolute value:

x^{'} = \frac{x}{| x_{max} |}

Preserves zero entries in sparse data.

4. Robust Scaling

Uses median and interquartile range (IQR) to scale:

x^{'} = \frac{x - median}{IQR}

Less sensitive to outliers.

When to Use Normalization

Use normalization when:

Input features are measured in different units or ranges.
You use distance-based algorithms (e.g., k-NN, SVM).
You're training neural networks using gradient descent.

When Not to Normalize

When using tree-based algorithms like Decision Trees or Random Forests (these are insensitive to feature scale).
When your features are already on the same scale or naturally bounded.

Example

If feature A ranges from 1 to 1000 and feature B from 0 to 1:

A normalized model ensures both features contribute equally to model training.
Without normalization, feature A may dominate due to its larger range.

Related Pages

SEO Keywords

normalization in machine learning, min max normalization, z score normalization, feature scaling ML, standardization vs normalization, when to normalize data, data preprocessing techniques