Normalization (Machine Learning)
Normalization in machine learning is a data preprocessing technique used to scale input features so they fall within a similar range, typically between 0 and 1. This helps improve model performance, especially for algorithms sensitive to the scale of data.
Why Normalize Data?
Some machine learning algorithms (e.g., K-Nearest Neighbors, Gradient Descent-based models, Neural Networks) perform better when input features are on a similar scale. Without normalization, features with larger numeric ranges may dominate others, leading to biased results.
Common Normalization Techniques
1. Min-Max Normalization
Scales features to a fixed range, usually [0, 1].
- Best for bounded data.
- Sensitive to outliers.
2. Z-score Normalization (Standardization)
Centers the data around the mean with unit variance.
Where:
- = mean of the feature
- = standard deviation
- Useful for algorithms that assume Gaussian distribution.
3. Max Abs Scaling
Scales data by dividing by the maximum absolute value:
- Preserves zero entries in sparse data.
4. Robust Scaling
Uses median and interquartile range (IQR) to scale:
- Less sensitive to outliers.
When to Use Normalization
Use normalization when:
- Input features are measured in different units or ranges.
- You use distance-based algorithms (e.g., k-NN, SVM).
- You're training neural networks using gradient descent.
When Not to Normalize
- When using tree-based algorithms like Decision Trees or Random Forests (these are insensitive to feature scale).
- When your features are already on the same scale or naturally bounded.
Example
If feature A ranges from 1 to 1000 and feature B from 0 to 1:
- A normalized model ensures both features contribute equally to model training.
- Without normalization, feature A may dominate due to its larger range.
Related Pages
- Feature Scaling
- Standardization
- Preprocessing (Machine Learning)
- K-Nearest Neighbors
- Gradient Descent
SEO Keywords
normalization in machine learning, min max normalization, z score normalization, feature scaling ML, standardization vs normalization, when to normalize data, data preprocessing techniques