Created page with "== Exploding Gradient Problem == The '''Exploding Gradient Problem''' is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable. === πŸ“ˆ What Are Gradients? === Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error. :<math> \text{Gradient} =..."
Β 
Β 
Line 97: Line 97:


=== πŸ“Ž See Also ===
=== πŸ“Ž See Also ===
* [[Vanishing Gradient Problem]]
* [[Vanishing gradient problem]]
* [[Backpropagation]]
* [[Backpropagation]]
* [[Gradient Clipping]]
* [[Gradient Clipping]]
* [[Weight Initialization]]
* [[Weight Initialization]]
* [[ReLU]]
* [[ReLU]]

Latest revision as of 10:09, 11 June 2025

Exploding Gradient Problem

The Exploding Gradient Problem is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable.

πŸ“ˆ What Are Gradients?

Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error.

Gradient=LossWeight

If gradients become very large, the weight updates become huge, which can cause the model to diverge (never reach a good solution).

⚠️ When Does It Happen?

It usually happens in:

  • Very deep networks with many layers
  • Recurrent Neural Networks (RNNs), especially for long sequences
  • When using poor weight initialization

πŸ§ͺ Example

Let’s assume a layer has a weight matrix and a large gradient. When we compute updates:

ΔW=ηGradient

If the gradient is large (e.g., 10,000), even a small learning rate η leads to massive weight updates.

This can result in:

  • Loss becoming NaN (Not a Number) πŸ’₯
  • Weights exploding to infinity ➑️ ∞
  • Model failing to train 😒

πŸ” Symptoms of Exploding Gradients

  • ❌ Loss value jumps or becomes NaN
  • πŸ“ˆ Weights become excessively large
  • πŸ” Training fails to converge
  • πŸ’₯ Network outputs explode to very high values

πŸ”§ Solutions

Several techniques are commonly used to fix or prevent this issue:

1. Gradient Clipping

Limit (or "clip") the gradients to a maximum value during backpropagation:

If g>threshold, then g:=thresholdgg

This keeps gradients from becoming too large.

2. Better Weight Initialization

Use techniques like:

  • Xavier initialization for Tanh/Sigmoid
  • He initialization for ReLU

These help control the scale of activations and gradients.

3. Use Normalization Layers

    • Batch Normalization** helps to keep the network outputs within a stable range.

4. Choose Better Activation Functions

ReLU and its variants (Leaky ReLU, ELU) tend to work better in deep networks.

πŸ“š Summary Table

Problem Cause Effect Solution
Exploding Gradient Deep networks, poor initialization Huge weight updates, loss divergence Gradient clipping, normalization, better activation functions

🧠 Difference from Vanishing Gradient

Problem Gradient Size Effect
Vanishing Gradient Near zero Training stops (no learning)
Exploding Gradient Extremely large Training blows up (unstable learning)

πŸ“Ž See Also