Thakshashila: /* 📎 See Also */

2025-06-11T10:09:50Z

📎 See Also

← Older revision		Revision as of 10:09, 11 June 2025
Line 97:		Line 97:

	=== 📎 See Also ===		=== 📎 See Also ===
	* [[Vanishing ~~Gradient Problem~~]]		* [[Vanishing gradient problem]]
	* [[Backpropagation]]		* [[Backpropagation]]
	* [[Gradient Clipping]]		* [[Gradient Clipping]]
	* [[Weight Initialization]]		* [[Weight Initialization]]
	* [[ReLU]]		* [[ReLU]]

Thakshashila: Created page with "== Exploding Gradient Problem == The '''Exploding Gradient Problem''' is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable. === 📈 What Are Gradients? === Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error. : $\text{Gradient} =..."$

2025-06-11T10:09:11Z

Created page with "== Exploding Gradient Problem == The '''Exploding Gradient Problem''' is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable. === 📈 What Are Gradients? === Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error. :<math> \text{Gradient} =..."

New page

== Exploding Gradient Problem ==

The '''Exploding Gradient Problem''' is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable.

=== 📈 What Are Gradients? ===

Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error.

:<math> \text{Gradient} = \frac{\partial \text{Loss}}{\partial \text{Weight}} </math>

If gradients become very large, the weight updates become huge, which can cause the model to diverge (never reach a good solution).

=== ⚠️ When Does It Happen? ===

It usually happens in:
* Very '''deep networks''' with many layers
* '''Recurrent Neural Networks (RNNs)''', especially for long sequences
* When using poor '''weight initialization'''

=== 🧪 Example ===

Let’s assume a layer has a weight matrix and a large gradient. When we compute updates:

:<math> \Delta W = - \eta \cdot \text{Gradient} </math>

If the gradient is large (e.g., 10,000), even a small learning rate <math>\eta</math> leads to massive weight updates.

This can result in:
* Loss becoming NaN (Not a Number) 💥
* Weights exploding to infinity ➡️ ∞
* Model failing to train 😢

=== 🔍 Symptoms of Exploding Gradients ===

* ❌ Loss value jumps or becomes NaN
* 📈 Weights become excessively large
* 🔁 Training fails to converge
* 💥 Network outputs explode to very high values

=== 🔧 Solutions ===

Several techniques are commonly used to fix or prevent this issue:

==== 1. Gradient Clipping ====

Limit (or "clip") the gradients to a maximum value during backpropagation:

:<math> \text{If } \|g\| > \text{threshold, then } g := \frac{\text{threshold}}{\|g\|} \cdot g </math>

This keeps gradients from becoming too large.

==== 2. Better Weight Initialization ====

Use techniques like:
* Xavier initialization for Tanh/Sigmoid
* He initialization for ReLU

These help control the scale of activations and gradients.

==== 3. Use Normalization Layers ====

**Batch Normalization** helps to keep the network outputs within a stable range.

==== 4. Choose Better Activation Functions ====

ReLU and its variants (Leaky ReLU, ELU) tend to work better in deep networks.

=== 📚 Summary Table ===

{| class="wikitable"
! Problem
! Cause
! Effect
! Solution
|-
| Exploding Gradient
| Deep networks, poor initialization
| Huge weight updates, loss divergence
| Gradient clipping, normalization, better activation functions
|}

=== 🧠 Difference from Vanishing Gradient ===

{| class="wikitable"
! Problem
! Gradient Size
! Effect
|-
| Vanishing Gradient
| Near zero
| Training stops (no learning)
|-
| Exploding Gradient
| Extremely large
| Training blows up (unstable learning)
|}

=== 📎 See Also ===
* [[Vanishing Gradient Problem]]
* [[Backpropagation]]
* [[Gradient Clipping]]
* [[Weight Initialization]]
* [[ReLU]]

Exploding Gradient Problem - Revision history

Thakshashila: /* 📎 See Also */