<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://qbase.texpertssolutions.com/index.php?action=history&amp;feed=atom&amp;title=Exploding_Gradient_Problem</id>
	<title>Exploding Gradient Problem - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://qbase.texpertssolutions.com/index.php?action=history&amp;feed=atom&amp;title=Exploding_Gradient_Problem"/>
	<link rel="alternate" type="text/html" href="https://qbase.texpertssolutions.com/index.php?title=Exploding_Gradient_Problem&amp;action=history"/>
	<updated>2026-05-14T13:41:04Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>https://qbase.texpertssolutions.com/index.php?title=Exploding_Gradient_Problem&amp;diff=258&amp;oldid=prev</id>
		<title>Thakshashila: /* 📎 See Also */</title>
		<link rel="alternate" type="text/html" href="https://qbase.texpertssolutions.com/index.php?title=Exploding_Gradient_Problem&amp;diff=258&amp;oldid=prev"/>
		<updated>2025-06-11T10:09:50Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;📎 See Also&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 10:09, 11 June 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l97&quot;&gt;Line 97:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 97:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== 📎 See Also ===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== 📎 See Also ===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Vanishing &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Gradient Problem&lt;/del&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Vanishing &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;gradient problem&lt;/ins&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Backpropagation]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Backpropagation]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Gradient Clipping]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Gradient Clipping]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Weight Initialization]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[Weight Initialization]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[ReLU]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [[ReLU]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Thakshashila</name></author>
	</entry>
	<entry>
		<id>https://qbase.texpertssolutions.com/index.php?title=Exploding_Gradient_Problem&amp;diff=257&amp;oldid=prev</id>
		<title>Thakshashila: Created page with &quot;== Exploding Gradient Problem ==  The &#039;&#039;&#039;Exploding Gradient Problem&#039;&#039;&#039; is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable.  === 📈 What Are Gradients? ===  Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error.  :&lt;math&gt; \text{Gradient} =...&quot;</title>
		<link rel="alternate" type="text/html" href="https://qbase.texpertssolutions.com/index.php?title=Exploding_Gradient_Problem&amp;diff=257&amp;oldid=prev"/>
		<updated>2025-06-11T10:09:11Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;== Exploding Gradient Problem ==  The &amp;#039;&amp;#039;&amp;#039;Exploding Gradient Problem&amp;#039;&amp;#039;&amp;#039; is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable.  === 📈 What Are Gradients? ===  Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error.  :&amp;lt;math&amp;gt; \text{Gradient} =...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Exploding Gradient Problem ==&lt;br /&gt;
&lt;br /&gt;
The &amp;#039;&amp;#039;&amp;#039;Exploding Gradient Problem&amp;#039;&amp;#039;&amp;#039; is a common issue in training deep neural networks where the gradients grow too large during backpropagation. This leads to very large weight updates, making the model unstable or completely unusable.&lt;br /&gt;
&lt;br /&gt;
=== 📈 What Are Gradients? ===&lt;br /&gt;
&lt;br /&gt;
Gradients are computed during the backpropagation step of training. They help the model understand how to change its weights to reduce error.&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt; \text{Gradient} = \frac{\partial \text{Loss}}{\partial \text{Weight}} &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If gradients become very large, the weight updates become huge, which can cause the model to diverge (never reach a good solution).&lt;br /&gt;
&lt;br /&gt;
=== ⚠️ When Does It Happen? ===&lt;br /&gt;
&lt;br /&gt;
It usually happens in:&lt;br /&gt;
* Very &amp;#039;&amp;#039;&amp;#039;deep networks&amp;#039;&amp;#039;&amp;#039; with many layers&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Recurrent Neural Networks (RNNs)&amp;#039;&amp;#039;&amp;#039;, especially for long sequences&lt;br /&gt;
* When using poor &amp;#039;&amp;#039;&amp;#039;weight initialization&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
=== 🧪 Example ===&lt;br /&gt;
&lt;br /&gt;
Let’s assume a layer has a weight matrix and a large gradient. When we compute updates:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt; \Delta W = - \eta \cdot \text{Gradient} &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If the gradient is large (e.g., 10,000), even a small learning rate &amp;lt;math&amp;gt;\eta&amp;lt;/math&amp;gt; leads to massive weight updates.&lt;br /&gt;
&lt;br /&gt;
This can result in:&lt;br /&gt;
* Loss becoming NaN (Not a Number) 💥&lt;br /&gt;
* Weights exploding to infinity ➡️ ∞&lt;br /&gt;
* Model failing to train 😢&lt;br /&gt;
&lt;br /&gt;
=== 🔍 Symptoms of Exploding Gradients ===&lt;br /&gt;
&lt;br /&gt;
* ❌ Loss value jumps or becomes NaN&lt;br /&gt;
* 📈 Weights become excessively large&lt;br /&gt;
* 🔁 Training fails to converge&lt;br /&gt;
* 💥 Network outputs explode to very high values&lt;br /&gt;
&lt;br /&gt;
=== 🔧 Solutions ===&lt;br /&gt;
&lt;br /&gt;
Several techniques are commonly used to fix or prevent this issue:&lt;br /&gt;
&lt;br /&gt;
==== 1. Gradient Clipping ====&lt;br /&gt;
&lt;br /&gt;
Limit (or &amp;quot;clip&amp;quot;) the gradients to a maximum value during backpropagation:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt; \text{If } \|g\| &amp;gt; \text{threshold, then } g := \frac{\text{threshold}}{\|g\|} \cdot g &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This keeps gradients from becoming too large.&lt;br /&gt;
&lt;br /&gt;
==== 2. Better Weight Initialization ====&lt;br /&gt;
&lt;br /&gt;
Use techniques like:&lt;br /&gt;
* Xavier initialization for Tanh/Sigmoid&lt;br /&gt;
* He initialization for ReLU&lt;br /&gt;
&lt;br /&gt;
These help control the scale of activations and gradients.&lt;br /&gt;
&lt;br /&gt;
==== 3. Use Normalization Layers ====&lt;br /&gt;
&lt;br /&gt;
**Batch Normalization** helps to keep the network outputs within a stable range.&lt;br /&gt;
&lt;br /&gt;
==== 4. Choose Better Activation Functions ====&lt;br /&gt;
&lt;br /&gt;
ReLU and its variants (Leaky ReLU, ELU) tend to work better in deep networks.&lt;br /&gt;
&lt;br /&gt;
=== 📚 Summary Table ===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Problem&lt;br /&gt;
! Cause&lt;br /&gt;
! Effect&lt;br /&gt;
! Solution&lt;br /&gt;
|-&lt;br /&gt;
| Exploding Gradient&lt;br /&gt;
| Deep networks, poor initialization&lt;br /&gt;
| Huge weight updates, loss divergence&lt;br /&gt;
| Gradient clipping, normalization, better activation functions&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== 🧠 Difference from Vanishing Gradient ===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Problem&lt;br /&gt;
! Gradient Size&lt;br /&gt;
! Effect&lt;br /&gt;
|-&lt;br /&gt;
| Vanishing Gradient&lt;br /&gt;
| Near zero&lt;br /&gt;
| Training stops (no learning)&lt;br /&gt;
|-&lt;br /&gt;
| Exploding Gradient&lt;br /&gt;
| Extremely large&lt;br /&gt;
| Training blows up (unstable learning)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== 📎 See Also ===&lt;br /&gt;
* [[Vanishing Gradient Problem]]&lt;br /&gt;
* [[Backpropagation]]&lt;br /&gt;
* [[Gradient Clipping]]&lt;br /&gt;
* [[Weight Initialization]]&lt;br /&gt;
* [[ReLU]]&lt;/div&gt;</summary>
		<author><name>Thakshashila</name></author>
	</entry>
</feed>