Thakshashila: Created page with "== Backpropagation == '''Backpropagation''' (short for "backward propagation of errors") is a fundamental algorithm used to train neural networks. It calculates how much each weight in the network contributed to the total error and updates them to reduce this error. === 🧠 Purpose === The main goal of backpropagation is to: * Minimize the '''loss function''' (error) 📉 * Improve model accuracy over time by adjusting weights 🔧 === 🔁 How It Works (Step-by-Ste..."

2025-06-11T11:12:10Z

Created page with "== Backpropagation == '''Backpropagation''' (short for "backward propagation of errors") is a fundamental algorithm used to train neural networks. It calculates how much each weight in the network contributed to the total error and updates them to reduce this error. === 🧠 Purpose === The main goal of backpropagation is to: * Minimize the '''loss function''' (error) 📉 * Improve model accuracy over time by adjusting weights 🔧 === 🔁 How It Works (Step-by-Ste..."

New page

== Backpropagation ==

'''Backpropagation''' (short for "backward propagation of errors") is a fundamental algorithm used to train neural networks. It calculates how much each weight in the network contributed to the total error and updates them to reduce this error.

=== 🧠 Purpose ===

The main goal of backpropagation is to:
* Minimize the '''loss function''' (error) 📉
* Improve model accuracy over time by adjusting weights 🔧

=== 🔁 How It Works (Step-by-Step) ===

Neural network training has two main steps:

# '''Forward pass''': Inputs go through the network to make a prediction.
# '''Backward pass (Backpropagation)''':
## Calculate the error (loss)
## Compute the gradient (how much each weight affects the loss)
## Update weights using gradient descent

=== 🧮 Mathematical Explanation ===

Let:
* <math>L</math> = Loss function
* <math>y</math> = Actual output
* <math>\hat{y}</math> = Predicted output
* <math>w</math> = Weights
* <math>x</math> = Inputs

Loss:
:<math>L = \frac{1}{2}(y - \hat{y})^2</math>

Gradient of loss w.r.t. weights:
:<math>\frac{\partial L}{\partial w}</math>

The weights are updated as:
:<math>w = w - \eta \cdot \frac{\partial L}{\partial w}</math>

Where:
* <math>\eta</math> = learning rate 🔧

This update rule is applied to each layer using the chain rule from calculus.

=== 📊 Example Workflow ===

Let’s say we have:
* A network with one hidden layer
* Sigmoid activation
* Mean squared error loss

{| class="wikitable"
! Step
! Description
|-
| 1
| Do a forward pass to get predicted output <math>\hat{y}</math>
|-
| 2
| Calculate the error <math>L = (y - \hat{y})^2</math>
|-
| 3
| Compute the derivative of loss with respect to each weight
|-
| 4
| Update weights: <math>w = w - \eta \cdot \frac{\partial L}{\partial w}</math>
|-
| 5
| Repeat this process for many epochs (passes over data)
|}

=== 🔧 Backpropagation Uses ===

* Deep learning (CNNs, RNNs, Transformers)
* Supervised learning tasks (image classification, NLP, etc.)
* Any task where you need to minimize a loss function

=== 💡 Key Concepts ===

* Chain Rule: Used to pass the gradient from the output layer back to the input layer
* Gradient Descent: Optimizer that uses gradients to minimize loss
* Learning Rate: Controls how big the weight updates are

=== 🚫 Challenges ===

* Can suffer from [[Vanishing Gradient Problem]]
* Can also face [[Exploding Gradient Problem]]
* Requires good weight initialization and choice of activation functions

=== 📚 Summary Table ===

{| class="wikitable"
! Concept
! Meaning
|-
| Backpropagation
| Algorithm for updating weights based on error
|-
| Gradient
| Direction and size of weight adjustment
|-
| Chain Rule
| Math rule used to calculate gradients in multi-layer networks
|-
| Loss Function
| Measures how wrong the prediction is
|}

=== 📎 See Also ===

* [[Gradient Descent]]
* [[Loss Function]]
* [[Activation Functions]]
* [[Vanishing Gradient Problem]]
* [[Exploding Gradient Problem]]
* [[Neural Networks]]

Backpropagation - Revision history