Backpropagation: Difference between revisions

Latest revision as of 11:12, 11 June 2025

Backpropagation

Backpropagation (short for "backward propagation of errors") is a fundamental algorithm used to train neural networks. It calculates how much each weight in the network contributed to the total error and updates them to reduce this error.

🧠 Purpose

The main goal of backpropagation is to:

Minimize the loss function (error) 📉
Improve model accuracy over time by adjusting weights 🔧

🔁 How It Works (Step-by-Step)

Neural network training has two main steps:

Forward pass: Inputs go through the network to make a prediction.
Backward pass (Backpropagation):
1. Calculate the error (loss)
2. Compute the gradient (how much each weight affects the loss)
3. Update weights using gradient descent

🧮 Mathematical Explanation

Let:

$L$ = Loss function
$y$ = Actual output
$\hat{y}$ = Predicted output
$w$ = Weights
$x$ = Inputs

Loss:

L = \frac{1}{2} (y - \hat{y})^{2}

Gradient of loss w.r.t. weights:

\frac{\partial L}{\partial w}

The weights are updated as:

w = w - η \cdot \frac{\partial L}{\partial w}

Where:

$η$ = learning rate 🔧

This update rule is applied to each layer using the chain rule from calculus.

📊 Example Workflow

Let’s say we have:

A network with one hidden layer
Sigmoid activation
Mean squared error loss

Step	Description
1	Do a forward pass to get predicted output $\hat{y}$
2	Calculate the error $L = (y - \hat{y})^{2}$
3	Compute the derivative of loss with respect to each weight
4	Update weights: $w = w - η \cdot \frac{\partial L}{\partial w}$
5	Repeat this process for many epochs (passes over data)

🔧 Backpropagation Uses

Deep learning (CNNs, RNNs, Transformers)
Supervised learning tasks (image classification, NLP, etc.)
Any task where you need to minimize a loss function

💡 Key Concepts

Chain Rule: Used to pass the gradient from the output layer back to the input layer
Gradient Descent: Optimizer that uses gradients to minimize loss
Learning Rate: Controls how big the weight updates are

🚫 Challenges

Can suffer from Vanishing Gradient Problem
Can also face Exploding Gradient Problem
Requires good weight initialization and choice of activation functions

📚 Summary Table

Concept	Meaning
Backpropagation	Algorithm for updating weights based on error
Gradient	Direction and size of weight adjustment
Chain Rule	Math rule used to calculate gradients in multi-layer networks
Loss Function	Measures how wrong the prediction is