Created page with "== Backpropagation == '''Backpropagation''' (short for "backward propagation of errors") is a fundamental algorithm used to train neural networks. It calculates how much each weight in the network contributed to the total error and updates them to reduce this error. === 🧠 Purpose === The main goal of backpropagation is to: * Minimize the '''loss function''' (error) 📉 * Improve model accuracy over time by adjusting weights 🔧 === 🔁 How It Works (Step-by-Ste..."
 
(No difference)

Latest revision as of 11:12, 11 June 2025

Backpropagation

Backpropagation (short for "backward propagation of errors") is a fundamental algorithm used to train neural networks. It calculates how much each weight in the network contributed to the total error and updates them to reduce this error.

🧠 Purpose

The main goal of backpropagation is to:

  • Minimize the loss function (error) 📉
  • Improve model accuracy over time by adjusting weights 🔧

🔁 How It Works (Step-by-Step)

Neural network training has two main steps:

  1. Forward pass: Inputs go through the network to make a prediction.
  2. Backward pass (Backpropagation):
    1. Calculate the error (loss)
    2. Compute the gradient (how much each weight affects the loss)
    3. Update weights using gradient descent

🧮 Mathematical Explanation

Let:

  • L = Loss function
  • y = Actual output
  • y^ = Predicted output
  • w = Weights
  • x = Inputs

Loss:

L=12(yy^)2

Gradient of loss w.r.t. weights:

Lw

The weights are updated as:

w=wηLw

Where:

  • η = learning rate 🔧

This update rule is applied to each layer using the chain rule from calculus.

📊 Example Workflow

Let’s say we have:

  • A network with one hidden layer
  • Sigmoid activation
  • Mean squared error loss
Step Description
1 Do a forward pass to get predicted output y^
2 Calculate the error L=(yy^)2
3 Compute the derivative of loss with respect to each weight
4 Update weights: w=wηLw
5 Repeat this process for many epochs (passes over data)

🔧 Backpropagation Uses

  • Deep learning (CNNs, RNNs, Transformers)
  • Supervised learning tasks (image classification, NLP, etc.)
  • Any task where you need to minimize a loss function

💡 Key Concepts

  • Chain Rule: Used to pass the gradient from the output layer back to the input layer
  • Gradient Descent: Optimizer that uses gradients to minimize loss
  • Learning Rate: Controls how big the weight updates are

🚫 Challenges

📚 Summary Table

Concept Meaning
Backpropagation Algorithm for updating weights based on error
Gradient Direction and size of weight adjustment
Chain Rule Math rule used to calculate gradients in multi-layer networks
Loss Function Measures how wrong the prediction is

📎 See Also