Gradient Descent: Difference between revisions
Thakshashila (talk | contribs) Created page with "= Gradient Descent = '''Gradient Descent''' is an optimization algorithm used in machine learning and deep learning to minimize the cost (loss) function by iteratively updating model parameters in the direction of steepest descent, i.e., the negative gradient. == What is Gradient Descent? == Gradient Descent helps find the best-fit parameters (like weights in a neural network or coefficients in regression) that minimize the error between predicted and actual values. I..." |
(No difference)
|
Latest revision as of 06:35, 10 June 2025
Gradient Descent
Gradient Descent is an optimization algorithm used in machine learning and deep learning to minimize the cost (loss) function by iteratively updating model parameters in the direction of steepest descent, i.e., the negative gradient.
What is Gradient Descent?
Gradient Descent helps find the best-fit parameters (like weights in a neural network or coefficients in regression) that minimize the error between predicted and actual values. It does this by adjusting the parameters gradually to reduce the loss.
The Basic Formula
Where:
- = model parameters (weights)
- = learning rate (step size)
- = cost/loss function
- = gradient (slope) of the loss with respect to the parameters
Types of Gradient Descent
1. Batch Gradient Descent
- Uses the entire training dataset to compute the gradient.
- Stable but slow on large datasets.
2. Stochastic Gradient Descent (SGD)
- Updates weights for each training example.
- Faster but can be noisy and less stable.
3. Mini-Batch Gradient Descent
- Uses a subset (mini-batch) of training data to compute each update.
- Combines advantages of both batch and SGD.
- Commonly used in deep learning.
Learning Rate (α)
The learning rate controls how big the step is during each update.
- If is too small: slow convergence.
- If is too large: may overshoot or diverge.
Example
Suppose we are minimizing the Mean Squared Error (MSE) in linear regression. Gradient descent updates the weights so that the predicted line fits the data points better over time.
Visualization
Imagine a ball rolling down a curved surface to reach the lowest point (minimum). Gradient descent is the process of rolling the ball by calculating the slope and moving it downhill.
Applications of Gradient Descent
- Training machine learning models (e.g., linear/logistic regression)
- Optimizing deep learning models (e.g., neural networks)
- Used in NLP, computer vision, recommendation systems, etc.
Related Concepts
- Learning Rate
- Loss Function
- Optimization Algorithms
- Backpropagation
- Stochastic Gradient Descent
- Neural Networks
SEO Keywords
gradient descent machine learning, how gradient descent works, types of gradient descent, optimization in ML, stochastic gradient descent, loss minimization, cost function optimization