🧭 Gradient Descent Visualizer

See how different optimizers move across a loss landscape. Adjust learning rate, momentum, and other hyperparameters and observe their effect on convergence.


Loss Landscape (Contour)
Tip: Click on the plot to set a new starting point (w0, w1)
Loss Over Steps
Step
0
Loss
||Grad||
🎯 Objective Function
Domain: w0, w1 in [-3, 3]. You can click the contour to choose a new start.
⚙️ Optimizer
Learning rate0.05
Epsilon ε1e-8
🎛️ Initialization
🌪️ Stochasticity & Steps
Gradient noise0.00

What is Gradient Descent? Why it's used
What is it?

Gradient Descent is an iterative method to minimize a loss function by moving parameters in the direction of steepest descent (negative gradient). Each update nudges the parameters toward lower loss.
Update (conceptual): θ ← θ − η · ∇L(θ)

  • Loss landscape: Map of loss values across parameters
  • Learning rate (η): Controls how big each step is
  • Path: The sequence of parameter positions over steps
Why it's used
  • Versatility: Trains many models (regression, neural networks, etc.)
  • Scalability: Stochastic variants handle large datasets efficiently
  • Practicality: Extensions (Momentum, RMSProp, Adam) improve stability and speed
  • Non-convex optimization: Navigates valleys, saddles, and plateaus in real problems

In convex problems it reaches the global minimum; in non-convex ones it often finds good solutions. Choosing the right learning rate and optimizer is crucial.

About this Visualizer

This playground shows how optimization algorithms (SGD, Momentum, RMSProp, Adam) follow the gradient to minimize a loss function. The contour plot is a map of the loss: darker/bluer regions are lower values, brighter/yellow regions are higher values.

What you can control
  • Objective: Choose different surfaces (convex, saddle, tricky valleys)
  • Optimizer: Pick SGD, Momentum, RMSProp, or Adam
  • Hyperparameters: Learning rate and method-specific parameters
  • Noise: Add randomness to emulate stochastic gradients
  • Start point: Click on the plot or type w0/w1
Reading the visuals
  • Path: The polyline shows how parameters move step-by-step
  • Loss curve: Should trend down if optimization is working
  • Gradient norm: Indicates how steep the surface is locally
Optimizer insights
  • SGD: Simple and fast; can zig-zag in narrow valleys
  • Momentum: Smooths updates; helps escape shallow regions
  • RMSProp: Adapts step size per-parameter; good on noisy gradients
  • Adam: Momentum + RMSProp; often faster convergence
Troubleshooting
  • Diverging: Lower learning rate
  • Slow progress: Increase learning rate slightly or try Adam
  • Saddle stuck: Add noise or use momentum/Adam

Support This Free Tool

Every coffee helps keep the servers running. Every book sale funds the next tool I'm dreaming up. You're not just supporting a site — you're helping me build what developers actually need.

500K+ users
200+ tools
100% private
Privacy Guarantee: Private keys you enter or generate are never stored on our servers. All tools are served over HTTPS.

About This Tool & Methodology

This visualizer simulates gradient descent (batch and stochastic variants) on configurable cost surfaces. It computes gradients analytically or numerically and updates parameters with a user‑selected learning rate and momentum. Plots are rendered in your browser.

Learning Outcomes

  • See how learning rate and momentum affect convergence and oscillations.
  • Understand local minima, saddle points, and plateau behavior.
  • Contrast batch vs stochastic updates and their noise/variance.

Authorship & Review

  • Author: 8gwifi.org engineering team
  • Reviewed by: Anish Nath
  • Last updated: 2025-11-19

Trust & Privacy

  • All simulations run locally; no data is uploaded.