Type a sentence and explore how attention focuses on context. Toggle heads/layers, hover tokens to see attention weights, and switch between self- and cross-attention.
Hover a token to highlight where it sends attention (rows) or receives attention (columns).
Use Temperature: higher → sharpened attention, lower → flatter weights.
Auto‑play to cycle through heads/layers and notice complementary patterns.
SEO: transformer visualization • attention mechanism • how BERT works • attention heads explained
What’s happening under the hood — and why it matters
What’s happening
Query/Key similarity: Each head projects tokens into query and key vectors and measures similarity. The heatmap cell is the softmax‑normalized similarity.
Softmax weighting: Rows sum to 1. Temperature sharpens (high) or smooths (low) these weights.
Heads focus differently: One head may track word identity (“the→the”), another syntax (“cat→sat”), another positions (“on→the”).
Layers compose: Earlier layers learn local links; later layers aggregate longer‑range dependencies.
Interacting with the view
Switch Self vs Cross to see intra‑sentence vs encoder→decoder patterns.
Move across heads/layers to observe complementary attention behaviors.
Hover tokens to highlight the row/column and read off weight distribution.
Why this is important
Interpretability: Attention offers an intuitive window into what a model “looks at” when forming context.
Debugging: Mismatches (e.g., attention stuck on punctuation) can reveal tokenization or context issues.
Quality signals: Healthy heads often show consistent, meaningful patterns (e.g., determiners attending nouns, verbs attending subjects/objects).
Education: Seeing heads and layers evolve turns the math of attention into actionable intuition.
Where you’ll see this
Language models (GPT/BERT family) attending to entities, coreferences, and syntax.
Translation: decoder cross‑attention aligning target words with source phrases.
Vision and audio transformers: attention over patches or time frames.
Support This Free Tool
Every coffee helps keep the servers running. Every book sale funds the next tool I'm dreaming up.
You're not just supporting a site — you're helping me build what developers actually need.