⚖️ Imbalanced Learning Workshop

Handle rare events with resampling (undersample, oversample, SMOTE), class weighting, focal loss, and threshold tuning. Inspect PR/ROC curves and the confusion matrix.

Dataset & Model PR / ROC Controls Guide

Beginner Quick Start

Generate data: Set Total samples, Pos % (minority prevalence), and Noise, then click Generate.
Choose resampling (optional): Undersample/Oversample/SMOTE to balance the training set, then click Apply Resampling.
Train: Use Train Step or Train 100. Increase Class weight (pos) or Focal γ if the model ignores positives.
Decide: Move the Threshold slider to trade precision vs recall. Watch the confusion matrix and PR/ROC curves update.

Tips: PR curve (left) is more informative for imbalance. The colored background shows predicted probability; the dashed line is the decision boundary.

Dataset & Model

Positive (1) Negative (0)

Tip: Click the canvas to add a point to the selected class. Apply resampling to adjust training distribution.

Precision

—

Recall

—

PR AUC

—

	Pred 0	Pred 1
Actual 0	—	—
Actual 1	—	—

Threshold0.50

Adjust decision threshold to trade precision vs recall.

Precision–Recall & ROC

PR curve is informative for imbalanced data.

Diagonal line is random classifier.

📦 Dataset

Total samples

Pos %

Noise (σ)

Add class

Pos % controls the rarity of the positive class. Higher Noise makes classes overlap and the task harder.

🔁 Resampling

Undersample = faster but may drop information. Oversample = simple duplication. SMOTE = creates synthetic minority points between neighbors.

Target Pos %40

Resampling affects training distribution only; evaluation uses all points.

🤖 Model (Logistic)

LR0.10

Epochs/Step10

Class weight (pos)3.0

Focal γ0.0

Class weight (pos) increases penalty for missing positives (↑ recall). Focal γ focuses learning on hard examples (try 1.5–2.0 for heavy imbalance).

About Imbalanced Learning

Why it’s hard

Minority class is rare → models bias to predicting majority
Accuracy can be misleading → prefer PR curves and recall
Threshold choice matters as much as model choice

Techniques

Resampling: Undersample majority, oversample minority, or use SMOTE to synthesize new minority samples
Class weights: Penalize mistakes on minority more heavily
Focal loss: Down-weight easy examples to focus on hard ones

How to evaluate

PR curve & AP: Better for imbalance than ROC
Confusion matrix at threshold: Inspect precision/recall trade-offs
Calibration: If making probabilistic decisions, calibrate then choose thresholds

Workflow tips

Try resampling + class weights, then tune threshold by business costs
Use focal loss if minority still ignored
Validate with PR AUC and recall at target precision

How it works and how to visualize effects

How it works (pipeline)

Generate data: Set total size, positive percentage, and noise; points are drawn from two clusters.
Choose resampling: Applies only to the training indices (display shows all points). Options:
- Undersample: reduce majority
- Oversample: replicate minority
- SMOTE: synthesize minority near neighbors
Train model: Logistic regression updates weights with:
- Class weight (pos): upweights minority errors
- Focal γ: emphasizes hard examples (down-weights easy ones)
- LR / Epochs: controls update magnitude and iterations
Decide: Threshold converts probabilities to classes → confusion matrix and metrics update live.
Evaluate: PR/ROC curves recomputed from model probabilities over all points (not just resampled subset).

Tip: Resampling shifts training distribution; evaluation remains on the full set to avoid inflated metrics.

How to see and visualize improvements

Baseline vs Resampling: Generate with low Pos% (e.g., 10–15%). Train baseline (no resampling). Apply SMOTE or Oversample and retrain. Watch recall and PR AUC rise.
Class weights: Increase “Class weight (pos)” and observe the decision boundary tilt toward the minority and recall improve (precision may drop).
Focal loss: Set γ=2.0 for highly imbalanced/noisy data. Train; hard minority points influence more—recall improves.
Threshold tuning: Move the threshold slider:
- Lower threshold → higher recall, more FP
- Higher threshold → higher precision, more FN
PR vs ROC: Use PR to compare scenarios; it’s more sensitive to improvements in minority detection than ROC on imbalanced sets.
Stress test: Increase Noise (σ) to 0.9–1.2; compare “None” vs “SMOTE + class weights” for robustness.
Visual boundary: The colored background shows predicted probability; the dashed line is the current decision boundary.

Suggested experiments

Pos% = 5–10%, SMOTE to 40% target → train → compare PR AUC and recall
Keep data fixed; sweep class weight from 1.0 → 5.0 and plot F1 changes
Combine Oversample + γ=1.5 and compare to SMOTE-only

Support This Free Tool

Every coffee helps keep the servers running. Every book sale funds the next tool I'm dreaming up. You're not just supporting a site — you're helping me build what developers actually need.

500K+ users

200+ tools

100% private

☕ One-time support Buy me a coffee 📚 Learn & support 9-Book Bundle - $9 Stay updated Follow @anish2good

Privacy Guarantee: Private keys you enter or generate are never stored on our servers. All tools are served over HTTPS.

About This Tool & Methodology

This workshop explores class imbalance strategies: resampling (undersample/oversample), class weights, and threshold tuning. It visualizes confusion matrices, ROC/PR curves, and metrics sensitive to imbalance (F1, MCC).

Learning Outcomes

Understand why accuracy can be misleading under imbalance.
Compare class weighting vs sampling; observe PR curve behavior.
Use MCC/F1/recall@precision as robust alternatives.

Authorship & Review

Author: 8gwifi.org engineering team
Reviewed by: Anish Nath
Last updated: 2025-11-19

Trust & Privacy

Runs locally with synthetic or provided data; nothing is uploaded.

⚖️ Imbalanced Learning Workshop

Beginner Quick Start

Dataset & Model

Precision–Recall & ROC

📦 Dataset

🔁 Resampling

🤖 Model (Logistic)

About Imbalanced Learning

Why it’s hard

Techniques

How to evaluate

Workflow tips

How it works and how to visualize effects

How it works (pipeline)

How to see and visualize improvements

Suggested experiments

Support This Free Tool

About This Tool & Methodology

Learning Outcomes

Authorship & Review

Trust & Privacy

Quick Access

PGP Tools

Sharing Services

Security Tools

Cryptography

Network Tools

Legal & Compliance

DevOps/Container

Blockchain

Encoders/Converters

Developer Tools

Machine Learning Visualizers

Media Tools

Documents & PDF

Finance

Health

Lifestyle & Productivity

Chemistry

Math & Education

Physics Tools

Internationalization