🧪 Decision Tree Model Selection Lab

Choose the best decision tree using cross-validation, validation curves, pruning, and a simple grid search. Watch decision regions and splits in real time.

Visualization Validation Curve Grid Search Controls Guide

Beginner Quick Start

Generate dataset (Balanced/Overlap/XOR/Moons/Circles) then click Run CV & Recommend.
Use the Validation Curve to sweep max_depth and see train vs CV score (over/underfitting).
Try the Grid Search on max_depth × min_samples_leaf and pick a stable, simpler model.

Tip: Prefer models with high CV mean and low std; tie‑break in favor of smaller depth.

Dataset & Decision Regions

Class 1 Class 0

Show split lines

Decision regions are colored softly by the predicted class; dashed lines show tree thresholds.

Validation Curve

Sweep

Train vs CV score across the selected hyperparameter. Overfitting shows as high train and low CV.

Grid Search (max_depth × min_samples_leaf)

Cells colored by CV score (darker = better). Click a cell to apply those hyperparameters.

📦 Dataset

Click the canvas to add points for the selected class.

⚙️ Model Selection

Criterion

max_depth5

min_samples_split4

min_samples_leaf2

max_features2

ccp_alpha0.00

K-folds5

Score metric

Validation strategy

Random assumes i.i.d.; Forward‑Chaining respects time order and is preferred when data drifts or is temporal.

Cost FP1.0

Cost FN5.0

Cost-based CV chooses the threshold that minimizes FP/FN cost on each validation fold and reports a normalized score (higher is better).

📊 CV Metrics

CV Mean: —

CV Std: —

Recommended: —

🧪 Final Test (Hold-out)

Create a final test set not used during model selection and evaluate once.

Test %

Final Test Score: —

What to learn on this page

Validation curves

max_depth: too small → underfit (both low); too large → overfit (train high, CV low)
min_samples_leaf: higher values reduce variance, often improving CV stability
ccp_alpha: pruning threshold; larger values prune more (less variance, more bias)

Grid search

Use the heatmap to find stable regions of good CV performance
Prefer simple models in a plateau over spiky maxima

Practical tips

Pick the smallest depth with near‑best CV score to reduce overfitting
Increase min_samples_leaf if small leaves appear noisy
Use balanced accuracy or F1 when classes are imbalanced

How the visuals help

Decision regions show model complexity; splits indicate where the tree decided
Validation curve reveals bias–variance behavior
Heatmap shows robust hyperparameter zones

At a glance: what’s going on

Cross‑validation (CV): The data is split into K folds; the tree trains on K−1 folds and validates on the remaining one. CV Mean/Std summarize performance stability.
Validation curve: We sweep a hyperparameter (e.g., max_depth) and plot Train vs CV scores to reveal under/overfitting.
Pruning (ccp_alpha): Acts like a penalty on complexity; higher values prune more, often reducing variance at the cost of bias.
Grid search heatmap: Tests pairs (max_depth × min_samples_leaf); darker cells mean better CV score. Click a cell to apply those settings.
Decision regions: The canvas shows the fitted tree on all points; color indicates predicted class, dashed lines show split thresholds.
Recommend button: Runs CV around your current settings and suggests a robust, simpler model when scores are similar.

Goal: Pick the simplest tree that achieves near‑best CV performance and shows stable behavior across nearby settings.

Why validation strategy matters

Mirror production: If your model scores future data, validation should reflect that. Forward‑Chaining trains on earlier points and validates on later ones.
Avoid optimistic bias: Random K‑Fold can leak temporal information when data drifts; forward splits reduce this risk.
Match the objective: Pick metrics aligned with the use case (e.g., PR AUC or cost‑based for imbalance) and, if using cost, choose threshold per fold to minimize cost.
Keep a final test: Hold out a final set untouched during tuning; report that score separately after selecting the model.
Prefer stability and simplicity: Favor settings with high CV mean and low CV std; when tied, pick the smaller depth/min leaf.

Rule of thumb: Use Forward‑Chaining for time‑ordered data; use Random K‑Fold for i.i.d. data. Always validate with the metric that reflects your real deployment goals.

Support This Free Tool

Every coffee helps keep the servers running. Every book sale funds the next tool I'm dreaming up. You're not just supporting a site — you're helping me build what developers actually need.

500K+ users

200+ tools

100% private

☕ One-time support Buy me a coffee 📚 Learn & support 9-Book Bundle - $9 Stay updated Follow @anish2good

Privacy Guarantee: Private keys you enter or generate are never stored on our servers. All tools are served over HTTPS.

About This Tool & Methodology

This tool builds decision trees with tunable hyperparameters (max_depth, min_samples, impurity criteria) and compares models via validation scores. Visualizations show splits and feature importance where supported.

Learning Outcomes

Understand overfitting/underfitting through depth and min split controls.
Compare Gini vs entropy and their effects on splits.
Use validation curves to choose hyperparameters.

Authorship & Review

Author: 8gwifi.org engineering team
Reviewed by: Anish Nath
Last updated: 2025-11-19

Trust & Privacy

Runs fully in your browser using demo or user datasets.

🧪 Decision Tree Model Selection Lab

Beginner Quick Start

Dataset & Decision Regions

Validation Curve

Grid Search (max_depth × min_samples_leaf)

📦 Dataset

⚙️ Model Selection

📊 CV Metrics

🧪 Final Test (Hold-out)

What to learn on this page

Validation curves

Grid search

Practical tips

How the visuals help

At a glance: what’s going on

Why validation strategy matters

Support This Free Tool

About This Tool & Methodology

Learning Outcomes

Authorship & Review

Trust & Privacy

Quick Access

PGP Tools

Sharing Services

Security Tools

Cryptography

Network Tools

Legal & Compliance

DevOps/Container

Blockchain

Encoders/Converters

Developer Tools

Machine Learning Visualizers

Media Tools

Documents & PDF

Finance

Health

Lifestyle & Productivity

Chemistry

Math & Education

Physics Tools

Internationalization