Demo runtime · idle

PCA, visualized

Reduction

Principal Component Analysis finds the directions along which your data varies the most — then lets you project onto a smaller subset of them, keeping signal and discarding redundancy. Watch the PC arrows appear on a 3D cloud on the left, and the projection (with the discarded dimensions collapsed to zero) on the right. Compare across six datasets to see where PCA shines and where it fails.

Original data · PCs in red/green/blue

Projected data · in PC coordinates

Explained variance per principal component · dashed = cumulative

Dataset

Points along a diagonal + noise. PC1 captures most variance — the canonical PCA case.

Target dimensions 2

Standardize

z-score features first (recommended)

Tip: drag to rotate either 3D panel — compare the variance directions to the data shape.

The algorithm edit and re-run

pca.py

import numpy as np

# DATA_X is an (N, 3) numpy array of input points, injected by the runner.
# Goal: find the directions of maximum variance, then project onto them.

# ── Step 1 · Center (and optionally scale) ──────────────
# Always subtract the mean — PCA is defined on centered data.
# If standardize=True, also divide by std so no feature dominates by scale.
def standardize(X):
    return (X - X.mean(axis=0)) / X.std(axis=0)

def center_only(X):
    return X - X.mean(axis=0)

# ── Step 2 · Covariance matrix ──────────────────────────
# d × d matrix where C[i, j] = how features i and j co-vary.
def covariance(X):
    return np.cov(X.T)

# ── Step 3 · Eigendecomposition ─────────────────────────
# Covariance matrices are symmetric and positive semi-definite, so we
# use eigh (not eig) — eigh guarantees real eigenvalues and is more
# numerically stable. eigh returns ascending order; reverse for desc.
def eigendecompose(C):
    eig_vals, eig_vecs = np.linalg.eigh(C)
    idx = np.argsort(eig_vals)[::-1]
    return eig_vals[idx], eig_vecs[:, idx]

# ── Sign convention (sklearn-style) ─────────────────────
# Eigenvectors are defined up to sign — flip so the largest-magnitude
# component of each PC is positive.  Keeps results stable across runs.
def apply_sign_convention(eig_vecs, projected):
    for j in range(eig_vecs.shape[1]):
        i = int(np.argmax(np.abs(eig_vecs[:, j])))
        if eig_vecs[i, j] < 0:
            eig_vecs[:, j] *= -1
            if j < projected.shape[1]:
                projected[:, j] *= -1
    return eig_vecs, projected

# ── Step 4 · Project onto the full PC basis ─────────────
# Multiply pre-processed data by ALL eigenvectors.  Dimensionality
# reduction is then a one-liner: keep the first k columns.
def project(X_pre, eig_vecs):
    return X_pre @ eig_vecs            # shape (N, d)

# ── Full PCA pipeline ───────────────────────────────────
def pca(X, standardize_first=True):
    X_pre = standardize(X) if standardize_first else center_only(X)
    C = covariance(X_pre)
    eig_vals, eig_vecs = eigendecompose(C)
    projected = project(X_pre, eig_vecs)
    eig_vecs, projected = apply_sign_convention(eig_vecs, projected)
    var_pct = eig_vals / eig_vals.sum()
    # When we skip standardization, PCs are already in original-data
    # scale — return std=1 so the JS arrow renderer doesn't re-scale.
    std_out = X.std(axis=0).tolist() if standardize_first else [1.0] * X.shape[1]
    return {
        "eig_vals":     eig_vals.tolist(),
        "eig_vecs":     eig_vecs.tolist(),
        "projected":    projected.tolist(),    # (N, d) — full basis
        "variance_pct": var_pct.tolist(),
        "mean":         X.mean(axis=0).tolist(),
        "std":          std_out,
    }

# Dimensionality reduction = take the first k columns:
#     X_reduced = projected[:, :k]
# That's what the "Target dimensions" slider in the UI does on the display side.

# Try this:
#   · Replace eigh() with SVD:  U, S, Vt = np.linalg.svd(X_pre, full_matrices=False)
#                              eig_vecs = Vt.T   (SVD avoids forming the covariance)
#   · Toggle Standardize off on the anisotropic dataset — PCs flip to follow raw scale
#   · Compute reconstruction error for k components:
#         X_recon = projected[:, :k] @ eig_vecs[:, :k].T
#         err = np.linalg.norm(X_pre - X_recon)

The math, derived

1. The goal — max-variance directions.

Given centered data $X \in \mathbb{R}^{N \times d}$ (each column zero-mean — this is why preprocessing always subtracts the mean), find a unit vector $u \in \mathbb{R}^d$ that maximizes the variance of the projections $X u$:

$$ \mathrm{Var}(Xu) \;=\; \frac{1}{N - 1}\, u^{\top} X^{\top} X\, u \;=\; u^{\top} S\, u $$

where $S = \frac{1}{N-1} X^{\top} X$ is the covariance matrix. (If you also divide by std before this step, $S$ becomes the correlation matrix — what we compute when Standardize is on.)

2. The constraint — unit length.

Without a length constraint, you can make $u^{\top} S u$ arbitrarily large by scaling. We only care about direction, so we constrain $u^{\top} u = 1$:

$$ \max_{u} \; u^{\top} S\, u \quad \text{subject to} \quad u^{\top} u = 1 $$

3. Lagrangian.

Use a Lagrange multiplier $\lambda$ to fold the constraint into the objective:

$$ \mathcal{L}(u, \lambda) \;=\; u^{\top} S\, u \,-\, \lambda \,(\, u^{\top} u - 1 \,) $$

4. Take the gradient, set to zero.

Differentiate w.r.t. $u$ and equate to zero:

$$ \nabla_{u} \mathcal{L} \;=\; 2 S u \,-\, 2 \lambda u \;=\; 0 $$ $$ S\, u \;=\; \lambda\, u $$

That is exactly the eigenvalue equation for $S$. Every critical point of the constrained problem is an eigenvector of the covariance matrix.

5. Which eigenvector?

Substitute $Su = \lambda u$ back into the objective: $u^{\top} S u = \lambda \, u^{\top} u = \lambda$. So the variance along eigenvector $u$ equals its eigenvalue $\lambda$. The principal components are the eigenvectors of $S$ sorted by descending eigenvalue — the eigenvalue tells you how much variance that direction carries.

Try this

The 90% threshold

On the plane dataset, look at the scree plot. PC1 + PC2 should hit ~95% explained variance — the third dimension is mostly noise. This is the visual intuition behind n_components=0.95.

When PCA gives up

Switch to the sphere dataset. All three eigenvalues should be roughly equal — there's no preferred direction. PCA produces something, but it's meaningless. This is what isotropic variance looks like.

The nonlinearity wall

The swiss roll is a 2D manifold embedded in 3D. Linear PCA flattens it — the projection loses the curved structure. This motivates kernel PCA, t-SNE, or UMAP for nonlinear data.

SVD instead of eig

Replace np.linalg.eig(C) with np.linalg.svd(X_std, full_matrices=False). SVD is more numerically stable for nearly-singular covariance matrices — it’s what sklearn.decomposition.PCA uses under the hood.

Scale matters

In standardize(), replace / X.std(axis=0) with just centering (subtract mean only). On the anisotropic dataset, watch how the PCs change — PCA without scaling is at the mercy of feature units.

Reconstruction error

Add reconstructed = projected @ eig_vecs[:, :dims].T in your code and print np.linalg.norm(X_std - reconstructed). That’s the information you discarded. Compare at dims = 1, 2, 3 across datasets.

In one glance

⚠️ Watch out Skip standardization Isotropic variance Nonlinear structure Use eigvecs for non-PCA model No scree plot → arbitrary k

🔧 In practice sklearn.decomposition.PCA StandardScaler np.linalg.svd explained_variance_ratio_ n_components=0.95

Frequently asked

Three things: (1) dimensionality reduction — compress N-feature data to $k$ features that retain most variance; (2) visualization — projecting high-dimensional data to 2D or 3D for plotting; (3) decorrelation — the principal components are orthogonal, useful as inputs to downstream models that assume independent features.

When variance is isotropic (a sphere has no preferred direction) or when the structure is nonlinear (a swiss roll’s underlying 2D manifold gets flattened by linear projection). Try the sphere and swissRoll datasets — PCA flatlines on the first and loses the structure on the second.

Almost always yes. PCA is sensitive to feature scale — a column ranging 0–10000 will dominate PC1 over a column ranging 0–1 even if both carry similar information. Subtract the mean and divide by the std (z-score) before fitting.

PCA is linear, fast, interpretable, and good for global structure + decorrelation. t-SNE and UMAP are nonlinear, slower, and better at preserving local neighborhoods — but they distort global distances and shouldn’t be used for downstream modeling. Use PCA when you want a meaningful feature space; use t-SNE/UMAP only for visualization.

Two heuristics: (1) plot the cumulative explained variance and pick the smallest $k$ that gets you above your threshold (e.g., 95%); (2) look for an elbow in the scree plot — a sharp drop in eigenvalues followed by a plateau. In sklearn, PCA(n_components=0.95) picks the smallest $k$ retaining 95% variance for you.

Menu

⭐ Popular Tools

🕒 Recently Used

📁 All Categories

Quick Links

Support