Cyclical features, on the circle

Feature Engineering

Raw integers treat hour=23 and hour=0 as 23 apart — even though on a clock they’re neighbours. Map the value to (cos θ, sin θ) with θ = 2π·x/period and the wraparound is preserved. Drag the value below and watch raw-numeric and sin/cos encodings race on the same task. Linear and distance-based models almost always win with sin/cos; trees mostly don’t care.

Unit circle point = (cos θ, sin θ) · θ = 2π·value/period
current value trail 23 ↔ 0 are neighbours on the circle — 23 apart as integers.
Raw numeric vs sin/cos predicted curve across one period
R2: MAE: Accuracy: F1:
Encoding doesn’t matter here Switch model to Linear to see the seam-break.
sin / cos importance
sin
0.50
cos
0.50
Linear-only Switch model to Linear to see the regression weights.
model size vs performance
Feature & value drag the slider, watch the dot move
feature
value0
Encoding period defines the wrap, noise = target σ
period24
noise σ0.10
Task & model how to evaluate the encoding
task
model
Theory & exercises · the unit-circle map, when to skip it, harmonics

The math, in three moves

1. Map the scalar to an angle.

$$ \theta = 2\pi \cdot \frac{x}{P} \qquad (x \in [0, P)) $$

$P$ is the period — 24 for hour, 12 for month, 7 for day-of-week, 360 for angle. The map is bijective on one period and identifies $P$ with $0$.

2. Encode as a point on the unit circle.

$$ \phi(x) = \big(\cos\theta,\; \sin\theta\big) $$

Two features replace one. The Euclidean distance between $\phi(x_1)$ and $\phi(x_2)$ is monotone in the arc distance — close on the cycle ⇒ close in feature space.

3. (Optional) Add harmonics.

$$ \phi_k(x) = \big(\cos(k\theta),\; \sin(k\theta)\big),\quad k=1,2,\dots,K $$

Two features per harmonic. With enough $K$, you reproduce a Fourier basis — linear models on these features can approximate any periodic target.

Try this

The seam test

Pick hour, model = Linear, task = Regression. Look at the prediction curves at the seam (x near 23 → 0). Raw breaks; sin/cos sweeps through smoothly. That single fact is the whole pitch for cyclical encoding.

Trees don’t need it

Switch to Tree (binning). The raw and sin/cos curves match exactly. Trees split on thresholds; they don’t care if your feature is cyclic. Stick with raw integers for tree-only pipelines.

kNN benefits from cyclic distance

kNN here uses an arc distance, not Euclidean. That alone fixes the wraparound problem. If your kNN library only does Euclidean (most do), encode with sin/cos first.

One-hot vs sin/cos size

For angle (period=360), one-hot would mean 360 features. Sin/cos = 2. Same cyclic info, 99% fewer parameters. The size chart on the right shows the same lesson at a glance.

Noise drowns small periods

Push noise σ to 0.9 on day-of-week (period=7). The R² for both encodings collapses. Encoding can’t rescue a target that’s mostly noise.

Importance bars hint at phase

The target here is sin(θ). The sin importance should dominate. Now imagine a target $\cos(\theta)$ — the cos bar would lead. Skewed bars hint at the dominant phase in your data.

In one glance

⚠️ Watch out Linear+raw integer seam break One-hot for high period Multiple harmonics? maybe drop hand-features Distance learners with raw cyclic ints
🔧 In practice np.sin(2*np.pi*x/P) np.cos(2*np.pi*x/P) ColumnTransformer FunctionTransformer sklearn.preprocessing Fourier features

Frequently asked

A raw integer treats 23 and 0 as 23 apart, but on the hour clock they’re neighbours. Mapping the value to (cos θ, sin θ) with θ = 2π·x/period preserves cyclic geometry — close points on the cycle stay close in the encoded space. Linear models and distance-based learners (kNN, k-means, SVMs with RBF) benefit immediately.
Tree-based models (decision trees, random forests, gradient boosting) handle splits on raw integers fine — they don’t care about smoothness. If you only ever feed the feature into trees, sin/cos doesn’t help much. For everything else (linear regression, logistic regression, neural nets, kNN, k-means), encode it.
One-hot works but explodes the feature count for high-period values (one-hot of hour = 24 features, day-of-year = 365). It also throws away ordering — the model can’t tell 0 and 1 are adjacent. Sin/cos compresses the same information into 2 features while keeping the cyclic structure. Use one-hot only when the period is very small (e.g., 7 weekdays) AND ordering really doesn’t matter.
For most use cases, sin and cos at the base frequency are enough. For richer periodic targets (multiple peaks per cycle), add second-harmonic features: sin(2θ), cos(2θ). This is exactly what Fourier features do. In practice though, if your target needs many harmonics, you probably want a smoother model (GAM, GP, NN) rather than more hand-crafted features.