← Back to Projects
Project Report · Quantitative Research

Regime-Aware Model Risk Visualization

A diagnostic study showing that a model's reliability is state-dependent, and that disciplined deployment rules — filtering by confidence and regime — can improve risk outcomes even when daily prediction is weak.

Download Paper (PDF) GitHub Repository
Asset
SPY, daily (2006–2025)
Sample
N = 5,007 (70 / 30 split)
Regimes
GMM on (ret5, vol10, vol20)
Predictor
Logistic regression (ret5, vol20)
Confidence
|p − 0.5|
Goal
Evaluation + deployment discipline

Key Numbers (Out-of-Sample)

0.247
Brier Score
Close to uninformative baseline, consistent with probability compression.
0.0498
Mean Confidence |p−0.5|
Std 0.0121, max 0.1008 (test N = 1,503).
1.38
Best Regime Sharpe
Strategy performance concentrates in certain regimes.
unstable
High-Vol Regime
Regime 3 shows poor/undefined Sharpe and degraded reliability.

Metrics reported are gross of trading frictions and used primarily for diagnostic comparison, not as live performance claims.

Executive Summary

What This Demonstrates

Methods (Pipeline)

1

Data

SPY daily closes (2006–2025). Chronological split: 70% train, 30% test.

2

Features

ret5, vol10, vol20 (standardized on train).

3

Regimes (Unsupervised)

GMM on (ret5, vol10, vol20) fit on train, then applied forward.

4

Prediction (Supervised)

Logistic regression predicts next-day direction using (ret5, vol20).

5

Confidence

Use |p − 0.5| to stratify reliability and filter trades.

6

Deployment Rule

Trade only if |p−0.5| > 0.02 and deactivate execution in the high-risk regime.

7

Evaluation

OOS accuracy by confidence/regime + strategy Sharpe & max drawdown (gross of costs).

Results

Findings

  • Regimes persist and cluster in stress periods. The inferred high-volatility state concentrates around extended market stress, consistent with volatility clustering.
  • Reliability is regime-dependent. Accuracy patterns differ by regime; the high-volatility regime shows more erratic confidence-stratified behavior (small samples amplify variability).
  • Feature-space separation explains it. Regimes occupy distinct regions in (ret5, vol20), so a single global decision rule effectively faces different operating conditions.
  • Probabilities are compressed. Outputs stay near the unconditional mean; probabilities work better as low-amplitude signals than as literal calibrated forecasts.
  • Deployment rules change the risk profile. Confidence filtering plus deactivating the high-risk regime primarily reduces drawdown depth and return volatility rather than magically increasing raw accuracy.

Regime-Level Strategy Sharpe (OOS)

RegimeSharpeInterpretation
01.38Strongest contribution in this regime.
10.41Positive but modest.
20.22Low but positive.
3NaNUnstable / adverse high-volatility state.

Confidence filter: trade only when |p−0.5| > 0.02, and deactivate execution during the high-risk regime.

Regime Timeline Figure 1
Inferred market regimes over time

High-vol regime clusters around prolonged stress episodes, supporting interpretation as persistent market states.

Accuracy vs Confidence, by Regime Figure 2
Out-of-sample accuracy versus confidence by regime

Confidence is not universally monotonic. High-vol regime shows degraded stability (and smaller bin counts).

Equity Curve (OOS) Figure 3
Out-of-sample strategy equity curve

Selective deployment changes drawdowns/volatility more than it changes raw predictive metrics. Gross of costs.

Probability Field + Regimes Figure 4
Logistic probability field with inferred regimes

Regimes occupy distinct areas of (ret5, vol20), providing a geometric explanation for state-dependent behavior.

Probability Compression Figure 5
Distribution of predicted probabilities

Predicted probabilities concentrate near the unconditional mean, motivating thresholded deployment.

Interpretation & Limitations