Hannah Attar
← Back to Projects
Project Report

Regime-Aware Model Risk Visualization

A diagnostic study showing that a model’s reliability is state-dependent, and that disciplined deployment rules (confidence + regime) can improve risk outcomes even when daily prediction is weak.

Asset SPY, daily (2006–2025)
Sample N = 5,007 (70% train / 30% test)
Regimes GMM on (ret5, vol10, vol20)
Predictor Logistic regression on (ret5, vol20)
Confidence |p − 0.5|
Goal Evaluation + deployment discipline

Executive Summary

  • Daily equity direction prediction is noisy, and predicted probabilities are compressed near 0.5.
  • Regimes inferred from returns + realized volatility form persistent market states, not random “clusters.”
  • Model reliability varies materially by regime, so “average” metrics hide failure modes.
  • Filtering trades by confidence and disabling execution in a high-risk regime improves the strategy’s risk profile.

Key numbers (out-of-sample)

Brier score
0.247
Close to an uninformative baseline, consistent with probability compression.
Confidence (|p−0.5|)
mean 0.0498
Std 0.0121, max 0.1008 (test N=1,503).
Regime Sharpe (best)
1.38
Strategy performance concentrates in certain regimes.
High-vol regime
unstable
Regime 3 shows poor/undefined Sharpe and degraded reliability.

Metrics reported are gross of trading frictions and used primarily for diagnostic comparison.

What this demonstrates

  • Model risk is conditional: a “single” classifier behaves differently depending on the state distribution it operates under.
  • Confidence is not enough: the confidence-accuracy relationship changes by regime, especially in high volatility.
  • Prediction ≠ deployment: economic outcomes are strongly influenced by when you choose to trade, not just what the model predicts.

Methods (Pipeline)

  1. 1
    Data
    SPY daily closes (2006–2025). Chronological split: 70% train, 30% test.
  2. 2
    Features
    ret5, vol10, vol20 (standardized on train).
  3. 3
    Regimes (Unsupervised)
    GMM on (ret5, vol10, vol20) fit on train, then applied forward.
  4. 4
    Prediction (Supervised)
    Logistic regression predicts next-day direction using (ret5, vol20).
  5. 5
    Confidence
    Use |p − 0.5| to stratify reliability and filter trades.
  6. 6
    Deployment Rule
    Trade only if |p−0.5| > 0.02 and deactivate execution in the high-risk regime.
  7. 7
    Evaluation
    OOS accuracy by confidence/regime + strategy Sharpe & max drawdown (gross of costs).

Results

Findings

  • Regimes persist and cluster in stress periods. The inferred high-volatility state concentrates around extended market stress, consistent with volatility clustering.
  • Reliability is regime-dependent. Accuracy patterns differ by regime; the high-volatility regime shows more erratic confidence-stratified behavior (small samples amplify variability).
  • Feature-space separation explains it. Regimes occupy distinct regions in (ret5, vol20), so a single global decision rule effectively faces different operating conditions.
  • Probabilities are compressed. Outputs stay near the unconditional mean; probabilities work better as low-amplitude signals than as literal calibrated forecasts.
  • Deployment rules change the risk profile. Confidence filtering plus deactivating the high-risk regime primarily reduces drawdown depth and return volatility rather than magically increasing raw accuracy.

Regime-level strategy Sharpe (OOS)

RegimeSharpeInterpretation
01.38Strongest contribution in this regime.
10.41Positive but modest.
20.22Low but positive.
3NaNUnstable / adverse high-volatility state.

Confidence filter used in the strategy: trade only when |p−0.5| > 0.02, and deactivate execution during the high-risk regime.

Regime timeline
Figure 1
Inferred market regimes over time
High-vol regime clusters around prolonged stress episodes, supporting interpretation as persistent market states.
Accuracy vs confidence, by regime
Figure 2
Out-of-sample accuracy versus confidence by regime
Confidence is not universally monotonic. High-vol regime shows degraded stability (and smaller bin counts).
Equity curve (OOS)
Figure 3
Out-of-sample strategy equity curve
Selective deployment changes drawdowns/volatility more than it changes raw predictive metrics. Gross of costs.
Probability field + regimes
Figure 4
Logistic probability field with inferred regimes
Regimes occupy distinct areas of (ret5, vol20), providing a geometric explanation for state-dependent behavior.
Probability compression
Figure 5
Distribution of predicted probabilities
Predicted probabilities concentrate near the unconditional mean, motivating thresholded deployment.

Interpretation and limitations

  • Diagnostic, not “alpha claims.” The point is to map failure modes and decide when not to trade.
  • Costs not modeled. Results are gross of transaction costs, slippage, and financing/short constraints.
  • Small samples in extreme regimes. High-volatility regime bins can be thin; interpret bin-level spikes cautiously.
  • Next steps. Richer features, alternative regime definitions, and explicit execution costs would make the deployment test more realistic.