Hannah Attar
← Back to Projects
Project Report

Spatial Pricing Dynamics in Coastal Real Estate

Spatial econometrics build separating structure, micro-location premia, and spillover effects. Uses clustering + IV + spatial autoregression and a logit classifier to stress-test drivers of pricing beyond standard hedonic controls.

Universe 620 residential sales (single-family + attached)
Location Coastal submarket (zip range 92624–92629)
Core vars ln(price), ln(sqft), beds, baths, stories, type dummies
Spatial dist to PCH + lat/long + cluster segmentation
Models IV / GS2SLS (spatial), Logit classification
Goal Quantify structural + spatial dependence effects

Executive Summary

  • Prices are explained by standard hedonic structure controls, but residuals remain spatially structured.
  • Distance-to-coast proxy (lndist_pch) behaves like a location premium and becomes stronger under spatial correction.
  • Spatial dependence matters: ignoring it can distort magnitudes and confidence on key location terms.
  • Cluster segmentation provides a clean way to encode micro-markets and stress-test heterogeneity.

Key numbers

N
620
Property transactions.
Price range
$380k–$32M
Heavy right tail.
GS2SLS Pseudo R²
0.753
Spatial autoregressive model fit.
Logit accuracy
90.48%
Classification at Pr(D) ≥ 0.5 threshold.

Metrics are reported for model comparison and diagnostic clarity (not a “forecast product” claim).

Data Description (Descriptive Statistics)

Three control blocks are used: main structure controls, time controls, and spatial controls. Tables below mirror the original outputs but are rendered natively so your site doesn’t look like a screenshot museum.

Main Controls

VariableObsMeanStd. Dev.MinMax
house id620310.5179.1231620
price6202,875,487.43,175,359380,00032,000,000
sqft6202,554.191,391.41679913,777
lot sqft6208,877.55321,439.8741,000304,920
beds6203.5560.91629
baths6202.5851.151110
stories6201.7310.59113
parking6202.250.68918
single family6200.9050.29401
condo6200.0440.20401
townhomes6200.0350.18501
duplex triplex6200.0160.12601
zipcode62092627.7422.1719262492629
year built6201980.77616.46519282023
house by year620627,925.91362,230.012,0221,254,260

Time Controls

VariableObsMeanStd. Dev.MinMax
time620753.72110.194739772
year=20216200.2320.42301
year=20226200.3180.46601
year=20236200.3350.47301
year=20246200.1150.31901
month dummies (1–12)Included as indicator controls (means range ~0.053–0.113).

Spatial Controls

VariableObsMeanStd. Dev.MinMax
dist_pch6200.0250.0110.0040.055
latitude62033.4730.01233.44533.495
longitude620-117.6930.022-117.732-117.648
kmeans cluster6202.0870.76413
std kmeans cluster6201.9790.72413
int complete cluster61963.9969.1441244
int ward cluster619135.008123.4541393

Spatial Structure (Selected Visuals)

Transaction map (spatial density)
Figure
Map of transaction locations
Used to validate spatial clustering and motivate spatial dependence correction.
Geographic clustering (overview)
Figure X
Housing observations mapped across the study area
Spatial distribution of observations used to motivate spatial controls and segmentation.
K-means segmentation (micro-markets)
Figure
K-means clusters by latitude and longitude
Cluster IDs are used as segmentation controls and for robustness checks.

Model 1: Spatial Autoregressive (GS2SLS / IV)

The goal is to estimate structural and location effects while correcting for spatial dependence and endogeneity. This specification is treated as the primary “pricing” model.

Structural + Spatial (GS2SLS):
lnpricei = β0 + β1lnsqfti + β2bedsi + β3bathsi + β4lndist_pchi + β5storiesi + δ·Typei + i.month + i.year + εi

Results (GS2SLS estimates)

TermCoefStd. ErrzP>z95% CI
lnsqft1.0520.1746.0600.000[0.712, 1.392]
beds-0.0430.029-1.4900.136[-0.101, 0.014]
baths0.0910.0332.8000.005[0.027, 0.155]
lndist_pch0.4240.03910.7800.000[0.347, 0.501]
stories-0.1340.036-3.6800.000[-0.205, -0.062]
single_family0.1120.0502.2200.026[0.013, 0.211]
year=20220.1540.0423.6300.000[0.071, 0.237]
year=20230.1990.0404.9200.000[0.120, 0.278]
year=20240.2350.0593.9600.000[0.119, 0.352]
std_kmeans_cluster=2-0.3370.036-9.4400.000[-0.407, -0.267]
std_kmeans_cluster=3-0.2770.036-7.6600.000[-0.348, -0.206]
Constant8.0791.2256.5900.000[5.678, 10.480]

Interpretation (tight):
(1) Size dominates (lnsqft ≈ 1.05), consistent with a multiplicative scaling of price with interior area. (2) Location premium is strong (lndist_pch positive and highly significant), and becomes especially clean under spatial correction. (3) Cluster effects are economically large, consistent with micro-market segmentation that isn’t captured by basic covariates.

Model 2: Logit Classifier (Price Regime / Tail Flag)

This model treats a price-state indicator as the target and tests whether structure + location + clusters reliably separate regimes. The purpose is diagnostic: “does the feature set actually separate outcomes cleanly?”

Logit:
Pr(Di=1) = σ( β0 + β1sqfti + β2bedsi + β3bathsi + β4storiesi + β5single_familyi + β6dist_pchi + γ·ClusterIDi + i.month + i.year )

Results (Logit)

TermCoefStd. Errtp95% CISig
sqft0.0020.0007.610.000[0.001, 0.002]***
beds-0.2880.251-1.140.253[-0.780, 0.205]
baths0.9270.2823.290.001[0.375, 1.479]***
stories-1.2950.361-3.580.000[-2.003, -0.587]***
single_family1.6121.0711.510.132[-0.487, 3.711]
dist_pch61.78820.0443.080.002[22.502, 101.073]***
ClusterID=21.0300.4322.380.017[0.183, 1.878]**
ClusterID=3-0.6110.500-1.220.222[-1.590, 0.369]
Constant-10.7731.817-5.930.000[-14.334, -7.212]***

Interpretation (tight):
(1) Size and bathrooms drive regime separation strongly. (2) dist_pch is economically meaningful in classification, not just in continuous pricing. (3) Clusters matter, consistent with submarket structure that is not fully reducible to raw coordinates.

Classification diagnostics

MetricValueNotes
Correctly classified90.48%Threshold: Pr(D) ≥ 0.5
Sensitivity74.84%Pr(+ | D)
Specificity95.70%Pr(- | ~D)
PPV85.29%Pr(D | +)
NPV91.94%Pr(~D | -)

Diagnostics (Residual Structure)

Residual diagnostics are used to sanity-check fit, tail behavior, and whether structure-only modeling leaves spatial patterns behind.

Residual scatter + histogram
Figure
Residual scatter and histogram
Used to check heteroskedasticity patterns and distributional shape (tails/skew).

Limitations and next steps

  • Tail heaviness: The price distribution is extremely right-skewed, so robust checks matter.
  • Micro-market stability: Cluster definitions can drift if the sample window expands or boundaries shift.
  • Production upgrade path: Add richer geospatial features, nonlinear structure terms, and explicit spatial weight sensitivity tests.