← Back to Projects
Project Report · Spatial Econometrics

Spatial Pricing Dynamics in Coastal Real Estate

Separating structure, micro-location premia, and spillover effects using clustering, IV, spatial autoregression, and a logit classifier to stress-test drivers of pricing beyond standard hedonic controls.

Download Paper (PDF) GitHub Repository arXiv
Universe
620 residential sales (single-family + attached)
Location
Coastal submarket (zip 92624–92629)
Core Variables
ln(price), ln(sqft), beds, baths, stories, type dummies
Spatial
dist to PCH + lat/long + cluster segmentation
Models
IV / GS2SLS (spatial), Logit classification
Goal
Quantify structural + spatial dependence effects

Key Numbers

620
Observations
Property transactions in study area.
$380K–$32M
Price Range
Heavy right tail in distribution.
0.753
GS2SLS Pseudo R²
Spatial autoregressive model fit.
90.48%
Logit Accuracy
Classification at Pr(D) ≥ 0.5 threshold.

Metrics are reported for model comparison and diagnostic clarity, not as a "forecast product" claim.

Executive Summary

Spatial Structure (Selected Visuals)

Transaction Map (Spatial Density) Figure 1
Map of transaction locations

Used to validate spatial clustering and motivate spatial dependence correction.

Geographic Clustering (Overview) Figure 2
Housing observations mapped across the study area

Spatial distribution of observations used to motivate spatial controls and segmentation.

K-Means Segmentation (Micro-Markets) Figure 3
K-means clusters by latitude and longitude

Cluster IDs are used as segmentation controls and for robustness checks.

Data Description (Descriptive Statistics)

Three control blocks are used: main structure controls, time controls, and spatial controls.

Main Controls

VariableObsMeanStd. Dev.MinMax
house id620310.5179.1231620
price6202,875,4873,175,359380,00032,000,000
sqft6202,554.191,391.4279913,777
lot sqft6208,877.5521,439.871,000304,920
beds6203.5560.91629
baths6202.5851.151110
stories6201.7310.59113
parking6202.250.68918
single family6200.9050.29401
condo6200.0440.20401
townhomes6200.0350.18501
duplex triplex6200.0160.12601
zipcode62092627.742.1719262492629
year built6201980.7816.46519282023
house by year620627,925.91362,230.012,0221,254,260

Time Controls

VariableObsMeanStd. Dev.MinMax
time620753.72110.194739772
year=20216200.2320.42301
year=20226200.3180.46601
year=20236200.3350.47301
year=20246200.1150.31901
month dummies (1–12)Included as indicator controls (means range ~0.053–0.113).

Spatial Controls

VariableObsMeanStd. Dev.MinMax
dist_pch6200.0250.0110.0040.055
latitude62033.4730.01233.44533.495
longitude620-117.6930.022-117.732-117.648
kmeans cluster6202.0870.76413
std kmeans cluster6201.9790.72413
int complete cluster61963.9969.1441244
int ward cluster619135.008123.4541393

Model 1: Spatial Autoregressive (GS2SLS / IV)

The goal is to estimate structural and location effects while correcting for spatial dependence and endogeneity. This specification is treated as the primary "pricing" model.

Structural + Spatial (GS2SLS):
lnpricei = β0 + β1lnsqfti + β2bedsi + β3bathsi + β4lndist_pchi + β5storiesi + δ·Typei + i.month + i.year + εi

GS2SLS Estimates

TermCoefStd. ErrzP>z95% CI
lnsqft1.0520.1746.0600.000[0.712, 1.392]
beds-0.0430.029-1.4900.136[-0.101, 0.014]
baths0.0910.0332.8000.005[0.027, 0.155]
lndist_pch0.4240.03910.7800.000[0.347, 0.501]
stories-0.1340.036-3.6800.000[-0.205, -0.062]
single_family0.1120.0502.2200.026[0.013, 0.211]
year=20220.1540.0423.6300.000[0.071, 0.237]
year=20230.1990.0404.9200.000[0.120, 0.278]
year=20240.2350.0593.9600.000[0.119, 0.352]
std_kmeans_cluster=2-0.3370.036-9.4400.000[-0.407, -0.267]
std_kmeans_cluster=3-0.2770.036-7.6600.000[-0.348, -0.206]
Constant8.0791.2256.5900.000[5.678, 10.480]

(1) Size dominates (lnsqft ≈ 1.05), consistent with multiplicative scaling of price with interior area. (2) Location premium is strong (lndist_pch positive and highly significant), especially clean under spatial correction. (3) Cluster effects are economically large, consistent with micro-market segmentation not captured by basic covariates.

Model 2: Logit Classifier (Price Regime / Tail Flag)

This model treats a price-state indicator as the target and tests whether structure + location + clusters reliably separate regimes. The purpose is diagnostic: "does the feature set actually separate outcomes cleanly?"

Logit:
Pr(Di=1) = σ( β0 + β1sqfti + β2bedsi + β3bathsi + β4storiesi + β5single_familyi + β6dist_pchi + γ·ClusterIDi + i.month + i.year )

Logit Estimates

TermCoefStd. Errtp95% CISig
sqft0.0020.0007.610.000[0.001, 0.002]***
beds-0.2880.251-1.140.253[-0.780, 0.205]
baths0.9270.2823.290.001[0.375, 1.479]***
stories-1.2950.361-3.580.000[-2.003, -0.587]***
single_family1.6121.0711.510.132[-0.487, 3.711]
dist_pch61.78820.0443.080.002[22.502, 101.073]***
ClusterID=21.0300.4322.380.017[0.183, 1.878]**
ClusterID=3-0.6110.500-1.220.222[-1.590, 0.369]
Constant-10.7731.817-5.930.000[-14.334, -7.212]***

(1) Size and bathrooms drive regime separation strongly. (2) dist_pch is economically meaningful in classification, not just in continuous pricing. (3) Clusters matter, consistent with submarket structure not fully reducible to raw coordinates.

Classification Diagnostics

MetricValueNotes
Correctly classified90.48%Threshold: Pr(D) ≥ 0.5
Sensitivity74.84%Pr(+ | D)
Specificity95.70%Pr(- | ~D)
PPV85.29%Pr(D | +)
NPV91.94%Pr(~D | -)

Diagnostics (Residual Structure)

Residual diagnostics are used to sanity-check fit, tail behavior, and whether structure-only modeling leaves spatial patterns behind.

Residual Scatter + Histogram Figure 4
Residual scatter and histogram

Used to check heteroskedasticity patterns and distributional shape (tails/skew).

Limitations & Next Steps