Projects

Computational research projects for rigorous evaluation.

The through line across these projects is experimental discipline: controlled perturbations, leakage-aware validation, state-space simulation, benchmark design, artifact lineage, model comparison, and inspectable failure analysis.

Generative audio robustness evaluation

RE-AMP: Generative Audio Robustness Evaluation

A public benchmarking system for controlled robustness evaluation of generated audio.

  • 18,000+ controlled runs
  • 82 perturbation operators
  • 9 evaluator families
  • 91% setup reduction
Problem
Generative audio robustness is difficult to compare when prompts, seeds, model checkpoints, transforms, metric configs, and execution environments drift between runs.
Method
Ran controlled perturbation studies across compression artifacts, prompt variation, acoustic transformations, stochastic decoding differences, and model-version changes.
System / stack
Python, PyTorch, torchaudio, FastAPI, React, TypeScript, Redis, PostgreSQL, Docker, async workers, statistical dashboards, and deterministic experiment manifests.
Validation methodology
Deterministic manifests covered prompts, seeds, checkpoints, transforms, metric configs, and environment hashes; results were compared through signal-level, perceptual, and model-based metric families with bootstrap uncertainty and failure-slice analysis.
Results
Supported 18,000+ controlled robustness runs across 82 perturbation operators and 9 evaluator families while reducing experiment setup time by 91%.
Failure modes / reliability checks
Tracked ranking instability, loudness drift, clipping artifacts, compression sensitivity, spectral convergence, embedding similarity, pitch/chroma stability, and FAD-style distribution distance.
Why it matters for research
The platform turns subjective generative-audio comparison into reproducible empirical measurement with explicit assumptions and inspectable artifacts.

Graph learning and spatial-temporal risk

Graph-Based Traffic Collision Risk Modeling

A leakage-aware graph learning study over traffic collision records and road-network structure.

  • 1.1M+ collision records
  • road-network topology
  • +23.4 AUROC points
  • +31% top-decile recall
Problem
Collision risk is spatially structured, temporally shifting, and easy to overstate if validation leaks future or nearby geographic information.
Method
Formulated risk as heterogeneous spatiotemporal graph prediction over intersections, road segments, and neighborhoods with spatial, temporal, traffic, structural, weather, and historical-risk covariates.
System / stack
Python, PyTorch Geometric, NetworkX, GeoPandas, road-network topology, SQL, XGBoost, spatial statistics, calibration tooling, and graph-learning pipelines.
Validation methodology
Compared GCN, GraphSAGE, GATv2, temporal aggregation, XGBoost, kernel, tabular, and geospatial baselines under temporal cutoffs, geographic buffer zones, held-out corridors, future-information audits, and spatial autocorrelation diagnostics.
Results
Modeled 1.1M+ collision records and road-network topology, improving hotspot-ranking AUROC by 23.4 points and top-decile recall by 31% over tabular baselines.
Failure modes / reliability checks
Added calibrated risk maps, ablations, counterfactual edge removal, conformal-style risk sets, reliability curves, ranking-stability checks, and leakage audits.
Why it matters for research
The project tests how graph structure changes prediction quality under realistic spatial and temporal validation constraints.

Robust state-space estimation

Robust Kalman Filtering for Vehicle Localization

A simulation and filtering project for robust recursive vehicle localization.

  • state-space simulation
  • 100 Hz C++ loop
  • 49% median-error reduction
  • 44% tail-error reduction
Problem
Standard Kalman filters can become brittle under sensor dropout, delayed observations, outlier regimes, correlated process noise, and measurement misspecification.
Method
Compared standard Kalman filtering with robust variants using innovation chi-square gating, covariance inflation, adaptive noise estimation, Huberized updates, Student-t observation models, residual clipping, and outlier rejection.
System / stack
Python, NumPy, SciPy, MATLAB, C++, stochastic simulation, state-space models, recursive Bayesian estimation, robust statistics, and a 100 Hz real-time C++ loop.
Validation methodology
Simulated configurable dynamics, observation models, sensor dropout, delayed observations, outliers, correlated process noise, misspecified measurements, and asynchronous update schedules.
Results
Reduced median localization error by 49% and 95th-percentile error by 44% relative to a standard Kalman filter under mixed dropout/outlier regimes.
Failure modes / reliability checks
Analyzed breakdown under covariance misspecification, high-leverage measurements, biased sensors, delayed observations, loss of observability, and ill-conditioned covariance updates.
Why it matters for research
The project makes state-space reliability concrete by connecting derivation, simulation, numerical conditioning, and real-time execution constraints.

Quantitative research systems

Quant Research Lab

A reproducible backtesting and factor-analysis environment for empirical finance research.

  • point-in-time ingestion
  • factor library
  • transaction-cost modeling
  • deflated Sharpe diagnostics
Problem
Factor research can look convincing while hiding lookahead variables, survivorship effects, future price references, unstable universes, and transaction-cost fragility.
Method
Built a public-facing factor research and backtesting demo covering point-in-time ingestion, signal construction, portfolio formation, evaluation, transaction-cost modeling, benchmark comparison, and attribution.
System / stack
Python, FastAPI, pandas, NumPy, SciPy, cvxpy, SQL, Docker, GitHub Actions, Streamlit, experiment manifests, and reproducible backtesting workflows.
Validation methodology
Generated tearsheets covering rank ICs, factor returns, turnover, drawdown, exposure decomposition, covariance sensitivity, deflated Sharpe diagnostics, transaction-cost stress tests, and regime breakdowns.
Results
Implemented a factor library for momentum, value, quality, volatility, liquidity, seasonality, residualization, and cross-sectional normalization with explicit assumptions and availability windows.
Failure modes / reliability checks
Each chart links to assumptions, data windows, transformations, validation constraints, code commits, and falsification checks.
Why it matters for research
The demo makes empirical finance research auditable by tying results to point-in-time assumptions, reproducible artifacts, and model-risk checks.

Related Research Systems

These pages preserve deeper writeups for work that is best summarized at a public-safe level because of workplace, data, or research constraints.

LLM evaluation

LLM Evaluation and Model Analysis Infrastructure

Experiment-control layer for 52,000+ controlled evaluation jobs/month with typed run lineage, paired statistical comparisons, judge reliability diagnostics, and contamination detection.

Open case study

Analytics observability

Analytics Observability Systems

Validation infrastructure for 6.4B+ weekly product events with statistical drift checks, replay validation, lineage-aware attribution, and 97% alert precision.

Open case study

Quant research infrastructure

Quant Research Infrastructure

Point-in-time research platform for a 30+ member organization, increasing reproducible throughput by 8.4x with leakage-aware evaluation and model-risk checks.

Open case study

Biomedical ML

Biomedical Multimodal ML Research

Biomedical ML pipelines supporting 1,200+ controlled runs, improving held-out AUROC from 0.69 to 0.86 and ECE from 0.18 to 0.045 after leakage removal and ablations.

Open case study