Applied Mathematics / Machine Learning PhD applicant. Incoming
Columbia M.S. student in Applied Mathematics, Fall 2026; Emory B.S. in
Computer Science and Mathematics expected May 2026. Research areas:
mathematical ML, scientific computing, optimization, statistical
learning, AI for science, and reliable AI systems.
The PDF remains the complete source document. This page renders
the same CV-aligned material as a readable web profile for
research conversations, graduate preparation, and technical
review.
Education
Columbia University
M.S. in Applied Mathematics, New York, NY. Incoming Fall 2026.
Honors and activities: QuestBridge National Match Scholar,
Dean's List, Undergraduate Research Distinction, HackDuke
Finalist, Selective Technical Leadership Fellow, Varsity Rowing.
Research Interests
Mathematical machine learning: statistical learning theory, calibration, conformal prediction, robustness, model selection, representation learning, graph learning, distribution shift, and benchmark validity.
Optimization and numerical computation: stochastic approximation, adaptive first-order methods, mirror descent, momentum, preconditioning, constrained optimization, inverse problems, and PDE-constrained learning.
Scientific computing and AI for science: numerical linear algebra, Krylov methods, state-space inference, differentiable simulation, surrogate modeling, uncertainty quantification, active experimentation, and structure-aware neural systems.
Reliable AI systems: LLM evaluation, tool-use and retrieval benchmarks, agent reliability, artifact lineage, dataset/version control, inference orchestration, auditability, observability, and failure-mode analysis.
Research and Engineering Experience
Arthur AI
Machine Learning Research Engineering Intern | New York, NY | May 2025 - Aug. 2025
Built an experiment-control layer for LLM evaluation, representing each run as a typed object over dataset snapshot, prompt graph, model artifact, tokenizer revision, sampling policy, inference backend, judge rubric, scorer, postprocessor, aggregation rule, and environment hash.
Scaled benchmark execution to 52,000+ controlled evaluation jobs per month across reasoning, retrieval, tool use, long-context, safety, robustness, multilingual, code, and domain-specific suites while preserving artifact-level reproducibility.
Raised exact rerun reproducibility from 71.4% to 99.98% by introducing deterministic manifests, immutable dataset hashes, container-pinned evaluator images, seeded decoding controls, schema-versioned prompts, and provenance-preserving result tables.
Designed asynchronous inference scheduling over heterogeneous model endpoints with adaptive batching, token-budget admission control, retry semantics, circuit breakers, partial-failure isolation, and cost-aware queue priorities; reduced median experiment turnaround from 9.6 hours to 54 minutes.
Built judge reliability diagnostics for LLM-as-judge workflows using gold preference sets, rubric-level variance decomposition, inter-judge agreement, prompt sensitivity, disagreement clustering, and calibration curves; improved agreement on audited tasks from kappa = 0.43 to 0.76.
Identified benchmark contamination from prompt-template leakage and near-duplicate examples that inflated a reasoning suite by 5.8 absolute points; added semantic deduplication checks, leakage alarms, and red-team fixtures.
Fullstory
Software Engineering Intern | Atlanta, GA | May 2024 - Aug. 2024
Improved alert precision from 61% to 97%, reduced false-positive pages by 82%, and lowered median time-to-detection from 3.7 hours to 6 minutes through adaptive thresholds and ownership-aware routing.
Designed replay validation that re-executed historical transformations under pinned inputs and compared outputs against current logic to catch non-idempotent code paths, hidden dependencies, and accidental semantic changes.
Built anomaly attribution tools linking metric shifts to deployment diffs, schema changes, query rewrites, upstream delays, column distributions, ingestion partitions, and ownership metadata; reduced median debugging time from 2.8 hours to 19 minutes.
Algory Capital, Emory University
Head of Research | Atlanta, GA | Sept. 2023 - Present
Led research infrastructure and analyst development for a 30+ member quantitative research organization, establishing a research operating system for hypothesis formation, falsification, peer review, replication, and model-risk documentation.
Architected a point-in-time research platform spanning ingestion, universe construction, feature engineering, cross-sectional factor testing, walk-forward modeling, portfolio construction, transaction-cost simulation, exposure control, and attribution.
Increased reproducible research throughput by 8.4x by replacing notebook-only workflows with versioned experiment templates, canonical data loaders, sealed train/test windows, automated tearsheets, invariant checks, and standardized review rubrics.
Implemented leakage-aware evaluation with purged and embargoed time splits, universe-retention accounting, survivorship-bias checks, delayed feature availability, benchmark alignment, turnover constraints, slippage assumptions, and factor-decay analysis.
Built portfolio-optimization modules for mean-variance, risk parity, rank-weighted, volatility-targeted, and drawdown-aware portfolios with sector/industry exposure constraints, covariance shrinkage, and transaction-cost-aware rebalancing.
Added statistical validation layers for multiple testing, White-style reality checks, deflated Sharpe diagnostics, false discovery control, unstable correlation warnings, regime-sensitivity analysis, and backtest-overfitting alarms; produced 18 reviewed research memos and 11 reproducible project repositories.
Georgia Tech / Emory University
Research Assistant | Atlanta, GA | Sept. 2022 - Jan. 2024
Developed biomedical ML pipelines supporting 1,200+ controlled experimental runs across cohort definitions, preprocessing variants, imaging features, clinical covariates, fusion strategies, calibration procedures, subgroup analyses, and decision thresholds.
Rebuilt cohort construction around patient-level independence, temporal separation, site-aware validation, feature availability, and label-proxy audits, eliminating patient overlap, temporal contamination, and hidden leakage in retrospective splits.
Improved clinically meaningful held-out AUROC from 0.69 to 0.86 and expected calibration error from 0.18 to 0.045 after leakage removal, standardized preprocessing, model-family sweeps, and systematic ablations across imaging, clinical, and fused representations.
Implemented evaluation reports with bootstrap confidence intervals, DeLong-style AUROC comparisons, AUPRC, sensitivity/specificity at policy thresholds, decision-curve analysis, subgroup calibration, missingness analysis, and error taxonomy.
Designed ablation studies to distinguish genuine predictive signal from scanner/site artifacts, missingness shortcuts, preprocessing artifacts, distributional confounding, and label-derived proxies; built reproducibility infrastructure for exact reruns of prior model claims.
Selected Projects
RE-AMP: Generative Audio Robustness Evaluation: full-stack benchmark platform for generative audio models; supported 18,000+ controlled robustness runs across 82 perturbation operators and 9 evaluator families with deterministic manifests over prompts, seeds, checkpoints, transforms, metric configs, and environment hashes; reduced experiment setup time by 91%.
Graph-Based Traffic Collision Risk Modeling: graph ML pipeline over 1.1M+ collision records and road-network topology; compared GCN, GraphSAGE, GATv2, temporal aggregation, XGBoost, kernel, tabular, and geospatial baselines; improved hotspot-ranking AUROC by 23.4 points and top-decile recall by 31%.
Robust Kalman Filtering for Vehicle Localization: modular state-space simulation environment comparing standard Kalman filtering with robust variants under sensor dropout, delayed observations, outlier regimes, correlated process noise, and measurement misspecification; reduced median localization error by 49% and 95th-percentile error by 44% while preserving a 100 Hz C++ loop.
Quant Research Lab: public-facing factor research and backtesting demo with point-in-time ingestion, signal construction, portfolio formation, evaluation, transaction-cost modeling, benchmark comparison, and attribution; generated reproducible tearsheets for rank ICs, factor returns, turnover, drawdown, exposure decomposition, covariance sensitivity, deflated Sharpe, transaction-cost stress tests, and regime breakdowns.
Technical Notes and Presentations
Technical notes: stochastic optimization; numerical linear algebra; graph learning; mirror descent and momentum; diffusion models and ML foundations; Gaussian processes with derivative matching; robust Kalman filtering; foundations of stochastic control.
Selected presentations: diffusion models; derivative-matching Gaussian processes; robust Kalman filtering; stochastic control; SGD, momentum, and Adam; mirror descent; graph neural networks for risk modeling; benchmark design for LLM evaluation; calibration and uncertainty for biomedical ML.
Research practice: writes LaTeX notes emphasizing derivation, geometric intuition, computational experiments, and failure cases; standardizes dataset snapshots, preprocessing manifests, seed control, environment capture, artifact hashes, statistical uncertainty, and falsification logs.