Academic CV

Fatim Majumder

Applied Mathematics / Machine Learning PhD applicant. Incoming Columbia M.S. student in Applied Mathematics, Fall 2026; Emory B.S. in Computer Science and Mathematics expected May 2026. Research areas: mathematical ML, scientific computing, optimization, statistical learning, AI for science, and reliable AI systems.

Web CV

A concise academic view of the full CV.

The PDF remains the complete source document. This page renders the same CV-aligned material as a readable web profile for research conversations, graduate preparation, and technical review.

Education

Columbia University

M.S. in Applied Mathematics, New York, NY. Incoming Fall 2026.

Planned focus: numerical analysis, scientific computing, stochastic processes, optimization, inverse problems, probabilistic modeling, statistical inference, and mathematical machine learning.

Emory University

B.S. in Computer Science and Mathematics, Atlanta, GA. Expected May 2026.

GPA: 3.98/4.00 cumulative; 4.00 mathematics; 4.00 computer science.

Analysis

  • Real Analysis I-II
  • Multivariable Calculus
  • Differential Equations
  • Partial Differential Equations

Algebra

  • Abstract Algebra I-II
  • Linear Algebra
  • Foundations of Mathematics
  • Combinatorics

Numerical / Scientific Computing

  • Numerical Analysis
  • Numerical Linear Algebra
  • Iterative Methods for Linear Systems

Probability / Statistics

  • Probability
  • Stochastic Processes
  • Mathematical Statistics I-II
  • Statistical Learning

Optimization

  • Numerical Optimization
  • Convex Optimization

CS / ML / AI

  • Algorithms
  • Theory of Computing
  • Machine Learning
  • Artificial Intelligence
  • Deep Learning on Graphs
  • Data Mining
  • AI for Science Reading Seminar

Honors and activities: QuestBridge National Match Scholar, Dean's List, Undergraduate Research Distinction, HackDuke Finalist, Selective Technical Leadership Fellow, Varsity Rowing.

Research Interests

Research and Engineering Experience

Arthur AI

Machine Learning Research Engineering Intern | New York, NY | May 2025 - Aug. 2025

Technical stack: Python, PyTorch, JAX, Ray, FastAPI, PostgreSQL, Redis Streams, Docker, Kubernetes, AWS, OpenTelemetry, NumPy, pandas, scikit-learn.

  • Built an experiment-control layer for LLM evaluation, representing each run as a typed object over dataset snapshot, prompt graph, model artifact, tokenizer revision, sampling policy, inference backend, judge rubric, scorer, postprocessor, aggregation rule, and environment hash.
  • Scaled benchmark execution to 52,000+ controlled evaluation jobs per month across reasoning, retrieval, tool use, long-context, safety, robustness, multilingual, code, and domain-specific suites while preserving artifact-level reproducibility.
  • Raised exact rerun reproducibility from 71.4% to 99.98% by introducing deterministic manifests, immutable dataset hashes, container-pinned evaluator images, seeded decoding controls, schema-versioned prompts, and provenance-preserving result tables.
  • Designed asynchronous inference scheduling over heterogeneous model endpoints with adaptive batching, token-budget admission control, retry semantics, circuit breakers, partial-failure isolation, and cost-aware queue priorities; reduced median experiment turnaround from 9.6 hours to 54 minutes.
  • Implemented statistically defensible model-comparison tooling: paired bootstrap intervals, stratified randomization tests, multiple-comparison correction, per-slice uncertainty bands, effect-size reporting, regression severity tiers, and run-diff reports.
  • Built judge reliability diagnostics for LLM-as-judge workflows using gold preference sets, rubric-level variance decomposition, inter-judge agreement, prompt sensitivity, disagreement clustering, and calibration curves; improved agreement on audited tasks from kappa = 0.43 to 0.76.
  • Identified benchmark contamination from prompt-template leakage and near-duplicate examples that inflated a reasoning suite by 5.8 absolute points; added semantic deduplication checks, leakage alarms, and red-team fixtures.

Fullstory

Software Engineering Intern | Atlanta, GA | May 2024 - Aug. 2024

Technical stack: Python, SQL, Kafka, Airflow, dbt, PostgreSQL, analytical SQL, Docker, GitHub Actions, Grafana, OpenTelemetry.

  • Built validation infrastructure for analytics pipelines processing 6.4B+ weekly product events, covering schema evolution, event freshness, transformation replay, metric discontinuities, cardinality explosions, join drift, null spikes, and deploy-correlated anomalies.
  • Reframed data-quality monitoring as online statistical inference over a changing data-generating process, separating seasonality, instrumentation changes, product launches, upstream lag, bot traffic, and genuine pipeline failures.
  • Implemented layered detectors combining robust z-scores, seasonal baselines, Kolmogorov-Smirnov drift tests, population-stability indexes, column-level entropy checks, foreign-key integrity tests, and lineage-neighbor correlation checks.
  • Improved alert precision from 61% to 97%, reduced false-positive pages by 82%, and lowered median time-to-detection from 3.7 hours to 6 minutes through adaptive thresholds and ownership-aware routing.
  • Designed replay validation that re-executed historical transformations under pinned inputs and compared outputs against current logic to catch non-idempotent code paths, hidden dependencies, and accidental semantic changes.
  • Built anomaly attribution tools linking metric shifts to deployment diffs, schema changes, query rewrites, upstream delays, column distributions, ingestion partitions, and ownership metadata; reduced median debugging time from 2.8 hours to 19 minutes.

Algory Capital, Emory University

Head of Research | Atlanta, GA | Sept. 2023 - Present

Technical stack: Python, pandas, NumPy, SciPy, scikit-learn, XGBoost, statsmodels, cvxpy, SQL, PostgreSQL, DuckDB, Streamlit, FastAPI, Docker.

  • Led research infrastructure and analyst development for a 30+ member quantitative research organization, establishing a research operating system for hypothesis formation, falsification, peer review, replication, and model-risk documentation.
  • Architected a point-in-time research platform spanning ingestion, universe construction, feature engineering, cross-sectional factor testing, walk-forward modeling, portfolio construction, transaction-cost simulation, exposure control, and attribution.
  • Increased reproducible research throughput by 8.4x by replacing notebook-only workflows with versioned experiment templates, canonical data loaders, sealed train/test windows, automated tearsheets, invariant checks, and standardized review rubrics.
  • Implemented leakage-aware evaluation with purged and embargoed time splits, universe-retention accounting, survivorship-bias checks, delayed feature availability, benchmark alignment, turnover constraints, slippage assumptions, and factor-decay analysis.
  • Built portfolio-optimization modules for mean-variance, risk parity, rank-weighted, volatility-targeted, and drawdown-aware portfolios with sector/industry exposure constraints, covariance shrinkage, and transaction-cost-aware rebalancing.
  • Added statistical validation layers for multiple testing, White-style reality checks, deflated Sharpe diagnostics, false discovery control, unstable correlation warnings, regime-sensitivity analysis, and backtest-overfitting alarms; produced 18 reviewed research memos and 11 reproducible project repositories.

Georgia Tech / Emory University

Research Assistant | Atlanta, GA | Sept. 2022 - Jan. 2024

Technical stack: Python, PyTorch, scikit-learn, NumPy, pandas, SQL, Streamlit, MATLAB, medical-imaging preprocessing, calibration analysis, nested cross-validation.

  • Developed biomedical ML pipelines supporting 1,200+ controlled experimental runs across cohort definitions, preprocessing variants, imaging features, clinical covariates, fusion strategies, calibration procedures, subgroup analyses, and decision thresholds.
  • Rebuilt cohort construction around patient-level independence, temporal separation, site-aware validation, feature availability, and label-proxy audits, eliminating patient overlap, temporal contamination, and hidden leakage in retrospective splits.
  • Improved clinically meaningful held-out AUROC from 0.69 to 0.86 and expected calibration error from 0.18 to 0.045 after leakage removal, standardized preprocessing, model-family sweeps, and systematic ablations across imaging, clinical, and fused representations.
  • Implemented evaluation reports with bootstrap confidence intervals, DeLong-style AUROC comparisons, AUPRC, sensitivity/specificity at policy thresholds, decision-curve analysis, subgroup calibration, missingness analysis, and error taxonomy.
  • Designed ablation studies to distinguish genuine predictive signal from scanner/site artifacts, missingness shortcuts, preprocessing artifacts, distributional confounding, and label-derived proxies; built reproducibility infrastructure for exact reruns of prior model claims.

Selected Projects

Technical Notes and Presentations

Honors, Awards, and Leadership

Technical Skills

Languages

  • Python
  • C++
  • C
  • SQL
  • TypeScript
  • JavaScript
  • Java
  • Bash
  • MATLAB
  • LaTeX

Machine Learning and AI

  • PyTorch
  • JAX
  • PyTorch Geometric
  • scikit-learn
  • XGBoost
  • statsmodels
  • Ray
  • model evaluation
  • LLM-as-judge calibration
  • graph ML
  • multimodal ML
  • transformer inference
  • retrieval evaluation
  • calibration
  • robustness testing
  • ablation studies
  • uncertainty estimation

Applied Mathematics and Scientific Computing

  • numerical linear algebra
  • Krylov and iterative methods
  • preconditioning
  • convex and numerical optimization
  • stochastic approximation
  • state-space models
  • Kalman filtering
  • Monte Carlo simulation
  • Bayesian inference
  • statistical learning
  • spectral methods
  • graph algorithms

Optimization and Inference

  • gradient methods
  • mirror descent
  • momentum/adaptive methods
  • constrained optimization
  • covariance shrinkage
  • hierarchical comparisons
  • bootstrap inference
  • randomization tests
  • calibration
  • conformal-style uncertainty
  • experimental design
  • model selection

Data and Research Infrastructure

  • PostgreSQL
  • MySQL
  • DuckDB
  • Redis
  • Kafka
  • Airflow
  • dbt
  • ETL/ELT
  • feature validation
  • dataset versioning
  • artifact lineage
  • experiment tracking
  • reproducible replay
  • structured logging
  • lineage graphs

Backend and Systems

  • FastAPI
  • Docker
  • Kubernetes
  • AWS
  • Linux
  • Git
  • GitHub Actions
  • CI/CD
  • distributed queues
  • asynchronous orchestration
  • caching
  • batching
  • observability
  • profiling
  • reliability engineering

Evaluation and Reliability

  • benchmark design
  • metric validation
  • regression detection
  • prompt/dataset lineage
  • statistical comparison
  • confidence intervals
  • run-diff tooling
  • audit logs
  • failure analysis
  • incident triage
  • data-quality testing
  • model-risk documentation

Contact

Email: fatim.majumder@emory.edu. GitHub: github.com/fatimmajumder. LinkedIn: linkedin.com/in/fatim-majumder. Website: fatimmajumder.github.io.