Staff ML Engineer · Meta
Building agentic AI systems, LLM eval infrastructure, and XAI pipelines at billion-user scale. Current focus: making AI systems that can explain themselves, fail safely, and be trusted in production.
- Agentic AI — multi-step agent architectures, tool-use, planning, and safety guardrails at scale
- LLM Eval Infrastructure — consistency, hallucination detection, factual grounding, response drift
- LLM Inference Infrastructure — high-throughput model serving, torch.compile optimisation, KV cache efficiency, production latency SLAs
- MLOps & Observability — drift detection, model monitoring, evaluation pipelines, contributor to evidentlyai/evidently
- Explainable AI (XAI) — decision explainability hooks, counterfactual reasoning, causal attribution
- Security ML — real-time risk scoring, access intelligence, anomaly detection at billion-event scale
Identifies 7 failure modes in production agentic AI systems and introduces PAEF (Production Agentic Evaluation Framework) — validated on four controlled experiments. Reference implementation: llm-eval-toolkit.
My repos
| Repo | Description |
|---|---|
| llm-eval-toolkit | Production-grade framework for evaluating LLM agent outputs — consistency, grounding, hallucination, drift |
| agentic-safety-patterns | Pattern library for safe agentic systems — circuit breakers, explainability hooks, rollback, audit logging |
| retrieval-ranking-eval | Dense retrieval + cross-encoder reranking pipeline benchmarked on BEIR datasets — NDCG@K, Recall@K, MRR |
| QuantumAI-IntradayRiskDemo | Intraday risk pipeline: LSTM volatility forecasting + quantum-inspired QUBO/D-Wave portfolio optimisation |
Upstream contributions
| Repo | What |
|---|---|
| TransformerLensOrg/TransformerLens | PR #1434 open — NemotronH hybrid Mamba2-Transformer adapter (nvidia/Nemotron-H-8B/47B); SSMBlockBridge + optional Mamba submodules handle 4 heterogeneous layer types; 52 unit tests |
| TransformerLensOrg/TransformerLens | PR #1408 merged ✅ — DeepSeek-V2 / V2-Lite / Coder-V2 architecture adapter; handles complex-exponential RoPE, optional Q LoRA path (V2-Lite), and unhookable gate; 17 integration tests |
| TransformerLensOrg/TransformerLens | PR #1396 merged ✅ — direct path patching implementation (get_act_patch_direct_path), closes issue #111 opened by Neel Nanda in 2022; supports HookedTransformer and TransformerBridge, full test suite |
| TransformerLensOrg/TransformerLens | PR #1399 merged ✅ — adapter unit tests for Phi-3 and Granite/GraniteMoe (55 tests), part of issue #1302 |
| evidentlyai/evidently | PR #1863 open — ROUGE score descriptor (rouge1/2/L, F/P/R variants, NaN-safe; 737 lines, 31 tests); closes issue #1318 |
| evidentlyai/evidently | PR #1862 open — fixes silent data corruption in KL-divergence drift scoring; replaces hardcoded fill value with data-relative min_nonzero/10 |
| huggingface/trl | PR #6120 open — adds save_value_model flag to PPOConfig; persists critic checkpoint alongside policy, making PPO runs resumable |
| huggingface/trl | PR #6121 open — fixes mathematically-impossible negative entropy in PPO trainer; masks INVALID_LOGPROB padding tokens |
| huggingface/trl | PR #6122 open — fixes OnlineDPOTrainer crash on eval_strategy=steps; adds prediction_step override |
| huggingface/trl | PR #6123 open — implements Adaptive Beta-DPO (arXiv:2407.08639); per-batch β scaling via preference margin EMA |
| huggingface/swift-transformers | PR #370 open — fixes offline-mode crash; path canonicalization bug with .. components in downloadBase |
| huggingface/swift-transformers | PR #371 open — fixes fatal crash on Task cancellation; replaces try! with async throws across CoreML generation call chain |
| huggingface/swift-transformers | PR #372 open — adds encodeWithOffsets() returning character-span offsets per token, matching Python return_offsets_mapping=True |
| vllm-project/vllm | PR #46068 open — reject invalid negative values for max_logprobs and long_prefill_token_threshold via Pydantic field validators; closes issue #43985 |
| vllm-project/vllm | PR #41381 open — torch.compile config hash typing cleanups + cache_key_factors debug expansion |

