Mukund Pandey mukund1985

Mukund Pandey

Staff ML Engineer · Meta

Building agentic AI systems, LLM eval infrastructure, and XAI pipelines at billion-user scale. Current focus: making AI systems that can explain themselves, fail safely, and be trusted in production.

What I work on

Agentic AI — multi-step agent architectures, tool-use, planning, and safety guardrails at scale
LLM Eval Infrastructure — consistency, hallucination detection, factual grounding, response drift
LLM Inference Infrastructure — high-throughput model serving, torch.compile optimisation, KV cache efficiency, production latency SLAs
MLOps & Observability — drift detection, model monitoring, evaluation pipelines, contributor to evidentlyai/evidently
Explainable AI (XAI) — decision explainability hooks, counterfactual reasoning, causal attribution
Security ML — real-time risk scoring, access intelligence, anomaly detection at billion-event scale

Research

Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework

Identifies 7 failure modes in production agentic AI systems and introduces PAEF (Production Agentic Evaluation Framework) — validated on four controlled experiments. Reference implementation: llm-eval-toolkit.

Open source

My repos

Repo	Description
llm-eval-toolkit	Production-grade framework for evaluating LLM agent outputs — consistency, grounding, hallucination, drift
agentic-safety-patterns	Pattern library for safe agentic systems — circuit breakers, explainability hooks, rollback, audit logging
retrieval-ranking-eval	Dense retrieval + cross-encoder reranking pipeline benchmarked on BEIR datasets — NDCG@K, Recall@K, MRR
QuantumAI-IntradayRiskDemo	Intraday risk pipeline: LSTM volatility forecasting + quantum-inspired QUBO/D-Wave portfolio optimisation

Upstream contributions

Repo	What
TransformerLensOrg/TransformerLens	PR #1434 open — NemotronH hybrid Mamba2-Transformer adapter (nvidia/Nemotron-H-8B/47B); SSMBlockBridge + optional Mamba submodules handle 4 heterogeneous layer types; 52 unit tests
TransformerLensOrg/TransformerLens	PR #1408 merged ✅ — DeepSeek-V2 / V2-Lite / Coder-V2 architecture adapter; handles complex-exponential RoPE, optional Q LoRA path (V2-Lite), and unhookable gate; 17 integration tests
TransformerLensOrg/TransformerLens	PR #1396 merged ✅ — direct path patching implementation (`get_act_patch_direct_path`), closes issue #111 opened by Neel Nanda in 2022; supports HookedTransformer and TransformerBridge, full test suite
TransformerLensOrg/TransformerLens	PR #1399 merged ✅ — adapter unit tests for Phi-3 and Granite/GraniteMoe (55 tests), part of issue #1302
evidentlyai/evidently	PR #1863 open — ROUGE score descriptor (rouge1/2/L, F/P/R variants, NaN-safe; 737 lines, 31 tests); closes issue #1318
evidentlyai/evidently	PR #1862 open — fixes silent data corruption in KL-divergence drift scoring; replaces hardcoded fill value with data-relative min_nonzero/10
huggingface/trl	PR #6120 open — adds `save_value_model` flag to PPOConfig; persists critic checkpoint alongside policy, making PPO runs resumable
huggingface/trl	PR #6121 open — fixes mathematically-impossible negative entropy in PPO trainer; masks INVALID_LOGPROB padding tokens
huggingface/trl	PR #6122 open — fixes OnlineDPOTrainer crash on eval_strategy=steps; adds prediction_step override
huggingface/trl	PR #6123 open — implements Adaptive Beta-DPO (arXiv:2407.08639); per-batch β scaling via preference margin EMA
huggingface/swift-transformers	PR #370 open — fixes offline-mode crash; path canonicalization bug with `..` components in downloadBase
huggingface/swift-transformers	PR #371 open — fixes fatal crash on Task cancellation; replaces `try!` with `async throws` across CoreML generation call chain
huggingface/swift-transformers	PR #372 open — adds `encodeWithOffsets()` returning character-span offsets per token, matching Python `return_offsets_mapping=True`
vllm-project/vllm	PR #46068 open — reject invalid negative values for `max_logprobs` and `long_prefill_token_threshold` via Pydantic field validators; closes issue #43985
vllm-project/vllm	PR #41381 open — torch.compile config hash typing cleanups + cache_key_factors debug expansion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mukund Pandey mukund1985

Achievements

Achievements

Block or report mukund1985

Mukund Pandey

What I work on

Research

Open source

Stack

Find me

GitHub Stats

Pinned Loading

Uh oh!