PhD candidate in the Department of Computing at The Hong Kong Polytechnic University, advised by Prof. Wenjie Li (Maggie) and Prof. Wei Zhang.
I want to make the environment trainable, the way models are, and with it to lift the ceiling of what AI can become. Today the environment is not even a single thing: a reward model here, a verifier there, a curriculum somewhere else, each built and judged on its own. My work begins with measurement: what does each piece actually contribute to the model it trains.
More at battam1111.github.io.
-
Exact Is Easier: Credit Assignment for Cooperative LLM Agents (in submission, arXiv:2603.06859) One shared outcome hides each decision's share. The transcript makes every decision replayable, so per-decision credit is measured exactly instead of estimated: a learning algorithm that outperforms every approximate multi-agent RL alternative, plus a method-agnostic audit of credit quality.
-
The Accuracy Paradox in RLHF (EMNLP 2024) A reward model's benchmark accuracy fails to predict the policy it trains: varying only accuracy yields an interior optimum, with the real signal in the training dynamics.
-
battam1111.github.io Source for my homepage. al-folio + custom SCSS, trilingual EN/中/日.
- Email: yan-jun.chen@connect.polyu.hk
- Google Scholar · ORCID 0009-0001-9065-9137
- Hong Kong

