battam Battam1111

Yanjun Chen

PhD candidate in the Department of Computing at The Hong Kong Polytechnic University, advised by Prof. Wenjie Li (Maggie) and Prof. Wei Zhang.

I want to make the environment trainable, the way models are, and with it to lift the ceiling of what AI can become. Today the environment is not even a single thing: a reward model here, a verifier there, a curriculum somewhere else, each built and judged on its own. My work begins with measurement: what does each piece actually contribute to the model it trains.

More at battam1111.github.io.

Selected work

Exact Is Easier: Credit Assignment for Cooperative LLM Agents (in submission, arXiv:2603.06859) One shared outcome hides each decision's share. The transcript makes every decision replayable, so per-decision credit is measured exactly instead of estimated: a learning algorithm that outperforms every approximate multi-agent RL alternative, plus a method-agnostic audit of credit quality.
The Accuracy Paradox in RLHF (EMNLP 2024) A reward model's benchmark accuracy fails to predict the policy it trains: varying only accuracy yields an interior optimum, with the real signal in the training dynamics.
battam1111.github.io Source for my homepage. al-folio + custom SCSS, trilingual EN/中/日.

Find me

Email: yan-jun.chen@connect.polyu.hk
Google Scholar · ORCID 0009-0001-9065-9137
Hong Kong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

battam Battam1111

Achievements

Achievements

Organizations

Block or report Battam1111

Yanjun Chen

Selected work

Find me

Pinned Loading

Uh oh!