Skip to content

Pull requests: THUDM/slime

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Support partial rollout resume in Search-R1 example
#2128 opened Jun 23, 2026 by OLIVER-XYP Loading…
Reduce entropy logging memory when entropy coef is zero
#2127 opened Jun 23, 2026 by none0663 Contributor Loading…
Add test for megatron server run-ci-changed
#2123 opened Jun 23, 2026 by zhuzilin Contributor Loading…
fix(partial-rollout): cap max_new_tokens by prior response length
#2122 opened Jun 23, 2026 by none0663 Contributor Loading…
fix(ppo): stop corrupting the logged rollout/kl metric
#2114 opened Jun 21, 2026 by EazyReal Contributor Loading…
Fix(rollout): Fail closed on unknown SGLang model names
#2112 opened Jun 21, 2026 by Baiyu-Su Contributor Loading…
fix(train): support eval-only mode (--num-rollout 0)
#2109 opened Jun 20, 2026 by EazyReal Contributor Loading…
feat(examples/strands_sglang): update to strands-sglang 0.4.2
#2106 opened Jun 20, 2026 by Lawhy Contributor Loading…
fix(dist): preserve new_group options across reloadable group reload
#2095 opened Jun 17, 2026 by EazyReal Contributor Loading…
fix(scripts): correct model config source path in FP8 low_precision scripts
#2094 opened Jun 17, 2026 by aoshen02 Contributor Loading…
2 tasks done
feat(loss): add --loss-aggregation for the four ScaleRL pg_loss modes
#2090 opened Jun 16, 2026 by EazyReal Contributor Loading…
Disk-level delta weight sync
#2089 opened Jun 16, 2026 by nanjiangwill Collaborator Loading…
fix(opd): score teacher logprobs at rollout temperature, not 0
#2085 opened Jun 15, 2026 by EazyReal Contributor Loading…
feat(rl): add off-policy IS correction hook (current policy vs rollout)
#2084 opened Jun 15, 2026 by EazyReal Contributor Loading…
feat(rl): add REINFORCE advantage estimator
#2083 opened Jun 15, 2026 by EazyReal Contributor Loading…
feat(coding_agent_rl): add SWE-bench harness evaluation path
#2079 opened Jun 15, 2026 by aoshen02 Contributor Draft
3 tasks
fix(rollout): isolate per-trajectory exceptions in generate_and_rm_group
#2078 opened Jun 15, 2026 by aoshen02 Contributor Loading…
ProTip! Filter pull requests by the default branch with base:main.