[Doc] Memory-efficient RL training tutorial + cross-refs by vmoens · Pull Request #3745 · pytorch/rl

vmoens · 2026-05-12T16:10:41Z

Summary

Ties together the three recently-merged memory-efficiency PRs into a
single story:

compact_obs collector flag ([Performance] Add compact_obs flag to DataCollector #3742)
NextStateReconstructor RB transform ([Feature] NextStateReconstructor RB transform #3743)
NaN-safe value-estimator forward ([BugFix] Sanitize NaN ('next', obs) in value-estimator forward #3744)

Two parts:

1. Runnable Sphinx-gallery tutorial at
tutorials/sphinx-tutorials/memory_efficient_rl.py. Sections:

Where the observation memory goes (concrete td.bytes() numbers)
Why ("next", obs) is kept by default — bootstrap target at
trajectory ends, MultiStepTransform n-step fallback
Knob 1 — SyncDataCollector(compact_obs=True)
Knob 2 — NextStateReconstructor with the traj_id + done contract
Knob 2.5 — value-estimator NaN safety
(_sanitize_next_obs_nan), GAE finite everywhere
When not to take this path — MultiStepTransform incompatibility,
the V(obs[t]) ≈ V(real_next_obs) approximation at truncated steps,
and how shifted=True interacts
Knob 3 — LazyMemmapStorage for buffers ≥ VRAM
Knob 4 — SliceSampler + the new "scan" / "triton"
recurrent backends for padding-free sequence training
End-to-end pipeline snippet
Conclusion + Further reading

Runs end-to-end on CPU (CartPole-v1, 200 frames; <2s wall) and reports
the byte-level savings concretely from td.bytes().

2. Docstring cross-references so a reader landing on any of the
three new APIs finds the other two:

Collector(compact_obs=…) (and the multi-process collectors):
pointers to NextStateReconstructor, the value-estimator
sanitizer, the MultiStepTransform incompatibility note, and the
new tutorial.
NextStateReconstructor: .. seealso:: block covering
compact_obs, the sanitizer, MultiStepTransform, and the
tutorial.
ValueEstimatorBase._sanitize_next_obs_nan: .. seealso::
block to compact_obs, NextStateReconstructor, and the
tutorial.

docs/source/index.rst registers the new tutorial under "Basics".

Test plan

Tutorial runs end-to-end with concrete output (memory savings
reported, NaN at slice boundaries confirmed to coincide with
trajectory boundaries, GAE advantage finite everywhere, memmap
roundtrip works).
All cross-references resolve to existing public symbols
(verified by reading the rendered class docstrings via
Collector.__doc__ etc.).

🤖 Generated with Claude Code

New tutorial under tutorials/sphinx-tutorials/memory_efficient_rl.py that ties together the three recent memory-efficiency PRs: - compact_obs flag on the collector (pytorch#3742) - NextStateReconstructor RB transform (pytorch#3743) - NaN-safe value-estimator forward (pytorch#3744) The tutorial walks through: - Where the observation memory goes and why TorchRL keeps both obs and ("next", obs) by default (bootstrap targets, MultiStep n-step fallback) - Knob 1: SyncDataCollector(compact_obs=True) — halves the obs footprint at the producer side - Knob 2: NextStateReconstructor — rebuilds ("next", obs) at sampling time, NaN at trajectory ends - Knob 2.5: ValueEstimatorBase._sanitize_next_obs_nan keeps GAE/TD targets numerically defined - When NOT to take this path: MultiStepTransform, truncated transitions where the V(obs[t]) ≈ V(real_next_obs) approximation is unacceptable - Knob 3: LazyMemmapStorage for buffers larger than VRAM - Knob 4: SliceSampler + scan/Triton recurrent backends for padding-free sequence training - End-to-end pipeline snippet The tutorial runs end-to-end on CPU (CartPole-v1, 200 frames) and reports concrete byte-level savings from `td.bytes()`. Cross-references added to: - SyncDataCollector / MultiSyncCollector / MultiAsyncCollector (`compact_obs` docstring) — pointers to NextStateReconstructor, the value-estimator sanitizer, MultiStep incompatibility note, and the new tutorial. - NextStateReconstructor — `.. seealso::` block to compact_obs, the sanitizer, MultiStep incompatibility, and the tutorial. - ValueEstimatorBase._sanitize_next_obs_nan — `.. seealso::` to compact_obs, NextStateReconstructor, and the tutorial. docs/source/index.rst — register the new tutorial under "Basics". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

pytorch-bot · 2026-05-12T16:10:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3745

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 1 New Failure

As of commit 01502ba with merge base 6e46542 ():

NEW FAILURE - The following job has failed:

Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh)
test/transforms/test_observation_transforms.py::TestNextObservationDelta::test_trans_parallel_env_check

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Add a new section between Knob 2 and Knob 2.5 that describes the lossy-delta variant of memory-efficient observation storage shipped in pytorch#3777: - env-side: write `(next_obs - obs).to(delta_dtype)` under `("next", "delta", obs)` and drop the full-precision next obs. - RB-side: the same transform reconstructs `("next", obs)` as `obs + delta` at sample time. The new knob trades a smaller memory saving (~25% vs ~50%) for boundary-preserving reconstruction: no NaN at trajectory ends, so losses that bootstrap on truncated transitions get the real next obs instead of the `V(obs[t])` fallback used by the value-estimator sanitizer. MultiStep is still incompatible. Cross-references: - "When not to rehydrate" now points at NextObservationDelta as the alternative for truncated-bootstrap-heavy losses. - Conclusion bullets include the delta knob alongside the compact + reconstructor pair. The runnable code path is unchanged; the new section uses a `.. code-block:: python` (non-executed) snippet, so the tutorial does not depend on pytorch#3777 being merged first.

…NextObservationDelta The conflict in torchrl/collectors/_single.py was between two extensions of the compact_obs docstring -- HEAD added the tutorial / NaN-sanitizer / MultiStep cross-references, main added the new shifted='compact' GAE pairing. Resolved by keeping both. Now that NextObservationDelta (pytorch#3777) is in main, point at it from the three places that already cross-reference the memory-efficient knobs: - torchrl/collectors/_single.py compact_obs docstring -- 'lossy-precision alternative that *does* preserve boundary transitions'. - torchrl/collectors/_multi_base.py compact_obs docstring -- same line. - torchrl/envs/transforms/rb_transforms.py NextStateReconstructor seealso -- mention the delta variant for the NaN-at-boundary case. - torchrl/objectives/value/advantages.py _sanitize_next_obs_nan seealso -- mention the delta variant as an alternative that avoids NaN. No code changes; docs only.

…ether' Now that NextObservationDelta is in main, promote the Knob 2b .. code-block:: python snippet to a runnable section. The section: - builds a TransformedEnv(CartPole, NextObservationDelta()) - runs a 200-step CartPole rollout - prints the bytes for default vs compact_obs vs delta side by side - confirms ('next', 'delta', 'observation') is float16 and ('next', 'observation') is absent from the rollout - attaches the same class to a ReplayBuffer (with explicit in_keys for the RB side, since there is no env parent for auto-inference) - samples and verifies ('next', 'observation') is reconstructed finite at every position -- including trajectory boundaries, which is exactly the case where the compact-obs path produces NaN. Also extend 'Putting it together' with a parallel Recipe B that uses NextObservationDelta on both sides, alongside the existing compact_obs + NextStateReconstructor recipe.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026

github-actions Bot added Documentation Improvements or additions to documentation Objectives Collectors Transforms tutorials/ Integrations/torch_geometric Integrations labels May 12, 2026

vmoens added 4 commits May 21, 2026 15:42

Tutorial: mention NextObservationDelta in the 'What you will learn' card

8ea7ae4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Memory-efficient RL training tutorial + cross-refs#3745

[Doc] Memory-efficient RL training tutorial + cross-refs#3745
vmoens wants to merge 5 commits into
pytorch:mainfrom
vmoens:feature/memory-efficient-docs

vmoens commented May 12, 2026

Uh oh!

pytorch-bot Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vmoens commented May 12, 2026

Summary

Test plan

Uh oh!

pytorch-bot Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3745

❗ 1 Active SEVs

❌ 1 New Failure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented May 12, 2026 •

edited

Loading