Skip to content

[Doc] Memory-efficient RL training tutorial + cross-refs#3745

Open
vmoens wants to merge 5 commits into
pytorch:mainfrom
vmoens:feature/memory-efficient-docs
Open

[Doc] Memory-efficient RL training tutorial + cross-refs#3745
vmoens wants to merge 5 commits into
pytorch:mainfrom
vmoens:feature/memory-efficient-docs

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented May 12, 2026

Summary

Ties together the three recently-merged memory-efficiency PRs into a
single story:

Two parts:

1. Runnable Sphinx-gallery tutorial at
tutorials/sphinx-tutorials/memory_efficient_rl.py. Sections:

  • Where the observation memory goes (concrete td.bytes() numbers)
  • Why ("next", obs) is kept by default — bootstrap target at
    trajectory ends, MultiStepTransform n-step fallback
  • Knob 1 — SyncDataCollector(compact_obs=True)
  • Knob 2 — NextStateReconstructor with the traj_id + done contract
  • Knob 2.5 — value-estimator NaN safety
    (_sanitize_next_obs_nan), GAE finite everywhere
  • When not to take this path — MultiStepTransform incompatibility,
    the V(obs[t]) ≈ V(real_next_obs) approximation at truncated steps,
    and how shifted=True interacts
  • Knob 3 — LazyMemmapStorage for buffers ≥ VRAM
  • Knob 4 — SliceSampler + the new "scan" / "triton"
    recurrent backends for padding-free sequence training
  • End-to-end pipeline snippet
  • Conclusion + Further reading

Runs end-to-end on CPU (CartPole-v1, 200 frames; <2s wall) and reports
the byte-level savings concretely from td.bytes().

2. Docstring cross-references so a reader landing on any of the
three new APIs finds the other two:

  • Collector(compact_obs=…) (and the multi-process collectors):
    pointers to NextStateReconstructor, the value-estimator
    sanitizer, the MultiStepTransform incompatibility note, and the
    new tutorial.
  • NextStateReconstructor: .. seealso:: block covering
    compact_obs, the sanitizer, MultiStepTransform, and the
    tutorial.
  • ValueEstimatorBase._sanitize_next_obs_nan: .. seealso::
    block to compact_obs, NextStateReconstructor, and the
    tutorial.

docs/source/index.rst registers the new tutorial under "Basics".

Test plan

  • Tutorial runs end-to-end with concrete output (memory savings
    reported, NaN at slice boundaries confirmed to coincide with
    trajectory boundaries, GAE advantage finite everywhere, memmap
    roundtrip works).
  • All cross-references resolve to existing public symbols
    (verified by reading the rendered class docstrings via
    Collector.__doc__ etc.).

🤖 Generated with Claude Code

New tutorial under tutorials/sphinx-tutorials/memory_efficient_rl.py
that ties together the three recent memory-efficiency PRs:
  - compact_obs flag on the collector (pytorch#3742)
  - NextStateReconstructor RB transform (pytorch#3743)
  - NaN-safe value-estimator forward (pytorch#3744)

The tutorial walks through:
  - Where the observation memory goes and why TorchRL keeps both
    obs and ("next", obs) by default (bootstrap targets, MultiStep
    n-step fallback)
  - Knob 1: SyncDataCollector(compact_obs=True) — halves the obs
    footprint at the producer side
  - Knob 2: NextStateReconstructor — rebuilds ("next", obs) at
    sampling time, NaN at trajectory ends
  - Knob 2.5: ValueEstimatorBase._sanitize_next_obs_nan keeps GAE/TD
    targets numerically defined
  - When NOT to take this path: MultiStepTransform, truncated
    transitions where the V(obs[t]) ≈ V(real_next_obs) approximation
    is unacceptable
  - Knob 3: LazyMemmapStorage for buffers larger than VRAM
  - Knob 4: SliceSampler + scan/Triton recurrent backends for
    padding-free sequence training
  - End-to-end pipeline snippet

The tutorial runs end-to-end on CPU (CartPole-v1, 200 frames) and
reports concrete byte-level savings from `td.bytes()`.

Cross-references added to:
  - SyncDataCollector / MultiSyncCollector / MultiAsyncCollector
    (`compact_obs` docstring) — pointers to NextStateReconstructor,
    the value-estimator sanitizer, MultiStep incompatibility note,
    and the new tutorial.
  - NextStateReconstructor — `.. seealso::` block to compact_obs,
    the sanitizer, MultiStep incompatibility, and the tutorial.
  - ValueEstimatorBase._sanitize_next_obs_nan — `.. seealso::` to
    compact_obs, NextStateReconstructor, and the tutorial.

docs/source/index.rst — register the new tutorial under "Basics".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3745

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 01502ba with merge base 6e46542 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026
vmoens added 4 commits May 21, 2026 15:42
Add a new section between Knob 2 and Knob 2.5 that describes the
lossy-delta variant of memory-efficient observation storage shipped
in pytorch#3777:

- env-side: write `(next_obs - obs).to(delta_dtype)` under
  `("next", "delta", obs)` and drop the full-precision next obs.
- RB-side: the same transform reconstructs `("next", obs)` as
  `obs + delta` at sample time.

The new knob trades a smaller memory saving (~25% vs ~50%) for
boundary-preserving reconstruction: no NaN at trajectory ends, so
losses that bootstrap on truncated transitions get the real next
obs instead of the `V(obs[t])` fallback used by the value-estimator
sanitizer. MultiStep is still incompatible.

Cross-references:
- "When not to rehydrate" now points at NextObservationDelta as the
  alternative for truncated-bootstrap-heavy losses.
- Conclusion bullets include the delta knob alongside the compact +
  reconstructor pair.

The runnable code path is unchanged; the new section uses a
`.. code-block:: python` (non-executed) snippet, so the tutorial does
not depend on pytorch#3777 being merged first.
…NextObservationDelta

The conflict in torchrl/collectors/_single.py was between two extensions
of the compact_obs docstring -- HEAD added the tutorial / NaN-sanitizer /
MultiStep cross-references, main added the new shifted='compact' GAE
pairing. Resolved by keeping both.

Now that NextObservationDelta (pytorch#3777) is in main, point at it from the
three places that already cross-reference the memory-efficient knobs:

- torchrl/collectors/_single.py compact_obs docstring -- 'lossy-precision
  alternative that *does* preserve boundary transitions'.
- torchrl/collectors/_multi_base.py compact_obs docstring -- same line.
- torchrl/envs/transforms/rb_transforms.py NextStateReconstructor seealso
  -- mention the delta variant for the NaN-at-boundary case.
- torchrl/objectives/value/advantages.py _sanitize_next_obs_nan seealso
  -- mention the delta variant as an alternative that avoids NaN.

No code changes; docs only.
…ether'

Now that NextObservationDelta is in main, promote the Knob 2b
.. code-block:: python snippet to a runnable section. The section:

- builds a TransformedEnv(CartPole, NextObservationDelta())
- runs a 200-step CartPole rollout
- prints the bytes for default vs compact_obs vs delta side by side
- confirms ('next', 'delta', 'observation') is float16 and
  ('next', 'observation') is absent from the rollout
- attaches the same class to a ReplayBuffer (with explicit in_keys for
  the RB side, since there is no env parent for auto-inference)
- samples and verifies ('next', 'observation') is reconstructed
  finite at every position -- including trajectory boundaries, which
  is exactly the case where the compact-obs path produces NaN.

Also extend 'Putting it together' with a parallel Recipe B that uses
NextObservationDelta on both sides, alongside the existing
compact_obs + NextStateReconstructor recipe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Documentation Improvements or additions to documentation Integrations/torch_geometric Integrations Objectives Transforms tutorials/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant