Skip to content

docs: convert reStructuredText sources to MyST markdown#1579

Draft
timsaucer wants to merge 2 commits into
apache:mainfrom
timsaucer:doc/phase2-rst-to-md
Draft

docs: convert reStructuredText sources to MyST markdown#1579
timsaucer wants to merge 2 commits into
apache:mainfrom
timsaucer:doc/phase2-rst-to-md

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Phase 2 of the documentation-site refresh started in #1578. With the
modern pydata-sphinx-theme + navigation in place, this PR moves the
content format off .rst and onto MyST .md. The motivation:

  • Markdown is the lingua franca of agent-tuned tooling. LLMs trained
    on GitHub and modern docs parse Markdown reliably; reStructuredText
    is a minority dialect that frequently confuses both humans editing
    via PR review and agents reading the source. The Apache
    datafusion-comet sibling project completed the same migration
    recently and reported smoother contributor onboarding.
  • MyST is a strict superset of CommonMark with directives for the
    Sphinx features we actually use (toctrees, cross-references,
    code-blocks, admonitions, eval-rst escape hatch).
  • The myst-parser extension is already in the docs dependency
    group and was loaded by conf.py even before this PR — switching
    the on-disk format is a low-risk, mechanical change.

This PR stacks on #1578 (theme + navbar refresh). It should land
after #1578.

What changes are included in this PR?

Format conversion (mechanical, via rst-to-myst):

  • 33 human-authored .rst files under docs/source/ become 33
    .md files — the user guide, contributor guide, IO subsection,
    common-operations subsection, dataframe subsection, top-level
    index, and links.
  • Toctrees, cross-references, code blocks, hyperlinks, admonitions,
    and license headers all round-trip cleanly.

Manual fixes layered on top of the converter output:

  • Cross-reference anchors. The converter kebab-cased every
    (label)= anchor (e.g. (io-csv)=), but every {ref} in the
    corpus — including the Python docstrings that sphinx-autoapi
    pulls into the API reference — still uses the underscore form
    ({ref}\CSV <io_csv>`). Rewrite the anchors back to underscore form ((io_csv)=, (window_functions)=, (user_guide_concepts)=, (execution_metrics)=`, etc.) so existing references resolve
    without churning every callsite.
  • MyST extensions. Enable colon_fence and deflist in
    myst_enable_extensions (the converter emits these on a few
    files, notably dataframe/execution-metrics.md).
  • source_suffix. Keep .rst registered even though no
    human-authored RST remains: sphinx-autoapi generates .rst
    under autoapi/ at build time and Sphinx needs the suffix to
    parse it. The comment in conf.py flags this so a future cleanup
    pass doesn't strip it again.

86 {eval-rst} blocks remain in the converted output. Every one of
them wraps a .. ipython:: directive, which has no first-class MyST
equivalent in our extensions setup. The blocks render identically
and don't block the build. Migrating these to a native MyST exec
syntax is a follow-up that requires either myst-nb or a custom
parser registration — out of scope here.

AGENTS.md is updated so the two .rst paths called out under
"Aggregate and Window Function Documentation" point at the new .md
equivalents.

Are there any user-facing changes?

No behavioral change to the datafusion package — only the source
format of the published documentation. Readers of the rendered site
will not notice the migration; the HTML output is unchanged. Internal
cross-references resolve, the pokemon.csv ipython example on the
landing page and the yellow_tripdata_2021-01.parquet example on
the basics page both still execute.

No api change label — public APIs untouched.

Follow-ups (out of scope for this PR)

  • Migrate the 86 {eval-rst} .. ipython:: blocks to a
    MyST-native exec syntax. Requires either pulling in myst-nb or
    configuring a per-language parser.
  • Phase 3: multi-version doc publishing (the comet pattern).
  • Phase 4: asf-site publishing workflow.

@timsaucer timsaucer force-pushed the doc/phase2-rst-to-md branch from a400ec1 to 67c2761 Compare June 7, 2026 13:20
@timsaucer timsaucer marked this pull request as draft June 7, 2026 13:29
timsaucer and others added 2 commits June 7, 2026 15:31
Phase 2 of the documentation-site refresh. Run `rst2myst convert` over
every human-authored .rst file under docs/source/ and remove the
originals. The result:

- 33 .rst files become 33 .md files (user guide, contributor guide,
  index, links).
- Headings, paragraphs, hyperlinks, code blocks, admonitions, and
  toctree directives all map cleanly to MyST syntax.
- Cross-reference anchors round-trip through MyST as `(label)=`
  blocks. The converter kebab-cased the labels (e.g. `(io-csv)=`),
  but every `{ref}` target in the corpus still uses the underscore
  form from the original RST (`{ref}\`CSV <io_csv>\``) and so do the
  Python docstrings that AutoAPI pulls in. Rewrite the anchors back
  to the underscore form so the existing references resolve.
- 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::`
  directives, which have no first-class MyST equivalent. They render
  identically and don't block the build.

conf.py changes:

- Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst
  emits these on a few files, particularly execution-metrics.md).
- Keep `.rst` in `source_suffix` even though no human-authored RST
  remains: sphinx-autoapi generates RST under autoapi/ at build time
  and Sphinx needs the suffix registered to parse it.

AGENTS.md: update the two .rst paths called out under "Aggregate and
Window Function Documentation" to point at the .md equivalents.

Verified by building locally — `build succeeded`, no warnings, all
internal cross-references resolve, the ipython examples on the
landing page and basics page still execute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RST-to-MD conversion emitted MyST `%` comment syntax with blank line
between each header line, which renders as visible text. Replace with
canonical `<!--- ... -->` HTML comment block matching upstream
apache/datafusion and this repo's existing markdown files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer force-pushed the doc/phase2-rst-to-md branch from 026b9e5 to 30efd76 Compare June 7, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant