Skip to content

Revert "Expert Parallelism: common C API + NCCL EP backend"#3126

Merged
timmoon10 merged 1 commit into
mainfrom
revert-3034-phuong/ep-2-commwindow
Jun 13, 2026
Merged

Revert "Expert Parallelism: common C API + NCCL EP backend"#3126
timmoon10 merged 1 commit into
mainfrom
revert-3034-phuong/ep-2-commwindow

Conversation

@phu0ngng

@phu0ngng phu0ngng commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Reverts #3034 since we need to resolve dependencies in our CI infrastructure first.

@phu0ngng phu0ngng requested a review from ptrendx as a code owner June 13, 2026 01:43
@phu0ngng phu0ngng requested a review from timmoon10 June 13, 2026 01:44
This reverts commit c3396ee.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 force-pushed the revert-3034-phuong/ep-2-commwindow branch from f7f6a07 to ee354cd Compare June 13, 2026 01:46
@timmoon10 timmoon10 merged commit 547d284 into main Jun 13, 2026
9 of 13 checks passed
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR reverts #3034 ("Expert Parallelism: common C API + NCCL EP backend"), removing the entire NCCL EP feature to unblock CI infrastructure dependencies. It is a clean, mechanical revert that deletes ~2,450 lines of EP implementation, tests, and build scaffolding.

  • Removes all EP source files (ep_api.cpp, ep_backend.cpp, ep_backend.h, ep.h, comm_window.h) and the 3rdparty/nccl git submodule, plus the corresponding CMake build blocks in both transformer_engine/common/CMakeLists.txt and tests/cpp_distributed/CMakeLists.txt.
  • Strips the NCCL EP build path from setup.py (including build_nccl_ep_submodule and _discover_nccl_home) and the public is_nccl_ep_available/require_nccl_ep Python helpers from transformer_engine/__init__.py.
  • Reverts qa/L1_cpp_distributed/test.sh to the simpler pre-EP form (single cmake --build build + mpirun test_comm_gemm) and restores _load_core_library() from RTLD_GLOBAL | RTLD_LAZY back to plain RTLD_GLOBAL.

Confidence Score: 5/5

The revert is clean and complete — no EP symbols, headers, or build references remain in the tree, confirmed by a full codebase grep.

Every file touched by the original EP PR is either deleted or mechanically reverted. The only noteworthy difference from a perfect restore is the omission of os.RTLD_LAZY in _load_core_library(), which matches the pre-EP code and works correctly under glibc. No functional regressions are introduced.

transformer_engine/common/init.py — the _load_core_library() binding-mode flag is worth verifying is intentionally left as-is.

Important Files Changed

Filename Overview
.gitmodules Removes the 3rdparty/nccl submodule entry — correct counterpart to deleting the 3rdparty/nccl subproject commit.
setup.py Removes 157 lines of NCCL EP build logic (build_nccl_ep_submodule, _discover_nccl_home, arch detection) — matches the removed CMake blocks and deleted submodule.
transformer_engine/common/init.py Reverts _load_core_library() from RTLD_GLOBAL
transformer_engine/init.py Removes public is_nccl_ep_available() / require_nccl_ep() helpers and the _nccl_runtime_version() cache — correct since the EP backend is no longer present.
transformer_engine/common/CMakeLists.txt Removes the entire NCCL EP CMake block (libnccl_ep.a static link, include paths, ep_api.cpp / ep_backend.cpp sources, NVTE_WITH_NCCL_EP option) — consistent with deleted source files.
qa/L1_cpp_distributed/test.sh Reverts to simple set -e + cmake --build build + mpirun test_comm_gemm; removes XML artifact collection and per-target failure tracking added alongside the EP test.
tests/cpp_distributed/CMakeLists.txt Removes the test_ep executable target, NCCL header discovery (nccl.h prefix walk), and EP-specific linker flags; only test_comm_gemm remains.
tests/cpp_distributed/test_ep.cu Deleted — 843-line EP distributed test suite (dispatch, combine, fwd/bwd, zero-copy) removed as part of the revert.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["PR #3126: Revert EP PR #3034"] --> B["Remove 3rdparty/nccl submodule"]
    A --> C["Delete EP source files\nep_api.cpp / ep_backend.cpp / ep_backend.h\nep.h / comm_window.h"]
    A --> D["Revert build system\nsetup.py + CMakeLists.txt"]
    A --> E["Revert Python public API\ntransformer_engine/__init__.py"]
    A --> F["Revert test infrastructure\ntest.sh / CMakeLists.txt\nDelete test_ep.cu / run_test_ep.sh"]
    A --> G["Revert common/__init__.py\nRTLD_GLOBAL|RTLD_LAZY → RTLD_GLOBAL"]
    D --> D1["setup.py removes:\nbuild_nccl_ep_submodule()\n_discover_nccl_home()"]
    D --> D2["common/CMakeLists.txt removes:\nNVTE_WITH_NCCL_EP option\nlibnccl_ep.a static link\nep_api.cpp / ep_backend.cpp sources"]
    D --> D3["tests/CMakeLists.txt removes:\ntest_ep target\nNCCL header discovery"]
    E --> E1["Removes:\nis_nccl_ep_available()\nrequire_nccl_ep()\n_nccl_runtime_version()"]
    F --> F1["qa/test.sh reverts to:\nset -e\ncmake --build build\nmpirun test_comm_gemm"]
Loading

Reviews (1): Last reviewed commit: "Revert "Expert Parallelism: common C API..." | Re-trigger Greptile

def _load_core_library():
"""Load shared library with Transformer Engine C extensions"""
return ctypes.CDLL(_get_shared_object_file("core"), mode=ctypes.RTLD_GLOBAL | os.RTLD_LAZY)
return ctypes.CDLL(_get_shared_object_file("core"), mode=ctypes.RTLD_GLOBAL)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The EP PR added os.RTLD_LAZY alongside RTLD_GLOBAL so that lazy binding would prevent unresolved NCCL EP symbols from causing a dlopen failure at load time. The revert drops that flag, restoring the pre-EP state. POSIX requires exactly one of RTLD_LAZY or RTLD_NOW to be specified; without either, the behavior is implementation-defined (glibc defaults to lazy binding, so this works in practice, but it is technically underdefined). Consider restoring os.RTLD_LAZY to make the binding mode explicit, regardless of whether EP is present.

Suggested change
return ctypes.CDLL(_get_shared_object_file("core"), mode=ctypes.RTLD_GLOBAL)
return ctypes.CDLL(_get_shared_object_file("core"), mode=ctypes.RTLD_GLOBAL | os.RTLD_LAZY)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@timmoon10 timmoon10 deleted the revert-3034-phuong/ep-2-commwindow branch June 14, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants