[fix] allow exporting custom Metal kernels without a Metal backend by xthomaswang · Pull Request #3650 · ml-explore/mlx

xthomaswang · 2026-06-10T03:17:51Z

Proposed changes

mx.fast.metal_kernel throws at construction on non-Metal builds, so functions
using custom Metal kernels can't be exported from Linux.

Move metal_kernel graph construction to backend/common so it builds on all platforms
During export tracing without Metal, record the kernel on a placeholder GPU stream which the importing process remaps to a real stream
Outside of export tracing, calling the kernel without Metal still raises an error

The same mechanism could support exporting mx.fast.cuda_kernel from non-CUDA
platforms; happy to do that as a follow-up.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

xthomaswang · 2026-06-12T22:47:04Z

This now conflicts with #3638, which made the tracing state thread-local. My plan is to rebase on main and make the new InExportTracing counter thread-local as well to match the rest of the tracing state, let me know if sounds good, or do you suggest any other approach.

zcbenz · 2026-06-14T00:35:10Z

This basically looks good to me, a rebase and changing the counter to threadlcoal would be enough.

- Move metal_kernel graph construction to backend/common so it builds on all platforms - During export tracing without Metal, record the kernel on a placeholder GPU stream which the importing process remaps to a real stream - Outside of export tracing, calling the kernel without Metal still raises an error Fixes ml-explore#3240

xthomaswang · 2026-06-14T02:03:48Z

This basically looks good to me, a rebase and changing the counter to threadlcoal would be enough.

Done, rebase on main and change counter to threadlocal.

zcbenz · 2026-06-14T04:07:19Z

+  if ((device && *device != Device::gpu) ||
+      (stream && stream->device != Device::gpu) ||
+      (tl_stream && tl_stream->device != Device::gpu)) {
+    throw std::invalid_argument("[metal_kernel] Only supports the GPU.");


Is it necessary to require a gpu stream here? Can we just return to_stream(s) for exporting?

zcbenz mentioned this pull request Jun 10, 2026

[Feature Request] Fallback for custom kernels #3240

Open

xthomaswang force-pushed the fix/no-metal-custom-kernel-export branch from a431ede to 7cbaeac Compare June 14, 2026 02:00

zcbenz reviewed Jun 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] allow exporting custom Metal kernels without a Metal backend#3650

[fix] allow exporting custom Metal kernels without a Metal backend#3650
xthomaswang wants to merge 1 commit into
ml-explore:mainfrom
xthomaswang:fix/no-metal-custom-kernel-export

xthomaswang commented Jun 10, 2026

Uh oh!

xthomaswang commented Jun 12, 2026

Uh oh!

zcbenz commented Jun 14, 2026

Uh oh!

xthomaswang commented Jun 14, 2026

Uh oh!

zcbenz Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xthomaswang commented Jun 10, 2026

Proposed changes

Checklist

Uh oh!

xthomaswang commented Jun 12, 2026

Uh oh!

zcbenz commented Jun 14, 2026

Uh oh!

xthomaswang commented Jun 14, 2026

Uh oh!

zcbenz Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants