Skip to content

[BUG] multi_tensor_apply: int32 overflow in TensorListMetadata::sizes causes illegal memory access for tensors with numel > INT_MAX #2918

@yezhengmao1

Description

@yezhengmao1

multi_tensor_apply silently truncates per-tensor sizes from 64-bit to 32-bit, causing illegal memory access when any input tensor has numel() > INT_MAX (2,147,483,647).

In transformer_engine/common/multi_tensor/multi_tensor_apply.cuh, TensorListMetadataBase::sizes is declared as int sizes[...] (int32), but it is populated from Tensor::numel() (size_t / int64):

  // multi_tensor_apply.cuh:24
  int sizes[depth_to_max_tensors[n - 1]];
  ...
  // multi_tensor_apply.cuh:68
  tl.sizes[loc_tensor_info] = tensor_lists[0][t]->numel();   // int64 -> int32, silent truncation

For a tensor with numel = 2,476,250,368 (e.g. an embedding of shape [19345706, 128]), this field becomes 2476250368 - 2^32 = -1,818,716,928. The resulting negative / bogus size is then consumed by downstream kernels (e.g. multi_tensor_l2norm_kernel) which compute element offsets from it, producing out-of-bounds global-memory accesses and the following error at the next CUDA sync:

RuntimeError: .../multi_tensor_apply.cuh:92 in function multi_tensor_apply:
CUDA Error: an illegal memory access was encountered

This is hit by any real-world use that feeds a tensor with numel > 2^31 to TE's multi_tensor utilities. In particular, megatron.training.utils.calc_params_l2_norm →
multi_tensor_applier(multi_tensor_l2norm, ...) crashes for any model containing a single parameter with >2.14B elements (common for large-vocab embeddings, tied output layers, over-encoding tables, etc.).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions