Skip to content

clear residual op in vpto emit#855

Open
Likai-19 wants to merge 1 commit into
hw-native-sys:mainfrom
Likai-19:vpto_clear_residual_op
Open

clear residual op in vpto emit#855
Likai-19 wants to merge 1 commit into
hw-native-sys:mainfrom
Likai-19:vpto_clear_residual_op

Conversation

@Likai-19

Copy link
Copy Markdown

问题总结

VPTO lowering pipeline 完成后,module 中残留的 tile 元数据 op 阻塞了 LLVM 导出。这些 op 分为两类:

  1. 语义已完成但未清理的死 oppto.set_valid_shape(始终无 user)、无 user 的 pto.get_valid_shape、无 user 的 pto.treshape、无 user 的 pto.alloc_tile、无 user 的 UnrealizedConversionCastOp

  2. 可被折叠的 cast 链pto.get_valid_shapeUnrealizedConversionCastOp → LLVM i64 的 cast 链,其等价的 LLVM i64 值已存在于原始 pto.alloc_tile 的 valid row/col 属性中,但未被复用。

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces folding for pto.get_validshape intrinsics into materialized tile handles, along with erasing unused pto.set_validshape operations and performing dead code elimination (DCE) on leftover tile-handle view and allocation operations. The review feedback suggests optimizing the iterative DCE implementation to a worklist-based approach to improve complexity from $O(N \times M)$ to $O(N + M)$, and adding a safety check to ensure UnrealizedConversionCastOp has exactly one result before accessing it.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +680 to +699
bool tileDceChanged = true;
while (tileDceChanged) {
tileDceChanged = false;
SmallVector<Operation *, 8> deadTileOps;
func.walk([&](Operation *op) {
if (!op->use_empty())
return;
if (isa<pto::TReshapeOp, pto::MaterializeTileOp, pto::AllocTileOp>(op))
deadTileOps.push_back(op);
else if (auto castOp = dyn_cast<UnrealizedConversionCastOp>(op)) {
if (castOp.getNumOperands() == 1 &&
isa<pto::TileBufType>(castOp.getResult(0).getType()))
deadTileOps.push_back(op);
}
});
for (auto *op : llvm::reverse(deadTileOps)) {
op->erase();
tileDceChanged = true;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current iterative DCE implementation performs a full function walk (func.walk) inside a while loop for every level of dead tile operations. For a chain of $N$ dead operations, this results in $O(N \times M)$ complexity (where $M$ is the number of operations in the function).

We can optimize this to $O(N + M)$ by using a worklist-based approach. When an operation is erased, we check if its operands have become dead and add them to the worklist if they match the target types. Additionally, we should verify that UnrealizedConversionCastOp has exactly 1 result before accessing getResult(0) to prevent potential out-of-bounds assertions.

    SmallVector<Operation *, 8> worklist;
    func.walk([&](Operation *op) {
      if (!op->use_empty())
        return;
      if (isa<pto::TReshapeOp, pto::MaterializeTileOp, pto::AllocTileOp>(op)) {
        worklist.push_back(op);
      } else if (auto castOp = dyn_cast<UnrealizedConversionCastOp>(op)) {
        if (castOp.getNumOperands() == 1 && castOp.getNumResults() == 1 &&
            isa<pto::TileBufType>(castOp.getResult(0).getType()))
          worklist.push_back(op);
      }
    });

    while (!worklist.empty()) {
      Operation *op = worklist.pop_back_val();
      SmallVector<Value, 4> operands(op->getOperands());
      op->erase();
      for (Value operand : operands) {
        if (auto *defOp = operand.getDefiningOp()) {
          if (!defOp->use_empty())
            continue;
          if (isa<pto::TReshapeOp, pto::MaterializeTileOp, pto::AllocTileOp>(defOp)) {
            worklist.push_back(defOp);
          } else if (auto castOp = dyn_cast<UnrealizedConversionCastOp>(defOp)) {
            if (castOp.getNumOperands() == 1 && castOp.getNumResults() == 1 &&
                isa<pto::TileBufType>(castOp.getResult(0).getType()))
              worklist.push_back(defOp);
          }
        }
      }
    }

@Likai-19 Likai-19 force-pushed the vpto_clear_residual_op branch from 6c34f51 to eb6e8d3 Compare June 23, 2026 06:40
@reedhecre

reedhecre commented Jun 23, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: clear residual op in vpto emit #855 clear residual op in vpto emit
  • Author: Likai-19
  • Base/Head: main / vpto_clear_residual_op
  • Head SHA: eb6e8d3849d1
  • Trigger: 检测到新的 open PR
  • Generated At: 2026-06-23T07:12:01Z
  • Status: completed

Summary

PR 在两阶段 FoldTileBufIntrinsics 流水线里过早删除 set_validshape,会导致后续 valid-shape 读取/折叠看到错误的元数据。

Findings

  1. P2 `shape-only` now drops `set_validshape` before later valid-shape readers are folded lib/PTO/Transforms/FoldTileBufIntrinsics.cpp:729

VPTO pipeline runs this pass twice: shape-only first and addr-only later (tools/ptoas.cpp:1494-1509). The new cleanup here erases every pto.set_validshape in the first pass, but the newly added pto.get_validshape folding only happens in the addr-family block above. That means any later get_validshape/treshape-derived reader in the second pass no longer sees the override and folds back to the original alloc/materialize valid dims. This breaks the dynamic-valid treshape + set_validshape patterns produced by PTOA5NormalizeTMovPass, and it also makes --fold-mode=shape-only itself non-semantics-preserving for IR that still contains get_validshape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants