Delay AMO rd write until store succeeds by shengwen-tw · Pull Request #141 · sysprog21/semu

shengwen-tw · 2026-06-23T12:24:06Z

Overview

Do not commit the AMO destination register before the store succeeds.

AMO_OP wrote rd before confirming that mmu_store() succeeded. If the store raised a fault, such as a COW/page fault, the destination register could be committed even though the guest kernel would handle the fault and retry the same instruction.

This patch delays the rd write until after mmu_store() returns and vm->error is known to be clear.

Problem Statement

Guest GDB can hang for:

gdb -q -batch -ex "set debuginfod enabled off" -ex run --args /bin/true

AI-agent investigation reduced the hang to GDB's default worker-thread path and identified that glibc's malloc at fork unlock path was failing to release its lock:

amoswap.w.rl a5,a5,(a4)

At that point, a4 points at glibc malloc's list_lock, a5 is the value to store (0, meaning unlocked), and the current lock value is 1 (locked). The instruction swaps the value in a5 into the lock and returns the old lock value back in a5.

before: list_lock = 1, a5 = 0

On the broken path, single-stepping showed that a5 still receives the old lock value, but the store did not clear the lock:

broken after: list_lock = 1, a5 = 1
(normal after: list_lock = 0, a5 = 1)

The underlying issue is the AMO implementation:

#define AMO_OP(STORED_EXPR)                                   \
...
set_dest_idx(vm, decoded_rd(decoded), value);
mmu_store(vm, addr, RV_MEM_SW, (STORED_EXPR), false);
if (vm->error)
    return;
...

mmu_store() can fault, for example on a COW/page fault after fork(). Therefore, the old implementation can corrupt that register before the fault is handled

Fix

Compute the store value first, perform the store, check for an error, and only then commit rd:

uint32_t stored = (STORED_EXPR);
mmu_store(vm, addr, RV_MEM_SW, stored, false);
if (vm->error)
    return;
set_dest_idx(vm, decoded_rd(decoded), value);

Testing

scripts/buildroot-enable-gdb.sh is a testing helper that appends the target GDB options needed to Buildroot config.

#!/usr/bin/env bash
set -euo pipefail

CONFIG_FILE="${1:-configs/buildroot.config}"

cat >>"$CONFIG_FILE" <<'EOF'

# Target-side GDB reproduction support.
BR2_TOOLCHAIN_BUILDROOT_CXX=y
BR2_INSTALL_LIBSTDCPP=y
BR2_ENABLE_DEBUG=y
# BR2_STRIP_strip is not set
BR2_PACKAGE_GDB=y
BR2_PACKAGE_GDB_DEBUGGER=y
BR2_PACKAGE_GDB_SERVER=y
BR2_PACKAGE_HOST_GDB=y
EOF

echo "Appended target-side GDB reproduction options to $CONFIG_FILE"

Build a GDB-enabled guest:

chmod +x ./scripts/buildroot-enable-gdb.sh
./scripts/buildroot-enable-gdb.sh
./scripts/build-image.sh --all --no-ext4
./scripts/rootfs_ext4.sh rootfs.cpio ext4.img 256
make semu -j$(nproc)

Boot:

./semu -k Image -c 1 -b minimal.dtb -H -d ext4.img

Inside the guest:

gdb -q -batch -ex "set debuginfod enabled off" -ex run --args /bin/true
echo GDB_TRUE_EXIT:$?

Expected behavior:

Before this patch: hangs
After this patch: exits normally with GDB_TRUE_EXIT:0

Summary by cubic

Delay the AMO destination register write until after mmu_store() succeeds to prevent committing rd on a fault. Fixes a guest GDB hang triggered by amoswap.w.rl in glibc’s malloc lock path.

Bug Fixes
- In AMO_OP, compute the store value, call mmu_store, check vm->error, then write rd.
- Prevents rd corruption on COW/page faults, restoring correct swap semantics and normal GDB exit.

^{Written for commit deadf03. Summary will update on new commits.}

shengwen-tw · 2026-06-23T12:25:27Z

@cubic-dev-ai

cubic-dev-ai · 2026-06-23T12:25:32Z

@cubic-dev-ai

@shengwen-tw I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

No issues found across 1 file

_{Re-trigger cubic}

Do not commit the AMO destination register before the store succeeds. `AMO_OP` wrote `rd` before confirming that `mmu_store()` succeeded. If the store raised a fault, such as a COW/page fault, the destination register could be committed even though the guest kernel would handle the fault and retry the same instruction. This patch delays the `rd` write until after `mmu_store()` returns and `vm->error` is known to be clear.

Cuda-Chen · 2026-06-24T15:32:48Z

Hi @shengwen-tw ,

I got a kernel panic when executing ./semu -k Image -c 1 -b minimal.dtb -H -d ext4.img:

$ ./semu -k Image -c 1 -b minimal.dtb -H -d ext4.img
...
[    0.023338] /dev/root: Can't open blockdev
[    0.023341] VFS: Cannot open root device "" or unknown-block(0,0): error -6
[    0.023344] Please append a correct "root=" boot option; here are the available partitions:
[    0.023347] fe00          262144 vda 
[    0.023348]  driver: virtio_blk
[    0.023350] List of all bdev filesystems:
[    0.023352]  ext3
[    0.023353]  ext4
[    0.023354]  fuseblk
[    0.023356] 
[    0.023358] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    0.023361] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.94 #1
[    0.023364] Hardware name: semu (DT)
[    0.023366] Call Trace:
[    0.023368] [<c0004aa4>] dump_backtrace+0x2c/0x3c
[    0.023371] [<c04794ac>] show_stack+0x40/0x54
[    0.023374] [<c047efdc>] dump_stack_lvl+0x74/0xa8
[    0.023377] [<c047f02c>] dump_stack+0x1c/0x2c
[    0.023380] [<c047985c>] panic+0x128/0x368
[    0.023383] [<c0488cdc>] mount_root_generic+0x214/0x2a8
[    0.023386] [<c0488ddc>] mount_root+0x6c/0x1d4
[    0.023390] [<c0489148>] prepare_namespace+0x204/0x258
[    0.023393] [<c04885a0>] kernel_init_freeable+0x1dc/0x220
[    0.023397] [<c04803ac>] kernel_init+0x24/0x120
[    0.023400] [<c0487294>] ret_from_exception_end+0x14/0x20
[    0.023404] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

Mes0903 · 2026-06-26T22:05:21Z

Hi @Cuda-Chen ,

I checked your reply locally. I think the panic is caused by a DTB/runtime boot-mode mismatch, not by the AMO change in this PR.

The key line is:

VFS: Cannot open root device "" or unknown-block(0,0)

The same log also lists vda from virtio_blk, so the disk is visible to the kernel. The missing part is a root= argument in /chosen/bootargs.

I reproduced the same class of failure by taking a known-good minimal.dtb, removing root=/dev/vda from /chosen/bootargs, and running:

timeout 30 ./semu -k ../blob/Image -c 1 -b /tmp/semu-minimal-external-no-root-quiet.dtb -H -d ../blob/test-tools.img

That reaches the same VFS panic. With the normal external-root DTB:

fdtget -t s minimal.dtb /chosen bootargs

I get:

earlycon console=ttyS0 root=/dev/vda rw rootwait quiet loglevel=3 lpj=260000

and the same launch path boots to the Buildroot login prompt.

So the likely local condition is that minimal.dtb was generated for the legacy/initramfs path, for example with ENABLE_EXTERNAL_ROOT=0 or after the fakeroot fallback, or that it is stale from an older build. In that mode the DTB needs -i rootfs.cpio; running it with only -d ext4.img leaves the kernel with root device "".

Maybe we can try to improve the behavior in #140.

PR sysprog21#141 discussed a kernel panic where Linux saw 'vda' from 'virtio-blk' but still reported root device "". The manual run used a DTB generated for the legacy initramfs path, so its bootargs did not provide 'root=/dev/vda'. The command only attached '-d ext4.img' and did not pass '-i rootfs.cpio'. The default 'make check' paths already avoid this. External-root builds generate bootargs with 'root=/dev/vda' and pass '-d ext4.img'; the fakeroot fallback and explicit 'ENABLE_EXTERNAL_ROOT=0' path pass '-i rootfs.cpio' together with the initramfs DTB. Keep custom DTBs outside these default 'minimal.dtb' checks. For the built-in default-DTB cases, validate the option combination before boot so manual invocations get an actionable 'semu' error instead of reaching the guest VFS panic.

Cuda-Chen · 2026-06-29T13:13:58Z

Hi @Mes0903 ,

Thanks for your assistance.
After building semu with ENABLE_EXTERNAL_ROOT=0 I can launch semu.

For @shengwen-tw ,
I confirm gdb will exit with GDB_TRUE_EXIT:0.

Cuda-Chen

Code LGTM.

PR sysprog21#141 discussed a kernel panic where Linux saw 'vda' from 'virtio-blk' but still reported root device "". The manual run used a DTB generated for the legacy initramfs path, so its bootargs did not provide 'root=/dev/vda'. The command only attached '-d ext4.img' and did not pass '-i rootfs.cpio'. The default 'make check' paths already avoid this. External-root builds generate bootargs with 'root=/dev/vda' and pass '-d ext4.img'; the fakeroot fallback and explicit 'ENABLE_EXTERNAL_ROOT=0' path pass '-i rootfs.cpio' together with the initramfs DTB. Keep custom DTBs outside these default 'minimal.dtb' checks. For the built-in default-DTB cases, validate the option combination before boot so manual invocations get an actionable 'semu' error instead of reaching the guest VFS panic.

shengwen-tw marked this pull request as ready for review June 23, 2026 12:24

shengwen-tw requested a review from jserv June 23, 2026 12:26

cubic-dev-ai Bot reviewed Jun 23, 2026

View reviewed changes

shengwen-tw requested review from ChinYikMing, Cuda-Chen, Mes0903 and visitorckw June 23, 2026 12:26

shengwen-tw force-pushed the fix-amo branch from afb15bc to 915d5b8 Compare June 23, 2026 12:38

shengwen-tw force-pushed the fix-amo branch from 915d5b8 to deadf03 Compare June 23, 2026 14:55

Mes0903 approved these changes Jun 26, 2026

View reviewed changes

Cuda-Chen approved these changes Jun 29, 2026

View reviewed changes

visitorckw approved these changes Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delay AMO rd write until store succeeds#141

Delay AMO rd write until store succeeds#141
shengwen-tw wants to merge 1 commit into
sysprog21:masterfrom
shengwen-tw:fix-amo

shengwen-tw commented Jun 23, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

shengwen-tw commented Jun 23, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 23, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Cuda-Chen commented Jun 24, 2026

Uh oh!

Mes0903 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Cuda-Chen commented Jun 29, 2026

Uh oh!

Cuda-Chen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

shengwen-tw commented Jun 23, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Problem Statement

Fix

Testing

Summary by cubic

Uh oh!

shengwen-tw commented Jun 23, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 23, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Cuda-Chen commented Jun 24, 2026

Uh oh!

Mes0903 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cuda-Chen commented Jun 29, 2026

Uh oh!

Cuda-Chen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shengwen-tw commented Jun 23, 2026 •

edited by cubic-dev-ai Bot

Loading

Mes0903 commented Jun 26, 2026 •

edited

Loading