Skip to content

fix(zkernel): mark ISR-reachable list helpers K_ISR_SAFE (IRAM) — closes #53#54

Open
swoisz wants to merge 3 commits into
mainfrom
fix/isr-safe-flash-resident-helpers
Open

fix(zkernel): mark ISR-reachable list helpers K_ISR_SAFE (IRAM) — closes #53#54
swoisz wants to merge 3 commits into
mainfrom
fix/isr-safe-flash-resident-helpers

Conversation

@swoisz

@swoisz swoisz commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #53. k_sem_give() is K_ISR_SAFE (IRAM_ATTR) so it can run from an ESP-IDF IRAM ISR while the flash cache is disabled, but on its hot path it called z_sem_pop_waiter() — a flash-resident static helper. When the give comes from an IRAM ISR during a concurrent flash op (e.g. an IRAM GPIO ISR submitting work → k_work_submitk_sem_give), fetching the flash-mapped helper faults with a cache-access error and the chip panics. Found and fixed locally by a downstream project (verified on esp32c5).

Changes

  • z_sem_pop_waiterK_ISR_SAFE (k_sem.c). Pure list-walk over the caller-owned waiter list, no FreeRTOS calls — safe in IRAM. Its other caller k_sem_reset() is flash-resident, but flash → IRAM calls are fine.
  • z_event_matchK_ISR_SAFE (k_event.c). A sweep of the whole K_ISR_SAFE call graph for the same bug class (an IRAM function calling a flash-resident static helper) turned up exactly one sibling: z_event_match, called on both ISR-safe paths — the post family (z_event_post_internal) and the wait fast path (z_event_wait_internal). Pure arithmetic, no FreeRTOS calls.
  • CI guard (tools/check_iram_symbols.sh + a step in the existing build-esp32s3 job). Derives the K_ISR_SAFE symbol set from source and asserts each is in .iram0.text on the target ELF — the regression the host (linux) suite structurally cannot observe. Inlined/gc'd symbols report skip; only an out-of-line flash-resident symbol fails.

Call-graph sweep

Audited every K_ISR_SAFE/IRAM_ATTR entry point (k_sem, k_event, k_work, k_msgq, k_thread, k_timer, ring_buf, watchdog, gpio_dt). The rest were already clean: shared helpers (z_kernel_lock/unlock, all sys_dlist_*, ring_buf_*_claim/finish, k_timeout_*) are header static inline/ALWAYS_INLINE (inline into the IRAM caller, no symbol); k_panic() is #define … __builtin_trap() (no symbol); FreeRTOS *FromISR paths are IDF's IRAM responsibility.

Two items outside this bug class, flagged for follow-up (not changed here):

  • esp_timer_start_periodic reached from k_timer_esp_callback under CONFIG_K_TIMER_DISPATCH_ISR — an IDF API on the ISR-dispatch path.
  • memcpy in ring_buf.c — documented ROM-resident on ESP32-S3; confirm the same on RISC-V parts (C5/C6).

Verification

  • clang-format clean; host (linux) suite 224/224 pass (unaffected — expected, the fault is target-only).
  • Built for esp32s3 and ran the guard: z_sem_pop_waiter and z_event_match resolve into .iram0.text (0x4037xxxx); non-ISR helpers like k_sem_reset/k_timer_start correctly stay in .flash.text (0x4201xxxx), confirming the check discriminates rather than rubber-stamps. 27 K_ISR_SAFE symbols derived, 25 confirmed IRAM, 2 skipped (inlined/not linked), exit 0.

Follow-up offered (not in this PR)

A RISC-V (esp32c5) CI job would extend the guard to the exact ISA the downstream hit, and would cover the memcpy-residency concern above. Held back because esp32c5 is preview in IDF v5.4 and could destabilize the matrix — happy to add it if wanted.

swoisz and others added 3 commits June 24, 2026 10:48
k_sem_give() is K_ISR_SAFE (IRAM_ATTR) so it can run from an ESP-IDF
IRAM ISR while the flash cache is disabled, but on its hot path it
called z_sem_pop_waiter(), a flash-resident static helper. When the
give came from an IRAM ISR during a concurrent flash op (e.g. an
IRAM GPIO ISR submitting work -> k_work_submit -> k_sem_give), the
fetch of z_sem_pop_waiter faulted with a cache-access error and the
chip panicked (issue #53, found and fixed locally by a downstream).

Mark z_sem_pop_waiter K_ISR_SAFE so it relocates into IRAM. It is
pure list-walking over the caller-owned waiter list with no FreeRTOS
calls, so it is safe in IRAM. Its other caller, k_sem_reset(), is
flash-resident, but flash code calling an IRAM helper is fine.

Sweeping the rest of the K_ISR_SAFE call graph for the same class
(an IRAM function calling a flash-resident static helper) turned up
one sibling: z_event_match() in k_event.c, called on both ISR-safe
paths -- the post family (z_event_post_internal) and the wait family
fast path (z_event_wait_internal). Mark it K_ISR_SAFE too; it is pure
arithmetic with no FreeRTOS calls.

No behavioral change; host (linux target) suite unaffected (224/224),
clang-format clean. The fault is a target-only memory-placement issue
the host build cannot observe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The issue-#53 fault is invisible to the host (linux) suite: it is a
target-only memory-placement property. This guard runs against a real
target ELF and fails if any K_ISR_SAFE helper landed in a flash-mapped
text section instead of .iram0.text -- catching the exact regression
where an IRAM_ATTR caller reaches a flash-resident static helper.

Section-name based (.iram0.text vs .flash.text), so it is ISA-agnostic
across Xtensa and RISC-V targets. Validated on esp32s3: z_sem_pop_waiter
and z_event_match resolve into .iram0.text, while non-ISR helpers
(k_sem_reset, k_timer_start) remain in .flash.text.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire tools/check_iram_symbols.sh into the existing build-esp32s3 job so
every PR verifies that K_ISR_SAFE helpers are IRAM-resident on a real
target ELF -- the issue-#53 regression the host (linux) suite cannot
observe. Reuses the target build already produced by that job, so no
extra CI cost.

Also make the script derive its symbol set from source (every K_ISR_SAFE
definition in components/*/src) instead of a hand-maintained list, so a
newly-added ISR-safe helper is covered automatically. Inlined/gc'd
symbols report "skip" (safe); only an out-of-line, flash-resident symbol
fails. Validated on esp32s3: 27 symbols derived, 25 confirmed in
.iram0.text (incl. z_sem_pop_waiter and z_event_match), 2 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

k_sem_give faults (cache access error) from IRAM ISR: z_sem_pop_waiter is flash-resident

1 participant