bound recovery scan to input end in xs_textdecoder_decode by rootvector2 · Pull Request #1637 · Moddable-OpenSource/moddable

rootvector2 · 2026-06-13T19:43:06Z

the illegal-sequence recovery loop in xs_textdecoder_decode scans continuation bytes without checking src < srcEnd, so a streaming decode() that finishes a buffered partial lead with a short final chunk runs src past the input and passes a negative length to the trailing c_memcpy; caught with an asan build of xst.

phoddie · 2026-06-14T23:06:35Z

Interesting find.

I believe the proposed change will avoid the out of bounds read. But I think there may be a bug lurking in that code path that is the real source of the problem. I need to investigate a bit further.

rootvector2 · 2026-06-15T09:35:32Z

agreed, the bounds check only stops the over-read. the root cause is that the recovery loop scans continuation bytes straight from src and ignores bufferLength. in the streaming completion path the lead and its first continuation come from the saved buffer, so the loop walks src past bytes it never owned, and with no src < srcEnd guard it runs off the end of the view when the backing store holds more continuation-range bytes. that negative srcEnd - src is what poisons the trailing c_memcpy. happy to rework the loop to consume from buffer first the way the legal decode path does if you'd rather fix it at the source. your call on direction.

phoddie · 2026-06-15T20:37:08Z

I think I'd do something like this:

			const uint8_t *s = &utf8[1];
			while (clen-- > 0) {
				uint8_t c = *s++;
				if ((lower <= c) && (c <= upper)) {
					if (bufferLength) {
						bufferLength--;
						buffer++;
					}
					else
						src++;
				}
				else
					break;
			}

It reuses utf8 to avoid having to decide whether to read from buffer or src. Something similar would be in the second loop as well.

…_decode the illegal-sequence recovery loop read continuation bytes straight from src and advanced src for each one, ignoring the bytes carried over in the streaming buffer. completing a buffered partial lead with a short final chunk walked src past the view end and passed a negative length to the trailing c_memcpy. read the already-assembled utf8 bytes and advance buffer or src the same way the legal decode path does, in both passes.

rootvector2 · 2026-06-16T09:12:33Z

makes sense, that's the real fix. reworked both loops to read the continuation bytes from utf8 and advance buffer before src, so recovery counts the buffered bytes the same way the legal path does and never runs src past the view. side effect is that F0 80 90 90 now decodes to four U+FFFD per spec instead of the two my old guard produced. asan build is clean and decode.js / decodestream.js / surrogates.js pass; bumped the decodestream.js regression to the four-replacement result.

phoddie · 2026-06-16T16:47:28Z

Thanks for reviewing, testing, and integrating. We'll get this merged.

mkellner · 2026-06-16T17:14:57Z

This has been merged and will be available in the next release.

rootvector2 force-pushed the textdecoder-recovery-bounds branch from f690635 to 8e8b58c Compare June 16, 2026 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bound recovery scan to input end in xs_textdecoder_decode#1637

bound recovery scan to input end in xs_textdecoder_decode#1637
rootvector2 wants to merge 1 commit into
Moddable-OpenSource:publicfrom
rootvector2:textdecoder-recovery-bounds

rootvector2 commented Jun 13, 2026

Uh oh!

phoddie commented Jun 14, 2026

Uh oh!

rootvector2 commented Jun 15, 2026

Uh oh!

phoddie commented Jun 15, 2026

Uh oh!

rootvector2 commented Jun 16, 2026

Uh oh!

phoddie commented Jun 16, 2026

Uh oh!

mkellner commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rootvector2 commented Jun 13, 2026

Uh oh!

phoddie commented Jun 14, 2026

Uh oh!

rootvector2 commented Jun 15, 2026

Uh oh!

phoddie commented Jun 15, 2026

Uh oh!

rootvector2 commented Jun 16, 2026

Uh oh!

phoddie commented Jun 16, 2026

Uh oh!

mkellner commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants