Skip to content

Parse SVG arc paths with packed large-arc/sweep flags (fixes #129)#245

Open
gaoflow wants to merge 1 commit into
mathandy:masterfrom
gaoflow:parse-packed-arc-flags
Open

Parse SVG arc paths with packed large-arc/sweep flags (fixes #129)#245
gaoflow wants to merge 1 commit into
mathandy:masterfrom
gaoflow:parse-packed-arc-flags

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 30, 2026

Copy link
Copy Markdown

Fixes #129.

The bug

parse_path() raises IndexError on valid SVG arc commands whose large-arc-flag/sweep-flag are written without a separator before the following number:

>>> from svgpathtools import parse_path
>>> parse_path("M 0 0 A 5 5 0 0110 0")
IndexError: pop from empty list
>>> parse_path("M 0 0 A 5 5 0 0 1 10 0")     # the spaced-out form works
Path(Arc(start=0j, radius=(5+5j), rotation=0.0, large_arc=False, sweep=True, end=(10+0j)))

Per the SVG path grammar, each flag is a single "0"/"1" token that may be immediately followed by the next number with no separator, so 0110 0 means flag 0, flag 1, number 10, number 0. Every browser and librsvg accept this, and SVGO emits it to save bytes — which is exactly the real-world path in #129 (the simple-icons "emlakjet" icon, …a3.543 3.543 0 00-1.267…).

Root cause

_tokenize_path tokenizes every command's arguments with FLOAT_RE.findall:

for token in FLOAT_RE.findall(x):
    yield token

FLOAT_RE is greedy and command-agnostic, so for an arc it reads the packed flag pair 0110 as the single number 110. The arc branch then pops seven values (rx, ry, rotation, large_arc, sweep, x, y) but only five tokens exist, so elements.pop() empties the list → IndexError at path.py:3361.

The fix

Tokenize arc arguments flag-aware: the two flag fields are read one "0"/"1" character at a time, while every other field stays a normal (greedy) number. The resulting token stream for a packed arc is byte-identical to its spaced-out form, so the existing pop-based parser is unchanged — only the tokenizer learned that arc flags are single characters. Non-arc commands keep using FLOAT_RE.findall exactly as before.

This mirrors the upstream svg.path library (from which this parser was originally copied), which already tokenizes arc flags this way.

Tests

test_arc_flag_packing asserts that six packed forms (absolute and relative, a negative coordinate directly after a flag, and a two-arc group with the second packed) parse identically to their spaced-out equivalents, that the flags carry through to the resulting Arc, and that the real #129 simple-icons path parses instead of raising. The new test fails on master with the original IndexError; the full suite (102 tests) passes with the fix.

parse_path() raised IndexError on valid arc commands whose large-arc-flag
and sweep-flag are written without a separator before the next number
(e.g. 'A 5 5 0 0110 0', as emitted by SVGO and accepted by every browser).
The tokenizer used FLOAT_RE.findall for every command, so it greedily read
the packed flag pair '0110' as the single number 110, leaving the arc
branch's seven pops to run the token list dry.

Tokenize arc arguments flag-aware: read the two flag fields one '0'/'1'
character at a time. The token stream for a packed arc is then identical to
its spaced-out form, which the existing parser already handles. Fixes mathandy#129.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exceptions parsing paths

1 participant