test(waterdata): rerun flaky transient 5xx/429 from the chunked fan-out#325
Merged
thodson-usgs merged 1 commit intoJun 15, 2026
Merged
Conversation
dd307dc to
e2d32d3
Compare
The suite already retries transient HTTP failures (flaky's `only_rerun`), but the patterns missed two kinds, so a transient upstream 502 failed CI instead of retrying: - a direct 5xx is raised as `ServiceUnavailable`, and - a chunked request wraps a transient 429/5xx as `QuotaExhausted` / `ServiceInterrupted`. Add patterns for both. Verified they retry these transient errors but not deterministic ones (e.g. a 404 or an assertion failure). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e2d32d3 to
ece83da
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The live
tests/waterdata_test.pysuite reruns transient HTTP failures viapytest.mark.flaky(only_rerun=[...]), but the patterns only match the direct-path shapesRateLimited:/RuntimeError: 5xx:. Two transient sources are no longer covered:ChunkInterruptedsubclass —ServiceInterrupted(5xx) /QuotaExhausted(429) — from the#322chunker work.#313/#319): a direct 5xx now raisesServiceUnavailable(aTransientError), notRuntimeError.None of these match
only_rerun, so a transient upstream 502 during a multi-value (chunked)get_monitoring_locationscall fails CI outright instead of being retried. This surfaced on PR #324 (test (ubuntu-latest, 3.13)red withServiceInterrupted: ... Cause: ServiceUnavailable: 502: Bad Gateway, while the other five matrix cells passed), but the gap is pre-existing onmain— PR #324 only got unlucky enough to hit it.Fix
Add one
only_rerunpattern coveringServiceUnavailable/QuotaExhausted/ServiceInterrupted.Verified by simulation that it matches those transient exceptions (including a
ServiceInterruptedwrapping a 502) but not deterministic failures (HTTPError404,AssertionError);RateLimited(429) stays covered by the existing pattern.Notes
tests/ngwmn_test.py(added by feat: add NGWMN getters as anogcsibling; extract a shared OGC engine #324) carries the sameonly_rerunblock and will want the identical pattern once that PR lands — worth a follow-up, or a shared constant to prevent drift.🤖 Generated with Claude Code