Skip to content

api: bound unbounded trace endpoints + temp traceparent debug log#49

Merged
thejefflarson merged 1 commit into
mainfrom
traces-guards-and-tp-debug
Jun 8, 2026
Merged

api: bound unbounded trace endpoints + temp traceparent debug log#49
thejefflarson merged 1 commit into
mainfrom
traces-guards-and-tp-debug

Conversation

@thejefflarson

Copy link
Copy Markdown
Owner

Two things, from the service-map investigation.

Bound the unbounded trace endpoints (fixes the 500s)

list_traces and service_red did a full-table GROUP BY / percentile aggregate when called with no time window — the UI always passes one, but a bare /api/traces or /api/services would scan the whole spans table and hit statement_timeout → 500 (and the heavy scan starved the pool, slowing ingest). Default the lower bound to now() - 24h when absent so they can't blow up. Explicit ranges are unchanged; /api/logs already uses an indexed ORDER BY time DESC LIMIT so it didn't need it. Two tests that queried unbounded over 1970-era timestamps now use recent data and expect the recent-window default.

Temporary traceparent debug log

A one-line info in otel_request_span recording whether an incoming traceparent reached the app, to settle traefik-not-injecting vs linkerd-stripping for the missing traefik → watcher edge. Will be removed in a follow-up once diagnosed.

34 tests pass.

🤖 Generated with Claude Code

list_traces and service_red did a full-table GROUP BY when called with no time
window (the UI always passes one, but a bare API call would scan the whole spans
table and hit statement_timeout → 500). Default the lower bound to now()-24h when
absent so they can't blow up; explicit ranges are unchanged. Two tests that
queried unbounded over ancient timestamps now use recent data / expect the
recent-window default.

Also add a temporary debug log in otel_request_span recording whether an incoming
`traceparent` reached the app — to settle whether the missing traefik→watcher
edge is traefik not injecting it vs the linkerd sidecar stripping it. Removed in a
follow-up once diagnosed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@thejefflarson thejefflarson merged commit 096ab22 into main Jun 8, 2026
5 checks passed
@thejefflarson thejefflarson deleted the traces-guards-and-tp-debug branch June 8, 2026 07:38
thejefflarson added a commit that referenced this pull request Jun 9, 2026
The debug log from #49 served its purpose. The missing "traefik -> watcher" edge
turned out to be a non-issue: watcher is fronted by its own cloudflared tunnel,
not traefik, so there's no traced upstream to parent to — root spans are expected.
The trace-context extraction (otel_request_span) stays for when a tracing-aware
proxy does front it; only the debug log is removed.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant