Goal
Dogfood the GitHub-backed trace dashboard against the Phoenix web UI using the same trace/session data, so AgentV's local Dashboard reaches deliberate feature parity where parity matters and has comparable performance for core review workflows.
Scenario
Use one realistic agent run that produces OpenInference/OTLP trace data and can be viewed in both places:
- AgentV Dashboard reads AgentV-owned run artifacts from local/GitHub results, and if available the B2-backed hybrid path.
- Phoenix reads the independently emitted matching session/traces through its normal UI and GraphQL/API.
- The same run/session/trace IDs are correlated through external_trace metadata, not by exporting AgentV artifacts into Phoenix.
UX parity checklist
Compare Phoenix and AgentV Dashboard for:
- session summary: trace count, latency, token usage, costs, annotations when present;
- turns view: root-span input/output, timing, token/cost metadata, selected turn state;
- trace list/tree: nesting, multiple roots, status, span kind, duration, selected span state;
- span detail: attributes, events, tool-call metadata, errors, input/output, raw artifact links;
- AgentV-specific affordances: links back to transcript/files/checks/source and run/eval context;
- empty/error states for missing trace artifacts, unresolved Phoenix links, and unavailable auth/endpoint.
Performance checklist
Capture comparable measurements for both UIs on the same data: initial run/session detail load, trace tab load, trace tree render, selected span switch latency, route/API payload counts and sizes, and any full-bundle materialization. AgentV list/aggregate routes must stay manifest-only; trace/transcript/body reads should happen only on detail/drilldown.
Evidence
Save screenshots, short screen recordings, timing notes, and any HAR/perf summaries to EntityProcess/agentv-private. Do not commit evidence artifacts, secrets, BWS-derived values, or Phoenix auth material to the public repo.
Goal
Dogfood the GitHub-backed trace dashboard against the Phoenix web UI using the same trace/session data, so AgentV's local Dashboard reaches deliberate feature parity where parity matters and has comparable performance for core review workflows.
Scenario
Use one realistic agent run that produces OpenInference/OTLP trace data and can be viewed in both places:
UX parity checklist
Compare Phoenix and AgentV Dashboard for:
Performance checklist
Capture comparable measurements for both UIs on the same data: initial run/session detail load, trace tab load, trace tree render, selected span switch latency, route/API payload counts and sizes, and any full-bundle materialization. AgentV list/aggregate routes must stay manifest-only; trace/transcript/body reads should happen only on detail/drilldown.
Evidence
Save screenshots, short screen recordings, timing notes, and any HAR/perf summaries to EntityProcess/agentv-private. Do not commit evidence artifacts, secrets, BWS-derived values, or Phoenix auth material to the public repo.