Track LLM token usage in reports by HexSleeves · Pull Request #244 · NVIDIA/SkillSpector

HexSleeves · 2026-07-03T04:02:22Z

Capture per-call token telemetry so report metadata can surface LLM token consumption and detect cost/usage trends.
Preserve existing structured-output parsing while retaining raw provider metadata needed for usage (LangChain raw payload).

Add token counters to LLMCallRecord and llm_call_record() defaults: input_tokens, output_tokens, and total_tokens.
Use with_structured_output(..., include_raw=True) and unwrap {"raw","parsed","parsing_error"} responses, extracting usage from raw.usage_metadata (supports input_tokens/output_tokens and prompt_tokens/completion_tokens variants) and preserving parsing-error behavior.
Record token usage in both sync (run_batches) and async (arun_batches) paths (including non-structured raw responses), and propagate analyzer.llm_usage into node-level llm_call_log entries (success and failure).
Aggregate llm_call_log token counters into report JSON metadata under metadata.llm_usage while keeping existing metadata intact.
Add support in the CLI structured-output adapter for include_raw=True so local/CLI providers expose raw usage without changing parser expectations.
Add unit tests covering the new fields, sync/async usage capture, missing usage metadata handling, structured-output include_raw=True parsing, and metadata aggregation; closes feat: expose LLM token usage in JSON report output #242.

Ran ruff format . and ruff check . with no issues.
Ran the full test suite with pytest -q: all tests passed (1261 passed, 12 skipped, 34 deselected, 6 xfailed).
mypy src was executed and reported pre-existing typing issues in unrelated modules; no new type regressions introduced by these changes (mypy warnings are repository pre-existing and not caused by this PR).

Track LLM token usage in reports

3bb9d24

Provide feedback