Skip to content

feat(firewall): payload sizing, byte budgets, tiktoken counter, honest summaries#237

Open
dgenio wants to merge 2 commits into
mainfrom
claude/github-issue-triage-nfi1ep
Open

feat(firewall): payload sizing, byte budgets, tiktoken counter, honest summaries#237
dgenio wants to merge 2 commits into
mainfrom
claude/github-issue-triage-nfi1ep

Conversation

@dgenio

@dgenio dgenio commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Closes #207, closes #211, closes #218, closes #174.

This is the grouped firewall PR identified during triage: four issues that share one implementation path — how the context firewall measures, bounds, and faithfully represents tool-output payloads before they reach the LLM. They live in firewall/ + handles.py and are cleaner together (a single shared size estimator) than apart.

What changed

  • firewall/size_estimate.py (new) — Avoid serializing entire payloads to measure size in the firewall hot path #207. Allocation-free estimated_size(value, *, limit=None) that walks a structure to approximate its serialised length instead of json.dumps-ing it. transform.py now uses it for the raw-mode budget check (and drops import json), so a multi-MB raw payload is no longer fully serialised purely to measure it. Deterministic; exact for scalar containers, bounded error otherwise (only threshold comparisons depend on it).
  • handles.pyAdd byte-size-aware budgeting and eviction to HandleStore #211. HandleStore gains opt-in max_total_bytes (evict oldest-first, never the just-stored entry) and max_entry_bytes (reject an over-cap payload with the new HandleTooLarge error rather than truncating it — keeps expansion faithful). Both default to None → behaviour unchanged. current_bytes exposes resident usage; eviction/expiry paths maintain the accounting.
  • handle_query.py (new) — refactor. The HandleStore.expand query pipeline (validation → filter → paginate → project → redact → Frame) moves verbatim into a free expand_handle(); the store keeps a thin delegator that injects its TTL-checked fetch via a lambda. This was required to stay within the shrink-only module-size ratchet after adding the byte-budget code, and it brings handles.py from 371 → 264 lines (also chips at Bring oversized modules back within the documented 300-line budget #188). handles.py is removed from _SIZE_RATCHET per the documented "shrank below 300 → drop it" rule.
  • firewall/token_counting_tiktoken.py (new) — Implement the reserved tiktoken extra for accurate token counting #218. make_tiktoken_counter(encoding="cl100k_base") implements the existing TokenCounter seam with a real tokenizer: lazy import tiktoken (base install stays light), cached encoder, eager resolution so a missing extra (ImportError) / unknown encoding (FirewallError) fails at construction. Exported from weaver_kernel.firewall; wired via BudgetManager(token_counter=...).
  • firewall/summarize.pySurface truncation explicitly in summaries and fix boolean handling in summarize() #174. Boolean columns are reported as True/False counts instead of being averaged (a bool is an int subclass — "mean of is_active = 0.7" is nonsense); truncated fact lists end in an explicit … (N more facts omitted; full data via handle) marker, never silently.
  • Docs/metadata: docs/context_firewall.md (budget estimation, byte budgets, tiktoken usage, summary semantics), CHANGELOG.md, HandleTooLarge exported, tiktoken.* mypy override.

Why

Grouped per the triage: same module, same bug category (size/budget accounting), one shared estimator. Decisions and assumptions (Mode B):

  • tiktoken default encoding cl100k_base (broadest compatibility; o200k_base documented for GPT-4o/o-series) — the encoding is explicit, not guessed per model.
  • Oversized handle entry → reject (not silent truncate), because a truncated raw dataset would corrupt expansion and the goal is bounding resident raw data.
  • The issue's reference to Firewall(token_counter=…) is imprecise — the real seam is BudgetManager(token_counter=…); wired there.
  • tiktoken is not added to [dev]: it downloads BPE vocabularies over the network on first use, which must not be a test dependency. The counter's logic is unit-tested against a fake encoder injected into sys.modules, fully offline.

How verified

  • make ciEXIT 0: ruff format --check (117 files), ruff check (all passed), mypy --strict, pytest 762 passed, 1 skipped (branch-coverage fail_under gate met), all 13 examples run.
  • New tests: tests/test_size_estimate.py (scalars, exactness on scalar containers, ±25% bounded error corpus, early-exit, non-serialisable fallback, deep-nesting no-recursion), tests/test_token_counting_tiktoken.py (fake-encoder: value handling, unknown-encoding FirewallError, missing-extra ImportError, BudgetManager wiring). Extended tests/test_firewall.py (bool-not-averaged, numeric still averaged, truncation marker present/absent, exactly-max_facts no false marker) and tests/test_handles.py (per-entry reject, total-bytes eviction order, evicted-handle expand → HandleNotFound, accounting release, invalid-config). Added size_estimate to the doctest gate.
  • Behaviour-preserving: the existing test_handles.py expand suite (now exercising the delegator → handle_query) passes unchanged.

Risks / caveats

  • estimated_size is approximate; it preserves the only size-driven decision (raw-mode warning) and is bounded by tests, but it is not byte-identical to json.dumps.
  • max_entry_bytes rejection raises from store(); only callers that opt in are affected (kernel default is None, so the invoke path is untouched). Kernel-level wiring of byte budgets is intentionally out of scope (HandleStore-only, per Add byte-size-aware budgeting and eviction to HandleStore #211).
  • One PR spans four issues by request; the handle_query move is verbatim but does enlarge the diff.

🤖 Generated with Claude Code

https://claude.ai/code/session_01HjetuNJX1EUJ2eVGfoLyHt


Generated by Claude Code

…t summaries

Group the four context-firewall issues that share a single implementation path
(payload measurement and the bounded Frame) into one change:

- #207: add allocation-free firewall.estimated_size and use it for the raw-mode
  budget check instead of len(json.dumps(...)), so large payloads are no longer
  fully serialised just to be measured.
- #211: add opt-in byte budgets to HandleStore (max_total_bytes evicts
  oldest-first; max_entry_bytes rejects an over-cap payload with the new
  HandleTooLarge error). Defaults stay None so behaviour is unchanged.
- #218: ship firewall.make_tiktoken_counter implementing the TokenCounter seam
  with a real tokenizer (lazy import, cached encoder, configurable encoding),
  wired via BudgetManager(token_counter=...).
- #174: stop averaging boolean columns in summaries (report true/false counts)
  and surface truncation with an explicit "N more facts omitted" marker.

To stay within the shrink-only module-size ratchet, the HandleStore.expand
query pipeline moves verbatim into a new handle_query module (also reduces the
handles.py ceiling debt tracked by #188). Docs and CHANGELOG updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HjetuNJX1EUJ2eVGfoLyHt
Copilot AI review requested due to automatic review settings June 22, 2026 07:48

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the context firewall and handle storage to better measure and bound tool-output payloads before they reach the LLM, adding an allocation-free size estimator, optional byte budgeting for HandleStore, a tiktoken-backed token counter, and more faithful summaries.

Changes:

  • Replace json.dumps-based sizing in the firewall with firewall.estimated_size() and export it as part of the firewall API.
  • Add optional HandleStore budgeting (max_total_bytes, max_entry_bytes) plus HandleTooLarge, and refactor handle expansion into expand_handle() in a new module.
  • Add make_tiktoken_counter() (lazy optional dependency) and improve summary fidelity (bool handling + explicit truncation marker), with docs/tests updates.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_token_counting_tiktoken.py New tests for the tiktoken counter factory and BudgetManager wiring using an offline fake module.
tests/test_size_estimate.py New tests for allocation-free size estimation, bounded error, and early-exit behavior.
tests/test_handles.py Extends handle-store tests for byte budgeting, eviction, and config validation.
tests/test_firewall.py Adds tests for boolean summarization and explicit truncation markers.
tests/test_doctests.py Adds firewall.size_estimate to doctest coverage.
tests/test_architecture.py Removes handles.py from the shrink-only size ratchet after refactor.
src/weaver_kernel/handles.py Adds byte budgeting/accounting and delegates expansion to handle_query.expand_handle().
src/weaver_kernel/handle_query.py New module containing the handle expansion pipeline (validation → filter → paginate → project → redact → Frame).
src/weaver_kernel/firewall/transform.py Uses estimated_size() instead of json.dumps() for raw-mode sizing checks.
src/weaver_kernel/firewall/token_counting_tiktoken.py New tiktoken-backed token counter factory with lazy import and cached encoding.
src/weaver_kernel/firewall/summarize.py Fixes bool-as-int averaging and adds explicit omission markers for truncated facts.
src/weaver_kernel/firewall/size_estimate.py New iterative allocation-free payload size estimator with optional early exit.
src/weaver_kernel/firewall/init.py Exports estimated_size and make_tiktoken_counter.
src/weaver_kernel/errors.py Expands BudgetConfigError docstring and adds HandleTooLarge.
src/weaver_kernel/init.py Re-exports HandleTooLarge.
pyproject.toml Adds mypy ignore override for tiktoken.*.
docs/context_firewall.md Documents size estimation, handle byte budgets, summary semantics, and tiktoken usage.
CHANGELOG.md Adds an unreleased entry describing the grouped firewall improvements.

Comment on lines 118 to 121
def _summarize_dict(data: dict[str, Any], *, max_facts: int) -> list[str]:
facts: list[str] = [f"Keys: {', '.join(sorted(data.keys())[:20])}"]
for k, v in list(data.items())[: max_facts - 1]:
for k, v in data.items():
if isinstance(v, (int, float)):
Comment thread src/weaver_kernel/firewall/summarize.py Outdated
Comment on lines +134 to +137
def _summarize_plain_list(data: list[Any], *, max_facts: int) -> list[str]:
facts = [f"List of {len(data)} items"]
for item in data[: max_facts - 1]:
facts.append(repr(item)[:100])
return facts[:max_facts]
facts.extend(repr(item)[:100] for item in data)
return _truncate_facts(facts, max_facts)
Comment thread src/weaver_kernel/handles.py Outdated
Comment on lines +94 to +100
size = estimated_size(data)
if self._max_entry_bytes is not None and size > self._max_entry_bytes:
raise HandleTooLarge(
f"Handle data for '{capability_id}' is ~{size} bytes, exceeding the "
f"per-entry cap of {self._max_entry_bytes} bytes; not stored."
)

Comment on lines +81 to +90
elif isinstance(cur, dict):
n = len(cur)
total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP
for key, val in cur.items():
total += len(str(key)) + _QUOTES + _KV_SEP
stack.append(val)
elif isinstance(cur, (list, tuple)):
n = len(cur)
total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP
stack.extend(cur)
Comment on lines +65 to +67
total = 0
stack: list[Any] = [value]
while stack:
…stimator, total-budget invariant

- summarize: cap _summarize_dict/_summarize_plain_list iteration with islice and
  derive the omitted count from known length, so a huge dict/list no longer
  triggers an O(n) walk on the firewall hot path (defeating max_facts).
- size_estimate: count each container at most once (cycle detection) so
  self-referential inputs terminate instead of hanging; correct the docstring's
  traversal-order claim (the total is order-independent).
- handles: reject a single entry larger than the binding cap (max_entry_bytes or
  max_total_bytes) so current_bytes can never exceed max_total_bytes; use
  estimated_size(limit=...) to avoid fully walking an over-cap payload.
- Docs/CHANGELOG and tests updated (cyclic-input, total-budget rejection).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HjetuNJX1EUJ2eVGfoLyHt

dgenio commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

Thanks — all five points were valid and are fixed in 25b26dd:

  • summarize.py (dict & plain-list O(n) on the hot path): both now iterate at most max_facts items via islice and derive the omitted count from the known length, so a huge payload no longer materialises one fact per item.
  • handles.py (store could stay over max_total_bytes): a single entry larger than the binding cap (max_entry_bytes or max_total_bytes) is now rejected with HandleTooLarge, so current_bytes can never exceed the budget. The size check uses estimated_size(..., limit=...) so an over-cap payload isn't fully walked. Added test_entry_larger_than_total_budget_is_rejected / test_total_budget_never_exceeded_after_stores.
  • size_estimate.py (cyclic input): the walk now records each container's id and visits it once, so self-referential structures terminate instead of hanging (covered by new cyclic-input tests).
  • size_estimate.py (traversal-order docstring): the total is order-independent, so I corrected the docstring rather than reverse the push order (the early-exit value is documented as "only guaranteed > limit").

make ci is green locally (766 passed).


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants