feat(firewall): payload sizing, byte budgets, tiktoken counter, honest summaries#237
Open
dgenio wants to merge 2 commits into
Open
feat(firewall): payload sizing, byte budgets, tiktoken counter, honest summaries#237dgenio wants to merge 2 commits into
dgenio wants to merge 2 commits into
Conversation
…t summaries Group the four context-firewall issues that share a single implementation path (payload measurement and the bounded Frame) into one change: - #207: add allocation-free firewall.estimated_size and use it for the raw-mode budget check instead of len(json.dumps(...)), so large payloads are no longer fully serialised just to be measured. - #211: add opt-in byte budgets to HandleStore (max_total_bytes evicts oldest-first; max_entry_bytes rejects an over-cap payload with the new HandleTooLarge error). Defaults stay None so behaviour is unchanged. - #218: ship firewall.make_tiktoken_counter implementing the TokenCounter seam with a real tokenizer (lazy import, cached encoder, configurable encoding), wired via BudgetManager(token_counter=...). - #174: stop averaging boolean columns in summaries (report true/false counts) and surface truncation with an explicit "N more facts omitted" marker. To stay within the shrink-only module-size ratchet, the HandleStore.expand query pipeline moves verbatim into a new handle_query module (also reduces the handles.py ceiling debt tracked by #188). Docs and CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HjetuNJX1EUJ2eVGfoLyHt
There was a problem hiding this comment.
Pull request overview
This PR upgrades the context firewall and handle storage to better measure and bound tool-output payloads before they reach the LLM, adding an allocation-free size estimator, optional byte budgeting for HandleStore, a tiktoken-backed token counter, and more faithful summaries.
Changes:
- Replace
json.dumps-based sizing in the firewall withfirewall.estimated_size()and export it as part of the firewall API. - Add optional
HandleStorebudgeting (max_total_bytes,max_entry_bytes) plusHandleTooLarge, and refactor handle expansion intoexpand_handle()in a new module. - Add
make_tiktoken_counter()(lazy optional dependency) and improve summary fidelity (bool handling + explicit truncation marker), with docs/tests updates.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_token_counting_tiktoken.py | New tests for the tiktoken counter factory and BudgetManager wiring using an offline fake module. |
| tests/test_size_estimate.py | New tests for allocation-free size estimation, bounded error, and early-exit behavior. |
| tests/test_handles.py | Extends handle-store tests for byte budgeting, eviction, and config validation. |
| tests/test_firewall.py | Adds tests for boolean summarization and explicit truncation markers. |
| tests/test_doctests.py | Adds firewall.size_estimate to doctest coverage. |
| tests/test_architecture.py | Removes handles.py from the shrink-only size ratchet after refactor. |
| src/weaver_kernel/handles.py | Adds byte budgeting/accounting and delegates expansion to handle_query.expand_handle(). |
| src/weaver_kernel/handle_query.py | New module containing the handle expansion pipeline (validation → filter → paginate → project → redact → Frame). |
| src/weaver_kernel/firewall/transform.py | Uses estimated_size() instead of json.dumps() for raw-mode sizing checks. |
| src/weaver_kernel/firewall/token_counting_tiktoken.py | New tiktoken-backed token counter factory with lazy import and cached encoding. |
| src/weaver_kernel/firewall/summarize.py | Fixes bool-as-int averaging and adds explicit omission markers for truncated facts. |
| src/weaver_kernel/firewall/size_estimate.py | New iterative allocation-free payload size estimator with optional early exit. |
| src/weaver_kernel/firewall/init.py | Exports estimated_size and make_tiktoken_counter. |
| src/weaver_kernel/errors.py | Expands BudgetConfigError docstring and adds HandleTooLarge. |
| src/weaver_kernel/init.py | Re-exports HandleTooLarge. |
| pyproject.toml | Adds mypy ignore override for tiktoken.*. |
| docs/context_firewall.md | Documents size estimation, handle byte budgets, summary semantics, and tiktoken usage. |
| CHANGELOG.md | Adds an unreleased entry describing the grouped firewall improvements. |
Comment on lines
118
to
121
| def _summarize_dict(data: dict[str, Any], *, max_facts: int) -> list[str]: | ||
| facts: list[str] = [f"Keys: {', '.join(sorted(data.keys())[:20])}"] | ||
| for k, v in list(data.items())[: max_facts - 1]: | ||
| for k, v in data.items(): | ||
| if isinstance(v, (int, float)): |
Comment on lines
+134
to
+137
| def _summarize_plain_list(data: list[Any], *, max_facts: int) -> list[str]: | ||
| facts = [f"List of {len(data)} items"] | ||
| for item in data[: max_facts - 1]: | ||
| facts.append(repr(item)[:100]) | ||
| return facts[:max_facts] | ||
| facts.extend(repr(item)[:100] for item in data) | ||
| return _truncate_facts(facts, max_facts) |
Comment on lines
+94
to
+100
| size = estimated_size(data) | ||
| if self._max_entry_bytes is not None and size > self._max_entry_bytes: | ||
| raise HandleTooLarge( | ||
| f"Handle data for '{capability_id}' is ~{size} bytes, exceeding the " | ||
| f"per-entry cap of {self._max_entry_bytes} bytes; not stored." | ||
| ) | ||
|
|
Comment on lines
+81
to
+90
| elif isinstance(cur, dict): | ||
| n = len(cur) | ||
| total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP | ||
| for key, val in cur.items(): | ||
| total += len(str(key)) + _QUOTES + _KV_SEP | ||
| stack.append(val) | ||
| elif isinstance(cur, (list, tuple)): | ||
| n = len(cur) | ||
| total += _CONTAINER_DELIMS + max(0, n - 1) * _ITEM_SEP | ||
| stack.extend(cur) |
Comment on lines
+65
to
+67
| total = 0 | ||
| stack: list[Any] = [value] | ||
| while stack: |
…stimator, total-budget invariant - summarize: cap _summarize_dict/_summarize_plain_list iteration with islice and derive the omitted count from known length, so a huge dict/list no longer triggers an O(n) walk on the firewall hot path (defeating max_facts). - size_estimate: count each container at most once (cycle detection) so self-referential inputs terminate instead of hanging; correct the docstring's traversal-order claim (the total is order-independent). - handles: reject a single entry larger than the binding cap (max_entry_bytes or max_total_bytes) so current_bytes can never exceed max_total_bytes; use estimated_size(limit=...) to avoid fully walking an over-cap payload. - Docs/CHANGELOG and tests updated (cyclic-input, total-budget rejection). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HjetuNJX1EUJ2eVGfoLyHt
Owner
Author
|
Thanks — all five points were valid and are fixed in
Generated by Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #207, closes #211, closes #218, closes #174.
This is the grouped firewall PR identified during triage: four issues that share one implementation path — how the context firewall measures, bounds, and faithfully represents tool-output payloads before they reach the LLM. They live in
firewall/+handles.pyand are cleaner together (a single shared size estimator) than apart.What changed
firewall/size_estimate.py(new) — Avoid serializing entire payloads to measure size in the firewall hot path #207. Allocation-freeestimated_size(value, *, limit=None)that walks a structure to approximate its serialised length instead ofjson.dumps-ing it.transform.pynow uses it for the raw-mode budget check (and dropsimport json), so a multi-MB raw payload is no longer fully serialised purely to measure it. Deterministic; exact for scalar containers, bounded error otherwise (only threshold comparisons depend on it).handles.py— Add byte-size-aware budgeting and eviction to HandleStore #211.HandleStoregains opt-inmax_total_bytes(evict oldest-first, never the just-stored entry) andmax_entry_bytes(reject an over-cap payload with the newHandleTooLargeerror rather than truncating it — keeps expansion faithful). Both default toNone→ behaviour unchanged.current_bytesexposes resident usage; eviction/expiry paths maintain the accounting.handle_query.py(new) — refactor. TheHandleStore.expandquery pipeline (validation → filter → paginate → project → redact → Frame) moves verbatim into a freeexpand_handle(); the store keeps a thin delegator that injects its TTL-checked fetch via alambda. This was required to stay within the shrink-only module-size ratchet after adding the byte-budget code, and it bringshandles.pyfrom 371 → 264 lines (also chips at Bring oversized modules back within the documented 300-line budget #188).handles.pyis removed from_SIZE_RATCHETper the documented "shrank below 300 → drop it" rule.firewall/token_counting_tiktoken.py(new) — Implement the reservedtiktokenextra for accurate token counting #218.make_tiktoken_counter(encoding="cl100k_base")implements the existingTokenCounterseam with a real tokenizer: lazyimport tiktoken(base install stays light), cached encoder, eager resolution so a missing extra (ImportError) / unknown encoding (FirewallError) fails at construction. Exported fromweaver_kernel.firewall; wired viaBudgetManager(token_counter=...).firewall/summarize.py— Surface truncation explicitly in summaries and fix boolean handling insummarize()#174. Boolean columns are reported asTrue/Falsecounts instead of being averaged (aboolis anintsubclass — "mean ofis_active= 0.7" is nonsense); truncated fact lists end in an explicit… (N more facts omitted; full data via handle)marker, never silently.docs/context_firewall.md(budget estimation, byte budgets, tiktoken usage, summary semantics),CHANGELOG.md,HandleTooLargeexported,tiktoken.*mypy override.Why
Grouped per the triage: same module, same bug category (size/budget accounting), one shared estimator. Decisions and assumptions (Mode B):
cl100k_base(broadest compatibility;o200k_basedocumented for GPT-4o/o-series) — the encoding is explicit, not guessed per model.Firewall(token_counter=…)is imprecise — the real seam isBudgetManager(token_counter=…); wired there.[dev]: it downloads BPE vocabularies over the network on first use, which must not be a test dependency. The counter's logic is unit-tested against a fake encoder injected intosys.modules, fully offline.How verified
make ci→ EXIT 0:ruff format --check(117 files),ruff check(all passed),mypy --strict,pytest762 passed, 1 skipped (branch-coveragefail_undergate met), all 13 examples run.tests/test_size_estimate.py(scalars, exactness on scalar containers, ±25% bounded error corpus, early-exit, non-serialisable fallback, deep-nesting no-recursion),tests/test_token_counting_tiktoken.py(fake-encoder: value handling, unknown-encodingFirewallError, missing-extraImportError,BudgetManagerwiring). Extendedtests/test_firewall.py(bool-not-averaged, numeric still averaged, truncation marker present/absent, exactly-max_factsno false marker) andtests/test_handles.py(per-entry reject, total-bytes eviction order, evicted-handle expand →HandleNotFound, accounting release, invalid-config). Addedsize_estimateto the doctest gate.test_handles.pyexpand suite (now exercising the delegator →handle_query) passes unchanged.Risks / caveats
estimated_sizeis approximate; it preserves the only size-driven decision (raw-mode warning) and is bounded by tests, but it is not byte-identical tojson.dumps.max_entry_bytesrejection raises fromstore(); only callers that opt in are affected (kernel default isNone, so the invoke path is untouched). Kernel-level wiring of byte budgets is intentionally out of scope (HandleStore-only, per Add byte-size-aware budgeting and eviction to HandleStore #211).handle_querymove is verbatim but does enlarge the diff.🤖 Generated with Claude Code
https://claude.ai/code/session_01HjetuNJX1EUJ2eVGfoLyHt
Generated by Claude Code