fix(export): surface bulk partial failures via X-Export-Warnings header#73
fix(export): surface bulk partial failures via X-Export-Warnings header#73clean6378-max-it wants to merge 3 commits into
Conversation
Accumulate structured ExportFailure entries in BulkExportResult, return 422 EXPORT_ALL_FAILED when every candidate fails, and attach X-Export-Warnings on partial 200 ZIP responses without changing the stable ZIP body contract.
📝 WalkthroughWalkthroughThis PR captures per-session export failures as structured ExportFailure entries, exposes them via a bounded X-Export-Warnings header and export-warnings.json inside ZIPs for partial success, and returns ErrorCode.EXPORT_ALL_FAILED (HTTP 422) with failures when every attempted session fails; tests and docs updated accordingly. ChangesStructured bulk-export failure reporting
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/export_api.py`:
- Around line 180-182: The current code writes the entire failure_payload into
resp.headers["X-Export-Warnings"] which can exceed header size limits; change
this to a bounded header strategy: when result.failures is non-empty, compute a
compact summary (e.g., total_failures count, first N failure summaries or IDs)
and set resp.headers["X-Export-Warnings"] to a JSON object containing the count,
truncated=true flag, and the sampled entries instead of the full
failure_payload; put the full failure_payload into the response body/manifest
(or a downloadable URL) for clients that need details. Ensure you update the
logic that populates failure_payload and references to result.failures so the
header uses only the capped sample and includes truncation metadata.
In `@utils/export_engine.py`:
- Around line 289-294: The current ExportFailure uses raw exception text
(str(exc)) which can leak sensitive info; instead, map the exception to a
stable, sanitized message using the failure code / ErrorCode mapping and include
that in ExportFailure.message. Update the code around result.failures.append
(the ExportFailure constructor call where session_id=sid and
code=failure_code_for_exception(exc, phase=phase)) to call an existing or new
helper (e.g., failure_code_for_exception -> use its returned code with a lookup
like message_for_error_code(code) or a small function
sanitized_message_for_exception(exc, phase)) and pass that sanitized, stable
text as message rather than str(exc).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 266ab9dd-dd9b-4997-a7dd-a0c62cf82b3b
📒 Files selected for processing (5)
api/error_codes.pyapi/export_api.pydocs/api-reference.mdtests/test_export_api_bulk.pyutils/export_engine.py
|
@CodeRabbit , Can you reeviewe this PR fully? |
|
@CodeRabbit. check this PR? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== utils/export_engine.py ==="
cat utils/export_engine.py
echo ""
echo "=== api/export_api.py ==="
cat api/export_api.py
echo ""
echo "=== api/error_codes.py ==="
cat api/error_codes.py
echo ""
echo "=== docs/api-reference.md ==="
cat docs/api-reference.md
echo ""
echo "=== tests/test_export_api_bulk.py ==="
cat tests/test_export_api_bulk.pyLength of output: 44546 🏁 Script executed: #!/bin/bash
echo "=== conftest.py (root) ==="
fd -i "conftest.py" --exec cat {}
echo ""
echo "=== Check failure_count usage ==="
rg -n "failure_count" --type py
echo ""
echo "=== Check ExportFailure import usage ==="
rg -n "ExportFailure" --type py
echo ""
echo "=== Check _EXPORT_WARNINGS_HEADER_MAX constants ==="
rg -n "_EXPORT_WARNINGS_HEADER_MAX" --type py
echo ""
echo "=== Check str(item.code) / StrEnum behaviour ==="
rg -n "StrEnum" --type pyLength of output: 4396
PR
|
| # | Severity | File | Item |
|---|---|---|---|
| 1 | Medium | utils/export_engine.py |
Redundant isinstance(exc, json.JSONDecodeError) branch |
| 2 | Minor | utils/export_engine.py |
phase should use Literal["parse","export"] |
| 3 | Minor | utils/export_engine.py |
failure_count vs len(failures) dual bookkeeping |
| 4 | Minor | api/export_api.py |
Single-entry overflow not guarded in _export_warnings_header_payload |
| 5 | Minor | api/export_api.py |
_serialize_export_failures return type |
| 6 | Minor | tests/test_export_api_bulk.py |
No test for byte-level header truncation |
| 7 | Minor | tests/test_export_api_bulk.py |
No test for since="incremental" with partial failures |
| 8 | Minor | docs/api-reference.md |
EXPORT_ALL_FAILED failures shape wording ambiguous |
The core logic (failure accumulation, response routing, ZIP embedding, state bookmark exclusion) is correct and the acceptance criteria from issue #69 are all met. Items 1 and 4 are the most worth fixing before merge; the rest can reasonably be follow-up work.
Remove redundant JSONDecodeError branch, tighten phase typing, add last-resort header byte cap, and add truncation/incremental bookmark tests.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_export_api_bulk.py (1)
129-141: ⚡ Quick winTighten the assertion for entry-limit truncation.
The test name indicates it's specifically testing entry-limit truncation, and there's a separate test for byte-size overflow (lines 144-156). The assertion on line 141 should use
== 20instead of<= 20to precisely verify that the entry limit is applied correctly. Using<=weakens the test and could miss regressions in the truncation logic.✏️ Proposed fix
- assert len(payload["failures"]) <= 20 + assert len(payload["failures"]) == 20🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_export_api_bulk.py` around lines 129 - 141, The test test_export_warnings_header_payload_truncates_at_entry_limit should assert the exact entry-limit truncation; update the assertion that currently checks len(payload["failures"]) <= 20 to require len(payload["failures"]) == 20 so the test precisely verifies _export_warnings_header_payload enforces the 20-entry cap; modify the assertion in the test function test_export_warnings_header_payload_truncates_at_entry_limit accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_export_api_bulk.py`:
- Around line 129-141: The test
test_export_warnings_header_payload_truncates_at_entry_limit should assert the
exact entry-limit truncation; update the assertion that currently checks
len(payload["failures"]) <= 20 to require len(payload["failures"]) == 20 so the
test precisely verifies _export_warnings_header_payload enforces the 20-entry
cap; modify the assertion in the test function
test_export_warnings_header_payload_truncates_at_entry_limit accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ec63f992-f2f4-478b-b00c-442aac2c9585
📒 Files selected for processing (4)
api/export_api.pydocs/api-reference.mdtests/test_export_api_bulk.pyutils/export_engine.py
✅ Files skipped from review due to trivial changes (1)
- docs/api-reference.md
🚧 Files skipped from review as they are similar to previous changes (2)
- utils/export_engine.py
- api/export_api.py
| from datetime import date, datetime, timezone | ||
| from typing import Any, Callable, Literal, Protocol | ||
|
|
||
| from api.error_codes import ErrorCode |
There was a problem hiding this comment.
This makes export_engine.py the only file in utils/ that import from api/ and since api/error_codes.py imports flask(Response, jsonify), a pure utility module now transitively pulls in the web framework.
| code: ErrorCode | ||
|
|
||
|
|
||
| def failure_code_for_exception( |
There was a problem hiding this comment.
The phase="export" branch returns INTERNAL_ERROR unconditionally and ignores exc.
That's fine for now, but the exc param then does nothing on that path
closes #69
Summary
Surfaces per-session bulk export failures to HTTP clients instead of silently omitting failed sessions from the ZIP. Keeps the stable 200 + ZIP body contract from Week 1; partial failures are signaled via X-Export-Warnings.
Problem
POST /api/export caught per-session parse/export errors, logged them, and returned HTTP 200 with a ZIP missing failed sessions. BulkExportResult.failure_count was populated but never exposed to the client.
Changes
X-Export-Warnings shape:
[{"session_id": "session_def456", "code": "PARSE_ERROR", "message": "human-readable detail"}]
Export state bookmark (_write_state) only includes successfully exported session IDs — failed sessions are not advanced in incremental mode.
Acceptance criteria
Test plan
pytest tests/test_export_api_bulk.py -v
pytest -q
mypy -p api -p utils -p models
ruff check .
ruff format --check .
Manual smoke: POST /api/export against a corpus with one corrupt .jsonl → inspect X-Export-Warnings header.
Out of scope
Dependencies
Sprint item #2 · 5 pt · branch fix/export-partial-failure-warnings
Summary by CodeRabbit
New Features
Documentation
Tests