Skip to content

fix(export): surface bulk partial failures via X-Export-Warnings header#73

Open
clean6378-max-it wants to merge 3 commits into
masterfrom
fix/export-partial-failure-warnings
Open

fix(export): surface bulk partial failures via X-Export-Warnings header#73
clean6378-max-it wants to merge 3 commits into
masterfrom
fix/export-partial-failure-warnings

Conversation

@clean6378-max-it

@clean6378-max-it clean6378-max-it commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

closes #69

Summary

Surfaces per-session bulk export failures to HTTP clients instead of silently omitting failed sessions from the ZIP. Keeps the stable 200 + ZIP body contract from Week 1; partial failures are signaled via X-Export-Warnings.

Problem

POST /api/export caught per-session parse/export errors, logged them, and returned HTTP 200 with a ZIP missing failed sessions. BulkExportResult.failure_count was populated but never exposed to the client.

Changes

  • utils/export_engine.py — ExportFailure dataclass; BulkExportResult.failures; failure_code_for_exception() maps parse vs post-parse errors to PARSE_ERROR / INTERNAL_ERROR
  • api/export_api.py — response routing:
    • All succeed → 200 ZIP (no header)
    • Partial fail → 200 ZIP + X-Export-Warnings JSON header
    • All candidates fail → 422 + EXPORT_ALL_FAILED + failures in JSON body
    • Nothing to export → 422 + EXPORT_NOTHING_TO_EXPORT (unchanged)
  • api/error_codes.py — EXPORT_ALL_FAILED
  • docs/api-reference.md — documents header shape and new error code
  • tests/test_export_api_bulk.py — all-succeed, partial-fail, all-fail, export-state bookmark tests

X-Export-Warnings shape:

[{"session_id": "session_def456", "code": "PARSE_ERROR", "message": "human-readable detail"}]

Export state bookmark (_write_state) only includes successfully exported session IDs — failed sessions are not advanced in incremental mode.

Acceptance criteria

  • Machine-readable list of skipped sessions on partial export
  • All-fail returns error status, not 200
  • Partial export: 200 ZIP + X-Export-Warnings
  • Structured ErrorCode values used
  • docs/api-reference.md updated
  • Tests: all-succeed, partial-fail, all-fail, state excludes failures
  • pytest, mypy, ruff check ., ruff format --check . pass locally

Test plan

pytest tests/test_export_api_bulk.py -v
pytest -q
mypy -p api -p utils -p models
ruff check .
ruff format --check .

Manual smoke: POST /api/export against a corpus with one corrupt .jsonl → inspect X-Export-Warnings header.

Out of scope

  • CLI export warning surfacing (HTTP only)
  • TypedDict parse boundary (Tuesday PR 1)
  • ZIP layout changes beyond existing manifest.jsonl

Dependencies

Sprint item #2 · 5 pt · branch fix/export-partial-failure-warnings

Summary by CodeRabbit

  • New Features

    • Export now reports structured failures: partial exports return HTTP 200 with an X-Export-Warnings header (bounded/truncated) and an export-warnings.json inside the ZIP; if every session fails, the endpoint returns HTTP 422 with error code EXPORT_ALL_FAILED and a failures payload.
  • Documentation

    • API reference updated to document EXPORT_ALL_FAILED, partial-export behavior, header payload shape, and the in‑ZIP warnings file.
  • Tests

    • Added tests for full success, partial failures, all-fail scenarios, and header truncation/size fallback.

Accumulate structured ExportFailure entries in BulkExportResult, return
422 EXPORT_ALL_FAILED when every candidate fails, and attach X-Export-Warnings
on partial 200 ZIP responses without changing the stable ZIP body contract.
@clean6378-max-it clean6378-max-it self-assigned this Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR captures per-session export failures as structured ExportFailure entries, exposes them via a bounded X-Export-Warnings header and export-warnings.json inside ZIPs for partial success, and returns ErrorCode.EXPORT_ALL_FAILED (HTTP 422) with failures when every attempted session fails; tests and docs updated accordingly.

Changes

Structured bulk-export failure reporting

Layer / File(s) Summary
Error codes and failure data model
api/error_codes.py, utils/export_engine.py
Adds ErrorCode.EXPORT_ALL_FAILED; imports ErrorCode; ExportFailure dataclass; failure_code_for_exception() and failure_message_for_code() helpers; BulkExportResult.failures list.
Bulk-export failure recording
utils/export_engine.py
_record_failure() now accepts phase and appends ExportFailure with computed ErrorCode; per-session export exceptions recorded with phase="export".
API failure serialization and response mapping
api/export_api.py
Adds JSON serialization helpers, header/ZIP filename and size constants, bounded header payload builder, writes export-warnings.json into ZIP on partial success, sets X-Export-Warnings header when failures exist, and returns EXPORT_ALL_FAILED (422) with failures when no sessions were exported.
Response documentation
docs/api-reference.md
Documents EXPORT_ALL_FAILED (HTTP 422), clarifies partial-export behavior (200 + ZIP of successes, X-Export-Warnings header, and export-warnings.json in ZIP), and clarifies EXPORT_NOTHING_TO_EXPORT.
Comprehensive test coverage
tests/test_export_api_bulk.py
Adds tests: all succeed (200, no header), partial parse failure (200, header + ZIP warnings), all parse fail (422 with EXPORT_ALL_FAILED and failures list), header truncation/byte-size fallback tests, and persistence/state tests ensuring failed sessions are excluded from saved state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • timon0305
  • wpak-ai

Poem

🐰 When exports skip a session or two,
I log the why and bundle what grew —
Warnings tucked into the ZIP we send,
Headers whisper which ones didn't mend.
Hooray for clarity, hop to the end!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(export): surface bulk partial failures via X-Export-Warnings header' directly and concisely describes the main change in the PR—surfacing export failures via a new response header.
Linked Issues check ✅ Passed The PR addresses all key requirements from issue #69: structured failure tracking in BulkExportResult, X-Export-Warnings header with session/code/message format, EXPORT_ALL_FAILED error code for all-fail scenarios, export state bookmark respecting only successful sessions, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #69: error code addition, export engine failure tracking, API response routing, documentation updates, and test expansion—no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/export-partial-failure-warnings

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/export_api.py`:
- Around line 180-182: The current code writes the entire failure_payload into
resp.headers["X-Export-Warnings"] which can exceed header size limits; change
this to a bounded header strategy: when result.failures is non-empty, compute a
compact summary (e.g., total_failures count, first N failure summaries or IDs)
and set resp.headers["X-Export-Warnings"] to a JSON object containing the count,
truncated=true flag, and the sampled entries instead of the full
failure_payload; put the full failure_payload into the response body/manifest
(or a downloadable URL) for clients that need details. Ensure you update the
logic that populates failure_payload and references to result.failures so the
header uses only the capped sample and includes truncation metadata.

In `@utils/export_engine.py`:
- Around line 289-294: The current ExportFailure uses raw exception text
(str(exc)) which can leak sensitive info; instead, map the exception to a
stable, sanitized message using the failure code / ErrorCode mapping and include
that in ExportFailure.message. Update the code around result.failures.append
(the ExportFailure constructor call where session_id=sid and
code=failure_code_for_exception(exc, phase=phase)) to call an existing or new
helper (e.g., failure_code_for_exception -> use its returned code with a lookup
like message_for_error_code(code) or a small function
sanitized_message_for_exception(exc, phase)) and pass that sanitized, stable
text as message rather than str(exc).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 266ab9dd-dd9b-4997-a7dd-a0c62cf82b3b

📥 Commits

Reviewing files that changed from the base of the PR and between 32e60f4 and 367ded9.

📒 Files selected for processing (5)
  • api/error_codes.py
  • api/export_api.py
  • docs/api-reference.md
  • tests/test_export_api_bulk.py
  • utils/export_engine.py

Comment thread api/export_api.py
Comment thread utils/export_engine.py
@clean6378-max-it

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit , Can you reeviewe this PR fully?

@clean6378-max-it

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit. check this PR?

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== utils/export_engine.py ==="
cat utils/export_engine.py

echo ""
echo "=== api/export_api.py ==="
cat api/export_api.py

echo ""
echo "=== api/error_codes.py ==="
cat api/error_codes.py

echo ""
echo "=== docs/api-reference.md ==="
cat docs/api-reference.md

echo ""
echo "=== tests/test_export_api_bulk.py ==="
cat tests/test_export_api_bulk.py

Length of output: 44546


🏁 Script executed:

#!/bin/bash
echo "=== conftest.py (root) ==="
fd -i "conftest.py" --exec cat {}

echo ""
echo "=== Check failure_count usage ==="
rg -n "failure_count" --type py

echo ""
echo "=== Check ExportFailure import usage ==="
rg -n "ExportFailure" --type py

echo ""
echo "=== Check _EXPORT_WARNINGS_HEADER_MAX constants ==="
rg -n "_EXPORT_WARNINGS_HEADER_MAX" --type py

echo ""
echo "=== Check str(item.code) / StrEnum behaviour ==="
rg -n "StrEnum" --type py

Length of output: 4396



@clean6378-max-it Sure! Here is a full review of PR #73.


PR #73 Review — fix(export): surface bulk partial failures via X-Export-Warnings header

Overall the PR is well-structured, correctly maps the acceptance criteria to code, and the tests cover the main happy/sad paths. A few items worth addressing before merge:


utils/export_engine.py

1. Redundant isinstance check in failure_code_for_exception (Medium)

json.JSONDecodeError is a subclass of ValueError, and ValueError is already in EXPORT_ERRORS. This means the first branch is dead code — the second branch would catch it with the same return value:

def failure_code_for_exception(exc: Exception, *, phase: str = "parse") -> ErrorCode:
    if phase == "export":
        return ErrorCode.INTERNAL_ERROR
    if isinstance(exc, json.JSONDecodeError):   # ← dead: ValueError below already covers it
        return ErrorCode.PARSE_ERROR
    if isinstance(exc, EXPORT_ERRORS):
        return ErrorCode.PARSE_ERROR
    return ErrorCode.INTERNAL_ERROR

The first isinstance branch can be removed without any behaviour change.

2. phase parameter should use Literal (Minor)

The phase parameter accepts any str silently falling back to INTERNAL_ERROR for unknown values. Tightening the type catches misuse at static-analysis time:

from typing import Literal

def failure_code_for_exception(
    exc: Exception,
    *,
    phase: Literal["parse", "export"] = "parse",
) -> ErrorCode:

3. failure_count is now always equal to len(result.failures) (Minor)

_record_failure atomically increments failure_count and appends to failures, so they can never diverge. failure_count is still consumed by scripts/export.py:415, so it cannot be removed without a CLI-side change — but it is worth noting the dual bookkeeping. A follow-up could replace result.failure_count with a @property delegating to len(self.failures).


api/export_api.py

4. _export_warnings_header_payload — single-entry overflow edge case (Minor)

The while-loop's exit condition is len(sample) > 1, so if a single serialized entry were to exceed _EXPORT_WARNINGS_HEADER_MAX_BYTES the function would still return an oversized payload:

while (
    len(json.dumps(payload, separators=(",", ":"))) > _EXPORT_WARNINGS_HEADER_MAX_BYTES
    and len(sample) > 1   # ← loop exits here; payload may still be > 8192 bytes
):

In practice each entry is ≈150 bytes so this cannot happen today, but a defensive hard-limit on the final payload would be safer:

result_json = json.dumps(payload, separators=(",", ":"))
if len(result_json) > _EXPORT_WARNINGS_HEADER_MAX_BYTES:
    # last-resort: omit failures list entirely, keep counts
    payload = {"total_failures": total, "truncated": True, "failures": []}
return payload

5. _serialize_export_failures return type annotation is too narrow (Minor)

The function is declared as returning list[dict[str, str]] but code is str(item.code) (a StrEnum) — which is indeed a str at runtime, so this is correct. However, if a non-string value is ever added to the dict in future, the annotation would be silently wrong. list[dict[str, object]] would be more honest and consistent with _export_warnings_header_payload's return type.


tests/test_export_api_bulk.py

6. No test for byte-level truncation in _export_warnings_header_payload (Minor)

The entry-count cap (_EXPORT_WARNINGS_HEADER_MAX_ENTRIES = 20) and the byte-cap while-loop are both untested. A unit test for _export_warnings_header_payload directly (not through HTTP) would guard against regressions:

from api.export_api import _export_warnings_header_payload
from utils.export_engine import ExportFailure
from api.error_codes import ErrorCode

def test_header_payload_truncates_at_entry_limit():
    failures = [
        ExportFailure(session_id=f"sess_{i:04d}", message="Failed to parse session", code=ErrorCode.PARSE_ERROR)
        for i in range(25)
    ]
    payload = _export_warnings_header_payload(failures)
    assert payload["total_failures"] == 25
    assert payload["truncated"] is True
    assert len(payload["failures"]) <= 20

7. No test for since="incremental" with partial failures (Minor)

All four new tests use since="all". A test verifying that the failed session is excluded from the incremental state bookmark (sessions dict) when since="incremental" would complete coverage of the bookmark semantics.


docs/api-reference.md

8. EXPORT_ALL_FAILED error body failures shape could be clearer (Minor)

The error table says:

failures (same shape as X-Export-Warnings entries)

This is technically correct (each item in both arrays has session_id, code, message) but "X-Export-Warnings entries" is ambiguous because the header value is a wrapper object ({total_failures, truncated, failures}), not a plain array. Consider wording it as:

failures — flat array of {"session_id", "code", "message"} objects (same item shape as the failures array in the X-Export-Warnings header)


Summary table

# Severity File Item
1 Medium utils/export_engine.py Redundant isinstance(exc, json.JSONDecodeError) branch
2 Minor utils/export_engine.py phase should use Literal["parse","export"]
3 Minor utils/export_engine.py failure_count vs len(failures) dual bookkeeping
4 Minor api/export_api.py Single-entry overflow not guarded in _export_warnings_header_payload
5 Minor api/export_api.py _serialize_export_failures return type
6 Minor tests/test_export_api_bulk.py No test for byte-level header truncation
7 Minor tests/test_export_api_bulk.py No test for since="incremental" with partial failures
8 Minor docs/api-reference.md EXPORT_ALL_FAILED failures shape wording ambiguous

The core logic (failure accumulation, response routing, ZIP embedding, state bookmark exclusion) is correct and the acceptance criteria from issue #69 are all met. Items 1 and 4 are the most worth fixing before merge; the rest can reasonably be follow-up work.

Remove redundant JSONDecodeError branch, tighten phase typing, add
last-resort header byte cap, and add truncation/incremental bookmark tests.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_export_api_bulk.py (1)

129-141: ⚡ Quick win

Tighten the assertion for entry-limit truncation.

The test name indicates it's specifically testing entry-limit truncation, and there's a separate test for byte-size overflow (lines 144-156). The assertion on line 141 should use == 20 instead of <= 20 to precisely verify that the entry limit is applied correctly. Using <= weakens the test and could miss regressions in the truncation logic.

✏️ Proposed fix
-    assert len(payload["failures"]) <= 20
+    assert len(payload["failures"]) == 20
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_export_api_bulk.py` around lines 129 - 141, The test
test_export_warnings_header_payload_truncates_at_entry_limit should assert the
exact entry-limit truncation; update the assertion that currently checks
len(payload["failures"]) <= 20 to require len(payload["failures"]) == 20 so the
test precisely verifies _export_warnings_header_payload enforces the 20-entry
cap; modify the assertion in the test function
test_export_warnings_header_payload_truncates_at_entry_limit accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_export_api_bulk.py`:
- Around line 129-141: The test
test_export_warnings_header_payload_truncates_at_entry_limit should assert the
exact entry-limit truncation; update the assertion that currently checks
len(payload["failures"]) <= 20 to require len(payload["failures"]) == 20 so the
test precisely verifies _export_warnings_header_payload enforces the 20-entry
cap; modify the assertion in the test function
test_export_warnings_header_payload_truncates_at_entry_limit accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec63f992-f2f4-478b-b00c-442aac2c9585

📥 Commits

Reviewing files that changed from the base of the PR and between 1029d31 and 2dc2d25.

📒 Files selected for processing (4)
  • api/export_api.py
  • docs/api-reference.md
  • tests/test_export_api_bulk.py
  • utils/export_engine.py
✅ Files skipped from review due to trivial changes (1)
  • docs/api-reference.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • utils/export_engine.py
  • api/export_api.py

Comment thread utils/export_engine.py
from datetime import date, datetime, timezone
from typing import Any, Callable, Literal, Protocol

from api.error_codes import ErrorCode

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes export_engine.py the only file in utils/ that import from api/ and since api/error_codes.py imports flask(Response, jsonify), a pure utility module now transitively pulls in the web framework.

Comment thread utils/export_engine.py
code: ErrorCode


def failure_code_for_exception(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phase="export" branch returns INTERNAL_ERROR unconditionally and ignores exc.
That's fine for now, but the exc param then does nothing on that path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

claude-code-chat-browser: Surface bulk-export partial failures instead of silent ZIP skips

2 participants