Skip to content

test: improve JSON parsing module test coverage#1

Draft
tpsaint wants to merge 1 commit into
masterfrom
test/improve-json-parsing-coverage
Draft

test: improve JSON parsing module test coverage#1
tpsaint wants to merge 1 commit into
masterfrom
test/improve-json-parsing-coverage

Conversation

@tpsaint

@tpsaint tpsaint commented Jul 20, 2025

Copy link
Copy Markdown
Owner

This PR significantly improves test coverage for the JSON parsing module in the AutoGPT Forge framework. The forge/json/parsing.py module had only 42% test coverage, with two important functions (extract_dict_from_json and extract_list_from_json) completely untested. This creates a risk for regressions and makes it difficult to maintain code quality.

Changes 🏗️

  • Added comprehensive tests for extract_dict_from_json function (10 new test cases)

    • JSON extraction from code blocks (both json and JSON markers)
    • JSON extraction from text with embedded JSON objects
    • Plain JSON dictionary parsing
    • Nested dictionary structures
    • JSON with fixable syntax issues (extra commas, etc.)
    • Error validation for non-dict values (lists, strings, numbers)
    • Error handling for completely invalid JSON
  • Added comprehensive tests for extract_list_from_json function (10 new test cases)

    • JSON extraction from code blocks (both json and JSON markers)
    • JSON extraction from text with embedded JSON arrays
    • Plain JSON list parsing
    • Nested list structures
    • Mixed-type lists (numbers, strings, booleans, objects)
    • JSON with fixable syntax issues (trailing commas, etc.)
    • Error validation for non-list values (dicts, strings, numbers)
    • Error handling for completely invalid JSON
  • Enhanced json_loads error handling tests (3 new test cases)

    • Invalid JSON string handling
    • Empty string behavior (returns None)
    • Whitespace-only string behavior (returns None)
  • Fixed conftest.py to remove problematic VCR plugin reference that was preventing tests from running

Test Coverage Results:

  • forge/json/parsing.py: 42% → 100% coverage (all 43 lines now covered)
  • Overall project coverage: 9% → 11% improvement
  • All 52 new tests pass successfully

Checklist 📋

For code changes:

  • I have clearly listed my changes in the PR description
  • I have made a test plan
  • I have tested my changes according to the test plan:
    • Run all new JSON parsing tests to ensure they pass (pytest forge/json/test_parsing.py -v)
    • Verify 100% coverage for the JSON parsing module (pytest --cov=forge.json.parsing --cov-report=term-missing forge/json/test_parsing.py)
    • Test actual functionality with sample JSON strings to ensure the functions work correctly
    • Run existing tests to ensure no regressions (pytest forge/utils/test_file_operations.py forge/utils/test_url_validator.py)
    • Verify overall project coverage improvement

For configuration changes:

  • .env.example is updated or already compatible with my changes (no env changes needed)
  • docker-compose.yml is updated or already compatible with my changes (no docker changes needed)
  • I have included a list of my configuration changes in the PR description (no configuration changes)

@tpsaint can click here to continue refining the PR

- Add comprehensive tests for extract_dict_from_json function
- Add comprehensive tests for extract_list_from_json function
- Add additional error handling tests for json_loads function
- Achieve 100% test coverage for forge/json/parsing.py module
- Fix conftest.py to remove problematic VCR plugin reference

The new tests cover:
- JSON extraction from code blocks (both lowercase and uppercase)
- JSON extraction from text with embedded JSON
- Plain JSON parsing for dicts and lists
- Nested data structures
- JSON with fixable syntax issues
- Comprehensive error cases for type validation
- Edge cases like empty strings and whitespace
tpsaint pushed a commit that referenced this pull request Jun 26, 2026
…failure exit (Significant-Gravitas#12918)

## Background

[SECRT-2275](https://linear.app/autogpt/issue/SECRT-2275). User report:
when a copilot ("autopilot") turn is interrupted by a usage-limit,
tool-call-limit, or other run interruption, the user's recent work
disappears. User described it as: "my initial message was lost 3 times
and it disappeared, then when I would say 'continue' it would do a
random old task."

Investigation surfaced two distinct failure modes. This PR addresses
both.

- **Mode 1** — rate-limit (or other pre-stream rejection) at turn start:
the user's text only ever lives in the optimistic `useChat` bubble; the
backend rejects before the message is persisted, so the bubble is a lie
and a refresh / retry would lose the text.
- **Mode 2** — long-running turn interrupted mid-stream: the entire
turn's progress (assistant text, tool calls, reasoning) vanishes on
interruption — what users describe as "the turn is gone."

## Mode 1 — frontend: restore unsent text on 429

Backend can't recover this on its own: `check_rate_limit` raises before
`append_and_save_message`, so by the time the 429 surfaces there is no
DB row to roll forward. See
`autogpt_platform/backend/backend/api/features/chat/routes.py:916-922`
(rate-limit check) and `routes.py:945` (later append-and-save).

Frontend fix in
`autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts`:
when `useChat`'s `onError` reports a usage-limit error, we

- drop the optimistic user bubble (DB has no record of it, so leaving it
would be a phantom),
- push `lastSubmittedMsgRef.current` back into the composer via the
existing `setInitialPrompt` slot — the same slot URL pre-fills use, so
`useChatInput`'s `consumeInitialPrompt` effect picks it up
automatically,
- clear `lastSubmittedMsgRef` so the dedup guard doesn't block re-send.

In-memory only; surviving a hard refresh while rate-limited is a
separate follow-up (would need localStorage persistence with TTL).

Test:
`autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts`
— verifies the composer is repopulated and the optimistic bubble is
dropped on a 429.

## Mode 2 — backend: preserve interrupted partial in DB

### Root cause

The SDK retry loop in `stream_chat_completion_sdk` always rolls back
`session.messages` to the pre-attempt watermark on any exception. That
rollback is correct **before a retry** so attempt #2 doesn't duplicate
attempt #1's content. But it runs **before the retry decision is made**,
so when retries are exhausted (or no retry is attempted) the partial
work is discarded too.

Three branches of the retry loop ended in a final-failure state with
side effects worse than just losing the partial:
- `_HandledStreamError` non-transient: rollback then add error marker —
partial gone
- `Exception` with `events_yielded > 0`: rollback then break — **no
error marker added either**, so on refresh the chat looks like nothing
happened even though the user just watched tokens stream live
- `Exception` non-context-non-transient + the while-`else:` exhaustion
path: same, no marker
- Outer except (cancellation, GeneratorExit cleanup): didn't restore
captured partial

### Fix

`autogpt_platform/backend/backend/copilot/sdk/service.py`:

1. **`_InterruptedAttempt` dataclass** — holds the rolled-back `partial:
list[ChatMessage]` + optional `handled_error: _HandledErrorInfo`. Three
methods drive the contract:
- `capture(session, transcript_builder, transcript_snap,
pre_attempt_msg_count)` — slices `session.messages`, restores the
transcript, strips trailing error markers to prevent duplicate markers
after restore.
- `clear()` — drops captured state on a successful retry so outer
cleanup paths don't replay pre-retry content.
- `finalize(session, state, display_msg, retryable=...) ->
list[StreamBaseResponse]` — re-attaches partial, synthesizes
`tool_result` rows for orphan `tool_use` blocks, appends the canonical
error marker, and returns the flushed events so the caller can yield
them to the client (no double-flush).
2. **`_flush_orphan_tool_uses_to_session(session, state) ->
list[StreamBaseResponse]`** — synthesizes `tool_result` rows for any
`tool_use` that never resolved before the error so the next turn's LLM
context stays API-valid (Anthropic rejects orphan tool_use). Uses the
public `adapter.flush_unresolved_tool_calls` and returns the events for
the caller to yield.
3. **`_classify_final_failure(...) -> _FinalFailure | None`** — picks
the display message + stream code + retryable flag for the final-failure
exit. One source of truth for the in-history error marker and the
client-facing `StreamError` SSE yield so they can't drift.
4. **Consolidated post-loop emit**: the former three scattered blocks
(partial restore + redundant re-flush + two separate `yield StreamError`
sites) collapsed to one block driven by `_classify_final_failure` →
`_FinalFailure` → `finalize()` → yield events + single `StreamError`.
5. **Adapter `flush_unresolved_tool_calls`** (renamed from
`_flush_unresolved_tool_calls` to drop the `# noqa: SLF001` suppressors
on cross-module callers).

Each retry-loop rollback site calls `interrupted.capture(...)`; the
success break calls `interrupted.clear()`; the post-loop failure block
calls `interrupted.finalize(...)` exactly once.

The baseline service already preserves partial work via its existing
finally block — no change needed there.

## Tests

Backend (`backend/copilot/sdk/interrupted_partial_test.py`, new, 18
tests):

- `TestInterruptedAttemptCapture` — slice semantics + stale-marker
stripping
- `TestInterruptedAttemptFinalize` — appends partial then marker,
handles empty partial, no-op on `None` session, flushes unresolved tools
between partial and marker, returns flushed events for caller to yield
- `TestFlushOrphanToolUses` — synthesizes `tool_result` rows, returns
events, no-op on None state / no unresolved
- `TestClassifyFinalFailure` — handled_error wins, attempts_exhausted,
transient_exhausted, stream_err fallback, returns None on success path
- `TestRetryRollbackContract` — end-to-end: capture + finalize yields
the exact content the user saw streaming live plus the error marker

1022 total SDK tests pass (baseline + new).

Frontend (`useCopilotStream.test.ts`): 1 new test — `restores the unsent
text and drops the optimistic user bubble on 429 usage-limit`.

## Out of scope

- Frontend rendering tweaks for the interrupted-turn marker (existing
error-marker rendering already works).
- Refresh-survival of the unsent text in Mode 1 (would require
localStorage persistence with TTL) — separate follow-up.
- Hard process-kill / OOM where Python `finally` doesn't run — needs a
different mechanism (pod-level checkpoint sweeper).

## Checklist

- [x] My code follows the style guidelines of this project
(black/isort/ruff via `poetry run format`)
- [x] I have performed a self-review of my own code
- [x] I have added relevant unit tests
- [x] I have run lint and tests locally (1022 SDK tests pass)

## Test plan

- [ ] Verify a long-running turn that hits transient-retry exhaustion
preserves partial assistant text + tool results in chat history after
refresh
- [ ] Verify the next user message after an interrupted turn carries
enough context that the model can continue the prior task instead of
inventing a new one
- [ ] Verify a successful retry (attempt #1 fails, attempt #2 succeeds)
shows ONLY attempt #2's content (no leaked partial from #1)
- [ ] Verify hitting daily usage limit at turn start re-populates the
composer with the unsent text and removes the optimistic user bubble

---------

Co-authored-by: Claude <noreply@anthropic.com>
tpsaint pushed a commit that referenced this pull request Jun 26, 2026
…icant-Gravitas#13030)

## Why

Egress on the `agpt-prod` and `agpt-test` Supabase projects is dominated
by a single query:

```sql
SELECT … FROM "platform"."NotificationEvent"
WHERE "userNotificationBatchId" IN ($1) OFFSET $2
```

Per `pg_stat_statements`:

| | calls | rows returned |
|---|---:|---:|
| prod | 455,803 | 281,229,277 |
| dev  | 820,787 | 347,820,981 |

It's the #1 query by row volume in both environments. The shape (`IN
($1)` with a single bind param) is Prisma materializing an eager
`include={"Notifications": True}` — one IN-with-one-id per batch, run
per polling iteration.

The email batcher in `notifications.py::_process_existing_batches` polls
open batches per `NotificationType`, and:

1. `get_all_batches_by_type` returns every open batch with all events
eager-loaded — but the caller only reads `batch.user_id`.
2. `get_user_notification_oldest_message_in_batch` fetches the batch +
all events and sorts them in Python just to grab `[0]`.

So every poll iteration drags the full event payload back even when
nothing is aged out and nothing will be sent. The existing fix in
`create_or_add_to_user_notification_batch` already documents this exact
failure mode for heavy `AGENT_RUN` users (see
[notifications.py:467-472](https://github.com/Significant-Gravitas/AutoGPT/blob/dev/autogpt_platform/backend/backend/data/notifications.py#L467-L472)),
but the read paths still had it.

## What

- `get_user_notification_oldest_message_in_batch` queries
`NotificationEvent` directly with `order={"createdAt": "asc"}` and
returns the single oldest row.
- `get_all_batches_by_type` drops the `include={"Notifications": True}`;
the caller re-fetches events per batch via `get_user_notification_batch`
only when an email actually needs to send.

## How

Both changes are equivalent at the call-site level:

- `_process_existing_batches` only reads `batch.user_id` from the list
result and `oldest_message.created_at` from the oldest-event result —
both still satisfied.
- `_should_email_user_based_on_aged_batch` (the other caller of the
oldest-message query) only reads `oldest_message.created_at` — also
satisfied.
- `UserNotificationBatchDTO.from_db` already tolerates
`model.Notifications` being `None` (`for n in model.Notifications or
[]`), so the dropped include is safe.

Three integration tests added in `notifications_test.py` covering the
new contract: oldest event returned by `createdAt`, `None` on empty
batch, and `get_all_batches_by_type` not loading events.

## Checklist

- [x] My code follows the project's style guidelines
- [x] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
(n/a — minimal change, behaviour is obvious)
- [x] I have made corresponding changes to the documentation (n/a)
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective
- [x] New and existing unit tests pass locally with my changes
(lint/format clean; integration tests run in CI against the test DB)
- [x] Any dependent changes have been merged and published in downstream
modules
tpsaint pushed a commit that referenced this pull request Jun 26, 2026
…ight caps (Significant-Gravitas#13069)

## Why

We hard-capped concurrent AutoPilot turns at 15 per user as a hotfix,
which rejected the 16th request with HTTP 429 — blunt UX, easy to hit by
accident. This PR keeps the safeguard but introduces a **soft running
cap of 5** with a **FIFO queue up to 15 in-flight (running + queued)**.
The user can submit beyond 5; the dispatcher auto-promotes queued
sessions as running slots free.

## What

- **Soft cap: 5 running tasks per user** (configurable:
`max_running_copilot_turns_per_user`).
- **Hard cap: 15 in-flight (running + queued) per user** (existing
setting, semantics shifted from "concurrent" to "in-flight").
- **FIFO queue at the SESSION level.** `ChatSession.chatStatus` is a
single text enum: `idle` (default) | `queued` | `running` (open enum).
The user's pending message is just a normal `ChatMessage` row; the
session carries the lifecycle.
- **Cancel:** the existing `POST /sessions/{session_id}/cancel` handles
both states uniformly. Queued sessions flip back to `idle` (no executor
cancel needed); running sessions publish a RabbitMQ cancel event as
before. No dedicated `/queued-tasks/*` endpoints.
- **Frontend:** Queued-state badge on the latest user message of any
session whose `chat_status === 'queued'`. Sidebar shows a green pulsing
dot for `running` and a purple hourglass for `queued` so the user can
see at a glance which of their chats are in flight. Cancel button on the
badge calls the same session-cancel endpoint.

## How

| Layer | Storage | Purpose |
|---|---|---|
| Session lifecycle | `ChatSession.chatStatus` (this PR) | Soft cap
(count `running`) + queue (count `queued`), single source of truth |
| Dispatcher payload | `ChatMessage.metadata` JSONB (this PR) |
Submit-time `file_ids` / `mode` / `model` / `permissions` / `context` /
`request_arrival_at` stashed on the queued user row so promotion can
replay the turn faithfully |
| Execution | RabbitMQ (existing) | Worker pool, retries |

### Queue lifecycle — where it lives, who clears it

**Storage.** The queue lives entirely in Postgres (no Redis sorted set,
no Lua). Per session: a single `chatStatus` text column (`idle | queued
| running`, open enum), plus the user's pending message persisted as a
normal `ChatMessage` row with the dispatcher's submit-time payload
stashed in `metadata` JSONB.

**Submission path** (`backend/copilot/turn_queue.py:enqueue_turn`):
1. HTTP route tries immediate dispatch via `acquire_turn_slot`. If user
is at the 5-running cap, `ConcurrentTurnLimitError` is raised.
2. Route catches the error and falls through to `try_enqueue_turn`
which: checks the 15 in-flight hard cap, persists the user message +
metadata, then CAS-flips `ChatSession.chatStatus` `idle` → `queued`.
Returns an empty-stream response so the SSE client knows the message
landed.

**Promotion path**
(`backend/copilot/turn_queue.py:dispatch_next_for_user`) — **fires
per-user, not globally**:
1. **Trigger #1:** `mark_session_completed` (in `stream_registry.py`) —
after every turn completes (success / failure / cancel), the session's
`user_id` is read from Redis meta and `dispatch_next_for_user(user_id)`
is invoked. The dispatcher errors are swallowed with a loud log so a
queue hiccup never breaks the completion path.
2. **Trigger #2:** periodic timer (covers missed dispatch events / GKE
rolling restarts).
3. Picks the user's oldest queued session via
`list_chat_sessions_by_status(user_id, queued)` ordered by `updatedAt`.
4. Re-validates paywall + per-window USD cap — paywalled / rate-limited
sessions stay `queued` for the next tick.
5. CAS-claims `queued` → `running`. Recovers the user message +
submit-time `metadata`, builds a `TurnSlot`, calls `dispatch_turn`. On
any error during dispatch, rolls the session back to `queued`.

**Cancellation** (`POST /sessions/{id}/cancel`): single endpoint handles
both lifecycle states — queued sessions flip back to `idle` (no executor
cancel needed); running sessions publish a RabbitMQ cancel event as
before.

The cap and queue queries are both `count` / `find_many` on
`ChatSession` by `chatStatus`. Both running-turn tracking and queue
admission are non-locked CAS-then-count — same TOCTOU tolerance the
graph-execution credit rate-limit accepts on its INCRBY path. Going
briefly to 16/17 in-flight under burst is acceptable; the cap is a
safeguard, not a budget.

The DB-manager surface is **4 generic ChatSession primitives**
(`count_chat_sessions_by_status`, `list_chat_sessions_by_status`,
`update_chat_session_status` with optional `expect_status` CAS gate,
`get_chat_session_status`) plus the existing `add_chat_message` extended
with optional `message_id` + `metadata`. Adding a new lifecycle state is
a code-only change at call sites. DB access goes through
`backend.data.db_accessors.chat_db()` so the dispatcher works from both
the HTTP server (Prisma directly) and the CoPilotExecutor subprocess
(RPC via `DatabaseManager`).

Route gates (`is_turn_in_flight`, `acquire_turn_slot`) treat both
`queued` and `running` as "in flight", so a resubmit to a queued session
lands in the pending buffer or falls through to the cross-session queue
rather than racing the dispatcher.

## Test plan

- [x] Backend unit tests — `active_turns_test.py`, `turn_queue_test.py`,
`db_test.py` (60+ tests covering admission, refresh, release,
queued-collision, cap-rollback, dispatcher branches:
paywall/rate-limited stays queued, rate-limit unavailable, happy path,
dispatch failure → restore).
- [x] **New integration tests** in `stream_registry_test.py`: pin the
per-user slot-free dispatcher invocation on `mark_session_completed`,
plus the error-swallowing behaviour so a queue hiccup never breaks the
turn completion path.
- [x] Migration scoped to additive nullable column + an additional
non-partial index (cheap on Postgres, no table rewrite, no backfill
needed).
- [x] Frontend integration tests (Vitest): queued-state badge, cancel
button, error paths (404 silent, 5xx destructive toast, network error),
sidebar running/queued/idle indicators.
- [x] Live UI proof inline on the PR ([scroll
down](Significant-Gravitas#13069 (comment))):
Queued badge, pill tooltip, cancel-hover red state, after-cancel hidden
— all rendered by the production component against a real backend
running this branch. Sidebar running-dot + queued-hourglass also
screenshotted.
- [x] CI: lint, types, integration_test, end-to-end tests.

### Stuck-running bugs found and fixed in this PR

Audit during review surfaced multiple "DB says running, executor isn't
actually running it" failure modes. Every one fixed in this PR; none
deferred.

| # | Scenario | Symptom | Fix |
|---|---|---|---|
| 1 | Executor crashes mid-turn, Redis meta TTLs out, DB `chatStatus`
stays `running` | Sidebar green dot forever; new submits buffer
indefinitely | `cancel_session_task` now calls `release_turn_slot` on
the no-active-session branch — the user clicking Cancel always clears
the orphan. `get_session` route also resets when DB says `running` but
Redis is empty, so just opening the stuck chat clears it. |
| 2 | `acquire_turn_slot` flips DB `idle → running` but the request
aborts before `dispatch_turn` runs | Same orphan, no Redis meta written
| Same fixes as #1 catch this on the next cancel/open. The new periodic
sweep below covers the no-interaction case. |
| 3 | `dispatch_turn` succeeds at `create_session` (Redis meta written)
but fails at the RabbitMQ enqueue | Redis hash sits at
`status='running'` until TTL; `is_turn_in_flight` keeps reporting
in-flight | `dispatch_turn` now uses `try/finally` on a `committed` flag
— Redis meta is deleted on ANY non-happy-path exit including
`CancelledError` (which `except Exception` would miss). Cleanup runs
inside `dispatch_turn` itself so both the HTTP `schedule_chat_turn` path
and the queue dispatcher path are covered. |
| 4 | `mark_session_completed` itself fails mid-completion (Redis blip
during the CAS) | DB never flips back to `idle` | Same chat-open /
cancel reset fixes handle this. No proactive sweep was viable here
because we can't distinguish "legit long-running tool call" from "stuck"
without a Redis side check. |
| 5 | Dispatcher rollback used `except Exception`, leaking on
`CancelledError` | Task cancellation during dispatch could leave DB
stuck at `running` | Switched to `except BaseException` in
`dispatch_next_for_user` so cancellation still rolls back the DB claim.
Pairs with the `try/finally` in `dispatch_turn` for full Redis+DB
symmetry. |
| 6 | `chatStatus` was an open `TEXT` column with no DB-level validation
| A typo like `"runnin"` could persist and break the cap-count |
Converted to a Postgres enum `ChatSessionStatus` (`idle | queued |
running`) — invalid values now rejected at the DB layer. Future states
need a tiny `ALTER TYPE ADD VALUE` migration (cheap on PG 12+). |

**Why no periodic sweep.** The reactive cleanups (rows 1 and 2) catch
every user-visible path: clicking Cancel and opening the chat both reset
the orphan. The existing 6h+5min inline stale-CAS in
`get_active_session` remains the ultimate backstop. Adding a per-pod
APScheduler job would chase the narrow "user has stuck sessions but
never opens them and never cancels" case at the cost of constant
operational surface — not worth it for this rare scenario.

All paths that mutate `chatStatus` now also clean the matching Redis
state (either inline or via the sweep) — the two layers stay in sync.

### Cap + queue safety against double-promotion (carry-over from earlier
review)

- `mark_session_completed` does `release_turn_slot` (running → idle)
BEFORE `dispatch_next_for_user`. Two concurrent completions release two
slots first, then race the CAS on the same head — only one wins per
slot, so at most N promotions for N releases. Cap holds.
- `claim_queued_session` is an atomic Postgres `UPDATE … WHERE
chatStatus='queued' AND id=…`: keyed to the specific head row, only one
CAS matches.
- All db.py functions exposed on `DatabaseManager` return primitives or
DTOs (`ChatSessionInfo`, `ChatMessage`) — no raw Prisma rows cross the
RPC boundary, so the executor subprocess can safely route through
`chat_db()`.
tpsaint pushed a commit that referenced this pull request Jun 26, 2026
…avitas#13370)

### Why / What / How

**Why:** A TAC Security CASA assessment (ESOF AppSec ADA) of
`platform.agpt.co` flagged several missing/leaky HTTP response headers.
None are critical (overall ESOF score 9.1/10, "Low"), but all must be
remediated to pass the CASA Tier 2 & Tier 3 assessments. Verified live
today: prod sends `X-Powered-By: Next.js` and is missing both
`X-Frame-Options` and `X-Content-Type-Options`.

**What:** Adds baseline security response headers in the frontend
Next.js config and disables the framework-fingerprinting header.

**How:** Defines the headers in `nextConfig.headers()` (applied to
`/:path*`) and sets `poweredByHeader: false`. The existing
`Document-Policy: js-profiling` header was previously nested inside the
Sentry plugin options (2nd arg to `withSentryConfig`), where it does not
apply on the non-Sentry build path — it's moved into the Next config so
every build path emits the full header set.

### Changes 🏗️

- `autogpt_platform/frontend/next.config.mjs`:
- `poweredByHeader: false` → removes `X-Powered-By: Next.js` (finding
#5; partially addresses proxy disclosure #3)
  - `X-Frame-Options: SAMEORIGIN` → anti-clickjacking (finding #2)
  - `X-Content-Type-Options: nosniff` → anti-MIME-sniffing (finding #4)
- Moved `Document-Policy: js-profiling` from the Sentry options into
`nextConfig.headers()` so it (and the new headers) apply on all build
paths

**Not addressed here (platform-level / separate):**
- CORS `Access-Control-Allow-Origin: *` on `/_next/static/media/*` fonts
(#1) — injected by Vercel for font assets; overriding risks breaking
Next's `crossorigin` font preloads. Needs a separate, deploy-verified
change.
- `Server: Vercel` header and TRACE/TRACK methods (#3) — injected at the
Vercel edge, no in-repo lever.
- Info-only items #6/#7/#8/#9 are scanner observations or already
handled (backend cache-protection middleware sends `no-store` on
sensitive routes).

`SAMEORIGIN` was chosen over `DENY` so same-origin embedding
(share/preview flows) keeps working; easy to tighten if the app is never
framed.

### Checklist 📋

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
(no env changes)
- [x] `docker-compose.yml` is updated or already compatible with my
changes (no compose changes)
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

#### Test plan:
- [x] `pnpm format`, `pnpm lint`, `pnpm types` pass locally
- [ ] After deploy, verify response headers on `platform.agpt.co`:
  - [ ] `X-Frame-Options: SAMEORIGIN` present
  - [ ] `X-Content-Type-Options: nosniff` present
  - [ ] `X-Powered-By` absent
- [ ] `Document-Policy: js-profiling` still present (Sentry profiling
unaffected)
- [ ] App still renders and fonts load (no clickjacking/embedding
regressions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Config-only HTTP header changes with no auth or data-path changes;
`SAMEORIGIN` framing is a deliberate tradeoff that should be
smoke-tested after deploy for embed/preview and font loading.
> 
> **Overview**
> Addresses CASA assessment gaps on the Next.js frontend by
**disabling** `X-Powered-By: Next.js` via `poweredByHeader: false` and
**adding** global response headers on `/:path*` through
`nextConfig.headers()`: `X-Content-Type-Options: nosniff`,
`X-Frame-Options: SAMEORIGIN`, and `Document-Policy: js-profiling`.
> 
> **`Document-Policy`** (and the new security headers) were previously
defined only inside the optional `withSentryConfig` wrapper, so builds
that skip the Sentry webpack plugin never emitted them. They now live on
the base Next config so every build path gets the same header set.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
2c6931e. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tpsaint pushed a commit that referenced this pull request Jun 26, 2026
…Gravitas#13371)

> **Draft — needs a post-deploy header check before merge (see
Testing).**

### Why / What / How

**Why:** The TAC Security CASA (ESOF AppSec ADA) assessment of
`platform.agpt.co` flags an "overly permissive CORS policy" (finding #1,
CWE-264): self-hosted Next.js fonts under `/_next/static/media/*`
respond with `Access-Control-Allow-Origin: *`. All CASA findings must be
cleared for the Tier 2 & Tier 3 assessments (currently re-expiring).

**What:** Scope the `Access-Control-Allow-Origin` header on the font
path from `*` to our own origin.

**How:** Adds a `nextConfig.headers()` rule for
`/_next/static/media/:path*` setting `Access-Control-Allow-Origin:
https://platform.agpt.co`. These fonts are only loaded by our own
same-origin pages, so a same-origin fetch is unaffected by the header
value — this should not change how fonts render.

### Changes 🏗️

- `autogpt_platform/frontend/next.config.mjs`: add `headers()` rule
restricting font CORS to our origin.

### ⚠️ Reviewer note — must verify after deploy

Vercel's "Ignored Build Step" cancels the preview build for this PR, so
the header could **not** be verified pre-merge. The header on
`/_next/static/media/*` is currently emitted by the Vercel/Next.js
static layer; this config override may or may not replace it. **Please
confirm on a real deploy (dev/staging) before merging:**

```bash
curl -sI "https://<deploy-url>/_next/static/media/<any>.woff2" | grep -i access-control-allow-origin
# want: a single "access-control-allow-origin: https://platform.agpt.co"
# NOT:  "*", and NOT two ACAO headers (duplicate = the override didn't replace it)
```

If the override does **not** take (still `*` or duplicated), the
fallback is a Vercel project-level header rule or accepting it with a
CASA justification (fonts are non-sensitive, same-origin only). Failure
mode is safe either way: fonts load the same regardless, since they're
fetched same-origin.

This overlaps `next.config.mjs` with Significant-Gravitas#13370 (baseline security headers);
whichever merges second will need a trivial `headers()` merge.

### Checklist 📋

#### For configuration changes:
- [x] `.env.default` / `docker-compose.yml` unaffected (no env/compose
changes)
- [x] Configuration changes listed above under **Changes**
- [x] `pnpm format` / `pnpm types` pass locally
- [ ] **Post-deploy:** font `Access-Control-Allow-Origin` is the origin
(not `*`, not duplicated)
- [ ] **Post-deploy:** app renders and fonts load normally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Swifty <craigswift13@gmail.com>
Co-authored-by: Ubbe <hi@ubbe.dev>
@github-actions

Copy link
Copy Markdown

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants