test: improve JSON parsing module test coverage by tpsaint · Pull Request #1 · tpsaint/AutoGPT

tpsaint · 2025-07-20T05:32:33Z

This PR significantly improves test coverage for the JSON parsing module in the AutoGPT Forge framework. The forge/json/parsing.py module had only 42% test coverage, with two important functions (extract_dict_from_json and extract_list_from_json) completely untested. This creates a risk for regressions and makes it difficult to maintain code quality.

Changes 🏗️

Added comprehensive tests for extract_dict_from_json function (10 new test cases)
- JSON extraction from code blocks (both json and JSON markers)
- JSON extraction from text with embedded JSON objects
- Plain JSON dictionary parsing
- Nested dictionary structures
- JSON with fixable syntax issues (extra commas, etc.)
- Error validation for non-dict values (lists, strings, numbers)
- Error handling for completely invalid JSON
Added comprehensive tests for extract_list_from_json function (10 new test cases)
- JSON extraction from code blocks (both json and JSON markers)
- JSON extraction from text with embedded JSON arrays
- Plain JSON list parsing
- Nested list structures
- Mixed-type lists (numbers, strings, booleans, objects)
- JSON with fixable syntax issues (trailing commas, etc.)
- Error validation for non-list values (dicts, strings, numbers)
- Error handling for completely invalid JSON
Enhanced json_loads error handling tests (3 new test cases)
- Invalid JSON string handling
- Empty string behavior (returns None)
- Whitespace-only string behavior (returns None)
Fixed conftest.py to remove problematic VCR plugin reference that was preventing tests from running

Test Coverage Results:

forge/json/parsing.py: 42% → 100% coverage (all 43 lines now covered)
Overall project coverage: 9% → 11% improvement
All 52 new tests pass successfully

Checklist 📋

For code changes:

I have clearly listed my changes in the PR description
I have made a test plan
I have tested my changes according to the test plan:
- Run all new JSON parsing tests to ensure they pass (pytest forge/json/test_parsing.py -v)
- Verify 100% coverage for the JSON parsing module (pytest --cov=forge.json.parsing --cov-report=term-missing forge/json/test_parsing.py)
- Test actual functionality with sample JSON strings to ensure the functions work correctly
- Run existing tests to ensure no regressions (pytest forge/utils/test_file_operations.py forge/utils/test_url_validator.py)
- Verify overall project coverage improvement

For configuration changes:

.env.example is updated or already compatible with my changes (no env changes needed)
docker-compose.yml is updated or already compatible with my changes (no docker changes needed)
I have included a list of my configuration changes in the PR description (no configuration changes)

@tpsaint can click here to continue refining the PR

- Add comprehensive tests for extract_dict_from_json function - Add comprehensive tests for extract_list_from_json function - Add additional error handling tests for json_loads function - Achieve 100% test coverage for forge/json/parsing.py module - Fix conftest.py to remove problematic VCR plugin reference The new tests cover: - JSON extraction from code blocks (both lowercase and uppercase) - JSON extraction from text with embedded JSON - Plain JSON parsing for dicts and lists - Nested data structures - JSON with fixable syntax issues - Comprehensive error cases for type validation - Edge cases like empty strings and whitespace

…failure exit (Significant-Gravitas#12918) ## Background [SECRT-2275](https://linear.app/autogpt/issue/SECRT-2275). User report: when a copilot ("autopilot") turn is interrupted by a usage-limit, tool-call-limit, or other run interruption, the user's recent work disappears. User described it as: "my initial message was lost 3 times and it disappeared, then when I would say 'continue' it would do a random old task." Investigation surfaced two distinct failure modes. This PR addresses both. - **Mode 1** — rate-limit (or other pre-stream rejection) at turn start: the user's text only ever lives in the optimistic `useChat` bubble; the backend rejects before the message is persisted, so the bubble is a lie and a refresh / retry would lose the text. - **Mode 2** — long-running turn interrupted mid-stream: the entire turn's progress (assistant text, tool calls, reasoning) vanishes on interruption — what users describe as "the turn is gone." ## Mode 1 — frontend: restore unsent text on 429 Backend can't recover this on its own: `check_rate_limit` raises before `append_and_save_message`, so by the time the 429 surfaces there is no DB row to roll forward. See `autogpt_platform/backend/backend/api/features/chat/routes.py:916-922` (rate-limit check) and `routes.py:945` (later append-and-save). Frontend fix in `autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts`: when `useChat`'s `onError` reports a usage-limit error, we - drop the optimistic user bubble (DB has no record of it, so leaving it would be a phantom), - push `lastSubmittedMsgRef.current` back into the composer via the existing `setInitialPrompt` slot — the same slot URL pre-fills use, so `useChatInput`'s `consumeInitialPrompt` effect picks it up automatically, - clear `lastSubmittedMsgRef` so the dedup guard doesn't block re-send. In-memory only; surviving a hard refresh while rate-limited is a separate follow-up (would need localStorage persistence with TTL). Test: `autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts` — verifies the composer is repopulated and the optimistic bubble is dropped on a 429. ## Mode 2 — backend: preserve interrupted partial in DB ### Root cause The SDK retry loop in `stream_chat_completion_sdk` always rolls back `session.messages` to the pre-attempt watermark on any exception. That rollback is correct **before a retry** so attempt #2 doesn't duplicate attempt #1's content. But it runs **before the retry decision is made**, so when retries are exhausted (or no retry is attempted) the partial work is discarded too. Three branches of the retry loop ended in a final-failure state with side effects worse than just losing the partial: - `_HandledStreamError` non-transient: rollback then add error marker — partial gone - `Exception` with `events_yielded > 0`: rollback then break — **no error marker added either**, so on refresh the chat looks like nothing happened even though the user just watched tokens stream live - `Exception` non-context-non-transient + the while-`else:` exhaustion path: same, no marker - Outer except (cancellation, GeneratorExit cleanup): didn't restore captured partial ### Fix `autogpt_platform/backend/backend/copilot/sdk/service.py`: 1. **`_InterruptedAttempt` dataclass** — holds the rolled-back `partial: list[ChatMessage]` + optional `handled_error: _HandledErrorInfo`. Three methods drive the contract: - `capture(session, transcript_builder, transcript_snap, pre_attempt_msg_count)` — slices `session.messages`, restores the transcript, strips trailing error markers to prevent duplicate markers after restore. - `clear()` — drops captured state on a successful retry so outer cleanup paths don't replay pre-retry content. - `finalize(session, state, display_msg, retryable=...) -> list[StreamBaseResponse]` — re-attaches partial, synthesizes `tool_result` rows for orphan `tool_use` blocks, appends the canonical error marker, and returns the flushed events so the caller can yield them to the client (no double-flush). 2. **`_flush_orphan_tool_uses_to_session(session, state) -> list[StreamBaseResponse]`** — synthesizes `tool_result` rows for any `tool_use` that never resolved before the error so the next turn's LLM context stays API-valid (Anthropic rejects orphan tool_use). Uses the public `adapter.flush_unresolved_tool_calls` and returns the events for the caller to yield. 3. **`_classify_final_failure(...) -> _FinalFailure | None`** — picks the display message + stream code + retryable flag for the final-failure exit. One source of truth for the in-history error marker and the client-facing `StreamError` SSE yield so they can't drift. 4. **Consolidated post-loop emit**: the former three scattered blocks (partial restore + redundant re-flush + two separate `yield StreamError` sites) collapsed to one block driven by `_classify_final_failure` → `_FinalFailure` → `finalize()` → yield events + single `StreamError`. 5. **Adapter `flush_unresolved_tool_calls`** (renamed from `_flush_unresolved_tool_calls` to drop the `# noqa: SLF001` suppressors on cross-module callers). Each retry-loop rollback site calls `interrupted.capture(...)`; the success break calls `interrupted.clear()`; the post-loop failure block calls `interrupted.finalize(...)` exactly once. The baseline service already preserves partial work via its existing finally block — no change needed there. ## Tests Backend (`backend/copilot/sdk/interrupted_partial_test.py`, new, 18 tests): - `TestInterruptedAttemptCapture` — slice semantics + stale-marker stripping - `TestInterruptedAttemptFinalize` — appends partial then marker, handles empty partial, no-op on `None` session, flushes unresolved tools between partial and marker, returns flushed events for caller to yield - `TestFlushOrphanToolUses` — synthesizes `tool_result` rows, returns events, no-op on None state / no unresolved - `TestClassifyFinalFailure` — handled_error wins, attempts_exhausted, transient_exhausted, stream_err fallback, returns None on success path - `TestRetryRollbackContract` — end-to-end: capture + finalize yields the exact content the user saw streaming live plus the error marker 1022 total SDK tests pass (baseline + new). Frontend (`useCopilotStream.test.ts`): 1 new test — `restores the unsent text and drops the optimistic user bubble on 429 usage-limit`. ## Out of scope - Frontend rendering tweaks for the interrupted-turn marker (existing error-marker rendering already works). - Refresh-survival of the unsent text in Mode 1 (would require localStorage persistence with TTL) — separate follow-up. - Hard process-kill / OOM where Python `finally` doesn't run — needs a different mechanism (pod-level checkpoint sweeper). ## Checklist - [x] My code follows the style guidelines of this project (black/isort/ruff via `poetry run format`) - [x] I have performed a self-review of my own code - [x] I have added relevant unit tests - [x] I have run lint and tests locally (1022 SDK tests pass) ## Test plan - [ ] Verify a long-running turn that hits transient-retry exhaustion preserves partial assistant text + tool results in chat history after refresh - [ ] Verify the next user message after an interrupted turn carries enough context that the model can continue the prior task instead of inventing a new one - [ ] Verify a successful retry (attempt #1 fails, attempt #2 succeeds) shows ONLY attempt #2's content (no leaked partial from #1) - [ ] Verify hitting daily usage limit at turn start re-populates the composer with the unsent text and removes the optimistic user bubble --------- Co-authored-by: Claude <noreply@anthropic.com>

…icant-Gravitas#13030) ## Why Egress on the `agpt-prod` and `agpt-test` Supabase projects is dominated by a single query: ```sql SELECT … FROM "platform"."NotificationEvent" WHERE "userNotificationBatchId" IN ($1) OFFSET $2 ``` Per `pg_stat_statements`: | | calls | rows returned | |---|---:|---:| | prod | 455,803 | 281,229,277 | | dev | 820,787 | 347,820,981 | It's the #1 query by row volume in both environments. The shape (`IN ($1)` with a single bind param) is Prisma materializing an eager `include={"Notifications": True}` — one IN-with-one-id per batch, run per polling iteration. The email batcher in `notifications.py::_process_existing_batches` polls open batches per `NotificationType`, and: 1. `get_all_batches_by_type` returns every open batch with all events eager-loaded — but the caller only reads `batch.user_id`. 2. `get_user_notification_oldest_message_in_batch` fetches the batch + all events and sorts them in Python just to grab `[0]`. So every poll iteration drags the full event payload back even when nothing is aged out and nothing will be sent. The existing fix in `create_or_add_to_user_notification_batch` already documents this exact failure mode for heavy `AGENT_RUN` users (see [notifications.py:467-472](https://github.com/Significant-Gravitas/AutoGPT/blob/dev/autogpt_platform/backend/backend/data/notifications.py#L467-L472)), but the read paths still had it. ## What - `get_user_notification_oldest_message_in_batch` queries `NotificationEvent` directly with `order={"createdAt": "asc"}` and returns the single oldest row. - `get_all_batches_by_type` drops the `include={"Notifications": True}`; the caller re-fetches events per batch via `get_user_notification_batch` only when an email actually needs to send. ## How Both changes are equivalent at the call-site level: - `_process_existing_batches` only reads `batch.user_id` from the list result and `oldest_message.created_at` from the oldest-event result — both still satisfied. - `_should_email_user_based_on_aged_batch` (the other caller of the oldest-message query) only reads `oldest_message.created_at` — also satisfied. - `UserNotificationBatchDTO.from_db` already tolerates `model.Notifications` being `None` (`for n in model.Notifications or []`), so the dropped include is safe. Three integration tests added in `notifications_test.py` covering the new contract: oldest event returned by `createdAt`, `None` on empty batch, and `get_all_batches_by_type` not loading events. ## Checklist - [x] My code follows the project's style guidelines - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas (n/a — minimal change, behaviour is obvious) - [x] I have made corresponding changes to the documentation (n/a) - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective - [x] New and existing unit tests pass locally with my changes (lint/format clean; integration tests run in CI against the test DB) - [x] Any dependent changes have been merged and published in downstream modules

…ight caps (Significant-Gravitas#13069) ## Why We hard-capped concurrent AutoPilot turns at 15 per user as a hotfix, which rejected the 16th request with HTTP 429 — blunt UX, easy to hit by accident. This PR keeps the safeguard but introduces a **soft running cap of 5** with a **FIFO queue up to 15 in-flight (running + queued)**. The user can submit beyond 5; the dispatcher auto-promotes queued sessions as running slots free. ## What - **Soft cap: 5 running tasks per user** (configurable: `max_running_copilot_turns_per_user`). - **Hard cap: 15 in-flight (running + queued) per user** (existing setting, semantics shifted from "concurrent" to "in-flight"). - **FIFO queue at the SESSION level.** `ChatSession.chatStatus` is a single text enum: `idle` (default) | `queued` | `running` (open enum). The user's pending message is just a normal `ChatMessage` row; the session carries the lifecycle. - **Cancel:** the existing `POST /sessions/{session_id}/cancel` handles both states uniformly. Queued sessions flip back to `idle` (no executor cancel needed); running sessions publish a RabbitMQ cancel event as before. No dedicated `/queued-tasks/*` endpoints. - **Frontend:** Queued-state badge on the latest user message of any session whose `chat_status === 'queued'`. Sidebar shows a green pulsing dot for `running` and a purple hourglass for `queued` so the user can see at a glance which of their chats are in flight. Cancel button on the badge calls the same session-cancel endpoint. ## How | Layer | Storage | Purpose | |---|---|---| | Session lifecycle | `ChatSession.chatStatus` (this PR) | Soft cap (count `running`) + queue (count `queued`), single source of truth | | Dispatcher payload | `ChatMessage.metadata` JSONB (this PR) | Submit-time `file_ids` / `mode` / `model` / `permissions` / `context` / `request_arrival_at` stashed on the queued user row so promotion can replay the turn faithfully | | Execution | RabbitMQ (existing) | Worker pool, retries | ### Queue lifecycle — where it lives, who clears it **Storage.** The queue lives entirely in Postgres (no Redis sorted set, no Lua). Per session: a single `chatStatus` text column (`idle | queued | running`, open enum), plus the user's pending message persisted as a normal `ChatMessage` row with the dispatcher's submit-time payload stashed in `metadata` JSONB. **Submission path** (`backend/copilot/turn_queue.py:enqueue_turn`): 1. HTTP route tries immediate dispatch via `acquire_turn_slot`. If user is at the 5-running cap, `ConcurrentTurnLimitError` is raised. 2. Route catches the error and falls through to `try_enqueue_turn` which: checks the 15 in-flight hard cap, persists the user message + metadata, then CAS-flips `ChatSession.chatStatus` `idle` → `queued`. Returns an empty-stream response so the SSE client knows the message landed. **Promotion path** (`backend/copilot/turn_queue.py:dispatch_next_for_user`) — **fires per-user, not globally**: 1. **Trigger #1:** `mark_session_completed` (in `stream_registry.py`) — after every turn completes (success / failure / cancel), the session's `user_id` is read from Redis meta and `dispatch_next_for_user(user_id)` is invoked. The dispatcher errors are swallowed with a loud log so a queue hiccup never breaks the completion path. 2. **Trigger #2:** periodic timer (covers missed dispatch events / GKE rolling restarts). 3. Picks the user's oldest queued session via `list_chat_sessions_by_status(user_id, queued)` ordered by `updatedAt`. 4. Re-validates paywall + per-window USD cap — paywalled / rate-limited sessions stay `queued` for the next tick. 5. CAS-claims `queued` → `running`. Recovers the user message + submit-time `metadata`, builds a `TurnSlot`, calls `dispatch_turn`. On any error during dispatch, rolls the session back to `queued`. **Cancellation** (`POST /sessions/{id}/cancel`): single endpoint handles both lifecycle states — queued sessions flip back to `idle` (no executor cancel needed); running sessions publish a RabbitMQ cancel event as before. The cap and queue queries are both `count` / `find_many` on `ChatSession` by `chatStatus`. Both running-turn tracking and queue admission are non-locked CAS-then-count — same TOCTOU tolerance the graph-execution credit rate-limit accepts on its INCRBY path. Going briefly to 16/17 in-flight under burst is acceptable; the cap is a safeguard, not a budget. The DB-manager surface is **4 generic ChatSession primitives** (`count_chat_sessions_by_status`, `list_chat_sessions_by_status`, `update_chat_session_status` with optional `expect_status` CAS gate, `get_chat_session_status`) plus the existing `add_chat_message` extended with optional `message_id` + `metadata`. Adding a new lifecycle state is a code-only change at call sites. DB access goes through `backend.data.db_accessors.chat_db()` so the dispatcher works from both the HTTP server (Prisma directly) and the CoPilotExecutor subprocess (RPC via `DatabaseManager`). Route gates (`is_turn_in_flight`, `acquire_turn_slot`) treat both `queued` and `running` as "in flight", so a resubmit to a queued session lands in the pending buffer or falls through to the cross-session queue rather than racing the dispatcher. ## Test plan - [x] Backend unit tests — `active_turns_test.py`, `turn_queue_test.py`, `db_test.py` (60+ tests covering admission, refresh, release, queued-collision, cap-rollback, dispatcher branches: paywall/rate-limited stays queued, rate-limit unavailable, happy path, dispatch failure → restore). - [x] **New integration tests** in `stream_registry_test.py`: pin the per-user slot-free dispatcher invocation on `mark_session_completed`, plus the error-swallowing behaviour so a queue hiccup never breaks the turn completion path. - [x] Migration scoped to additive nullable column + an additional non-partial index (cheap on Postgres, no table rewrite, no backfill needed). - [x] Frontend integration tests (Vitest): queued-state badge, cancel button, error paths (404 silent, 5xx destructive toast, network error), sidebar running/queued/idle indicators. - [x] Live UI proof inline on the PR ([scroll down](Significant-Gravitas#13069 (comment))): Queued badge, pill tooltip, cancel-hover red state, after-cancel hidden — all rendered by the production component against a real backend running this branch. Sidebar running-dot + queued-hourglass also screenshotted. - [x] CI: lint, types, integration_test, end-to-end tests. ### Stuck-running bugs found and fixed in this PR Audit during review surfaced multiple "DB says running, executor isn't actually running it" failure modes. Every one fixed in this PR; none deferred. | # | Scenario | Symptom | Fix | |---|---|---|---| | 1 | Executor crashes mid-turn, Redis meta TTLs out, DB `chatStatus` stays `running` | Sidebar green dot forever; new submits buffer indefinitely | `cancel_session_task` now calls `release_turn_slot` on the no-active-session branch — the user clicking Cancel always clears the orphan. `get_session` route also resets when DB says `running` but Redis is empty, so just opening the stuck chat clears it. | | 2 | `acquire_turn_slot` flips DB `idle → running` but the request aborts before `dispatch_turn` runs | Same orphan, no Redis meta written | Same fixes as #1 catch this on the next cancel/open. The new periodic sweep below covers the no-interaction case. | | 3 | `dispatch_turn` succeeds at `create_session` (Redis meta written) but fails at the RabbitMQ enqueue | Redis hash sits at `status='running'` until TTL; `is_turn_in_flight` keeps reporting in-flight | `dispatch_turn` now uses `try/finally` on a `committed` flag — Redis meta is deleted on ANY non-happy-path exit including `CancelledError` (which `except Exception` would miss). Cleanup runs inside `dispatch_turn` itself so both the HTTP `schedule_chat_turn` path and the queue dispatcher path are covered. | | 4 | `mark_session_completed` itself fails mid-completion (Redis blip during the CAS) | DB never flips back to `idle` | Same chat-open / cancel reset fixes handle this. No proactive sweep was viable here because we can't distinguish "legit long-running tool call" from "stuck" without a Redis side check. | | 5 | Dispatcher rollback used `except Exception`, leaking on `CancelledError` | Task cancellation during dispatch could leave DB stuck at `running` | Switched to `except BaseException` in `dispatch_next_for_user` so cancellation still rolls back the DB claim. Pairs with the `try/finally` in `dispatch_turn` for full Redis+DB symmetry. | | 6 | `chatStatus` was an open `TEXT` column with no DB-level validation | A typo like `"runnin"` could persist and break the cap-count | Converted to a Postgres enum `ChatSessionStatus` (`idle | queued | running`) — invalid values now rejected at the DB layer. Future states need a tiny `ALTER TYPE ADD VALUE` migration (cheap on PG 12+). | **Why no periodic sweep.** The reactive cleanups (rows 1 and 2) catch every user-visible path: clicking Cancel and opening the chat both reset the orphan. The existing 6h+5min inline stale-CAS in `get_active_session` remains the ultimate backstop. Adding a per-pod APScheduler job would chase the narrow "user has stuck sessions but never opens them and never cancels" case at the cost of constant operational surface — not worth it for this rare scenario. All paths that mutate `chatStatus` now also clean the matching Redis state (either inline or via the sweep) — the two layers stay in sync. ### Cap + queue safety against double-promotion (carry-over from earlier review) - `mark_session_completed` does `release_turn_slot` (running → idle) BEFORE `dispatch_next_for_user`. Two concurrent completions release two slots first, then race the CAS on the same head — only one wins per slot, so at most N promotions for N releases. Cap holds. - `claim_queued_session` is an atomic Postgres `UPDATE … WHERE chatStatus='queued' AND id=…`: keyed to the specific head row, only one CAS matches. - All db.py functions exposed on `DatabaseManager` return primitives or DTOs (`ChatSessionInfo`, `ChatMessage`) — no raw Prisma rows cross the RPC boundary, so the executor subprocess can safely route through `chat_db()`.

…avitas#13370) ### Why / What / How **Why:** A TAC Security CASA assessment (ESOF AppSec ADA) of `platform.agpt.co` flagged several missing/leaky HTTP response headers. None are critical (overall ESOF score 9.1/10, "Low"), but all must be remediated to pass the CASA Tier 2 & Tier 3 assessments. Verified live today: prod sends `X-Powered-By: Next.js` and is missing both `X-Frame-Options` and `X-Content-Type-Options`. **What:** Adds baseline security response headers in the frontend Next.js config and disables the framework-fingerprinting header. **How:** Defines the headers in `nextConfig.headers()` (applied to `/:path*`) and sets `poweredByHeader: false`. The existing `Document-Policy: js-profiling` header was previously nested inside the Sentry plugin options (2nd arg to `withSentryConfig`), where it does not apply on the non-Sentry build path — it's moved into the Next config so every build path emits the full header set. ### Changes 🏗️ - `autogpt_platform/frontend/next.config.mjs`: - `poweredByHeader: false` → removes `X-Powered-By: Next.js` (finding #5; partially addresses proxy disclosure #3) - `X-Frame-Options: SAMEORIGIN` → anti-clickjacking (finding #2) - `X-Content-Type-Options: nosniff` → anti-MIME-sniffing (finding #4) - Moved `Document-Policy: js-profiling` from the Sentry options into `nextConfig.headers()` so it (and the new headers) apply on all build paths **Not addressed here (platform-level / separate):** - CORS `Access-Control-Allow-Origin: *` on `/_next/static/media/*` fonts (#1) — injected by Vercel for font assets; overriding risks breaking Next's `crossorigin` font preloads. Needs a separate, deploy-verified change. - `Server: Vercel` header and TRACE/TRACK methods (#3) — injected at the Vercel edge, no in-repo lever. - Info-only items #6/#7/#8/#9 are scanner observations or already handled (backend cache-protection middleware sends `no-store` on sensitive routes). `SAMEORIGIN` was chosen over `DENY` so same-origin embedding (share/preview flows) keeps working; easy to tighten if the app is never framed. ### Checklist 📋 #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes (no env changes) - [x] `docker-compose.yml` is updated or already compatible with my changes (no compose changes) - [x] I have included a list of my configuration changes in the PR description (under **Changes**) #### Test plan: - [x] `pnpm format`, `pnpm lint`, `pnpm types` pass locally - [ ] After deploy, verify response headers on `platform.agpt.co`: - [ ] `X-Frame-Options: SAMEORIGIN` present - [ ] `X-Content-Type-Options: nosniff` present - [ ] `X-Powered-By` absent - [ ] `Document-Policy: js-profiling` still present (Sentry profiling unaffected) - [ ] App still renders and fonts load (no clickjacking/embedding regressions) 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Low Risk** > Config-only HTTP header changes with no auth or data-path changes; `SAMEORIGIN` framing is a deliberate tradeoff that should be smoke-tested after deploy for embed/preview and font loading. > > **Overview** > Addresses CASA assessment gaps on the Next.js frontend by **disabling** `X-Powered-By: Next.js` via `poweredByHeader: false` and **adding** global response headers on `/:path*` through `nextConfig.headers()`: `X-Content-Type-Options: nosniff`, `X-Frame-Options: SAMEORIGIN`, and `Document-Policy: js-profiling`. > > **`Document-Policy`** (and the new security headers) were previously defined only inside the optional `withSentryConfig` wrapper, so builds that skip the Sentry webpack plugin never emitted them. They now live on the base Next config so every build path gets the same header set. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 2c6931e. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Gravitas#13371) > **Draft — needs a post-deploy header check before merge (see Testing).** ### Why / What / How **Why:** The TAC Security CASA (ESOF AppSec ADA) assessment of `platform.agpt.co` flags an "overly permissive CORS policy" (finding #1, CWE-264): self-hosted Next.js fonts under `/_next/static/media/*` respond with `Access-Control-Allow-Origin: *`. All CASA findings must be cleared for the Tier 2 & Tier 3 assessments (currently re-expiring). **What:** Scope the `Access-Control-Allow-Origin` header on the font path from `*` to our own origin. **How:** Adds a `nextConfig.headers()` rule for `/_next/static/media/:path*` setting `Access-Control-Allow-Origin: https://platform.agpt.co`. These fonts are only loaded by our own same-origin pages, so a same-origin fetch is unaffected by the header value — this should not change how fonts render. ### Changes 🏗️ - `autogpt_platform/frontend/next.config.mjs`: add `headers()` rule restricting font CORS to our origin. ### ⚠️ Reviewer note — must verify after deploy Vercel's "Ignored Build Step" cancels the preview build for this PR, so the header could **not** be verified pre-merge. The header on `/_next/static/media/*` is currently emitted by the Vercel/Next.js static layer; this config override may or may not replace it. **Please confirm on a real deploy (dev/staging) before merging:** ```bash curl -sI "https://<deploy-url>/_next/static/media/<any>.woff2" | grep -i access-control-allow-origin # want: a single "access-control-allow-origin: https://platform.agpt.co" # NOT: "*", and NOT two ACAO headers (duplicate = the override didn't replace it) ``` If the override does **not** take (still `*` or duplicated), the fallback is a Vercel project-level header rule or accepting it with a CASA justification (fonts are non-sensitive, same-origin only). Failure mode is safe either way: fonts load the same regardless, since they're fetched same-origin. This overlaps `next.config.mjs` with Significant-Gravitas#13370 (baseline security headers); whichever merges second will need a trivial `headers()` merge. ### Checklist 📋 #### For configuration changes: - [x] `.env.default` / `docker-compose.yml` unaffected (no env/compose changes) - [x] Configuration changes listed above under **Changes** - [x] `pnpm format` / `pnpm types` pass locally - [ ] **Post-deploy:** font `Access-Control-Allow-Origin` is the origin (not `*`, not duplicated) - [ ] **Post-deploy:** app renders and fonts load normally 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Swifty <craigswift13@gmail.com> Co-authored-by: Ubbe <hi@ubbe.dev>

github-actions · 2026-06-26T03:50:32Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

tpsaint force-pushed the master branch from 19203f0 to e2711b1 Compare June 26, 2026 03:47

github-actions Bot added the conflicts label Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: improve JSON parsing module test coverage#1

test: improve JSON parsing module test coverage#1
tpsaint wants to merge 1 commit into
masterfrom
test/improve-json-parsing-coverage

tpsaint commented Jul 20, 2025

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tpsaint commented Jul 20, 2025

Changes 🏗️

Checklist 📋

For code changes:

For configuration changes:

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants