Skip to content

[Snyk] Fix for 3 vulnerabilities#22

Open
snyk-io[bot] wants to merge 1558 commits into
devfrom
snyk-fix-f4d2104e1907e39a01134b4a6140d9ce
Open

[Snyk] Fix for 3 vulnerabilities#22
snyk-io[bot] wants to merge 1558 commits into
devfrom
snyk-fix-f4d2104e1907e39a01134b4a6140d9ce

Conversation

@snyk-io

@snyk-io snyk-io Bot commented Jun 26, 2026

Copy link
Copy Markdown

snyk-top-banner

Snyk has created this PR to fix 3 vulnerabilities in the pnpm dependencies of this project.

Snyk changed the following file(s):

  • autogpt_platform/frontend/package.json
⚠️ Warning
Failed to update the pnpm-lock.yaml, please update manually before merging.

Vulnerabilities that will be fixed with an upgrade:

Issue Score
critical severity Missing Authorization
SNYK-JS-VITEST-17375131
  ****  
critical severity Missing Authorization
SNYK-JS-VITESTCOVERAGEV8-17375132
  ****  
critical severity Arbitrary Command Injection
SNYK-JS-SHELLQUOTE-16799355
  535  

Breaking Change Risk

Merge Risk: Medium

Notice: This assessment is enhanced by AI.


Important

  • Check the changes in this PR to ensure they won't cause issues with your project.
  • Max score is 1000. Note that the real score may have changed since the PR was raised.
  • This PR was automatically created by Snyk using the credentials of a real user.

Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.

For more information:
🧐 View latest project report
📜 Customise PR templates
🛠 Adjust project settings
📚 Read about Snyk's upgrade logic


Learn how to fix vulnerabilities with free interactive lessons:

🦉 Arbitrary Command Injection

ntindle and others added 30 commits May 1, 2026 06:55
Removed COLLABORATOR checks for comments and reviews in the workflow.
…ficant-Gravitas#12980)

### Why / What / How

**Why.** Three bugs, one PR — all rooted in the onboarding wizard's lack
of persistence:

1. **422 on `/api/onboarding/profile` mid-wizard.** The wizard store was
in-memory and `useOnboardingPage`'s init effect called `reset()` on
every mount. A user who refreshed mid-flow (or navigated to `?step=4`
directly) hit the SubscriptionStep with empty `name` / `role`; clicking
*Get Pro* / *Upgrade to Max* advanced to Preparing, which POSTed empty
`user_name` / `user_role` and got rejected with `string_too_short`.
2. **LOCAL crash on plan select.** The Stripe checkout path is
unconditional. On `BEHAVE_AS=LOCAL` the backend has no Stripe wiring, so
plan selection blew up trying to start a Checkout session.
3. **Welcome detour after completion.** `handlePreparingComplete` and
`checkCompletion` called `reset()`, which set `currentStep=1` after
`router.replace("/copilot")` had already been queued. The URL-sync
effect then fired a second `router.replace("/onboarding?step=1")` that
won, stranding the user on Welcome until they refreshed.

**What.**
- Wrap the wizard store with `zustand/middleware`'s `persist`
(sessionStorage, matches the existing `STEP_STORAGE_KEY` ceiling).
`partialize` excludes `currentStep` — the URL stays the source of truth
for which step the user is on.
- Drop the unconditional `reset()` from the init effect so persisted
form data survives refreshes.
- Replace `reset()` with
`useOnboardingWizardStore.persist.clearStorage()` on completion paths to
clean storage without mutating in-memory state (no spurious re-render,
no URL-sync race).
- Short-circuit `handlePlanSelect` on `environment.isLocal()` to skip
the profile POST + Stripe Checkout and advance straight to Preparing;
cloud path is unchanged.

**How.**
- `store.ts` — `persist(...)` with SSR-safe `createJSONStorage` (no-op
stub during SSR/vitest), `partialize` exposing only the form fields.
`version: 1` so future schema changes have a migration anchor.
- `useOnboardingPage.ts` — remove `reset()` from init effect; swap
completion-path `reset()` calls for `persist.clearStorage()`. Comment
block explains the URL-sync race so the next reader doesn't reintroduce
it.
- `useSubscriptionStep.ts` — early-return on `environment.isLocal()`
after the TEAM/inflight guards, before any Stripe code.
- Tests:
- `page.test.tsx` — regression test that pre-set form data isn't wiped
on mount (the bug the persist change fixes).
- `SubscriptionStep.test.tsx` — `vi.spyOn(environment,
"isLocal").mockReturnValue(false)` in `beforeEach` so existing
Stripe-path tests stay in cloud mode; new test flips it `true` and
asserts no Stripe / profile request fires and `currentStep` advances to
5.

### Changes 🏗️

- `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts` —
`zustand/middleware`'s `persist` with sessionStorage + `partialize`
excluding `currentStep`.
-
`autogpt_platform/frontend/src/app/(no-navbar)/onboarding/useOnboardingPage.ts`
— drop `reset()` from init; switch completion `reset()` to
`persist.clearStorage()`.
-
`autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/SubscriptionStep/useSubscriptionStep.ts`
— early-return on `environment.isLocal()` to skip Stripe.
-
`autogpt_platform/frontend/src/app/(no-navbar)/onboarding/__tests__/page.test.tsx`
— new "preserves form data on mount" regression test.
-
`autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/SubscriptionStep.test.tsx`
— force cloud mode in `beforeEach`; add LOCAL bypass test.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] On `BEHAVE_AS=LOCAL`: complete onboarding through Welcome → Role →
PainPoints → Subscription, click *Get Pro* → no Stripe Checkout, lands
on Preparing, then `/copilot` (no Welcome bounce, no refresh needed).
Repeat with *Upgrade to Max*.
- [ ] On `BEHAVE_AS=LOCAL`: click *Contact sales* (Team) → Tally form
opens in new tab; wizard stays on step 4.
- [ ] On `BEHAVE_AS=CLOUD`: existing Stripe Checkout flow unchanged —
*Get Pro* / *Upgrade to Max* redirect to Stripe with the right
`success_url` / `cancel_url`; success returns to
`?step=5&subscription=success` and lands on `/copilot`; cancel returns
to `?step=4&subscription=cancelled`.
- [ ] Mid-wizard refresh on step 4 with `payments` flag enabled: name /
role / pain points stay filled in (no 422, no re-fill required).
- [ ] Direct nav to `/onboarding?step=4` without prior data: clamps back
to the highest reached step (or step 1 on a fresh session) — no
fast-forward into Subscription.
- [ ] Returning user with `VISIT_COPILOT` already complete: redirects to
`/copilot` cleanly with no Welcome flash.

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ink cleanup (Significant-Gravitas#12979)

### Why / What / How

**Why** — Follow-up to the settings v2 cleanup (Significant-Gravitas#12976). Several rough
edges remained after the route migration:

- The Team-tier "Talk to sales" button was firing into a 404
(`agpt.co/contact-sales` no longer exists).
- A bunch of in-app links were still pointing to the deprecated
`/profile/*` settings routes, which now route through the legacy layout
instead of the new `/settings/*` pages.
- The MCP integration entry in the "Connect a service" dialog was a dead
end — the provider showed up in the list but the detail view said "No
connection method available".
- `/settings/creator-dashboard` shipped a bulk-select feature (checkbox
column, select-all bar, bulk-delete dialog) that nobody is using; it
added clutter to the row UI.
- `/settings` (no subpath) flashed an empty page during the client-side
redirect to `/settings/profile`.
- The submissions API was issuing a redundant `COUNT(*)` per request
because `_get_submission_stats` already returns `total`.

**What** — One PR, several small fixes packaged together because they
all touch the same surface (creator dashboard / integrations / settings
linking).

**How** — Most changes are one-liners or small file deletions. Two are
slightly bigger:
- Backend `_get_submission_stats` is reused for pagination total, so
`get_store_submissions` drops the duplicate `.count()` call. Stats
themselves use a single Postgres query with `FILTER` clauses for
total/approved/pending/total_runs/average_rating.
- MCP setup is now rendered inline in the Connect Service dialog via a
new `McpConnectPanel` component when `provider.id === "mcp"`. Server URL
→ tries OAuth via popup → on 400 falls back to bearer-token input → on
success invalidates `getGetV1ListCredentialsQueryKey()` so the
credential shows up in the integrations list.

### Changes 🏗️

**Backend (`backend/api/features/store`)**
- Add `SubmissionStats` payload to `StoreSubmissionsResponse` (total,
approved, pending, total_runs, average_rating) computed in one Postgres
`FILTER` round-trip.
- Reuse `stats.total` for pagination instead of running a separate
`COUNT(*)` query.

**Frontend creator dashboard (`/settings/creator-dashboard`)**
- Remove the bulk-select feature: checkbox column + selection bar +
bulk-delete dialog + `useSubmissionSelection` hook +
`SubmissionSelectionBar` + `MobileSelectionBar` are all deleted.
- Update tests to drop the now-irrelevant selection cases.

**Frontend integrations (`/settings/integrations`)**
- New `McpConnectPanel` rendered inside the Connect Service dialog
DetailView when the user picks the Mcp provider. OAuth-first with
bearer-token fallback when the server returns 400.
- Light `pl-1` polish on the DetailView header to align the back button
+ avatar with the body content.

**Frontend settings link migration**
- Replace stale `/profile/*` deep links across the platform with the new
`/settings/*` routes:
- `/profile/dashboard` → `/settings/creator-dashboard` (publish modal
"View progress")
- `/profile/credits` → `/settings/billing` (usage limits, briefing
panel, wallet refill)
- `/profile/integrations` → `/settings/integrations` (integrations setup
wizard)
- `/profile/settings` → `/settings/account` (timezone notices in
scheduling dialogs)
- Legacy `/profile/(user)/layout.tsx` self-nav left alone — that whole
route group is being deprecated in a follow-up PR.

**Frontend billing**
- `TEAM_UPGRADE_URL` switched from `https://agpt.co/contact-sales` (404)
to the Tally intake form `https://tally.so/r/2Eb9zj`. Already opens with
`target="_blank"` + `noopener,noreferrer`.

**Frontend settings index**
- `/settings` renders a content skeleton during the client-side redirect
to `/settings/profile`, so the page no longer flashes empty.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open `/settings/billing` on a MAX plan → click "Talk to sales —
Team" → Tally form opens in a new tab.
- [x] On `/settings/integrations` → "Connect Service" → pick "Mcp" →
enter a server URL → confirm OAuth popup launches; for an OAuth-less
server, confirm bearer-token fallback appears and saves the credential.
- [x] Confirm the new MCP credential shows up in the integrations list
after connect.
- [x] On `/settings/creator-dashboard` → confirm no checkbox column, no
select-all bar, no bulk-delete dialog. Row-level "Edit"/"Delete" via the
dropdown still works.
- [x] Submit an agent for review from `/build` → "View progress" → lands
on `/settings/creator-dashboard` (was 404 before).
- [x] Click any wallet "Add credits" / "Manage billing" / usage-limit
link → lands on `/settings/billing` (not `/profile/credits`).
- [x] Open the schedule dialog with no timezone set → "Set your
timezone" link goes to `/settings/account`.
- [x] Hit `/settings` directly → see the content skeleton, then redirect
to `/settings/profile`.
- [x] Verify `/v2/store/submissions` response contains `stats: { total,
approved, pending, total_runs, average_rating }` and pagination
`total_items` matches `stats.total`.
…t-Gravitas#12981)

### Why / What / How

**Why:** New users signing up on the platform currently receive a `PRO`
subscription tier by default — a holdover from the beta period. This
gives free, unrestricted access to all platform features without ever
hitting a paywall. SECRT-2295 requires closing this gap so new signups
land on `NO_TIER` and are prompted to subscribe.

**What:** Changes the default subscription tier for newly created users
from `PRO` to `NO_TIER` across the database schema, migration, and
Python model fallbacks. Existing users are **not** affected — their tier
remains whatever it already is.

**How:**
- Updates the Prisma schema `@default(PRO)` → `@default(NO_TIER)` on the
`User.subscriptionTier` column
- Adds a DDL-only migration (`ALTER COLUMN SET DEFAULT 'NO_TIER'`) that
touches zero rows and acquires only a microsecond metadata lock
- Updates two Python-side fallback defaults in `model.py` from `BASIC` →
`NO_TIER` to stay consistent with the DB
- The existing beta fallback in `rate_limit.py` already handles
`NO_TIER` users gracefully (maps to `BASIC` multiplier when
`ENABLE_PLATFORM_PAYMENT` flag is off)

### Changes 🏗️

- **`schema.prisma`**: Removed stale 7-line beta comment block, changed
`@default(PRO)` → `@default(NO_TIER)` on `subscriptionTier`
-
**`migrations/20260501172500_default_new_users_to_no_tier/migration.sql`**:
New DDL-only migration — `ALTER TABLE "User" ALTER COLUMN
"subscriptionTier" SET DEFAULT 'NO_TIER'`
- **`backend/data/model.py`**: Updated two fallback defaults from
`SubscriptionTier.BASIC` → `SubscriptionTier.NO_TIER` (field default on
line 75, `from_db()` fallback on line 151)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] Signed up a new user (`horses@ntindle.com`) on local dev stack and
confirmed `subscriptionTier = NO_TIER` in the database
- [x] Verified existing users retain their current tier (no row-level
changes in migration)
- [x] Confirmed rate_limit.py beta fallback maps NO_TIER → BASIC
multiplier when payment flag is off
  - [ ] CI pipeline validation (backend tests, lint, type checks)

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No configuration changes required — this is a schema default + Python
fallback change only.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes the default `User.subscriptionTier` at both DB and application
layers, which can affect entitlement/rate-limit behavior for all newly
created accounts. Low implementation complexity, but it touches
subscription gating assumptions and signup flows.
> 
> **Overview**
> Newly created users now default to `SubscriptionTier.NO_TIER` instead
of a paid tier.
> 
> This updates the Prisma schema default for `User.subscriptionTier`,
adds a DDL-only migration to set the Postgres column default to
`'NO_TIER'` without modifying existing rows, and aligns Python-side
fallbacks in `backend/data/model.py` (field default + `from_db()`
fallback) to `NO_TIER`.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
d3764e8. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rd (Significant-Gravitas#12982)

## Why

The "Compare plans" link on the billing settings page (next to "Your
plan" in the SubscriptionTab > YourPlanCard) was redundant — users
already see their current plan and can upgrade/downgrade directly from
the same card. The external link to `agpt.co/pricing` added noise
without adding value.

## What

- Removed the "Compare plans" anchor link (and accompanying
external-link icon) from the `Your plan` header inside `YourPlanCard`.
- Removed the now-unused `PRICING_PAGE_URL` constant.
- Kept the `ArrowSquareOutIcon` import — still used by the "Talk to
sales" upgrade button.

## How

Single-file change in
`autogpt_platform/frontend/src/app/(platform)/settings/billing/components/SubscriptionTab/YourPlanCard/YourPlanCard.tsx`:

- Dropped the `<a>` element that wrapped "Compare plans" + the external
icon.
- The header `<div>` now contains only the "Your plan" label.

## Test plan

- [ ] Navigate to `/settings/billing` and confirm the "Compare plans"
link no longer renders next to "Your plan".
- [ ] Verify the rest of `YourPlanCard` (plan label, badge,
manage/upgrade/downgrade buttons) still renders correctly for free,
paid, pending-cancel, and pending-downgrade states.
- [ ] `pnpm types` / `pnpm lint` clean.
…ore, auto-refill defaults (Significant-Gravitas#12984)

### Why / What / How

**Why:** A round of QA on the new Settings v2 surfaced several small but
visible polish issues — input focus rings being clipped on the right
edge of every dialog, the link rows in the Profile page flashing on
save, the Billing page bouncing the user back to the Subscription tab
after a Stripe topup, the Auto-refill dialog defaulting to an invalid
empty state that contradicted its own "$5 minimum" copy, and the
Integrations header recommending Figma even though it isn't a
connectable provider.

**What:** Frontend-only fix pass that addresses each of those issues at
the component level (and the dialog issue at the shared `DialogWrap`
level so every dialog benefits).

**How:**

- **Profile save flash** — `useProfilePage` previously rebuilt the form
state from scratch every time `useGetV2GetUserProfile` data changed.
After save, the refetch generated brand-new `LinkRow` IDs via
`makeLinkRow()`, which `<AnimatePresence>` keyed off of, causing every
row to exit + re-enter (the "flash"). Now the post-fetch sync skips when
the incoming profile is content-equivalent to current form state,
preserving link identity and silencing the animation.
- **Dialog focus-ring clipping** — Inputs use `ring-1` (a 1 px
box-shadow rendered outside the border). The dialog's scroll container
had `overflow-x-hidden` flush against the input edge, chopping off the
right side of every focus oval. Added `px-2` to both the dialog title
row and the inner scroll container in `DialogWrap`, giving 8 px of
horizontal breathing room across all dialogs.
- **ConnectServiceDialog inner clip** — That dialog has its own
`overflow-hidden` wrapper (used for the height/slide animation between
list and detail views), which clipped focus rings before the new
`DialogWrap` padding could help. Added matching `px-2` there.
- **Billing tab restore after Stripe** — `<TabsLine>` was uncontrolled
with `defaultValue="subscription"`, so a Stripe topup redirect
(`?topup=success|cancel`) always landed users on the Subscription tab
even though the topup flow originates from Automation Credits. Made the
tab controlled with state initialized from URL params: explicit `?tab=`
wins, else `?topup` implies Automation Credits, else fall back to
Subscription. Adds deep-link support as a side benefit.
- **Auto-refill defaults + banner copy** — The dialog opened with empty
fields (placeholder "min $5") but the form was invalid by default and
the existing yellow banner ("Auto-refill triggers at most once per agent
execution.") didn't tell users what to do about it. Pre-fill threshold
and refill amount with `"5"` so the form is valid by default and matches
the empty-state copy; rewrote the banner to: "A single agent run can
only trigger one auto-refill. Set a refill amount that covers your
typical usage so agents don't pause mid-run."
- **Integrations copy** — Replaced the Figma example in
`IntegrationsHeader` with Notion (verified Notion is a real provider in
`backend/integrations/providers.py`) so the copy doesn't reference an
integration that isn't actually offered in Connect Service.

### Changes 🏗️

- `frontend/src/app/(platform)/settings/profile/useProfilePage.ts` —
guard form re-sync on equivalent refetch.
- `frontend/src/components/molecules/Dialog/components/DialogWrap.tsx` —
`px-2` on title row and scroll container.
-
`frontend/src/app/(platform)/settings/integrations/components/ConnectServiceDialog/ConnectServiceDialog.tsx`
— `px-2` on inner animation wrapper.
- `frontend/src/app/(platform)/settings/billing/page.tsx` — controlled
tab state with URL-aware initial value.
-
`frontend/src/app/(platform)/settings/billing/components/AutomationCreditsTab/AutoRefillCard/useAutoRefillCard.ts`
— default to `"5"` when no saved config.
-
`frontend/src/app/(platform)/settings/billing/components/AutomationCreditsTab/AutoRefillCard/AutoRefillDialog.tsx`
— clearer warning banner, top-aligned icon for multi-line copy.
-
`frontend/src/app/(platform)/settings/integrations/components/IntegrationsHeader/IntegrationsHeader.tsx`
— "Figma for designs" → "Notion for documents".

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open `/settings/profile`, edit a link, click Save → no flash on
link rows
- [x] Open Update email dialog, focus the email input → purple focus
oval is fully closed on the right
- [x] Open Connect a service dialog, focus the search input → purple
focus oval is fully closed
- [x] Open Add credits dialog and Auto-refill dialog → focus rings
render fully
- [x] Add credits via Stripe, get redirected back → land on the
Automation Credits tab (not Subscription)
- [x] Open Auto-refill dialog with no prior config → fields show $5 /
$5, Save button is enabled
- [x] Auto-refill dialog yellow banner reads as a clear, actionable
instruction
- [x] Visit `/settings/integrations` → header copy says "Notion for
documents", not "Figma for designs"
…ificant-Gravitas#12985)

### Why / What / How

**Why:** [SECRT-2314](https://linear.app/autogpt/issue/SECRT-2314). When
a user hits their daily AutoPilot limit, the popup currently shows a
single "Go to billing" CTA regardless of their plan. For users on the
highest self-serve tier (MAX) — and the contact-sales tiers above it —
there's no higher plan they can upgrade to from the billing page, so
routing them there is a dead end. Reported by John in #breakage on
2026-04-27 as a follow-up to SECRT-2294.

**What:** Branch the daily-limit popup CTA on the user's current
subscription tier:
- `NO_TIER` / `BASIC` / `PRO` (or unknown) → **"Upgrade plan"** → routes
to `/settings/billing` (drives conversion).
- `MAX` / `BUSINESS` / `ENTERPRISE` → **"Contact us"** → opens
`mailto:contact@agpt.co` in a new tab (no higher self-serve plan).

**How:**
- `RateLimitResetDialog` gains an optional `tier?: SubscriptionTier |
null` prop and decides label + click handler from a small `TOP_TIERS`
set. The body copy adapts in the same branch ("upgrade your plan" vs
"contact us if you need more capacity").
- `RateLimitGate` fetches `useGetSubscriptionStatus` (same hook
`useYourPlanCard` uses) gated on `rateLimitMessage` being present, and
forwards `tier`. The query is disabled when no rate-limit dialog is
needed, so this adds zero extra requests on the happy path.
- `mailto:contact@agpt.co` is the same address used by
`WaitlistErrorContent` — kept the convention rather than introducing a
new constant.

### Changes 🏗️

- `RateLimitResetDialog.tsx` — accept `tier`, branch CTA
label/handler/body trailer between Upgrade plan and Contact us.
- `RateLimitGate.tsx` — pull subscription status (gated on
`rateLimitMessage`), forward `tier` to the dialog.
- Tests:
- `RateLimitResetDialog.test.tsx` — parameterised over
`NO_TIER`/`BASIC`/`PRO` (Upgrade plan + router push) and
`MAX`/`BUSINESS`/`ENTERPRISE` (Contact us + `window.open`).
- `RateLimitGate.test.tsx` — added `useGetSubscriptionStatus` mock;
verified tier forwarding (MAX → "MAX", null → null) and that the
subscription query is also disabled when no rate-limit message is
present.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  - [x] `pnpm format && pnpm lint && pnpm types` clean
  - [x] `pnpm test:unit` against `RateLimitResetDialog/` — 26/26 passing
- [ ] Manual: hit daily limit on dev as a `PRO` user → popup shows
**"Upgrade plan"**, click routes to `/settings/billing`
- [ ] Manual: hit daily limit on dev as a `MAX` user → popup shows
**"Contact us"**, click opens `mailto:contact@agpt.co` in a new tab
- [ ] Manual: hit daily limit on dev as a `NO_TIER` user → popup shows
**"Upgrade plan"**, routes to `/settings/billing`
- [ ] Manual: subscription-status request fails / returns no tier →
popup falls back to **"Upgrade plan"** (safe default)

cc @lluis-agusti — flagging for review per request. Draft until manual
QA on dev is done.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prevent USD-cap bypass (Significant-Gravitas#12990)

## Summary

`backend/copilot/rate_limit.py:check_rate_limit` was failing **open**
when Redis was unreachable: on `RedisError` / `ConnectionError` /
`OSError` it logged a warning and returned silently, letting the request
through unrate-limited.

During the GKE auto-upgrade rolling stateful pool incident captured in
the postmortem
[Significant-Gravitas/AutoGPT_cloud_infrastructure#318](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/318),
this means a user (or a script-driven user) on the affected slot can
blast the LLM loop unmetered for the duration of the brown-out,
bypassing the per-user daily/weekly USD caps.

### Dollar-risk surface

* Brown-out window: ~5 min (one stateful-set rolling step) — sometimes
longer if a node goes Pending.
* Worst case per affected user: a bot driving the chat at provider-bound
throughput easily hits **$50–$500** of LLM spend in 5 minutes on a
frontier model. Multiply by the number of users on the affected slot
during the brown-out.
* The user has already paid up-front via the credit wallet for **block**
spend, but the microdollar cap on the **copilot LLM turn itself** is
operator-side infrastructure cost that we cannot recover after the fact.

### Fix

`check_rate_limit` now raises a new exception `RateLimitUnavailable` on
Redis errors. The chat route handler maps that to HTTP 503 with
`Retry-After: 30`. 503 is the right code (transient infra outage, retry
shortly) rather than 429 ("you hit your limit"); they're different UX.

### Fail-open paths kept fail-open (deliberately)

Only the **enforcement** path is fail-closed. The other Redis call sites
in `rate_limit.py` are observability or best-effort and would create
noisier failures if they 503'd:

| Function | Behaviour | Why kept fail-open |
| --- | --- | --- |
| `get_usage_status` | returns zeros | Read-only gauge for the usage
banner; returning zeros during a brown-out is fine. |
| `record_cost_usage` | logs and returns | Losing one cost increment is
preferable to 500-ing the whole turn *after* generation. The next turn
re-checks the cap. |
| `release_reset_lock` | swallows | TTL-bounded — lock will expire on
its own. |
| `increment_daily_reset_count` | logs and returns | Informational
counter, not the authoritative daily reset state. |

### Fail-closed paths preserved

These already failed closed and are **unchanged**:

| Function | Behaviour |
| --- | --- |
| `check_rate_limit` | **NEW**: raises `RateLimitUnavailable` |
| `acquire_reset_lock` | returns `False` (reset is rejected — cannot
serialise without lock) |
| `get_daily_reset_count` | returns `None` so callers refuse the billed
reset |
| `reset_daily_usage` | returns `False` |
| `reset_user_usage` | re-raises (admin resets cannot silently no-op) |

### Test coverage

*
`rate_limit_test.py::TestCheckRateLimit::test_raises_unavailable_when_redis_connection_error`
*
`rate_limit_test.py::TestCheckRateLimit::test_raises_unavailable_when_redis_redis_error`
*
`rate_limit_test.py::TestCheckRateLimit::test_raises_unavailable_when_os_error`
*
`rate_limit_test.py::TestCheckRateLimit::test_skips_redis_and_does_not_raise_unavailable_when_unlimited`
— confirms unlimited users (limit=0) don't 503 on a brown-out.
*
`routes_test.py::test_stream_chat_returns_503_with_retry_after_when_rate_limit_unavailable`
— confirms route handler maps `RateLimitUnavailable` to 503 +
`Retry-After: 30` header.

The previously-passing test `test_allows_when_redis_unavailable` (which
asserted fail-open) is replaced with the new fail-closed assertions —
that was the bug.

## Test plan

- [x] `poetry run ruff format` clean on touched files
- [x] `poetry run ruff check` clean on touched files
- [x] `poetry run pytest backend/copilot/rate_limit_test.py
backend/api/features/chat -q` green
- [ ] Manual: trigger Redis brown-out in dev, confirm chat route returns
503 with `Retry-After: 30` instead of allowing the turn through
…gnificant-Gravitas#12994)

### Why / What / How

The Discord bot can currently send the unlinked-server setup prompt from
unrelated thread messages. That is noisy in servers where the bot is
installed but setup has not been completed yet.

This PR keeps the intended server behavior: regular channel messages are
only handled after an explicit bot mention, and bot-created threads
continue working after they are subscribed. Thread messages that do not
belong to a bot-created/subscribed thread are ignored silently.

The handler now checks the thread subscription gate before checking
server link status. Once a thread passes that gate, target resolution
can reuse the thread directly.

### Changes

- Ignore unsubscribed Discord thread messages before resolving server
link status.
- Keep the /setup prompt behavior for explicit bot mentions in regular
server channels.
- Update bot handler tests to cover the thread subscription gate.
- No configuration changes.

### Checklist

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] poetry run python -m py_compile backend/copilot/bot/handler.py
backend/copilot/bot/handler_test.py
  - [x] git diff --check
- [ ] Deploy to dev and verify unrelated server/thread messages do not
receive the setup prompt

#### For configuration changes:

- [x] .env.default is updated or already compatible with my changes
- [x] docker-compose.yml is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under Changes)
…age (Significant-Gravitas#12999)

### Why / What / How

**Why:** The existing warning text was written from an implementation
perspective. User feedback requested clearer, user-benefit-oriented copy
that helps users budget appropriately.

**What:** Updated the warning text inside the auto-refill dialog on
`/settings/billing` to use plain language that explains the safety
mechanism and its budgeting implications.

**How:** Single string change in `AutoRefillDialog.tsx`.

### Changes 🏗️

- Updated warning copy in `AutoRefillCard/AutoRefillDialog.tsx` from:
> "A single agent run can only trigger one auto-refill. Set a refill
amount that covers your typical usage so agents don't pause mid-run."

  To:
> "As a safety mechanism, auto-refill will only trigger once per task.
Keep this in mind when budgeting to ensure your balance does not hit
zero and your tasks don't pause mid-run."

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Navigate to `/settings/billing`, open the auto-refill dialog,
verify new warning text is displayed correctly

---------

Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
…ravitas#12995)

### Why / What / How

The bot currently creates its own Discord threads after a server-channel
mention, but it cannot be explicitly invited into an existing thread. If
someone mentions the bot inside a thread that the bot did not create,
the handler has no way to distinguish that from ambient thread chatter,
and it has no prior thread context to send to AutoPilot.

This PR adds explicit thread adoption. The Discord adapter records
whether the bot was mentioned and, when mentioned inside a thread,
fetches recent user messages from that thread. The handler uses that
mention signal to subscribe the existing thread after link validation,
then includes the recent thread context in the first AutoPilot message.

This is stacked on top of the unlinked-thread spam fix so
unrelated/unsubscribed thread messages still stay silent unless the bot
is explicitly mentioned.

### Changes

- Add `bot_mentioned` and `thread_history` to `MessageContext`.
- Fetch recent Discord thread history when the bot is mentioned inside
an existing thread.
- Subscribe an existing thread when the bot is explicitly mentioned
there and the server is linked.
- Include recent thread context in the AutoPilot prompt for the adoption
message.
- Add handler and Discord adapter tests for thread adoption and history
formatting.
- No configuration changes.

### Checklist

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] poetry run python -m py_compile
backend/copilot/bot/adapters/base.py
backend/copilot/bot/adapters/discord/adapter.py
backend/copilot/bot/handler.py backend/copilot/bot/handler_test.py
backend/copilot/bot/adapters/discord/adapter_test.py
  - [x] git diff --check
- [ ] Deploy to dev and verify mentioning the bot inside an existing
linked-server thread adopts the thread and replies with prior thread
context

#### For configuration changes:

- [x] .env.default is updated or already compatible with my changes
- [x] docker-compose.yml is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under Changes)
…hint for baseline (Significant-Gravitas#13002)

## Why

The autopilot SDK already carries a per-query `max_budget_usd` ceiling
that the CLI uses to nudge the model when it's close to the cap (see
`claude_agent_max_budget_usd: 10.0` in `config.py` — that's the "$10
session budget" you see in the UI). Two gaps in the current setup:

1. **The cap is static.** A user with $1.50 of daily USD headroom left
still gets `max_budget_usd=10.0`, so the in-CLI "wrap up" reminder never
fires until *after* they've blown the real cap (the post-turn Redis
recorder catches it then, which is too late for the model to pace
itself).
2. **Baseline has no equivalent.** The OpenRouter-direct path streams
completions and accumulates `cost_usd` post-turn, but the model never
sees its own running cost or remaining USD headroom mid-stream. So
baseline turns burn through to the limit blindly.

Tracked via the autopilot dev testing thread:
https://discord.com/channels/1126875755960336515/1499923303609925793/

## What

- **SDK**: per-query `max_budget_usd` now resolves dynamically to
`min(static_cap, remaining_daily_or_weekly_usd)`, floored at `$0.50` so
a near-cap user still dispatches.
- **Baseline**: parity via a small `<budget_context>` block injected
through `inject_user_context`'s existing `env_ctx` param, carrying the
same remaining-USD figure.
- Both fed by a single new helper `get_remaining_usd_budget(user_id,
daily, weekly)` in `rate_limit.py` so the source of truth stays one
place.

Note that "balance" here is the **remaining daily/weekly USD spend cap**
(the real money we infra-budget per user) — not the credit wallet. The
two budgets are separate by design (see the existing module docstring on
`rate_limit.py`); credit balance is a future unification.

## How

`backend/copilot/rate_limit.py`
- `get_remaining_usd_budget(...)`: returns the smaller of `(daily_limit
- daily_used)` and `(weekly_limit - weekly_used)` in USD. `inf` when
both caps are 0 (unlimited). Floored on Redis brown-out so observability
paths don't pretend the user has unlimited budget.
- `build_budget_env_ctx(...)`: thin wrapper that formats the result as a
`<budget_context>` block; returns `""` for unlimited / no-user-id (skip
injection).

`backend/copilot/sdk/service.py`
- New module-level `_resolve_dynamic_max_budget_usd(user_id)` reads the
user's tier limits via `get_global_rate_limits` and clamps
`claude_agent_max_budget_usd` to `[_MAX_BUDGET_USD_FLOOR,
remaining_usd]`.
- Wired into `ClaudeAgentOptions` construction (replaces the bare
`config.claude_agent_max_budget_usd`).

`backend/copilot/baseline/service.py`
- On the first user message of a turn, fetches `daily/weekly` via
`get_global_rate_limits`, builds the env_ctx block, passes it through
`inject_user_context(env_ctx=...)`. SDK does NOT do this — its CLI
already has a richer running-cost mechanism, so adding a one-shot
env_ctx hint there would just be noise.

## Test plan

- [x] `poetry run pytest
backend/copilot/rate_limit_test.py::TestGetRemainingUsdBudget
backend/copilot/rate_limit_test.py::TestBuildBudgetEnvCtx
backend/copilot/sdk/service_test.py::TestResolveDynamicMaxBudgetUsd` —
14 pass
- [x] `poetry run black` / `poetry run isort` / `poetry run ruff check`
on changed files — clean
- [ ] Manual: chat session at 90% of daily cap → SDK CLI surfaces "wrap
up" reminder ~$0.50 of spend later, not $10 later
- [ ] Manual: baseline chat with `<budget_context>` injected — verify
model is more conservative on tool depth

## Related

- Builds on the per-query `max_budget_usd` mechanism shipped earlier (P0
guardrail).
- Independent of Significant-Gravitas#12992 (re-prompt fix); both can ship in parallel.
…rollback (Significant-Gravitas#13003)

## Why

Paid→paid upgrades (the canonical case is **Pro→Max**) called
`stripe.Subscription.modify_async` with
`proration_behavior="create_prorations"`. That writes prorated line
items to the *next* invoice instead of billing the user now, so the
upgrade goes through "for free" and the user is then surprise-billed
alongside next month's full \$320 charge at cycle end. Worse, the DB
tier flip already lands before payment is collected, so if the user's
card later declines they're stuck on MAX with no charge captured.

Linear: SECRT-2315.

## What

- `modify_stripe_subscription_for_tier` upgrade branch
(`backend/data/credit.py`) now calls Stripe with:
- `proration_behavior="always_invoice"` — Stripe creates and pays the
prorated invoice synchronously instead of deferring it.
- `payment_behavior="error_if_incomplete"` — Stripe raises
`stripe.CardError` (or `InvalidRequestError` when there's no default
payment method) if the auto-charge fails, so the modify is rolled back
and we never flip the DB tier.
- `update_subscription_tier` (`backend/api/features/v1.py`) gets a new
`except stripe.CardError` branch returning **HTTP 402** with `"Your card
was declined. The plan was not changed; please update your payment
method and try again."`. Placed before the existing
`InvalidRequestError`/`StripeError` catches so the user-facing 402 wins
for declined-card failures.

The DB tier flip already runs *after* `stripe.Subscription.modify_async`
— Stripe error short-circuits before `set_subscription_tier`, so failed
payment ⇒ user stays on Pro.

## How (tests)

`backend/data/credit_subscription_test.py`:
- New
`test_modify_stripe_subscription_for_tier_pro_to_max_bills_immediately`
— Pro→Max calls Stripe with `always_invoice` + `error_if_incomplete` and
the DB tier flips.
- New
`test_modify_stripe_subscription_for_tier_pro_to_max_card_decline_does_not_flip_tier`
— Stripe raises `CardError` ⇒ function propagates the error and
`set_subscription_tier` is never awaited.
- Updated upgrade-path assertions in 4 existing tests
(`modifies_existing_sub`, `clears_cancel_at_period_end_on_upgrade`,
`upgrade_immediate_proration`, `upgrade_releases_pending_schedule`) to
expect the new kwargs.

`backend/api/features/subscription_routes_test.py`:
- New
`test_update_subscription_tier_pro_to_max_card_declined_returns_402` —
POST `/credits/subscription` with `tier=MAX` where Stripe modify raises
`CardError` returns HTTP 402 and `set_subscription_tier` is never
awaited.

## Local verification

- `poetry run ruff format` + `poetry run ruff check` on touched files:
clean.
- `pytest backend/data/credit_subscription_test.py -k
"modify_stripe_subscription_for_tier or upgrade or downgrade or
pro_to_max or schedule"`: **27 passed**.
- `pytest backend/api/features/subscription_routes_test.py -k
"card_declined or paid_to_paid or max_checkout"`: 4 of 5 pass when run
individually; the lone fail is a known pytest-asyncio event-loop scoping
flake when `paid_to_paid_modifies_subscription` happens to run first in
a fresh session — it passes when run alone or with my new test ordered
before it. Unrelated to this change.

## Risk

- Customers without a default payment method on file (rare for paid
subs) will now see a 402 instead of a silent deferred-charge upgrade.
That's the correct, intended behaviour: users must have a working
payment method to upgrade.
- Webhook idempotency unchanged — the existing
`customer.subscription.updated` handler still reconciles after a
successful modify.
…e-limit through DB-manager (Significant-Gravitas#12992)

## Why

Two production fixes surfaced from John Ababseh's dev testing on
2026-05-01 (Discord thread `1499923303609925793`):

- **Issue #5** — chat session `c93dc51f-bb38-4427-975a-6dc033358689`
finished after multiple minutes of work and showed only `(Done — no
further commentary.)` Langfuse trace `7d1a674eb7c84ffb5a4b34875306eea9`
shows the model wrote the entire restaurant-list answer **inside an
extended-thinking `ThinkingBlock`** (931 completion tokens, $0.50 spend)
and ended the turn with empty `content: []`. Our existing thinking-only
guard immediately stamped the placeholder, so the user never saw the
actual answer the model already generated.
- **Issue #2** — every image-generation request
(`AIImageCustomizerBlock` / `AIImageGeneratorBlock`) on dev failed with
`prisma.errors.ClientNotConnectedError: Client is not connected to the
query engine`. Regression from Significant-Gravitas#12780 (tier-based workspace file storage
limits): the new pre-write quota check at `util/workspace.py:225` called
`get_workspace_total_size` directly from `backend.data.workspace`, which
is a Prisma read. The copilot-executor process doesn't connect Prisma —
it RPCs into `database-manager` for everything else — so every
`manager.write_file()` from a tool blew up.

## What

- **Issue 5** — layered fallback for thinking-only final turns:
1. Adapter sets `pending_thinking_only_reprompt` and defers
placeholder/StreamFinish.
2. Driver re-enters the SDK loop and fires one synthetic
`client.query("Please write a brief user-facing summary of what you
found...")`.
3. If the re-prompt also returns thinking-only, promote the most recent
`ThinkingBlock` content to a visible `TextDelta`.
4. Only when thinking is also empty, emit the original `(Done — no
further commentary.)` placeholder.
Bounded to **one** re-prompt per turn so the worst case is ~one extra
LLM call.

- **Issue 2** — route the storage-limit pre-check through the existing
`workspace_db()` accessor and expose `get_workspace_total_size` on
`DatabaseManager` so the copilot-executor RPCs into database-manager
(where Prisma is connected), the same path other workspace queries on
this codepath use.

## How

`backend/copilot/sdk/response_adapter.py`
- New `pending_thinking_only_reprompt`, `thinking_only_reprompted`,
`_last_thinking_content` fields on `SDKResponseAdapter`.
- Capture latest `block.thinking` when streaming reasoning so the
second-tier promote-fallback has content.
- ResultMessage thinking-only branch — first hit defers; second hit
prefers `_last_thinking_content`, falls back to placeholder.

`backend/copilot/sdk/service.py`
- Wrap the `async for sdk_msg in _iter_sdk_messages(client):` block in a
`while True:` retry loop. After the inner loop ends, check
`pending_thinking_only_reprompt` — if set and not yet retried, fire
`client.query(_THINKING_ONLY_REPROMPT, ...)` and re-enter; else break.
Most of the diff is +4-space indentation churn.
- Module-level `_THINKING_ONLY_REPROMPT` constant for the re-prompt
copy.

`backend/data/db_manager.py`
- Import `get_workspace_total_size` and expose it via `_(...)` so it
becomes an RPC on `DatabaseManager` and the corresponding async client.

`backend/util/workspace.py`
- Drop the direct `get_workspace_total_size` import; call
`workspace_db().get_workspace_total_size(self.workspace_id)` instead.

`backend/util/workspace_test.py`,
`backend/copilot/sdk/response_adapter_test.py`
- Existing thinking-only test split into three: defer-on-first-pass,
promote-thinking-on-second-pass,
fallback-to-placeholder-when-no-thinking.
- Updated `test_flush_unresolved_at_result_message` to expect deferral
instead of immediate placeholder.
- New
`test_write_file_storage_check_routes_through_workspace_db_accessor`
proving the storage-limit pre-check goes through the accessor (would
have caught Issue 2).

## Test plan

- [x] `poetry run pytest backend/copilot/sdk/response_adapter_test.py
backend/util/workspace_test.py` — 67 pass
- [x] `poetry run ruff check` on changed files — clean
- [x] `poetry run black` / `poetry run isort` on changed files — clean
- [x] `/pr-test --fix` against dev preview to exercise the re-prompt +
image-write paths end-to-end
- [x] `/pr-polish` until merge-ready

## Related

- Regression introduced by Significant-Gravitas#12780 (tier-based workspace file storage
limits)
…icant-Gravitas#12996)

### Changes
Removes the country / currency selector button from the onboarding
Subscription step (`/onboarding?step=4`).

- Removed the `<CountrySelector>` render and its wrapper div from
`SubscriptionStep.tsx`
- Removed the unused `countryIdx` / `setCountryIdx` from the
`useSubscriptionStep` return shape and the now-orphaned `setCountryIdx`
helper
- Removed the `setSelectedCountryCode` selector inside the hook (store
API kept intact)
- Deleted `CountrySelector/CountrySelector.tsx` (no remaining consumers)
- Removed the "changing country persists the country code" integration
test

The `country` value derived from `selectedCountryCode` (defaulting to
`US`) is still passed to `<PlanCard>` for pricing display — only the UI
affordance for changing it is removed.

### Checklist
- [x] Removed UI element as instructed via page feedback
- [x] Cleaned up unused imports, hook returns, and stale test
- [x] No backend/API changes

Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com>
…gnificant-Gravitas#13004)

## Why

The onboarding paywall already renders a Monthly/Yearly toggle, but the
toggle is purely cosmetic — the backend always charges the monthly
Stripe price. This PR wires `billing_cycle` end-to-end so the toggle
actually drives Stripe price-ID selection, plus a number of related
billing-UX bugs surfaced during /pr-test (silent Pro→Max upgrade,
missing tier downgrade dialog, yearly→monthly behaving immediately
instead of deferred, etc.) — see the in-PR comments for the full list.

Linear: [SECRT-2317](https://linear.app/agpt/issue/SECRT-2317),
[SECRT-2306](https://linear.app/agpt/issue/SECRT-2306). Replaces closed
PR Significant-Gravitas#12998 with a clean rewrite.

## What

**Backend yearly support** (`backend/data/credit.py`,
`backend/api/features/v1.py`):

- `get_subscription_price_id(tier, billing_cycle="monthly")` reads the
`copilot-tier-stripe-prices` LD flag using **additive flat suffix keys**
— monthly stays at `<TIER>` (existing key, e.g. `"PRO"`), yearly lives
at `<TIER>_YEARLY` (e.g. `"PRO_YEARLY"`). This shape is
deploy-order-safe: adding the yearly key in LD never changes what the
old code reads from `<TIER>`, so the LD edit and the code deploy can
land in either order. Yearly request for a tier without a configured
`_YEARLY` key returns `None` (fail-closed; we never silently bill
monthly when the caller asked for yearly).
- `create_subscription_checkout` and
`modify_stripe_subscription_for_tier` accept `billing_cycle` and forward
it; the Checkout Session metadata carries `billing_cycle` for
observability and the modify path refreshes sub metadata so the Stripe
Dashboard reflects the live tier+cycle.
- `sync_subscription_from_stripe` gathers monthly + yearly prices for
every priceable tier so a user on a yearly Pro plan still maps back to
`SubscriptionTier.PRO`. Same dual-cycle map is used by
`get_pending_subscription_change` so scheduled cycle-only changes
resolve to the correct tier.
- `SubscriptionTierRequest` gains `billing_cycle: Literal["monthly",
"yearly"] = "monthly"` (default preserves back-compat with the settings
billing tab where no cycle was sent). `SubscriptionStatusResponse`
exposes `billing_cycle`, `tier_costs_yearly`, and
`pending_billing_cycle` so the UI can render the right labels and copy.
- `update_subscription_tier` route: forwards `billing_cycle` to all
helpers; same-tier-cycle-downgrade (yearly→monthly) routes through
`_schedule_downgrade_at_period_end` (no immediate proration);
same-tier-cycle-upgrade (monthly→yearly) and tier upgrades stay on the
immediate `proration_behavior=always_invoice` path. Same-tier
short-circuit (release-pending-schedule) gates on the user having an
active Stripe sub so admin-granted users can still pay.

**Backend webhook robustness:**

- Handle the new Stripe API event types (`invoice_payment.paid` /
`invoice_payment.payment_failed`) by hydrating the underlying Invoice
and delegating to the legacy handlers; `_invoice_subscription_id` reads
`parent.subscription_details.subscription` first with a fallback to the
legacy `invoice.subscription` field. Without this, accounts on the new
Stripe API would flip the tier on `customer.subscription.created` but
never grant credits because `invoice.payment_succeeded` was no longer
being emitted.
- `update_subscription_tier` maps `stripe.CardError` (including
`code="authentication_required"` and the
`subscription_payment_intent_requires_action` SCA variant) to a clear
HTTP 402 with appropriate copy; `stripe.InvalidRequestError` for missing
payment method (`code in {resource_missing, missing}`, `param in
{default_payment_method, payment_method,
invoice_settings.default_payment_method}`) also maps to 402. Substring
matching on the error message is kept as a defensive fallback because
Stripe documents `e.param` as nullable.
- `tier_multipliers` lookup uses `t.value` (string) to match the
`dict[str, float]` keying that `get_tier_multipliers()` documents in its
return type — the prior enum lookup silently fell back to `1.0` for
every tier.

**Frontend** (onboarding paywall, Settings → Billing, PaywallGate):

- `PlanCard` lifted from
`(no-navbar)/onboarding/.../components/PlanCard/` into
`src/components/molecules/PlanCard/` so onboarding, Settings → Billing,
and PaywallGate share a single implementation. Plan list is driven by
`tier_costs` / `tier_costs_yearly` from the API response — adding a new
tier in LD makes it appear on every surface without code changes.
- `PaywallGate` modal renders the shared `PlanCard` grid + the same
Monthly/Yearly toggle pattern. Team (BUSINESS) opens the contact-sales
intake form rather than POSTing to `/credits/subscription`.
- `Settings → Billing` (`YourPlanCard`):
- New `CycleToggle` + `SwitchCycleDialog`: yearly→monthly dialog
promises end-of-period deferral (now actually deferred backend-side);
monthly→yearly dialog promises immediate prorated charge.
- New `SwitchTierDialog` (reused for both upgrade and downgrade):
paid→paid upgrades surface the prorated immediate-charge copy before
firing the mutation; paid→paid downgrades surface the
keep-until-period-end copy before scheduling. Downgrade also gates on
`serverCycle` so a yearly subscriber stays on yearly through the rest of
their period.
- Cycle toggle is hidden for `ENTERPRISE` (admin-managed), `BASIC`
(reserved internal slot), `NO_TIER` (no active sub to switch), and
`null` (loading).
- PaywallGate's Upgrade flow gates on `has_active_stripe_subscription` —
admin-overridden NO_TIER users with an active Stripe sub get the same
SwitchTierDialog (so the modify-in-place isn't silent); genuine fresh
NO_TIER users go straight to Stripe Checkout (the Checkout page is the
confirmation).
- `useSubscriptionStep` (onboarding) sends `billing_cycle: isYearly ?
"yearly" : "monthly"` to `useUpdateSubscriptionTier`.
- 422 toast for "yearly billing not yet available" is scoped to
`billing_cycle === "yearly"` requests so monthly-targeted 422s surface
the generic toast instead.

## How (tests)

- **Backend** (touched test files, all passing locally):
`credit_subscription_test.py` (price-id resolution incl. suffix-key
shape, yearly fail-closed, sync mapping for yearly→tier, schedule cycle
for yearly→monthly downgrade, idempotency on replay),
`subscription_routes_test.py` (modify forwarding, 422 fail-closed,
decline / SCA / no-PM 402 mapping, cycle-switch routing, admin-granted
Checkout fallthrough), `rate_limit_test.py` (drift-warning yearly skip),
`v1_test.py`.
- **Frontend** (vitest + RTL + MSW): `PaywallModal.test.tsx` (dynamic
plan rendering from `tier_costs`, cycle toggle, mutation routing, Stripe
redirect, 422 toast, empty fallback, loading skeletons, Team
contact-sales divert, dialog gate for active-sub case),
`billing-cards.test.tsx` (cycle toggle + SwitchCycleDialog +
SwitchTierDialog flows, downgrade preserves yearly, NO_TIER hides
toggle, BASIC/ENTERPRISE hides toggle), `billing-hooks.test.tsx`
(hook-level mutation paths through dialogs).
- /pr-test --fix passed all V1–V9 scenarios end-to-end against a live
Stripe sandbox (initial yearly charge, cycle switches, tier upgrade with
SCA + decline, downgrade defers via Stripe Schedule, webhook
idempotency, etc.). Several Sentry catches (SCA event type, downgrade
cycle, BASIC toggle hide, race in confirmTierDowngrade, scoped 422
yearly toast) addressed in subsequent commits per the PR conversation.

## LaunchDarkly migration (operator runbook)

This PR ships **without** any LD config change required — the suffix-key
shape is purely additive. To enable yearly billing post-merge:

1. Create yearly Stripe Price IDs (per tier, 15% off the annual
equivalent).
2. Edit `copilot-tier-stripe-prices`: keep the existing `"PRO"` /
`"MAX"` keys, **add** `"PRO_YEARLY"` / `"MAX_YEARLY"` with the new
yearly Price IDs. Test mode for dev, live mode for prod.
3. No code redeploy needed — the running backend already reads
`<TIER>_YEARLY`.

Tiers without a `_YEARLY` key keep showing only monthly in the UI (cycle
toggle present but yearly request returns 422 fail-closed → toast
`Yearly billing is not yet available for your plan.`).

## Checklist

- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] New and existing unit tests pass locally with my changes
- [x] Any dependent changes have been merged and published
…vitas#12624)

## Why
Completes the two linking flows from Significant-Gravitas#12615 / Significant-Gravitas#12618. When the bot sends
a user a one-time `{frontend}/link/{token}?platform=...` URL, this page
is where the user actually connects their AutoGPT account — whether
that's claiming a server or linking their personal DMs.

## The flows

**Server link (`linkType: SERVER`)**
1. User runs `/setup` in a Discord guild → bot replies ephemerally with
the link
2. Clicks link → logs in to AutoGPT if needed → lands here
3. Page shows: *"Set up AutoPilot for {ServerName}"* with a clear
billing notice
4. Confirm → `POST /api/platform-linking/tokens/{token}/confirm`
5. Everyone in the server can now mention the bot; all usage bills to
this user

**DM link (`linkType: USER`)**
1. User DMs the bot → bot creates a USER token and posts the link in the
DM
2. Clicks link → logs in → lands here
3. Page shows: *"Link your {Platform} DMs"* with a personal-context
billing notice
4. Confirm → `POST /api/platform-linking/user-tokens/{token}/confirm`
5. DMs now run as that user's own AutoPilot, billed to themselves

**Stacked on:** Significant-Gravitas#12618Significant-Gravitas#12615. Merge those first.

## Implementation
Single file:
`autogpt_platform/frontend/src/app/(no-navbar)/link/[token]/page.tsx`

- State machine: `loading` → `not-authenticated` | `ready` → `linking` →
`success` | `error`
- `fetchTokenInfo(token)` hits `GET
/api/platform-linking/tokens/{token}/info` — no auth needed, returns
`platform`, `server_name`, and `link_type`. The page branches all copy
and the confirm endpoint choice on `link_type`.
- `?platform=DISCORD` query-param fallback so the platform name renders
instantly even before `/info` resolves (removes a UI flash on slower
connections).
- "Signed in as {email}" footer with a one-click "Not you? Sign out"
that logs out and redirects back to `/login?next=/link/{token}` —
handles the common "wrong account" case.
- 30s `AbortController` timeout on confirm. Timeouts surface as a retry
prompt rather than hanging silently.
- Reuses `AuthCard`, `Button`, `Text`, `Link` from the design system;
Phosphor icons only (no emojis); Tailwind only.

## States
| State | What the user sees |
|-------|--------------------|
| Loading | Spinner + "Loading…" |
| Not authenticated | "Sign in to continue" →
`/login?next=/link/{token}` |
| Ready (SERVER) | *"Set up AutoPilot for {ServerName}"* — 4-bullet
explainer + billing callout |
| Ready (USER) | *"Link your {Platform} DMs"* — 3-bullet personal
explainer |
| Linking | Spinner + "Setting up AutoPilot…" |
| Success (SERVER) | CheckCircle + "*{ServerName}* is now connected —
everyone in the server can start using AutoPilot" |
| Success (USER) | CheckCircle + "your *{Platform}* account is now
connected — you can chat with AutoPilot in your DMs" |
| Error | LinkBreak + inline error + "Ask the bot for a new setup link"
|
| Malformed token | Inline error — rejects client-side before any
network call |

## Security
- **Token format validation client-side**: `^[A-Za-z0-9_-]{1,64}$` —
mirrors the backend's Path regex, so a malformed `params.token` never
hits `/api/proxy/...`.
- **All requests go through `/api/proxy`** which handles the Supabase
session cookie server-side — no session token ever touches client-side
fetch headers.
- **Confirm endpoints are JWT-authed** on the backend
(`Security(auth.requires_user)`).
- **Token info endpoint is unauthenticated by design**: 32-byte-entropy
tokens with 30-min TTL are safe to look up for display, and the confirm
step still requires JWT.

## Follow-on wiring
- `AUTOGPT_FRONTEND_URL` on the bot points back here (used in `/unlink`
and the DM auto-link message) — no hardcoded hostnames.
- Backend openapi.json fully regenerated on Significant-Gravitas#12615, typed API types
regenerate via `pnpm generate:api` if you re-run it.

<img width="617" height="613" alt="image"
src="https://github.com/user-attachments/assets/c81c6935-c21d-4f31-9d67-0a4f9f1709bf"
/>

<img width="568" height="637" alt="image"
src="https://github.com/user-attachments/assets/406148b3-a1fc-4fed-8973-22ab1e3f7f43"
/>

<img width="607" height="518" alt="image"
src="https://github.com/user-attachments/assets/de29680f-8a70-45e1-92d3-5759ecbba4c4"
/>
…ostLog (Significant-Gravitas#13009)

## Why

The post-execution **activity-status generator**
([`activity_status_generator.py`](autogpt_platform/backend/backend/executor/activity_status_generator.py))
runs an LLM call (gpt-4o-mini via the platform's
`openai_internal_api_key`) on every completed graph run to produce a 1-3
sentence user-friendly summary + correctness score. It uses
`AIStructuredResponseGeneratorBlock` to issue the call.

**Bug:** the block writes its `provider_cost` into a local
`NodeExecutionStats` via `merge_stats`, but because the block runs
**outside** the executor's `execute_node` loop, neither
[`log_system_credential_cost`](autogpt_platform/backend/backend/executor/cost_tracking.py)
nor `charge_reconciled_usage` ever fires — the cost is dropped on the
floor when the function returns.

This is an **observability gap, not a billing gap**: the platform is
paying OpenAI for activity-status generation on every completed graph
execution, but those calls don't show up in the admin cost dashboard /
per-user attribution.

The [simulator](autogpt_platform/backend/backend/executor/simulator.py)
(same shape: platform-paid LLM call on the user's behalf) already routes
through `persist_and_record_usage` so its spend is attributed correctly.
This PR brings activity-status to parity.

## What

- New `_persist_activity_status_cost(...)` helper that builds a
`PlatformCostEntry` and schedules it via `schedule_platform_cost_log`
(renamed from the previously-private `_schedule_log` — see refactor
commit) so external modules can use it without reaching into private
API.
- Call site placed after `structured_block.run()` succeeds, before
`return activity_response`. Reads `provider_cost`, `input_token_count`,
`output_token_count` off `structured_block.execution_stats`.
- Helper body is wrapped in `try/except Exception` so any cost-log
failure (transient DB / scheduling error) is logged but never strips a
successful activity-status response from the user — cost-logging is
strictly best-effort.
- Early-return guard uses `not cost_usd` so both `provider_cost is None`
and `provider_cost == 0.0` (with zero tokens) short-circuit, avoiding
empty rows that would dilute dashboard averages.
- Distinguishes `cost_usd`-tracked rows (`tracking_type="cost_usd"`)
from tokens-only rows (`tracking_type="tokens"`, `tracking_amount =
input + output`) so admins can still filter by request volume when the
provider doesn't report a USD cost.
- Deliberately **does not** bill the user's wallet (no `spend_credits`)
and **does not** count against the user's copilot rate-limit (no
`record_cost_usage`) — activity-status is platform-side overhead, not
user-triggered. Matches the simulator's stance for dry-run executions.

## How (tests)

`backend/executor/activity_status_generator_test.py`:
- `test_generate_status_persists_platform_cost`: stubs
`execution_stats=NodeExecutionStats(input=120, output=40,
provider_cost=0.0042)`, patches `schedule_platform_cost_log`, and
asserts the entry carries the right `user_id` / `graph_exec_id` /
`graph_id` / `block_name="activity_status_generator"` /
`provider="openai"` / `model="gpt-4o-mini"` / `tracking_type="cost_usd"`
/ `cost_microdollars=4200` /
`metadata.source="activity_status_generator"`.
- `test_generate_status_no_cost_no_log`: zero-cost zero-token case must
skip the log write.
- `test_generate_status_tokens_only_branch`: provider returns no USD
cost but tokens are present (input=200, output=80) — entry is logged
with `tracking_type="tokens"`, `tracking_amount=280.0`,
`cost_microdollars=None`.
- Updated three existing success-path tests to set
`mock_instance.execution_stats = NodeExecutionStats()` so the new
short-circuit has concrete numbers to compare against (was a `MagicMock`
attribute before).

`poetry run pytest backend/executor/activity_status_generator_test.py` —
17 passed locally.
`poetry run pyright` / `poetry run ruff` — clean on touched files.

## Checklist

- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] New and existing unit tests pass locally with my changes
- [x] Any dependent changes have been merged and published
… show 'Go to billing' (Significant-Gravitas#13011)

## Why

[SECRT-2296](https://linear.app/agpt/issue/SECRT-2296) — When a user
hits the daily AutoPilot limit, the CTA briefly flashed the legacy
**"Reset daily limit for $5.00"** copy before switching to **"Go to
billing."** Visible in:

- The usage dropdown in the header
- The in-chat `UsageLimitReachedCard`
- The library briefing panel's usage section

Each remount (thread switch, dropdown reopen) reproduced the flash.

### Root cause

The render branch picked the CTA based on `hasInsufficientCredits =
credits < resetCost`. `useCredits` returns `credits = null` until the
API resolves, so `hasInsufficientCredits` defaulted to **false** on
first paint:

```
isDailyExhausted && !isWeeklyExhausted && resetCost > 0 && !hasInsufficientCredits
  → renders "Reset daily limit for $5.00"   // first paint, credits not loaded
```

Once credits resolved (typically 0 for users on the free tier), the flag
flipped and the CTA swapped to "Go to billing." That's the flash.

PR Significant-Gravitas#12973 already removed this credit-based reset flow from the
rate-limit dialog (`Wait for reset` / `Go to billing`), but the same
legacy code was still living in the dropdown, the in-chat card, and the
briefing panel.

## What

Finish the migration started by PR Significant-Gravitas#12973. Kill the credit-reset path
everywhere; show a single deterministic **"Go to billing"** CTA when the
daily limit is exhausted.

- `UsagePanelContent` — removed `ResetButton`, `useResetRateLimit`,
`hasInsufficientCredits` / `isBillingEnabled` / `onCreditChange` props.
Renders one CTA, gated by `ENABLE_PLATFORM_PAYMENT`, routing to
`/settings/billing`.
- `UsageLimits` (dropdown) and `UsageLimitReachedCard` (in-chat) —
dropped `useCredits` + flag prop drilling.
- `BriefingTabContent` (library) — removed `UsageFooter` (paid reset /
add-credits). Same single CTA, same flag gate.
- Deleted the now-unused `useResetRateLimit` hook.
- Tests updated:
- Assert the legacy `Reset daily limit` button is **never** rendered, in
any state.
  - Assert "Go to billing" is gated only by `ENABLE_PLATFORM_PAYMENT`.

## How

The flash was a render-order race against an async credit fetch.
Removing the conditional path that depends on `credits` removes the race
entirely — the CTA is now a pure function of `usage` and a synchronous
LD flag, both available on the first render that has usage data.

Behavior change worth flagging for review: users who *did* have ≥ $5
credit can no longer click "Reset daily limit" to spend credits and
unblock themselves immediately. They go to billing or wait for daily
reset. This matches PR Significant-Gravitas#12973's stated intent ("Replaced the
credit-based reset flow with **Wait for reset** and **Go to billing**").
If product wants to preserve paid-reset for credit-rich users, that
needs a separate decision — happy to revert that part and instead fix
the flash by gating render on `credits !== null`.

## Test plan

- [x] `pnpm format`, `pnpm lint`, `pnpm types` all clean
- [x] `pnpm vitest run UsagePanelContentRender BriefingTabContent` — 38
passed
- [x] `pnpm vitest run CopilotPage RateLimit` — 91 passed
- [ ] Manual: hit daily limit on dev, open usage dropdown — "Go to
billing" appears immediately, no flash
- [ ] Manual: switch chat threads with limit reached — no flash on
remount
- [ ] Manual: open in-chat `UsageLimitReachedCard` — single "Go to
billing" CTA
- [ ] Manual: library Briefing panel "All" tab with limit reached —
single "Go to billing" CTA
- [ ] Manual: with `ENABLE_PLATFORM_PAYMENT` off, no "Go to billing" /
"Manage billing" rendered
- [ ] Manual: clicking "Go to billing" lands on `/settings/billing`
…Significant-Gravitas#13007)

## Why

A user opening `/library` for the first time (zero agents, zero folders)
saw an entirely blank grid below the header — no guidance, no call to
action, no indication of what to do next. Existing empty-state copy only
existed for the **Favorites** tab; the main "All" tab fell through to an
empty `InfiniteScroll` and rendered nothing.

This is a poor onboarding moment for net-new users and looks like a
broken page for users who deleted all their agents.



https://github.com/user-attachments/assets/7b3b2097-36bb-4fc6-82de-78da30e1f287



## What

Adds a dedicated `LibraryEmptyState` rendered only in the **pristine**
zero-state on the main tab:

- ✅ no agents
- ✅ no folders
- ✅ no active search term
- ✅ not inside a folder
- ✅ `statusFilter === "all"`
- ✅ not the favorites tab (which keeps its existing `HeartIcon` empty
state)

Other empty cases (search-no-results, status-filter-empty, empty
subfolder) keep their current behaviour so the CTAs don't appear in
inappropriate contexts.

## How

**New component** —
[`LibraryEmptyState/LibraryEmptyState.tsx`](autogpt_platform/frontend/src/app/(platform)/library/components/LibraryEmptyState/LibraryEmptyState.tsx)

- **Custom SVG illustration** — three stacked, progressively-wider
rounded "agent cards" (avatar circle + title bar + action squares +
trailing pill). No external assets, just inline SVG.
- **Two CTAs** using the design-system `Button` atom (`size="large"`,
`as="NextLink"`):
  - Primary → `/build` ("Build an agent")
  - Secondary → `/marketplace` ("Browse marketplace")
- **Copy** uses the design-system `Text` atom.

**Animation** — applies Emil Kowalski's animation principles:

- Staggered fade-up entrance for every child (illustration → heading →
body → CTAs)
- The 3 cards inside the illustration also stagger (back-to-front, 80 ms
apart) so the deck visually "deals out"
- Shared `cubic-bezier(0.22, 1, 0.36, 1)` (out-quint) curve, ~350 ms per
element
- Only `transform` + `opacity` animated (GPU-friendly)
- Respects `prefers-reduced-motion` via `useReducedMotion`: collapses to
a single 200 ms opacity fade with no stagger and no translate

**Wired into**
[`LibraryAgentList.tsx`](autogpt_platform/frontend/src/app/(platform)/library/components/LibraryAgentList/LibraryAgentList.tsx)
— added a single `isPristineEmpty` derivation guarding the existing
render branch ladder, just below the favorites empty state.

## Tests

New integration test file —
[`empty-state.test.tsx`](autogpt_platform/frontend/src/app/(platform)/library/__tests__/empty-state.test.tsx)
covering:

- Renders heading + body copy on zero-state
- "Build an agent" CTA points to `/build`
- "Browse marketplace" CTA points to `/marketplace`
- Empty state is **not** rendered when at least one agent exists
- Empty state is **not** rendered when folders exist (even with zero
agents)

## Test plan

- [x] `pnpm test:unit` — new `empty-state.test.tsx` passes
- [x] Visual: open `/library` with a fresh user → see illustration +
CTAs animate in
- [x] Visual: with agents present → empty state is hidden
- [x] Visual: search a non-existent term → search-empty state shown (not
new empty state)
- [x] Visual: filter by a status with no matches → existing
filter-exhaust UX preserved
- [x] Visual: navigate into an empty folder → does not render the new
empty state
- [x] A11y: with `prefers-reduced-motion: reduce`, only opacity fades —
no translate, no stagger
- [x] CTAs route via Next.js client navigation (no full page reload)
…nificant-Gravitas#12960)

### Why / What / How

**Why:** `frontend/TESTING.md` is explicit that page/component-level
integration tests (Vitest + RTL + MSW) are the default — yet ~30 hook
tests across the frontend were exercising hooks directly via
`renderHook`, with mock-heavy harnesses that drifted from how the hooks
are actually consumed. They were brittle, hard to navigate, and provided
coverage that didn't reflect user-visible behavior. A handful of test
files also lived outside `__tests__/` siblings and a couple were
duplicated. This PR cleans the surface area and brings test placement in
line with the convention.

**What:**
- Migrates 22 direct hook tests into the consumer component test where
the behavior is observable through DOM rendering. Where a sibling
component test didn't exist (`GoogleDrivePicker`, `ChatSidebar`,
`CredentialsInput`, `PushNotificationProvider`), creates a minimal one
rendered with `render()` from `@/tests/integrations/test-utils`.
- Consolidates the 3 `services/push-notifications` hook tests into a
single `PushNotificationProvider.test.tsx` that mounts the provider and
asserts via the same boundaries the hook tests previously stubbed
(`next/navigation`, `useSupabase`, service-worker registration helpers,
push API).
- Extracts pure helpers exposed by hooks into helper-test files (e.g.,
`classifyCredentials.test.ts`, `useArtifactContentHelpers.test.ts`).
Drops behaviors that were purely internal `useEffect` orchestration with
no DOM-observable surface.
- Relocates 27 misplaced test files into `__tests__/` siblings. Merges 3
helper-test duplicates (`CredentialsInput/helpers.test.ts`,
`copilot/store.test.ts`, `copilot/helpers.test.ts`) into a single
canonical copy each — no coverage lost.
- Resolves 2 real test-file duplicates (`downloadArtifact.test.ts`,
`useAutoOpenArtifacts.test.ts`).

**How:** Each migration was a strict triage:

1. **Already covered** by the consumer component test → drop the
hook-level case.
2. **Reachable through DOM rendering** of a consumer component → port as
a render-driven test using MSW handlers (and existing component-test
patterns) rather than mocking the hook itself.
3. **Pure helper logic exported from the hook module** → keep coverage
in a uniquely-named helpers test next to the hook.
4. **Pure internal `useEffect` orchestration** that can only be reached
by `renderHook` → drop, with the lost behavior documented per-hook so it
can be reinstated later.

Hook source files are untouched. The existing `CopilotPage.test.tsx`
(which mocks `useCopilotPage` directly and is therefore a smoke test
today) is also untouched — it's the natural home for the page-level
orchestration coverage that this PR drops, but rewriting it to render
with MSW is its own much larger change and is left as follow-up.

### Changes 🏗️

**Hook-test migrations (sub-component layer)**
- `useChatInput` → behaviors merged into `ChatInput.test.tsx`
- `useElapsedTimer` → `TurnStatsBar.test.tsx` (DOM-observed via a small
`TimerHarness`)
- `useArtifactContent` → `ArtifactContent.test.tsx`; cache helpers
preserved in new `useArtifactContentHelpers.test.ts`
- `useAutoOpenArtifacts` → `ChatContainer.test.tsx` (real Zustand store
+ real hook drive behavior)
- `useBuilderChatPanel` → already covered; deleted (24 internal-only
behaviors documented as dropped)
- `useDiagnosticsContent` → 2 error-coalesce cases added to
`DiagnosticsContent.test.tsx`
- `useRateLimitManager` → 12 cases ported into
`RateLimitManager.test.tsx` (now 19 tests)
- `useCredentialsInput` → 1 case ported into a new
`CredentialsInput.test.tsx`; 2 dropped as dead code (the hook exposes
`userUpgradeableCredentials` and `handleScopeUpgrade` but no consumer
reads them)
- `useGoogleDrivePicker` → new `GoogleDrivePicker.test.tsx` covers
token/error/scope paths
- `useCredentials` → 7 helper cases extracted into
`classifyCredentials.test.ts`; 2 context-plumbing cases dropped
- `usePushNotifications` + `useReportClientUrl` +
`useReportNotificationsEnabled` → consolidated into
`PushNotificationProvider.test.tsx` (19 tests)
- `useCopilotStop` → 8 cases ported into `ChatInput.test.tsx` via a
`StopHarness` (44 tests total now)
- `useSessionDeletion` → 4 cases ported into a new
`ChatSidebar.test.tsx`

**Hook-test migrations (page-level layer — coverage gap, see below)**

The remaining 14 copilot page-level hook tests under
`app/(platform)/copilot/__tests__/use*.test.ts` are deleted. Most of
their behaviors were `renderHook` + mocked-args orchestration with no
DOM surface other than through `<CopilotPage>` itself. Since
`CopilotPage.test.tsx` mocks `useCopilotPage` wholesale, none of that
orchestration is currently reachable from any rendered test. Behaviors
observable through other consumers were ported there (e.g.,
`useCopilotStop` → ChatInput); the rest are dropped.

**Documented coverage drops (highest-risk):**
- `useCopilotStream` (12 behaviors) — registry reuse, hydration-gated
resume, restore latch, 6s stall watchdog, message-snapshot persistence,
idempotent resume, `setMessages` pre-replay strip, unmount cleanup,
background-disconnect reload marking, 30s forced reconnect, 429
rate-limit recovery branches.
- `useCopilotPage` orchestrator — onSend queue-in-flight routing (5
sub-cases), active-restore trim + cached-snapshot preference,
`turnStats` merge precedence, backward pagination ordering.
- `useSendMessage` — file count cap, file size cap, all-uploads-fail
toast/throw, first-send stash + createSession flow, double-send
concurrency guard, queued first-send flush on sessionId change.
- `useStreamActivityWatchdog` (6), `useWakeResync` (8),
`useHydrateOnStreamEnd` (11), `useCopilotPendingChips` (8 promotion
lifecycle), `useLoadMoreMessages` (8 cursor/backoff),
`useSessionTitlePoll` (6), `useWorkflowImportAutoSubmit` (7),
`useChatSession` (5), `useCopilotNotifications` (4 push-SW dedupe).

**Recommended follow-ups:**
1. Rewrite `CopilotPage.test.tsx` to render with MSW (replacing the
`useCopilotPage` mock). Highest-priority targets: `onSend`
queue-in-flight routing, `useCopilotStream`'s reconnect/restore
lifecycle, `useSendMessage`'s file caps + first-send flush,
`useChatSession`'s freshSessionData masking.
2. Extract embedded pure logic from `useHydrateOnStreamEnd`
(`preservePromotedUserBubbles`, zombie ledger),
`useStreamActivityWatchdog` guards, `useCopilotPendingChips` promotion
logic, `useLoadMoreMessages` backoff/window — currently all
module-private — into siblings exports so they can be helper-tested
without page-level setup.
3. Already-exported but untested helpers flagged by agents during this
work: `getLatestAssistantStatusMessage`, `concatWithAssistantMerge`,
`deduplicateMessages`, `hasInProgressAssistantParts`,
`hasVisibleAssistantContent` — cheap follow-up additions to
`helpers.test.ts`.

**Test relocations**
27 helper/component test files moved from feature directories into
`__tests__/` siblings (renderers under `OutputRenderers/`, helpers under
copilot tools / ArtifactPanel / CredentialsInput / SubscriptionStep /
SubscriptionTierSection / APIKeyList, `route.helpers.test.ts`,
`lib/utils.test.ts`, `lib/autogpt-server-api/{client,helpers}.test.ts`,
`middleware.test.ts`,
`providers/agent-credentials/credentials-provider.test.ts`,
`types/auth.test.ts`). Three helper-test pairs that existed both inside
and outside `__tests__/` were merged into the canonical inside copy with
no coverage lost.

**Real duplicates removed**
-
`app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.test.ts`
— older smaller copy, the inside `__tests__/` version is the
comprehensive one.
-
`app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.test.ts`
— kept the more recent outside copy by moving its content into
`__tests__/`, then migrated it into `ChatContainer.test.tsx` as part of
phase A.

**Net diff:** 65 files changed, +2,214 / −9,233 lines.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  - [x] `pnpm format` — clean
  - [x] `pnpm lint` — clean (only pre-existing tailwind warnings)
- [x] `pnpm types` — only pre-existing PaywallGate /
SubscriptionTierSection errors (verified by stashing this PR's changes;
same errors exist on dev)
- [x] `pnpm exec vitest run --no-coverage src/` — 164 test files, 2,223
tests passing
- [x] Spot-checked targeted suites after each migration: each touched
component test file passes in isolation

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sageLimitReachedCard (Significant-Gravitas#13012)

### Why / What / How

<img width="800" alt="Screenshot 2026-05-06 at 19 04 53"
src="https://github.com/user-attachments/assets/109aeaa6-c5a8-4cd4-8aa4-8bacbb3ce784"
/>
<img width="800" alt="Screenshot 2026-05-06 at 19 05 03"
src="https://github.com/user-attachments/assets/16e908bb-fa95-4842-877d-3899870df583"
/>
<img width="800" alt="Screenshot 2026-05-06 at 19 05 56"
src="https://github.com/user-attachments/assets/11b3fd1b-de7e-41eb-8935-7a895a632c84"
/>

**Why:** The chat copilot's `UsageLimits/` folder mixed two distinct
concerns through a shared `UsagePanelContent` abstraction with three
boolean knobs (`showHeader`, `showBillingLink`, size). The shared
component made the "Go to billing" CTA conditional inside the card, even
though the chat container already gates whether the card renders at all
— which is why the legacy "Reset daily limit" button kept flashing back
through history. While doing the split it also made sense to clean up
the duplicated query config across the three new hooks, align the visual
language for usage bars and tier pills with the `AgentBriefingPanel`,
and surface a billing entry point from the popover (and on the briefing
panel itself) so users don't need to wait for the limit-reached state to
manage their plan.

**What:**
- Splits `UsageLimits/` into two fully independent components —
`<UsagePopover />` (chart-bar trigger button + popover panel) and
`<UsageLimitReachedCard />` (alert card shown above the chat input when
the limit is reached). The card always renders the "Go to billing"
button (modulo the platform billing flag) and now defensively guards on
actual exhaustion; ChatContainer's `useIsUsageLimitReached` decides
whether to mount the card at all.
- Adds a shared `useCopilotUsage` hook so the three consumer hooks
(`useUsagePopover`, `useUsageLimitReachedCard`,
`useIsUsageLimitReached`) share one `USAGE_QUERY_CONFIG` constant and
can't drift.
- Aligns the usage-bar visual language across the copilot popover, the
limit-reached card, the credits page, and the settings/billing
`AutopilotUsageCard` — single `h-2` track with blue→orange threshold at
80%, label and percent in the header row, reset/caption underneath, and
the same `<Badge variant="info" size="small"
className="bg-[rgb(224,237,255)]">{Tier} plan</Badge>` pill used by
`AgentBriefingPanel.UsageMeter`.
- Adds an always-on `Manage billing` link to the usage popover (gated by
the platform billing flag) so users can jump to `/settings/billing`
without first hitting the limit.
- Replaces the conditional "Go to billing" CTA in
`AgentBriefingPanel.UsageSection` with an unconditional top-right
`Manage billing` link, keeping the briefing card calmer and consistent
with the popover.
- Switches the popover and card integration tests from `vi.mock`-ing the
data hooks to driving them via MSW handlers on `*/api/chat/usage`, so
the new shared hooks are actually exercised end-to-end (closes the
codecov/patch gap the bot previously flagged).

**How:**
- New layout follows CONTRIBUTING.md: `ComponentName/ComponentName.tsx`
+ `useComponentName.ts` + colocated `__tests__/`.
- Extracted small render primitives `UsageBar.tsx` and `StorageBar.tsx`
so the popover, card, and `credits/page.tsx` can each render usage bars
without inheriting the old multi-mode component. `UsageBar` no longer
takes a `size` prop — the bar is `h-2` everywhere, matching the briefing
panel reference.
- `useCopilotUsage` lives at the folder root; the per-consumer hooks
(`useUsagePopover`, `useUsageLimitReachedCard`,
`useIsUsageLimitReached`) wrap it and only add the bits each consumer
needs (e.g. the platform billing flag for the popover/card).
- `useIsUsageLimitReached` lives in its own file at the folder root —
used by ChatContainer, not the card itself.
- `formatBytes` moved into `usageHelpers.ts` (alongside
`formatResetTime`) so the StorageBar primitive doesn't own a generic
helper.
- `AutopilotUsageCard` (`/settings/billing`) keeps its framer-motion
fill animation but adopts the unified `h-2` blue/orange track +
`body`-variant percent label + small caption for "Spent: $X" so it
visually matches the copilot bars.
- Stale tests for `UsagePanelContent` deleted (the `formatResetTime`
tests were redundant with `usageHelpers.test.ts`); added focused
integration tests for `UsagePopover` (9) and `UsageLimitReachedCard` (7)
plus `formatBytes` table tests. The popover and card tests now drive
`*/api/chat/usage` via MSW so the real hooks run during the test instead
of being shimmed.
- `BriefingTabContent.UsageSection` simplified — dropped the local
`showGoToBilling` derivation and the bottom CTA; the `Manage billing`
link is now the only billing entry point and renders whenever
`ENABLE_PLATFORM_PAYMENT` is on.

### Changes 🏗️

- **Added**
  - `UsagePopover/UsagePopover.tsx` + `useUsagePopover.ts`
- `UsageLimitReachedCard/UsageLimitReachedCard.tsx` +
`useUsageLimitReachedCard.ts`
- `UsageBar.tsx`, `StorageBar.tsx`, `useIsUsageLimitReached.ts`,
`useCopilotUsage.ts`
- `UsagePopover/__tests__/UsagePopover.test.tsx`,
`UsageLimitReachedCard/__tests__/UsageLimitReachedCard.test.tsx`
  - `formatBytes` in `usageHelpers.ts` + tests
- "Manage billing" link in `<UsagePopover />` (always shown when
`ENABLE_PLATFORM_PAYMENT` is on)
- **Removed**
- `UsageLimits.tsx`, `UsagePanelContent.tsx`, flat
`UsageLimitReachedCard.tsx`
- `UsageLimits/__tests__/` (replaced by per-component test folders +
`usageHelpers.test.ts`)
- "Go to billing" CTA from `AgentBriefingPanel.UsageSection` (replaced
by the existing top-right "Manage billing" link, which now always
renders when billing is on)
- **Updated**
  - `ChatSidebar.tsx` — imports `UsagePopover`
  - `ChatContainer.tsx` + its test — split imports across new file paths
- `profile/(user)/credits/page.tsx` — uses `UsageBar` + `StorageBar`
directly instead of the deleted `UsagePanelContent`
- `settings/billing/.../AutopilotUsageCard.tsx` — replaces the `h-6`
striped gray bar with the unified `h-2` blue/orange bar
- `library/.../AgentBriefingPanel/BriefingTabContent.tsx` — `Manage
billing` link is now unconditional under the billing flag
- Visual primitives `UsageBar.tsx` / `StorageBar.tsx` aligned to the
briefing-panel reference (`h-2`, blue→orange ≥80%, header row + caption
layout, `<Text>` atoms)
- Tier pill across copilot popover, limit-reached card, and briefing
panel uses the same `Badge variant="info"` with `bg-[rgb(224,237,255)]`
and `"{Tier} plan"` label

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format` clean
  - [x] `pnpm lint` clean (only pre-existing img warnings)
- [x] `pnpm types` clean for all refactored files (pre-existing
yearly-billing type errors unrelated to this PR)
- [x] `pnpm vitest run` for all affected test files: 101/101 passing
across 7 files (`UsagePopover` 9, `UsageLimitReachedCard` 7,
`usageHelpers` 22, `ChatContainer` 2, `billing-cards` 42, `billing-page`
9, `BriefingTabContent` 10)
- [ ] Manually open `/copilot`, confirm the chart-bar popover in the
sidebar still shows usage bars, tier pill, storage section, and the new
"Manage billing" link at the bottom
- [ ] Manually trigger a daily-limit-reached state and confirm the alert
card renders with "Go to billing" leading to `/settings/billing`, plus
the defensive guard hides the card below 100%
- [ ] Open `/profile/credits` and confirm the AutoPilot Usage & Storage
section still renders bars and storage as before
- [ ] Open `/settings/billing` and confirm the Autopilot usage card now
uses the unified blue/orange `h-2` bar
- [ ] Open `/library` and confirm the Agent Briefing usage section shows
the always-on "Manage billing" link top-right (and no "Go to billing"
CTA below the bars)

#### For configuration changes:
- [x] N/A — no config touched

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend CI was flaky because run duration was around 15 minutes (= the current cutoff)
…97) (Significant-Gravitas#13016)

### Why / What / How

Refreshes the walkthrough videos shown in the wallet onboarding
checklist ([SECRT-2297](https://linear.app/autogpt/issue/SECRT-2297)).
Old recordings were stale and one task (`SCHEDULE_AGENT`) had no
walkthrough at all.

- Replace `marketplace-add.mp4` with a re-recorded version.
- Add `agent-run.mp4` (used by `MARKETPLACE_RUN_AGENT`, replaces
`marketplace-run.mp4`).
- Add `agent-schedule.mp4` and wire it to the previously video-less
`SCHEDULE_AGENT` task.
- Delete unreferenced clips: `builder-open/save/run.mp4`,
`marketplace-visit.mp4`, `marketplace-run.mp4`.
- Add `!public/onboarding/*.mp4` exception to `.gitignore` so onboarding
videos slip through the global `*.mp4` rule.

> Note: the `TRIGGER_WEBHOOK` task video was deferred to
[SECRT-2186](https://linear.app/autogpt/issue/SECRT-2186).

### Changes 🏗️

- `frontend/.gitignore`: allow `public/onboarding/*.mp4`.
- `frontend/public/onboarding/`: refreshed `marketplace-add.mp4`; new
`agent-run.mp4`, `agent-schedule.mp4`; removed orphaned clips.
- `frontend/src/components/layout/Navbar/components/Wallet/Wallet.tsx`:
point `MARKETPLACE_RUN_AGENT` at `agent-run.mp4` and add `video` for
`SCHEDULE_AGENT`.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Open the wallet popover, expand "Get an agent from the
marketplace" and confirm the new `marketplace-add.mp4` plays.
- [ ] Expand "Open the Library page and run an agent" and confirm
`agent-run.mp4` plays.
- [ ] Expand "Schedule your first agent" and confirm
`agent-schedule.mp4` now plays (previously no video).
  - [ ] Confirm no console 404s for `/onboarding/*.mp4`.
POST /api/blocks/{block_id}/execute now charges via block_usage_cost
+ spend_credits before running, mirroring the billing wrapper used
in graph execution (manager.py:1014-1022) and the copilot tool
helper. Insufficient balance surfaces as HTTP 402.

The execution-count tier charge is intentionally omitted to match
copilot/chat-route semantics — that tier is graph-execution-only.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ignificant-Gravitas#13000)

## Changes

Implements [SECRT-2311](https://linear.app/autogpt/issue/SECRT-2311):
Makes AutoPilot aware of the user's subscription tier so it can surface
billing CTAs and tailor responses without polluting the system prompt.

### What's included

1. **`get_platform_info` tool** — A pull-model tool the agent calls
on-demand with a `subscription` topic. Returns:
   - Current tier name & description
   - Rate-limit multiplier & workspace storage limit
   - Billing portal URL
   - Available upgrade tiers (excluded for Enterprise)

2. **Tier hint in `user_context`** — Injects a one-word `Plan: PRO` line
into the existing `<user_context>` block on the first message, so the
agent passively knows the tier without a tool call.

3. **Identity fix in fallback system prompt** — Updates the hardcoded
fallback prompt from "AI automation assistant" to "AutoPilot, the AI
assistant on the AutoGPT platform" with a "never direct to external AI
services" instruction.

4. **Full registration** — `PlatformInfoResponse` model, `PLATFORM_INFO`
response type, `TOOL_REGISTRY` entry, `ToolName` literal update.

5. **12 unit tests** — All tiers, no-auth, invalid topic, DB failure,
registry presence, schema validation.

### Design decisions

- **Pull model** — Agent calls the tool when relevant (user asks about
billing, hits limits, etc.) rather than stuffing tier info into every
system prompt. Preserves LLM prompt-cache hits.
- **Topic enum** — `subscription` is the only topic for V1. Designed to
expand to `integrations`, `webhooks`, `capabilities` later.
- **Enterprise excluded** from upgrade suggestions (not self-serve).
- **Tier lookup failure is non-fatal** — silently caught, tool returns
graceful error.

### Files changed

| File | Change |
|------|--------|
| `copilot/service.py` | Identity fix + `user_id` param on
`inject_user_context()` + tier lookup |
| `copilot/tools/platform_info.py` | **NEW** — `PlatformInfoTool`
implementation |
| `copilot/tools/test_platform_info.py` | **NEW** — 12 unit tests |
| `copilot/tools/models.py` | `PlatformInfoResponse` + `PLATFORM_INFO`
response type |
| `copilot/tools/__init__.py` | Import + registry entry |
| `copilot/permissions.py` | `ToolName` literal update |
| `copilot/baseline/service.py` | Pass `user_id` to
`inject_user_context()` |
| `copilot/sdk/service.py` | Pass `user_id` to `inject_user_context()` |

### Checklist

- [x] Code follows project conventions
- [x] `poetry run format` passes
- [x] Unit tests written and passing (12/12)
- [x] No changes to Langfuse prompt (managed separately)
- [x] `inject_user_context()` signature change is backward-compatible
(`user_id` defaults to `None`)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds a new authenticated tool that exposes subscription/billing info
and injects plan context into first-turn prompts, which can affect
assistant behavior and requires correct feature-flag/tier lookups. Risk
is moderate due to new tool surface area and prompt-content changes, but
scope is contained to copilot tooling and context injection.
> 
> **Overview**
> **AutoPilot is now tier-aware without changing the cacheable system
prompt.** The first-turn `inject_user_context` flow now accepts
`user_id` and, when available, appends a `Plan: <TIER>` line (via
`get_user_tier`) into the trusted `<user_context>` prefix; both baseline
and SDK execution paths were updated to pass `user_id`.
> 
> Adds a new authenticated tool, `get_platform_info`, registered in the
tool registry and permissions, with a corresponding
`PlatformInfoResponse`/`ResponseType` and OpenAPI enum update. The tool
returns subscription tier and a billing URL (or an “open access”
response when billing is feature-flag disabled) and includes messaging
to keep billing guidance on the AutoGPT platform only.
> 
> Also updates the cacheable system prompt identity string to name
“AutoPilot on the AutoGPT platform.”
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
12c56fd. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…roxy (Significant-Gravitas#13019)

### Why / What / How

**Why.** The Supabase proxy was the bottleneck for client-side API
latency. Two things were doing avoidable work on every browser API call:

1. The Next.js middleware's matcher matched `/api/proxy/*`, so each call
ran `supabase.auth.getUser()` — a round-trip to the Supabase GoTrue auth
server — *before* the actual backend call.
2. The proxy handler parsed every request body (`req.json()` /
`req.formData()`) and re-serialised it before forwarding. For 256 MB
FormData uploads that meant the whole upload was buffered in Next.js
memory, then re-encoded for the backend. The response body was likewise
read into memory and re-wrapped with `NextResponse.json(body, { status:
200 })`, which (a) flattened backend `201` / `202` / `204` to `200` and
(b) stripped headers like `Content-Disposition`, `Cache-Control`,
`Location`, `ETag`. Empty / unparseable POST bodies fell back to the
literal string `"null"` reaching the backend.

The httpOnly-cookie / no-token-in-browser security model is unchanged —
JWTs are still injected server-side and never exposed to the browser, so
Google CASA posture is preserved.

**What.**

- Excludes `/api/proxy` from the middleware matcher.
- Rewrites `src/app/api/proxy/[...path]/route.ts` as a single
stream-through proxy: `req.body` (`ReadableStream`) is forwarded with
`duplex: "half"`, and `backendResponse.body` is piped straight back into
`NextResponse`. Status code, statusText, and a filtered set of response
headers (hop-by-hop entries dropped per RFC 7230) are preserved.
- Workspace download branch is untouched — it still buffers via
`arrayBuffer()` because of the existing Vercel/Next.js streaming
truncation bug for large binaries.
- Adds `PROXY_FOLLOWUPS.md` capturing the medium / low-impact items from
the review (cache(), sentinel cleanup, browser supabase consolidation,
useSupabase refactor, edge runtime, etc.) with suggested PR splits, so
the next pass has a clear starting point.

**How.**

- `src/middleware.ts` matcher gains `api/proxy` in its negative
lookahead. The proxy still authenticates itself (`getServerAuthToken()`
→ bearer header) and the backend re-validates the JWT, so the
middleware-level auth was pure overhead for API requests.
- The new `route.ts` is ~110 lines vs ~290 lines before. All the
per-content-type branching (`handleJsonRequest` /
`handleFormDataRequest` / `handleUrlEncodedRequest` /
`handleGetDeleteRequest`) and the `createResponse` /
`createErrorResponse` helpers collapse into one `handler()` that just
calls `fetch()`. `makeAuthenticatedRequest` /
`makeAuthenticatedFileUpload` in `lib/autogpt-server-api/helpers.ts` are
still used by the legacy `BackendAPI` server-side path, so they stay.
- Forwarded request headers are an explicit allow-list (Content-Type,
Content-Length, Accept, Accept-Language, Accept-Encoding,
X-Act-As-User-Id, X-API-Key, sentry-trace, baggage). Hop-by-hop response
headers are filtered out (Connection, Keep-Alive, Proxy-Authenticate,
Proxy-Authorization, TE, Trailer, Transfer-Encoding, Upgrade,
Content-Encoding).

### Changes 🏗️

- `src/middleware.ts`: matcher now excludes `api/proxy`.
- `src/app/api/proxy/[...path]/route.ts`: streaming pass-through proxy
with status/header propagation; workspace download branch retained.
- `src/app/api/proxy/PROXY_FOLLOWUPS.md`: new follow-up plan for the
medium/low-impact items.

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  - [ ] `pnpm format`, `pnpm lint`, `pnpm types` clean
- [ ] `pnpm test:unit` — 178 test files pass (no regressions in the
existing `route.helpers.test.ts` or `helpers.test.ts` suites)
- [ ] Manual: load library, builder, copilot, monitor, settings/billing
— verify React Query queries return data, mutations work, file uploads
succeed, downloads stream
- [ ] Manual: verify backend `Cache-Control: no-store` headers now reach
the browser (devtools → network)
- [ ] Manual: verify backend 204s (e.g. delete operations) round-trip as
204, not 200
- [ ] Manual: verify large file upload (>50 MB) still works and Next.js
memory stays bounded
- [ ] Manual: verify session expiry still redirects correctly (admin
pages, protected pages) — this depended on middleware running on page
navigations, not on `/api/proxy`, so should be unaffected, but worth
confirming

#### For configuration changes:

- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Abhi1992002 and others added 19 commits June 18, 2026 15:53
…Significant-Gravitas#13379)

Resolves SECRT-2424.

### Why / What / How

**Why** — In AutoPilot, messages in the middle of a conversation
silently disappeared from the rendered view during a multi-turn session
(no refresh). Scrolling up did **not** bring them back; only a full page
refresh restored them. The messages were never lost on the backend —
purely a client-side rendering hole.

**What** — Stop the post-turn force-hydrate from dropping in-memory
messages that are older than the refetched window.

**How** — AutoPilot displays `concatWithAssistantMerge(pagedMessages,
currentMessages)`:
- `GET /sessions/{id}` returns only the **most-recent ~50 messages** — a
sliding *tail window* (`backend/copilot/db.py`, `limit=50`).
- After every turn (`status: streaming → ready`),
`useHydrateOnStreamEnd` **force-replaced** `currentMessages` with that
window.
- As the conversation grows the window slides forward (its
`oldest_sequence` increases), so the replace drops every in-memory
message older than the new window's oldest sequence.
- Meanwhile `pagedMessages` (older history the user scrolled back to) is
intentionally preserved across refetches and only ever extends **older**
(`before_sequence`), so it never covers the newly-vacated region.
- Result: a hole between the top of `pagedMessages` and the bottom of
the new window. Scroll-back can't fill a *middle* hole, so the messages
stay gone until refresh rebuilds both sources contiguously.

The fix adds `retainOlderHistory(prev, hydrated)` in
`useHydrateOnStreamEnd`: before force-replacing, it keeps the `prev`
messages whose DB sequence predates the window's oldest sequence and
prepends them, keeping the result contiguous with the older history.
Streaming rows (AI-SDK uuids with no `-seq-N` id) are excluded — they're
already inside the refetched window, so there are no duplicates.

### Changes 🏗️

- `useHydrateOnStreamEnd.ts`: new `retainOlderHistory` helper; the
force-hydrate path now retains older-than-window messages instead of
dropping them.
- `helpers/convertChatSessionToUiMessages.ts`: export the existing
`extractDbSequence` (`-seq-N` parser) so the hook reuses it instead of
duplicating the regex.
- `__tests__/useHydrateOnStreamEnd.test.ts`: **new** regression coverage
(the hook previously had none).

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format`, `pnpm lint`, `pnpm types` — clean
- [x] `pnpm test:unit` for the new test — 3/3 pass; changed hook
coverage 71% stmts / 78% lines (>70%)
- [x] Full copilot suite (`src/app/(platform)/copilot`) — 1200/1200
tests pass, no regressions
- [x] Manual: long AutoPilot session (>50 messages), scroll up to load
older history, send another message, confirm middle messages remain
after the turn completes (no refresh)

### Test plan to reproduce the original bug

1. Open an AutoPilot session with enough turns that total persisted
messages exceed 50.
2. Scroll up so older history loads (you can see the first message).
3. Send another message and let the turn finish — **without
refreshing**.
4. Before this fix: messages between the loaded older pages and the
recent window are missing and scroll-back won't recover them. After this
fix: the conversation stays contiguous.
…sult (Significant-Gravitas#13381)

### Why / What / How

**Why** — A `run_sub_session` sub-agent (Otto/AutoPilot) that does
substantial work and then *delivers it by writing workspace files* —
summarising only briefly in its final message — returns a hollow body to
the parent (SECRT-2377). Confirmed in production: a sub made 44 tool
calls and wrote ~37 KB of findings across 3 workspace files, then
returned 227 tokens of *"delivered in three workspace documents, what's
next?"*. The parent had no visibility into those files, treated the run
as empty, and re-ran the entire task — ~22.6k tokens wasted, no error
surfaced.

Root cause is a **contract gap**: `response_from_outcome` carried only
the sub's final assistant text (`response`) + a tool-call log. Workspace
files are scoped to the *sub's* session (the parent lists its own
session by default), and there was no field enumerating files the sub
produced. So "deliver via files + summarise" was silent data loss.

**What** — Add a `sub_workspace_files` manifest to
`SubSessionStatusResponse`, populated on completion, so the parent can
recover work delivered via files. Each entry has `file_id`, `name`,
fully session-qualified `path`, and `size_bytes` — `path` is directly
usable with `read_workspace_file(path=...)`.

**How** — Two sources, authoritative-first:
- **`list_sub_workspace_files`** (authoritative) reads the sub's session
for `origin=agent-created` files. This captures writes from **any** turn
— including the already-terminal / cold-poll path in
`get_sub_session_result`, where the rebuilt tool-call log reflects only
the sub's last message and would otherwise miss earlier writes. Returns
`None` on lookup failure; `[]` means the sub wrote nothing. Capped at 50
entries.
- **Tool-call mining** (`_workspace_files_from_tool_calls`) is the
fallback when the listing is unavailable, parsing `write_workspace_file`
outputs defensively (JSON string on live-drain, dict on
persisted-replay).

Both `run_sub_session._execute` and `get_sub_session_result._execute`
fetch the listing on `completed` and pass it through
`response_from_outcome`. The completed message also nudges the parent
toward the files so a hollow `response` isn't mistaken for an empty run.
(Stored `WorkspaceFile.path` is already session-qualified via
`_resolve_path` on write, so no re-prefixing is needed.)

### Changes 🏗️

- `models.py`: new `SubWorkspaceFileInfo` model; `sub_workspace_files:
list[...] | None` field on `SubSessionStatusResponse`.
- `run_sub_session.py`: `list_sub_workspace_files` (authoritative),
`_workspace_files_from_tool_calls` + `_as_payload` (fallback miner),
`workspace_files` override param on `response_from_outcome`,
completed-message nudge; `_execute` fetches the listing on completion.
- `get_sub_session_result.py`: fetches the listing on completion and
passes it through — fixes the cold-poll/terminal path.
- `sub_session_test.py`: repro at `response_from_outcome` level +
terminal-path and live-path `_execute` coverage; autouse fixture stubs
the listing so no test hits the workspace DB.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `poetry run pytest backend/copilot/tools/sub_session_test.py` — 22
passed
- [x] Repro: a completed sub that wrote files but returned a hollow
`response` now surfaces `sub_workspace_files` with session-qualified
paths
- [x] Terminal/cold-poll path (`get_sub_session_result`, last message
terminal, waiter skipped) still populates the manifest from the
authoritative listing
- [x] A sub that answered inline (no writes) yields `sub_workspace_files
= None` (no noise)
- [x] `poetry run format` and `poetry run lint` (ruff, isort, black,
pyright) clean
…#13170)

### Why / What / How

**Why:** The platform currently lacks a native way to parse JSON strings
into Python objects, or encode Python objects into JSON strings.
**What:** This PR introduces two new core data blocks:
`JSONEncoderBlock` and `JSONDecoderBlock`.
**How:** Built standard `Block` subclasses using the built-in `orjson`
wrappers (`dumps` and `loads` from `backend.util.json`) to handle the
conversion gracefully. Added comprehensive edge-case and boundary
testing for both blocks.

Closes : Significant-Gravitas#11108 

### Changes 🏗️

- Added `JSONEncoderBlock` to convert Python
dictionaries/lists/primitives into JSON strings.
- Added `JSONDecoderBlock` to parse JSON strings back into Python
dictionaries/lists.
- Added comprehensive unit tests in `test_json_blocks.py` covering
successful encoding/decoding as well as error handling for malformed
JSON strings.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Ensure automated unit tests pass for `JSONEncoderBlock` success
and error boundary conditions
- [x] Ensure automated unit tests pass for `JSONDecoderBlock` success
and error boundary conditions
- [x] Verify the blocks successfully pass the global Block Registry
schema validation

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: majdyz <zamil.majdy@agpt.co>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Mistral Vibe <vibe@mistral.ai>
…nificant-Gravitas#13393)

### Why / What / How

**Why:** Two adjacent Copilot UX rough edges:
- Entering a session that already has files auto-opened to the Context
panel's "Files" tab — an extra click from actually seeing the most
relevant output — and once an artifact was open there was no quick way
back to the full files list. The header icon buttons also relied on
native `title` tooltips.
- The "Enable browser notifications" banner and the "out of automation
credits" banner looked inconsistent (full-width amber bar vs. bordered
card, different buttons/icons) and sat flush against each other.

**What:**
- Auto-open the **last generated file** directly in the Artifact panel
when a session already has generated files, instead of the Context panel
Files tab.
- Add a folder button to the Artifact panel header (before the close
"x") that opens the Context panel on the Files tab.
- Show design-system tooltips (positioned below the button) with the
action name for all Artifact panel header icon buttons.
- Restyle the notification banner to match the low-credit banner (shared
`Alert` molecule, `warning` variant), unify both action buttons to the
default primary style, and add a gap between the two banners.

**How:**
- New store action `autoOpenArtifact(ref)` opens the Artifact panel on a
given file, respecting the existing `_autoOpenUserClosed` guard.
`useAutoOpenForFiles` now picks the most recently generated file (by
`created_at`) and opens it via this action; the per-session `triggered`
guard and session-change reset are preserved.
- New store action `showFilesTab()` is an explicit (un-guarded) open of
the Context panel Files tab, wired through `useArtifactPanel` →
`ArtifactPanel` → `ArtifactPanelHeader` as `onOpenFiles` (desktop +
mobile).
- `HeaderButton` wraps its button in
`Tooltip`/`TooltipTrigger`/`TooltipContent` (`side="bottom"`) from
`atoms/Tooltip`, using the action name as content; native `title`
replaced by the tooltip, `aria-label` kept for accessibility. A global
`TooltipProvider` already wraps the app.
- `NotificationBanner` now renders via the `Alert` molecule. Added a
small backward-compatible `icon?` override prop to `Alert` so the banner
keeps its bell icon (defaults to the variant icon, so no existing
`Alert` usage changes). Both banners use `<Button variant="primary"
size="small">`, and `CopilotPage` wraps them in `flex flex-col gap-3 ...
empty:hidden`.

### Changes 🏗️

**Artifact / Context panel**
- `copilot/store.ts`: add `autoOpenArtifact(ref)` and `showFilesTab()`
actions.
- `ContextPanel/useAutoOpenForFiles.ts`: open the last generated file in
the Artifact panel instead of the Context panel Files tab.
- `ArtifactPanel/components/ArtifactPanelHeader.tsx`: add folder button
before close; wrap header icon buttons in tooltips positioned below.
- `ArtifactPanel/useArtifactPanel.ts` +
`ArtifactPanel/ArtifactPanel.tsx`: expose and wire `showFilesTab` as
`onOpenFiles`.
- `ContextPanel/__tests__/ContextPanelAutoOpen.test.tsx`: update
assertion to the new artifact-open behavior.

**Banners**
- `copilot/components/NotificationBanner/NotificationBanner.tsx`: use
the `Alert` molecule (warning variant) with the bell icon and a
default-primary "Enable" button.
- `components/layout/TopUpPrompt/LowCreditBanner/LowCreditBanner.tsx`:
drop the custom orange "Top up" button styling so it matches the default
primary.
- `components/molecules/Alert/Alert.tsx`: add optional `icon` override
prop.
- `copilot/CopilotPage.tsx`: wrap both banners in a flex column with a
gap.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Open a session that already has generated files → the last
generated file opens directly in the Artifact panel
- [ ] Click the folder button in the Artifact panel header → Context
panel opens on the Files tab
- [ ] Hover each header icon button (Back, Copy, Download, All files,
Close) → tooltip with the action name shows **below** the button
- [ ] Explicitly close the panel, then re-enter the session → auto-open
is suppressed (respects user close)
- [ ] Verify behavior on mobile (folder button closes the artifact
drawer and opens the Files sheet)
- [ ] With low credits + notifications available, both banners render as
matching bordered cards with a gap, identical primary buttons, and
matching orange icons

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…icant-Gravitas#13395)

### Why / What / How

**Why:** The copilot tool-call UI animations were heavy and broken — the
status line did a per-character 3D-spring-blur stagger that re-ran on
every streamed token (leaving "gray blurry lines" for seconds and
replaying for every line on reload), and the result accordion expanded
with a springy bounce into an unbounded, black code block that could
swallow the page.

**What:** Reworked the tool-call status line, the result accordion, and
the code block styling for a faster, calmer, light-mode feel.

**How:** `MorphingTextAnimation` now renders crisp text with a single
opacity-only fade gated to live streaming (`animate={isStreaming}`), so
historical lines stay static on reload and nothing smears mid-stream;
the `ToolAccordion` expand swaps the spring for a no-bounce ease-out
tween with a softer blur bridge; and the expanded content is capped at
`max-h-[24rem]` with `overflow-y-auto` while `ContentCodeBlock` moves
from black to a light `bg-neutral-100`.

### Changes 🏗️

- `MorphingTextAnimation`: removed per-character 3D/spring/blur stagger
→ single opacity fade, only for actively-streaming tool calls; respects
`prefers-reduced-motion`.
- `ToolAccordion`: replaced springy bounce with ease-out tween, reduced
blur bridge, capped content height with a scroller so long output can't
cover the page.
- `ContentCodeBlock`: switched from black to light mode
(`bg-neutral-100` / `text-neutral-800`).

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run a copilot session with tool calls; confirm status lines fade
in once (no per-char/blur smear) while streaming
- [x] Reload a session with past tool calls; confirm lines render static
with no entrance replay
- [x] Expand a tool result accordion; confirm smooth ease-out (no
bounce) and long output scrolls within a bounded height
  - [x] Confirm the code block renders in light mode
Significant-Gravitas#13408)

### Why / What / How

**Why:** picsum.photos was used as a placeholder for marketplace agent
images in three product call sites. One of them (the publish modal's
Details-step initial image) is **not display-only** — `thumbnailSrc`
seeds the modal's image list which is submitted as `image_urls`. So
publishing **without** a custom image persisted a **random**
`picsum.photos/300/200` URL (no seed → different image each request) as
the live listing thumbnail. This is a latent bug. The same external
dependency also caused **intermittent E2E failures**: seeded picsum
images are fetched server-side via Next's `/_next/image` optimizer; when
picsum slowed down, hung image requests saturated the HTTP/1.1 localhost
connection pool and stalled the next client-side navigation (logout,
view-progress), timing out assertions in `auth-happy-path` and
`publish-happy-path`.

Decision (with @krzysztof.czerwinski): **just remove picsum.** Agents
already have proper image options (upload or AI-generate), and
image-less listings already render a solid-color fallback — so there's
no real gap to fill.

**What:** Remove the picsum.photos dependency entirely — from product
fallbacks, E2E seed data, the Next.js image-domain allow-list, and a dev
styleguide demo. Make the publish E2E hermetic.

**How:**
- Replaced the `|| "https://picsum.photos/300/200"` fallbacks with `||
""`. The existing `images.length === 0 → "At least one image is
required"` validation now genuinely requires a real image instead of
being silently satisfied by a junk URL.
- Seed data seeds `image_urls` as `[]` (cards render their built-in
fallback) and creator avatars as `""` (see note below), removing the
`get_image()` picsum helper.
- The `publish-happy-path` E2E selects a local fixture and **stubs the
media upload response** (the E2E stack has no GCS bucket); it does
**not** call the AI `generate_image` endpoint.

#### Follow-up fixes (after first CI run)

Removing picsum surfaced two unrelated latent issues that the picsum
pre-fill had been masking. Both are now fixed:

1. **Creator avatars must be non-null.** The `Creator` DB view types
`avatar_url` as a non-nullable `String` (maps `p."avatarUrl"` with no
COALESCE). Seeding `avatarUrl=None` made `GET /api/store/creators` 500
(`converting field avatar_url … found incompatible value of null`), so
the marketplace landing rendered its error card and "Become a Creator"
never appeared. Seed `""` instead — non-null, renders the frontend
fallback avatar, no external fetch.
2. **No GCS media bucket in the E2E stack** (`MEDIA_GCS_BUCKET_NAME`
empty, no emulator), so a real upload 500s. The publish spec stubs the
upload via Playwright `page.route` with a local asset URL. This spec
covers the publish → track → delete dashboard flow, not GCS storage.
(Open to wiring a fake-GCS emulator instead if preferred.)

### Changes 🏗️

- **Product fallbacks:** `usePublishAgentModal.ts` (×2) and
`useAgentSelectStep.ts` — picsum fallback → `""`.
- **E2E seed data:** `backend/test/e2e_test_data.py` &
`backend/test/test_data_creator.py` — removed `get_image()`;
`image_urls` → `[]`, creator `avatarUrl` → `""`. Updated docstrings +
`backend/backend/TEST_DATA_README.md`.
- **Playwright:** `credentials/index.ts` seed `image_urls` → `[]`;
`marketplace.page.ts` `submitAgentForReview` opens the Thumbnails
accordion, selects `assets/test-thumbnail.png` (new 1×1 PNG fixture),
and stubs the media-upload response before submitting.
- **Config:** `next.config.mjs` — removed `picsum.photos` from allowed
image domains.
- **Data migration** (`20260622120000_scrub_picsum_image_urls`): removes
legacy picsum URLs already persisted in the DB — filters them out of
`StoreListingVersion.imageUrls`, sets `Profile.avatarUrl` to `""`
(non-null, Creator view), and `LibraryAgent.imageUrl` to NULL. Real
user-chosen images are untouched.
- **Styleguide demo:** `copilot/styleguide/page.tsx` — two picsum URLs →
local `/placeholder.png`.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Full-stack E2E CI green — `publish-happy-path` and
`auth-happy-path` pass; no picsum requests / 504s in the run
- [x] `pnpm format`, `pnpm lint` (only pre-existing warnings), `pnpm
types` pass
  - [x] Publish modal unit tests pass (44 passed)
- [x] Backend seed files compile; backend `create_store_submission`
accepts empty `image_urls`

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation + fix Exa/Airtable bugs (Significant-Gravitas#13135)

### Why / What / How

**Why.** While auditing the webhook ingress path, three issues surfaced:

1. **Exa** webhook verification was broken. The signature check ran
inside `validate_payload`, computed HMAC against `webhook.secret` (Exa
actually signs with the secret *it* returns at registration, stored in
`config["exa_secret"]`), read the wrong header (`X-Exa-Signature` vs the
real `Exa-Signature`), compared the raw `t=<ts>,v1=<hex>` header against
a bare hex digest, and signed the body without the required
`<timestamp>.` prefix. It could never have validated a real Exa
delivery.
2. **Airtable** verification had the same flavor of bug: Airtable
returns the signing secret base64-encoded as `macSecretBase64` (stored
in `config["mac_secret"]`), but the code used the base64 string verbatim
as the HMAC key instead of base64-decoding it first. Additionally, the
cursor update in `validate_payload` *replaced* the whole `config` blob,
wiping `mac_secret` after the first delivery.
3. The **generic webhook** trigger accepted any payload posted to its
URL with no way to opt into a shared-secret check, even though the user
controls both ends.

These didn't outright break anything today because Exa/Airtable gated
the check behind `if header_present:` — absent header → check skipped.
Net effect: verification was effectively a no-op everywhere except
GitHub/Telegram.

**What.** Consolidate signature verification into a dedicated,
provider-overridable step that runs *before* payload validation, and
make it actually correct for every provider whose protocol supports it.

**How.** New `BaseWebhooksManager.verify_signature(webhook, request)`
classmethod, called by the ingress router before `validate_payload`. The
default is a no-op so providers without a signing scheme (Compass,
Slant3D) don't have to fake one. Verification failures return **403**
(not 404 — that would leak webhook existence). Providers that sign
override it and `hmac.compare_digest` against the stored secret.

> **Note on feature flags:** an earlier revision of this PR gated
Exa/Airtable enforcement behind LaunchDarkly flags (`ENFORCE_*`) for a
staged rollout. We dropped that approach: a DB check confirmed **no
agents use the Exa or Airtable webhook triggers** (zero registered
webhooks for either provider), so the flags were pure ceremony — and a
`default=False` gate makes a security control fail *open* on a
LaunchDarkly outage. Verification now enforces unconditionally,
consistent with GitHub/Telegram.

### Changes 🏗️

| File | Change |
|---|---|
| `webhooks/_base.py` | Add `verify_signature` classmethod (no-op
default) |
| `api/features/integrations/router.py` | Call `verify_signature` before
`validate_payload`; 403 on failure |
| `webhooks/github.py`, `telegram.py` | Move existing checks from
`validate_payload` into `verify_signature` (pure refactor, behavior
unchanged) |
| `blocks/exa/_webhook.py` | Correct implementation: read
`Exa-Signature`, parse `t=<ts>,v1=<hex>` (multi-`v1` supported), sign
`<ts>.<raw body>`, key with `config["exa_secret"]` ([Exa
docs](https://docs.exa.ai/websets/api/webhooks/verifying-signatures)) |
| `blocks/airtable/_webhook.py` | Base64-decode `config["mac_secret"]`
before HMAC; verify `X-Airtable-Content-MAC` (`hmac-sha256=<hex>`);
merge cursor update into existing config instead of replacing it |
| `blocks/generic_webhook/_webhook.py`, `triggers.py` | Optional
`secret_token` input → require matching `X-Webhook-Secret`
(constant-time); unset = today's behavior |
| `docs/platform/new_blocks.md`, `docs/.../generic_webhook/triggers.md`
| Document the `verify_signature` extension point and the generic
`secret_token` |

**Compass / Slant3D:** no code change — their protocols have no signing
scheme; the default no-op covers them (the UUID URL is the bearer
secret).

### Test plan 📋

`backend/api/features/integrations/webhook_ingress_test.py` (24 tests):

- Unsigned providers (Compass, Slant3D) pass through.
- Always-signed providers (GitHub, Telegram) reject missing/invalid
sigs, accept valid ones.
- **Exa**: missing / malformed / wrong-signature → 403; correct `t=,v1=`
over `<ts>.<body>` → accepted; signature computed over body-only (the
old bug) → rejected; missing `config["exa_secret"]` fail-closes.
- **Airtable**: missing sig / missing `config["mac_secret"]` → 403;
correct base64-decoded HMAC → accepted; regression test that the old
"base64 string as key" HMAC does **not** verify.
- **Generic webhook**: no-token / empty-token pass through; configured
token → missing/wrong/correct `X-Webhook-Secret`.
- Ordering: `verify_signature` runs before `validate_payload`.

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `poetry run pytest
backend/api/features/integrations/webhook_ingress_test.py` — 24 passed
  - [x] `poetry run lint` / `type-check` clean; full CI green

Closes [SECRT-2359](https://linear.app/autogpt/issue/SECRT-2359).

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t at (links, replies, forwards), securely (Significant-Gravitas#13396)

### Why / What / How

**Why:** When a user pointed the Discord bot at an existing conversation
— pasting a channel/message link, `<#>`-mentioning a channel,
**replying** to a message, or **forwarding** one — the bot couldn't read
it. It has no web access to Discord and would deflect: *"I can't open
links / read Discord history — paste the text here."* But the bot is
already on the gateway and can read anything it (and the user) has
access to. The context was right there; we just weren't gathering it or
handing it to the model.

**What:** The bot now gathers the conversation context a user points it
at, from **every** way they can point at one, and feeds it into the
AutoPilot turn — so it answers from the real conversation instead of
deflecting. Every read is gated on the **requesting user's** own
permissions, so the bot never surfaces anything the user couldn't
already see.

**How:**
- A gateway-free `references.py` parses channel/thread IDs **and
specific message IDs** out of message text (permalinks + `<#id>`
mentions), de-duplicated and order-preserving.
- The Discord adapter fetches each referenced conversation via its own
gateway connection:
  - **Channel/`<#>` reference** → recent history.
- **Specific-message permalink** (`/channel/message`) → that exact
message plus the turns leading up to it (works for old messages, and is
kept even when it points at the current channel — "what was said here
`<link>`").
- **Replies** (`message.reference`) and **forwards**
(`message_snapshots`) are folded in as quoted context.
- The platform-agnostic handler renders the gathered conversations into
the prompt and leads with a firm instruction that the content is
*already supplied*, then rewrites the raw link into a readable
`#channel-name` token so the model stops fixating on a URL it thinks it
must open.

**Security model (every read gated on the requester, not the bot):**
- Same-guild only — never surfaces content from another server.
- Requester must have `view_channel` + `read_message_history` on the
target.
- **Private threads** additionally require the requester to actually be
a member (`manage_threads` bypasses, as in the client) — channel-level
perms aren't enough.
- References are scanned **only from the user's own typed message** —
links inside a forwarded/replied-to/quoted message are context, never
auto-fetched or rewritten.
- Requester resolved from `message.author` (the bot runs without the
privileged members intent, so the member cache is unreliable).
- Bounded: ≤3 conversations, ~8k chars each; LLM-produced mentions stay
on an allowlist.

### Changes 🏗️

- **New** `adapters/discord/references.py`:
`extract_referenced_targets()` (channel + optional message id),
`replace_referenced_links()`, and a frozen `ReferenceTarget` pydantic
model.
- **`adapters/base.py`**: `ReferencedConversation` (title, `channel_id`,
messages) + `MessageContext.referenced_conversations`.
- **`adapters/discord/adapter.py`**: `_fetch_referenced_conversations` /
`_fetch_one_referenced` (same-guild, requester-ACL, private-thread
membership, specific-message vs recent-history fetch, budgeted),
`_can_requester_read`, `_with_reply_context` / `_resolve_reply`,
link-rewriting; reference scanning runs on the user's own text only.
- **`handler.py`**: always surfaces referenced conversations (channels,
DMs, threads), gating only thread-history behind the first-@-into-thread
flag; firm "you already have the content, don't deflect" framing.
- Comprehensive tests across `references_test.py`, `adapter_test.py`,
`handler_test.py` (permalink parsing, ACL gates incl. private-thread
membership, reply/forward context, "quoted links aren't fetched",
requester-from-author, fail-safe paths).

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Unit/integration: 253 bot tests pass
- [x] **Local end-to-end** against the live gateway: linked a channel
and a specific-message permalink, replied to a message, forwarded a
message — confirmed the bot reads the real content instead of deflecting
- [x] Verified security gates: cross-guild skipped, private
channel/thread without access skipped, links inside quoted
(forward/reply) content not fetched
…t-Gravitas#13399)

### Why / What / How

<!-- Why: Why does this PR exist? What problem does it solve, or what's
broken/missing without it? -->
This PR addresses issue BUILDER-7HR. Previously,
`Sentry.captureConsoleIntegration()` was called without any options,
causing it to capture all console levels (including `info`, `debug`,
`log`). This led to Sentry ingesting noisy messages, such as "open
[object Event]" from wallet browser extension SDKs, which were logged to
the console at info/debug levels. This change prevents the ingestion of
irrelevant console messages, reducing Sentry noise and improving
observability.
<!-- What: What does this PR change? Summarize the changes at a high
level. -->
This PR restricts the console levels captured by Sentry in the frontend.
<!-- How: How does it work? Describe the approach, key implementation
details, or architecture decisions. -->
By modifying `autogpt_platform/frontend/instrumentation-client.ts` to
pass `{ levels: ["fatal", "error", "warn"] }` to
`Sentry.captureConsoleIntegration()`. This aligns the client-side Sentry
configuration with the already correct configuration in
`sentry.edge.config.ts`.

### Changes 🏗️

<!-- List the key changes. Keep it higher level than the diff but
specific enough to highlight what's new/modified. -->
- Modified `autogpt_platform/frontend/instrumentation-client.ts` to
explicitly configure `Sentry.captureConsoleIntegration()` to only
capture `fatal`, `error`, and `warn` console levels.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
- [ ] Manually trigger `console.info()`, `console.debug()`,
`console.log()` in the frontend and verify these messages do *not*
appear in Sentry.
- [ ] Manually trigger `console.warn()`, `console.error()`,
`console.fatal()` (or throw an unhandled error) in the frontend and
verify these messages *do* appear in Sentry.
- [ ] Observe Sentry dashboard for a reduction in noisy `info`/`debug`
level console events.

<details>
  <summary>Example test plan</summary>
  
  - [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
  - [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
  - [ ] Edit an agent from monitor, and confirm it executes correctly
</details>

#### For configuration changes:

- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)

<details>
  <summary>Examples of configuration changes</summary>

  - Changing ports
  - Adding new services that need to communicate with each other
  - Secrets or environment variable changes
  - New or infrastructure changes such as databases
</details>

Fixes BUILDER-7HR

Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
…nt-Gravitas#13309)

Resolves
[OPEN-3155](https://linear.app/autogpt/issue/OPEN-3155/make-trigger-agent-creation-more-consistent)

### Why / What / How

**Why:** The "Trigger Agent" pattern (Significant-Gravitas#12740) lets AutoPilot poll a data
source and run an *action agent* once per detected change via the
`AgentExecutorBlock`. Two things made it inconsistent:
1. AutoPilot often crammed the polling loop **and** the action into a
single scheduled agent. That agent runs on every poll, so its run list
is mostly empty polls and the user can't tell which runs actually did
anything.
2. A trigger agent exists only to drive its action (parent) agent and is
never listed on its own in the library — but when the action agent was
deleted, its trigger agents were left behind: orphaned, invisible, and
still firing on their schedule.

Separately, AutoPilot tended to reach for AI blocks even when plain
logic would do — AI processing costs orders of magnitude more, spending
the user's money needlessly.

**What:**
- Clarify the agent-building guide so AutoPilot reliably builds the
polling (trigger) agent and the action agent as **two separate agents**.
- Cascade-delete trigger agents when their action agent is deleted.
- Add a general guide rule to prefer deterministic blocks over AI when
an equivalent exists (cost), applying to all agents.

**How:**
- *Guide (triggers):* rewrote the `### Building Trigger Agents` section
of `agent_generation_guide.md` to lead with a decision rule (poll +
act-on-change ⇒ two separate agents), the anti-pattern and its run-list
rationale, an over-split guard (a plain "do X on a schedule" agent stays
a single agent), an explicit build order, and `AgentExecutorBlock` as
the preferred sink. All new guidance stays inside the
`GENERIC_TRIGGER_AGENTS`-gated section (bold labels only, heading
unchanged) so the feature-flag strip still works.
- *Guide (cost):* added a top **Key Rules** bullet steering AutoPilot to
deterministic blocks (`CodeExecutionBlock`, `ConditionBlock`, …) over
LLM blocks whenever a non-AI equivalent exists, reserving AI for genuine
reasoning/summarization/generation. Always-visible (not gated), so it
applies to every agent.
- *Cleanup:* `delete_library_agent` now finds the hidden agents whose
graph runs the deleted graph via an `AgentExecutorBlock` (the same
derived relationship `list_trigger_agents` uses) and recursively deletes
each, reusing the existing schedule/webhook cleanup. A trigger that also
drives a *different* action agent is kept (the deleted agent must be its
only sink). The cascade is skipped when deleting a hidden agent, which
also bounds recursion to one level.

### Changes 🏗️

- **`copilot/sdk/agent_generation_guide.md`** —
- Rewrote the "Building Trigger Agents" section: mandates the
action/trigger split with an over-split guard; `AgentExecutorBlock` is
the preferred sink, `AutoPilotBlock` the fallback. (Also ~40% more
concise than the prior version.)
- Added a first "Key Rules" bullet: prefer pure logic over AI when an
equivalent exists, to avoid unnecessary LLM cost (applies to all
agents).
- **`copilot/tools/get_agent_building_guide_test.py`** — 2 regression
tests locking the "two separate agents" + "do NOT over-split" guidance
(gating coverage inherited from the existing flag-off test).
- **`api/features/library/db.py`** — `delete_library_agent` cascades to
orphaned trigger agents via two new helpers:
`_cleanup_trigger_agents_for_graph` (finds + deletes sole-sink triggers)
and `_trigger_targets_other_graph` (the "only sink" guard).
- **`api/features/library/db_test.py`** — 5 tests: cascade fires for
visible agents, skips hidden agents, deletes sole-sink triggers, keeps
multi-sink triggers, and the sink-detection helper.

No configuration changes.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `get_agent_building_guide_test.py` — 10 passed (heading sentinel,
both new split-guidance assertions, flag on/off strip-gating); still
green after the cost-rule note
- [x] `db_test.py` cascade tests — 5 passed (visible→cascade,
hidden→skip, sole-sink→delete, multi-sink→keep, sink detection)
- [x] Full `api/features/library/db_test.py` — 25 passed (no
regressions, incl. after merging dev)
  - [x] black / isort / ruff / pyright clean on all changed files
- [x] Manual end-to-end AutoPilot eval (build a polling+action goal →
expect a visible action agent + hidden trigger; delete the action agent
→ expect the trigger gone) — not run locally; needs a
`GENERIC_TRIGGER_AGENTS`-on session

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Gravitas#13382)

### Why / What / How

**Why:** When a backing service (Redis cluster, RabbitMQ, an RPC peer)
is unreachable, the connection-acquisition retry loops kept the whole
backend from shutting down. `conn_retry` and `create_retry_decorator`
retry up to ~100 times with 30s exponential backoff
(`pyro_client_comm_retry=100`, `pyro_client_max_wait=30`) — roughly 50
minutes — with **no shutdown awareness**. The backoff sleeps run on
worker/background-event-loop threads that never receive SIGINT/SIGTERM
(only the main thread does), and the eager `await
redis_client.get_redis_async()` in `AppService.lifespan` keeps Uvicorn
stuck in *startup*, so its graceful-shutdown path never runs. Net
effect: `Ctrl+C` is ignored while a service is retrying, and the logs
spam "Acquiring connection failed … Retrying now…" indefinitely.

**What:** Make the retry loops abort promptly when the process receives
a shutdown signal.

**How:**
- Added a process-wide shutdown flag in `backend/util/retry.py` —
`request_shutdown()` / `is_shutting_down()` backed by a
`threading.Event` (signal-safe: it only sets an event).
- Added a tenacity `_StopOnShutdown` stop condition (combined via `|`
with the existing `stop_after_attempt`) so no new retry is attempted
once shutdown is requested.
- Added interruptible sleeps (`_interruptible_sleep` /
`_interruptible_async_sleep`) that wake immediately on shutdown instead
of sleeping out the full backoff. `conn_retry` and
`create_retry_decorator` pick the right one based on whether the wrapped
function is sync or async.
- Both signal handlers — `AppProcess._self_terminate` and
`AppService._self_terminate` — now call `request_shutdown()` first.

On the first `Ctrl+C`, the in-flight retry wakes from its backoff, the
stop condition fires, `connect_async()` raises, the lifespan startup
fails fast, Uvicorn proceeds to shut down, and the process exits.

### Changes 🏗️

- `backend/util/retry.py`: add
`request_shutdown()`/`is_shutting_down()`, `_StopOnShutdown` tenacity
stop condition, and `_interruptible_sleep`/`_interruptible_async_sleep`;
wire them into `conn_retry` and `create_retry_decorator` (the latter now
picks the sync/async interruptible sleep at decoration time).
Final-failure log distinguishes "aborted: shutting down" from "failed
after retries".
- `backend/util/process.py`: `AppProcess._self_terminate` calls
`request_shutdown()`.
- `backend/util/service.py`: `AppService._self_terminate` calls
`request_shutdown()` so the eager Redis connect in `lifespan` stops
blocking Uvicorn startup.
- `backend/util/retry_test.py`: 5 new tests (sync/async `conn_retry`
abort on shutdown, `create_retry_decorator` abort, both interruptible
sleeps return early) + an autouse fixture that resets the shutdown flag
between tests.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `poetry run pytest backend/util/retry_test.py` — 19 passed (14
existing + 5 new shutdown tests)
- [x] `poetry run ruff check` + `poetry run pyright` clean on all
changed files; pre-commit hooks (ruff/isort/black/pyright) pass
  - [x] `backend/util/service_test.py::test_graceful_shutdown`

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…tas#13384)

### Why / What / How

**Why:** On startup, the app outputs multiple deprecation warnings from
Pydantic v2 and field name shadowing that clutter the console and
indicate outdated code patterns. The search RPC layer also logged
return-type validation warnings.

**What:** This PR fixes deprecation/validation warnings in our own
codebase and eliminates dependency-related warnings where possible by
upgrading dependencies.

**How:**
- Converted class-based Pydantic `Config` to `SettingsConfigDict`
(Pydantic v2 style)
- Renamed shadowed field names to avoid conflicts with `BaseModel`
attributes, with a backward-compatibility validator so existing
serialized data still deserializes
- Removed an unused field from a `TypedDict` return contract that was
triggering return-type validation warnings
- Upgraded dependencies that were causing transitive deprecation
warnings

### Changes 🏗️

- `backend/copilot/config.py`: Converted `ChatConfig.Config` class to
`model_config = SettingsConfigDict(...)`; reworded the dialect docstring
so the `ollama` default key no longer trips the secret scanner
- `backend/blocks/exa/helpers.py`: Renamed `SummarySettings.schema` to
`output_schema` to avoid shadowing `BaseModel.schema`, with a
`model_validator(mode="before")` that accepts the legacy `schema` key
for backward compatibility. Also fixed zero-valued `extras` int counts
(`links`/`image_links`) being sent to the Exa API instead of omitted.
- `backend/blocks/exa/contents.py`: Updated all references to the
renamed field and applied the same zero-extras omission fix in
`ExaContentsBlock.run()`
- `backend/api/features/search/hybrid_search.py`: Removed the unused
`lexical_raw` field from `HybridSearchRow`. It is computed in SQL only
to derive `lexical_score` and is never projected into result rows, so
its required presence on the `TypedDict` made every cross-service search
RPC log a return-type validation warning.
- Tests: added coverage for the Exa `output_schema` → `schema` SDK
mapping (both `process_contents_settings` and `ExaContentsBlock.run`
paths), zero-extras omission, the legacy `schema` alias, and a
`HybridSearchRow` contract test (positive + negative)
- `pyproject.toml`: Upgraded `supabase` from exact pin `2.28.0` to caret
`^2.31.0`, aligning with the caret convention used by sibling
dependencies
- `poetry.lock`: Updated to reflect dependency changes

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified all imports work without warnings from our own code
- [x] Ran existing tests for affected modules (16 tests in
cost_tracking_test.py, 15 tests in helpers_cost_test.py)
  - [x] Confirmed no type checking errors
  - [x] Verified app startup is clean of warnings from our codebase

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Mistral Vibe <vibe@mistral.ai>
…rk PRs (Significant-Gravitas#13406)

## Why

Six pre-existing copilot tests fail on **every fork PR** while staying
green on `dev`:

-
`backend/copilot/baseline/service_unit_test.py::TestBaselineReasoningStreaming::test_reasoning_param_absent_on_non_anthropic_routes`
- `…::test_reasoning_param_suppressed_when_thinking_tokens_zero`
-
`backend/copilot/graphiti/communities_test.py::TestRebuildCommunitiesForUser::test_success_path_calls_detach_delete_then_build`
- `…::test_failure_path_returns_error_in_result`
- `…::TestRebuildForUserActivityGate::test_force_bypasses_gate`
- `…::TestRebuildPathSelection::test_uses_flex_client_by_default`

**Root cause:** these tests implicitly assume the ambient `ChatConfig`
transport is `openrouter`. `openrouter_active` is only true when
`api_key` is non-empty, and `api_key` falls back to `OPENAI_API_KEY`.
GitHub Actions **withholds repository secrets from `pull_request` runs
triggered by forks**, so `OPENAI_API_KEY` is empty there and the
transport silently drops to `direct_anthropic`:

- graphiti: `use_flex = flex_requested and
chat_cfg.transport.supports_flex_tier`; only `openrouter` has
`supports_flex_tier=True`, so the sync path runs the *real* client and
the tests that only patch `make_flex_graphiti_client` fall through
(wrong shape / no error / `execution_path == "sync"`).
- baseline: without openrouter, `extra_body` isn't built → `KeyError:
'extra_body'`.

`dev` is green because its CI runs as `push` events, where secrets are
present. This surfaced on JSON-blocks PR Significant-Gravitas#13170 purely because it was a
fork PR — its code changes were unrelated.

> **Note:** this PR is deliberately opened **from a fork**
(`Pwuts/AutoGPT`) so CI runs without secrets — verifying the fix
actually holds under the failing condition. It is the fork-CI
counterpart of the same change in Significant-Gravitas#13386.

## What

Make the affected tests hermetic by pinning the OpenRouter transport, so
they pass regardless of credential availability:

- **`service_unit_test.py`** — pin `use_openrouter` / `api_key` /
`base_url` in the two reasoning tests that relied on `extra_body` being
built (matching the existing Kimi-test pattern).
- **`communities_test.py`** — add an autouse fixture pinning the
transport so the flex-tier path is taken deterministically. The
flag-driven sync test (`test_uses_sync_client_when_flex_disabled`) is
unaffected — it forces sync via `community_rebuild_use_flex_tier`, not
the transport.

## How

Verified locally in both credential states (`-p no:cov`, 138 tests
each):

```bash
# Fork-CI simulation (secrets cleared) — previously 6 failures, now:
env CHAT_API_KEY= CHAT_BASE_URL= OPENAI_API_KEY= OPEN_ROUTER_API_KEY= OPENAI_BASE_URL= \
  poetry run pytest backend/copilot/baseline/service_unit_test.py \
  backend/copilot/graphiti/communities_test.py
# -> 138 passed

# Normal env (creds present) — no regression:
poetry run pytest backend/copilot/baseline/service_unit_test.py \
  backend/copilot/graphiti/communities_test.py
# -> 138 passed
```

The real proof is this PR's own CI (fork → no secrets) going green. No
production code changes — test-only.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ignificant-Gravitas#13186)

## Summary

Fixes the bug tracked in
[SECRT-2373](https://linear.app/autogpt/issue/SECRT-2373).

The title-generation LLM call in `_generate_session_title` was passing
the user's raw first message verbatim as the `user` turn. With no
framing around it, the model occasionally interpreted imperative user
messages (e.g. "Write me a Python script...") as direct instructions and
responded with task output rather than a short title.

## Changes

**`autogpt_platform/backend/backend/copilot/service.py`**

- Updated system prompt to explicitly tell the model it will be shown a
user message and must not follow any instructions in it.
- Wrapped the user's message in an `<conversation>` XML tag in the user
turn, clearly separating it from the model's own instructions.

---------

Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com>
… model (Significant-Gravitas#13313)

## Summary

Removes a duplicate `unreal_speech_api_key` field declaration in the
`Secrets` model (`autogpt_platform/backend/backend/util/settings.py`).

The field was declared twice (lines 751 and 754) with identical
definitions. In Pydantic, the second declaration silently shadows the
first, which is confusing and could mask issues if the descriptions or
defaults ever diverge.

## Changes
- Removed the duplicate `unreal_speech_api_key` field on line 754 of
`settings.py`

## Testing
- No functional change — Pydantic uses the last declaration which had
the same default and description.
- Existing tests should pass unchanged.

## Checklist
- [x] My code follows the code style of this project
- [x] I have performed a self-review of my code

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
…pt-in) (Significant-Gravitas#13133)

## Why

The `PlatformLinkingManager` AppService was added in PR Significant-Gravitas#12615 and
deployed to dev/prod via infra PR Significant-Gravitas#310 — but it was never added to the
local `docker-compose` stack. Anyone trying to test the bot flow
(Discord or Slack) locally hits the same wall I just did:

```
httpx.ConnectError: All connection attempts failed
... bot_backend.create_link_token → self._client.create_server_link_token
```

The rest_server tries to RPC to a service that doesn't exist in the
local Compose project. This PR backfills the missing service so local
bot testing works.

## What

- **`docker-compose.platform.yml`** — new `platform_linking_manager`
service definition (same shape as `notification_server`: same
Dockerfile, `["platform-linking-manager"]` poetry script, port 8009,
shared backend env, depends on `db` + `redis-0` + `migrate` +
`database_manager`).
- **`docker-compose.platform.yml`** (env block) — adds
`PLATFORMLINKINGMANAGER_HOST: platform_linking_manager` to the
`x-backend-env` YAML anchor that all backend services inherit. Mirrors
the existing entries for `DATABASEMANAGER_HOST`,
`NOTIFICATIONMANAGER_HOST`, etc.
- **`docker-compose.yml`** — references the new service via the existing
`extends` pattern.
- **`bot/README.md`** — adds a "Docker (local dev)" subsection under
"Running" explaining the opt-in profile and why the service is needed.

## How

**Opt-in via Docker Compose profiles.** The service has `profiles:
["bot"]` so it doesn't start with a default `docker compose up -d`. Two
ways to opt in:

```bash
# Start the linking manager alongside your existing stack
docker compose up -d platform_linking_manager

# Or include via the profile (also brings up anything else profile-tagged later)
docker compose --profile bot up -d
```

This matches the spirit of how the service is managed in dev/prod (its
own Helm chart, separately deployable) while keeping local Compose
reproducible and the default boot fast.

**No code changes.** The service definition uses the existing backend
image and the `platform-linking-manager` poetry script that's been in
`pyproject.toml` since PR Significant-Gravitas#12615. Pure Compose + env wiring.

## Tests

- `docker compose up -d platform_linking_manager` brings the service up
healthy on port 8009 ✓
- `docker compose up -d` (no profile / no explicit name) does **not**
start the service — default behaviour preserved ✓
- `docker exec autogpt_platform-rest_server-1 sh -c 'echo
$PLATFORMLINKINGMANAGER_HOST'` returns `platform_linking_manager` after
recreate, RPC works end-to-end ✓
- Slack `/setup` slash command end-to-end through `chat.postMessage`
button → confirm page → workspace link created in DB ✓ (verified via PR
Significant-Gravitas#13132 testing)

## Out of scope (intentional)

- This PR doesn't add `platform_linking_manager` to the prod
docker-compose files used elsewhere — it's already deployed as its own
Helm chart in GKE. This is purely a local-dev fix.
- This PR doesn't touch the bot service definitions themselves — Discord
runs via its own Helm chart, Slack rides the existing `autogpt-server`
pod.

## Related

- PR Significant-Gravitas#12615 — added the `PlatformLinkingManager` AppService
- PR Significant-Gravitas#310 (infra) — deployed it to dev/prod
- PR Significant-Gravitas#13130 — webhook adapter base for the bot
- PR Significant-Gravitas#13132 — Slack adapter (the thing that surfaced this gap during
local testing)

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
…edule (Significant-Gravitas#13419)

### Why / What / How

**Why:** Agents in the Library — notably Marketplace-installed ones —
showed a **"Scheduled"** status badge even when the current user had
never scheduled them. The same flaw inflated the fleet-summary
"scheduled" count. Reported as OPEN-3184.

**What:** The frontend scheduled-predicate `isAgentScheduled()` now keys
solely off the user-scoped `is_scheduled` flag and no longer treats
`recommended_schedule_cron` as an active schedule.

**How:** `recommended_schedule_cron` comes from
`AgentGraph.recommendedScheduleCron` — a creator-defined *suggestion*
attached to the graph and shared by every user who installs the agent.
It is **not** evidence that the current user scheduled anything. The
backend already computes `is_scheduled` correctly (per-user, via
`_fetch_schedule_info(user_id=...)`); only the frontend predicate was
wrong. Since the badge, fleet summary, sitrep, status-map, and list
filter all funnel through this one predicate, the single-line change
fixes every call site.

```ts
// before
return !!agent.is_scheduled || !!agent.recommended_schedule_cron;
// after
return !!agent.is_scheduled;
```

### Changes 🏗️

- `library/hooks/executionHelpers.ts` — `isAgentScheduled()` now returns
`!!agent.is_scheduled` only; dropped `recommended_schedule_cron` from
the predicate and its signature.
- `library/hooks/executionHelpers.test.ts` — updated unit tests; added a
case asserting `recommended_schedule_cron` alone is ignored.
- `library/__tests__/filter.test.tsx` — added page-level regression
tests: a recommendation-only agent is excluded from the "scheduled"
filter and included in "idle".

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `pnpm vitest run executionHelpers.test.ts filter.test.tsx` — 9/9
pass (TDD: confirmed the new assertion failed against the old impl
first)
- [x] Full `src/app/(platform)/library` suite — 115/116 pass; the single
failure is a pre-existing, timezone-dependent test in
`followups/.../helpers.test.ts`, verified failing on the clean tree too
and untouched by this change
- [x] `pnpm format`, `pnpm lint` (only pre-existing warnings in
unrelated files), `pnpm types` — all clean

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Significant-Gravitas#13421)

### Why / What / How

**Why:** The AutoPilot skills library could only list, view, and delete
skills. Skills could only be *created* by the copilot itself via the
`store_skill` tool — users had no way to bring their own skills in or
take one out to share or back up.

**What:** Adds an **Upload skill** action (page-level) and a
**Download** action (per skill) to the library skills page, backed by a
new `POST /api/skills` upload endpoint.

**How:** The skill-write logic (validation → per-user `AsyncClusterLock`
cap → workspace write) is extracted from `StoreSkillTool` into a shared
`store_user_skill()` helper so the new REST endpoint and the copilot
tool use one code path. Upload sends the raw `SKILL.md` text, which the
backend parses with the existing `parse_skill_markdown` (malformed →
400, cap reached → 409, existing slug → upsert). Download fetches the
skill detail and reconstructs a re-uploadable `SKILL.md` client-side
(JSON string/array literals are valid YAML, so it round-trips without a
YAML lib).

### Changes 🏗️

- **Backend:** new `POST /api/skills` (`uploadCopilotSkill`) endpoint +
`UploadCopilotSkillRequest`; extracted shared `store_user_skill()`
helper and `SkillLimitError`; refactored `StoreSkillTool` to reuse it.
Added endpoint + helper tests.
- **Frontend:** `UploadSkillButton` (hidden file picker +
`useUploadCopilotSkill`) in the page header; `Download` button on each
skill row; markdown-render/download helpers. Added integration tests for
upload (success + limit) and download.
- **API client:** regenerated `openapi.json` (+49 lines, scoped to the
new endpoint/schema).

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] Backend: `poetry run pytest` for skills (88 passing, incl. 4 new
upload tests); ruff + pyright clean
- [x] Frontend: `pnpm test:unit` for the skills page (11 passing, incl.
upload + download tests)
- [x] Manual: upload a valid `SKILL.md` and confirm it appears in the
list
- [x] Manual: download a skill and re-upload the file to confirm it
round-trips
- [x] Manual: upload a malformed file (400) and at the 50-skill cap
(409) show error toasts

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
@snyk-io

snyk-io Bot commented Jun 26, 2026

Copy link
Copy Markdown
Author

Merge Risk: Medium

This upgrade involves a medium-risk minor version update for Vitest and a low-risk patch update for concurrently.

vitest (4.0.174.1.8)

@vitest/coverage-v8 (4.0.174.1.8)

Risk: Medium

This is a minor version upgrade for Vitest and its V8 coverage provider. The 4.1 release introduces several new features and one potential breaking change that warrants a medium risk assessment.

Potential Breaking Change:

  • test.extend API Change: The test.extend function has a new builder pattern that supports type inference. Previously, it passed an undocumented Suite object as the first argument, which is no longer the case. While the Vitest team noted the usage was likely small, any tests relying on this undocumented behavior will break.

New Features in 4.1:

  • Support for Vite 8.
  • Introduction of Test Tags for labeling and filtering tests.
  • New aroundAll and aroundEach hooks.
  • Improvements to browser mode trace views and coverage ignore hints.

Recommendation:
Review any usage of test.extend to ensure it aligns with the new, documented builder pattern. Tests not using this specific, advanced feature are unlikely to be affected.

concurrently (9.2.19.2.3)

Risk: Low

This is a patch release that includes minor bug fixes. The changes are not expected to cause breaking changes and improve compatibility on Windows and with numeric command names.

Source: Concurrently GitHub Releases

Notice 🤖: This content was augmented using artificial intelligence. AI-generated content may contain errors and should be reviewed for accuracy before use.

@snyk-io

snyk-io Bot commented Jun 26, 2026

Copy link
Copy Markdown
Author

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@snyk-io

snyk-io Bot commented Jun 26, 2026

Copy link
Copy Markdown
Author

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions

Copy link
Copy Markdown

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions Bot changed the base branch from master to dev June 26, 2026 03:51
@github-actions

Copy link
Copy Markdown

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.