[Snyk] Fix for 3 vulnerabilities#22
Conversation
Removed COLLABORATOR checks for comments and reviews in the workflow.
…ficant-Gravitas#12980) ### Why / What / How **Why.** Three bugs, one PR — all rooted in the onboarding wizard's lack of persistence: 1. **422 on `/api/onboarding/profile` mid-wizard.** The wizard store was in-memory and `useOnboardingPage`'s init effect called `reset()` on every mount. A user who refreshed mid-flow (or navigated to `?step=4` directly) hit the SubscriptionStep with empty `name` / `role`; clicking *Get Pro* / *Upgrade to Max* advanced to Preparing, which POSTed empty `user_name` / `user_role` and got rejected with `string_too_short`. 2. **LOCAL crash on plan select.** The Stripe checkout path is unconditional. On `BEHAVE_AS=LOCAL` the backend has no Stripe wiring, so plan selection blew up trying to start a Checkout session. 3. **Welcome detour after completion.** `handlePreparingComplete` and `checkCompletion` called `reset()`, which set `currentStep=1` after `router.replace("/copilot")` had already been queued. The URL-sync effect then fired a second `router.replace("/onboarding?step=1")` that won, stranding the user on Welcome until they refreshed. **What.** - Wrap the wizard store with `zustand/middleware`'s `persist` (sessionStorage, matches the existing `STEP_STORAGE_KEY` ceiling). `partialize` excludes `currentStep` — the URL stays the source of truth for which step the user is on. - Drop the unconditional `reset()` from the init effect so persisted form data survives refreshes. - Replace `reset()` with `useOnboardingWizardStore.persist.clearStorage()` on completion paths to clean storage without mutating in-memory state (no spurious re-render, no URL-sync race). - Short-circuit `handlePlanSelect` on `environment.isLocal()` to skip the profile POST + Stripe Checkout and advance straight to Preparing; cloud path is unchanged. **How.** - `store.ts` — `persist(...)` with SSR-safe `createJSONStorage` (no-op stub during SSR/vitest), `partialize` exposing only the form fields. `version: 1` so future schema changes have a migration anchor. - `useOnboardingPage.ts` — remove `reset()` from init effect; swap completion-path `reset()` calls for `persist.clearStorage()`. Comment block explains the URL-sync race so the next reader doesn't reintroduce it. - `useSubscriptionStep.ts` — early-return on `environment.isLocal()` after the TEAM/inflight guards, before any Stripe code. - Tests: - `page.test.tsx` — regression test that pre-set form data isn't wiped on mount (the bug the persist change fixes). - `SubscriptionStep.test.tsx` — `vi.spyOn(environment, "isLocal").mockReturnValue(false)` in `beforeEach` so existing Stripe-path tests stay in cloud mode; new test flips it `true` and asserts no Stripe / profile request fires and `currentStep` advances to 5. ### Changes 🏗️ - `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts` — `zustand/middleware`'s `persist` with sessionStorage + `partialize` excluding `currentStep`. - `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/useOnboardingPage.ts` — drop `reset()` from init; switch completion `reset()` to `persist.clearStorage()`. - `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/SubscriptionStep/useSubscriptionStep.ts` — early-return on `environment.isLocal()` to skip Stripe. - `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/__tests__/page.test.tsx` — new "preserves form data on mount" regression test. - `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/SubscriptionStep.test.tsx` — force cloud mode in `beforeEach`; add LOCAL bypass test. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] On `BEHAVE_AS=LOCAL`: complete onboarding through Welcome → Role → PainPoints → Subscription, click *Get Pro* → no Stripe Checkout, lands on Preparing, then `/copilot` (no Welcome bounce, no refresh needed). Repeat with *Upgrade to Max*. - [ ] On `BEHAVE_AS=LOCAL`: click *Contact sales* (Team) → Tally form opens in new tab; wizard stays on step 4. - [ ] On `BEHAVE_AS=CLOUD`: existing Stripe Checkout flow unchanged — *Get Pro* / *Upgrade to Max* redirect to Stripe with the right `success_url` / `cancel_url`; success returns to `?step=5&subscription=success` and lands on `/copilot`; cancel returns to `?step=4&subscription=cancelled`. - [ ] Mid-wizard refresh on step 4 with `payments` flag enabled: name / role / pain points stay filled in (no 422, no re-fill required). - [ ] Direct nav to `/onboarding?step=4` without prior data: clamps back to the highest reached step (or step 1 on a fresh session) — no fast-forward into Subscription. - [ ] Returning user with `VISIT_COPILOT` already complete: redirects to `/copilot` cleanly with no Welcome flash. #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ink cleanup (Significant-Gravitas#12979) ### Why / What / How **Why** — Follow-up to the settings v2 cleanup (Significant-Gravitas#12976). Several rough edges remained after the route migration: - The Team-tier "Talk to sales" button was firing into a 404 (`agpt.co/contact-sales` no longer exists). - A bunch of in-app links were still pointing to the deprecated `/profile/*` settings routes, which now route through the legacy layout instead of the new `/settings/*` pages. - The MCP integration entry in the "Connect a service" dialog was a dead end — the provider showed up in the list but the detail view said "No connection method available". - `/settings/creator-dashboard` shipped a bulk-select feature (checkbox column, select-all bar, bulk-delete dialog) that nobody is using; it added clutter to the row UI. - `/settings` (no subpath) flashed an empty page during the client-side redirect to `/settings/profile`. - The submissions API was issuing a redundant `COUNT(*)` per request because `_get_submission_stats` already returns `total`. **What** — One PR, several small fixes packaged together because they all touch the same surface (creator dashboard / integrations / settings linking). **How** — Most changes are one-liners or small file deletions. Two are slightly bigger: - Backend `_get_submission_stats` is reused for pagination total, so `get_store_submissions` drops the duplicate `.count()` call. Stats themselves use a single Postgres query with `FILTER` clauses for total/approved/pending/total_runs/average_rating. - MCP setup is now rendered inline in the Connect Service dialog via a new `McpConnectPanel` component when `provider.id === "mcp"`. Server URL → tries OAuth via popup → on 400 falls back to bearer-token input → on success invalidates `getGetV1ListCredentialsQueryKey()` so the credential shows up in the integrations list. ### Changes 🏗️ **Backend (`backend/api/features/store`)** - Add `SubmissionStats` payload to `StoreSubmissionsResponse` (total, approved, pending, total_runs, average_rating) computed in one Postgres `FILTER` round-trip. - Reuse `stats.total` for pagination instead of running a separate `COUNT(*)` query. **Frontend creator dashboard (`/settings/creator-dashboard`)** - Remove the bulk-select feature: checkbox column + selection bar + bulk-delete dialog + `useSubmissionSelection` hook + `SubmissionSelectionBar` + `MobileSelectionBar` are all deleted. - Update tests to drop the now-irrelevant selection cases. **Frontend integrations (`/settings/integrations`)** - New `McpConnectPanel` rendered inside the Connect Service dialog DetailView when the user picks the Mcp provider. OAuth-first with bearer-token fallback when the server returns 400. - Light `pl-1` polish on the DetailView header to align the back button + avatar with the body content. **Frontend settings link migration** - Replace stale `/profile/*` deep links across the platform with the new `/settings/*` routes: - `/profile/dashboard` → `/settings/creator-dashboard` (publish modal "View progress") - `/profile/credits` → `/settings/billing` (usage limits, briefing panel, wallet refill) - `/profile/integrations` → `/settings/integrations` (integrations setup wizard) - `/profile/settings` → `/settings/account` (timezone notices in scheduling dialogs) - Legacy `/profile/(user)/layout.tsx` self-nav left alone — that whole route group is being deprecated in a follow-up PR. **Frontend billing** - `TEAM_UPGRADE_URL` switched from `https://agpt.co/contact-sales` (404) to the Tally intake form `https://tally.so/r/2Eb9zj`. Already opens with `target="_blank"` + `noopener,noreferrer`. **Frontend settings index** - `/settings` renders a content skeleton during the client-side redirect to `/settings/profile`, so the page no longer flashes empty. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Open `/settings/billing` on a MAX plan → click "Talk to sales — Team" → Tally form opens in a new tab. - [x] On `/settings/integrations` → "Connect Service" → pick "Mcp" → enter a server URL → confirm OAuth popup launches; for an OAuth-less server, confirm bearer-token fallback appears and saves the credential. - [x] Confirm the new MCP credential shows up in the integrations list after connect. - [x] On `/settings/creator-dashboard` → confirm no checkbox column, no select-all bar, no bulk-delete dialog. Row-level "Edit"/"Delete" via the dropdown still works. - [x] Submit an agent for review from `/build` → "View progress" → lands on `/settings/creator-dashboard` (was 404 before). - [x] Click any wallet "Add credits" / "Manage billing" / usage-limit link → lands on `/settings/billing` (not `/profile/credits`). - [x] Open the schedule dialog with no timezone set → "Set your timezone" link goes to `/settings/account`. - [x] Hit `/settings` directly → see the content skeleton, then redirect to `/settings/profile`. - [x] Verify `/v2/store/submissions` response contains `stats: { total, approved, pending, total_runs, average_rating }` and pagination `total_items` matches `stats.total`.
…t-Gravitas#12981) ### Why / What / How **Why:** New users signing up on the platform currently receive a `PRO` subscription tier by default — a holdover from the beta period. This gives free, unrestricted access to all platform features without ever hitting a paywall. SECRT-2295 requires closing this gap so new signups land on `NO_TIER` and are prompted to subscribe. **What:** Changes the default subscription tier for newly created users from `PRO` to `NO_TIER` across the database schema, migration, and Python model fallbacks. Existing users are **not** affected — their tier remains whatever it already is. **How:** - Updates the Prisma schema `@default(PRO)` → `@default(NO_TIER)` on the `User.subscriptionTier` column - Adds a DDL-only migration (`ALTER COLUMN SET DEFAULT 'NO_TIER'`) that touches zero rows and acquires only a microsecond metadata lock - Updates two Python-side fallback defaults in `model.py` from `BASIC` → `NO_TIER` to stay consistent with the DB - The existing beta fallback in `rate_limit.py` already handles `NO_TIER` users gracefully (maps to `BASIC` multiplier when `ENABLE_PLATFORM_PAYMENT` flag is off) ### Changes 🏗️ - **`schema.prisma`**: Removed stale 7-line beta comment block, changed `@default(PRO)` → `@default(NO_TIER)` on `subscriptionTier` - **`migrations/20260501172500_default_new_users_to_no_tier/migration.sql`**: New DDL-only migration — `ALTER TABLE "User" ALTER COLUMN "subscriptionTier" SET DEFAULT 'NO_TIER'` - **`backend/data/model.py`**: Updated two fallback defaults from `SubscriptionTier.BASIC` → `SubscriptionTier.NO_TIER` (field default on line 75, `from_db()` fallback on line 151) ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] Signed up a new user (`horses@ntindle.com`) on local dev stack and confirmed `subscriptionTier = NO_TIER` in the database - [x] Verified existing users retain their current tier (no row-level changes in migration) - [x] Confirmed rate_limit.py beta fallback maps NO_TIER → BASIC multiplier when payment flag is off - [ ] CI pipeline validation (backend tests, lint, type checks) #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) No configuration changes required — this is a schema default + Python fallback change only. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes the default `User.subscriptionTier` at both DB and application layers, which can affect entitlement/rate-limit behavior for all newly created accounts. Low implementation complexity, but it touches subscription gating assumptions and signup flows. > > **Overview** > Newly created users now default to `SubscriptionTier.NO_TIER` instead of a paid tier. > > This updates the Prisma schema default for `User.subscriptionTier`, adds a DDL-only migration to set the Postgres column default to `'NO_TIER'` without modifying existing rows, and aligns Python-side fallbacks in `backend/data/model.py` (field default + `from_db()` fallback) to `NO_TIER`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit d3764e8. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rd (Significant-Gravitas#12982) ## Why The "Compare plans" link on the billing settings page (next to "Your plan" in the SubscriptionTab > YourPlanCard) was redundant — users already see their current plan and can upgrade/downgrade directly from the same card. The external link to `agpt.co/pricing` added noise without adding value. ## What - Removed the "Compare plans" anchor link (and accompanying external-link icon) from the `Your plan` header inside `YourPlanCard`. - Removed the now-unused `PRICING_PAGE_URL` constant. - Kept the `ArrowSquareOutIcon` import — still used by the "Talk to sales" upgrade button. ## How Single-file change in `autogpt_platform/frontend/src/app/(platform)/settings/billing/components/SubscriptionTab/YourPlanCard/YourPlanCard.tsx`: - Dropped the `<a>` element that wrapped "Compare plans" + the external icon. - The header `<div>` now contains only the "Your plan" label. ## Test plan - [ ] Navigate to `/settings/billing` and confirm the "Compare plans" link no longer renders next to "Your plan". - [ ] Verify the rest of `YourPlanCard` (plan label, badge, manage/upgrade/downgrade buttons) still renders correctly for free, paid, pending-cancel, and pending-downgrade states. - [ ] `pnpm types` / `pnpm lint` clean.
…ore, auto-refill defaults (Significant-Gravitas#12984) ### Why / What / How **Why:** A round of QA on the new Settings v2 surfaced several small but visible polish issues — input focus rings being clipped on the right edge of every dialog, the link rows in the Profile page flashing on save, the Billing page bouncing the user back to the Subscription tab after a Stripe topup, the Auto-refill dialog defaulting to an invalid empty state that contradicted its own "$5 minimum" copy, and the Integrations header recommending Figma even though it isn't a connectable provider. **What:** Frontend-only fix pass that addresses each of those issues at the component level (and the dialog issue at the shared `DialogWrap` level so every dialog benefits). **How:** - **Profile save flash** — `useProfilePage` previously rebuilt the form state from scratch every time `useGetV2GetUserProfile` data changed. After save, the refetch generated brand-new `LinkRow` IDs via `makeLinkRow()`, which `<AnimatePresence>` keyed off of, causing every row to exit + re-enter (the "flash"). Now the post-fetch sync skips when the incoming profile is content-equivalent to current form state, preserving link identity and silencing the animation. - **Dialog focus-ring clipping** — Inputs use `ring-1` (a 1 px box-shadow rendered outside the border). The dialog's scroll container had `overflow-x-hidden` flush against the input edge, chopping off the right side of every focus oval. Added `px-2` to both the dialog title row and the inner scroll container in `DialogWrap`, giving 8 px of horizontal breathing room across all dialogs. - **ConnectServiceDialog inner clip** — That dialog has its own `overflow-hidden` wrapper (used for the height/slide animation between list and detail views), which clipped focus rings before the new `DialogWrap` padding could help. Added matching `px-2` there. - **Billing tab restore after Stripe** — `<TabsLine>` was uncontrolled with `defaultValue="subscription"`, so a Stripe topup redirect (`?topup=success|cancel`) always landed users on the Subscription tab even though the topup flow originates from Automation Credits. Made the tab controlled with state initialized from URL params: explicit `?tab=` wins, else `?topup` implies Automation Credits, else fall back to Subscription. Adds deep-link support as a side benefit. - **Auto-refill defaults + banner copy** — The dialog opened with empty fields (placeholder "min $5") but the form was invalid by default and the existing yellow banner ("Auto-refill triggers at most once per agent execution.") didn't tell users what to do about it. Pre-fill threshold and refill amount with `"5"` so the form is valid by default and matches the empty-state copy; rewrote the banner to: "A single agent run can only trigger one auto-refill. Set a refill amount that covers your typical usage so agents don't pause mid-run." - **Integrations copy** — Replaced the Figma example in `IntegrationsHeader` with Notion (verified Notion is a real provider in `backend/integrations/providers.py`) so the copy doesn't reference an integration that isn't actually offered in Connect Service. ### Changes 🏗️ - `frontend/src/app/(platform)/settings/profile/useProfilePage.ts` — guard form re-sync on equivalent refetch. - `frontend/src/components/molecules/Dialog/components/DialogWrap.tsx` — `px-2` on title row and scroll container. - `frontend/src/app/(platform)/settings/integrations/components/ConnectServiceDialog/ConnectServiceDialog.tsx` — `px-2` on inner animation wrapper. - `frontend/src/app/(platform)/settings/billing/page.tsx` — controlled tab state with URL-aware initial value. - `frontend/src/app/(platform)/settings/billing/components/AutomationCreditsTab/AutoRefillCard/useAutoRefillCard.ts` — default to `"5"` when no saved config. - `frontend/src/app/(platform)/settings/billing/components/AutomationCreditsTab/AutoRefillCard/AutoRefillDialog.tsx` — clearer warning banner, top-aligned icon for multi-line copy. - `frontend/src/app/(platform)/settings/integrations/components/IntegrationsHeader/IntegrationsHeader.tsx` — "Figma for designs" → "Notion for documents". ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Open `/settings/profile`, edit a link, click Save → no flash on link rows - [x] Open Update email dialog, focus the email input → purple focus oval is fully closed on the right - [x] Open Connect a service dialog, focus the search input → purple focus oval is fully closed - [x] Open Add credits dialog and Auto-refill dialog → focus rings render fully - [x] Add credits via Stripe, get redirected back → land on the Automation Credits tab (not Subscription) - [x] Open Auto-refill dialog with no prior config → fields show $5 / $5, Save button is enabled - [x] Auto-refill dialog yellow banner reads as a clear, actionable instruction - [x] Visit `/settings/integrations` → header copy says "Notion for documents", not "Figma for designs"
…ificant-Gravitas#12985) ### Why / What / How **Why:** [SECRT-2314](https://linear.app/autogpt/issue/SECRT-2314). When a user hits their daily AutoPilot limit, the popup currently shows a single "Go to billing" CTA regardless of their plan. For users on the highest self-serve tier (MAX) — and the contact-sales tiers above it — there's no higher plan they can upgrade to from the billing page, so routing them there is a dead end. Reported by John in #breakage on 2026-04-27 as a follow-up to SECRT-2294. **What:** Branch the daily-limit popup CTA on the user's current subscription tier: - `NO_TIER` / `BASIC` / `PRO` (or unknown) → **"Upgrade plan"** → routes to `/settings/billing` (drives conversion). - `MAX` / `BUSINESS` / `ENTERPRISE` → **"Contact us"** → opens `mailto:contact@agpt.co` in a new tab (no higher self-serve plan). **How:** - `RateLimitResetDialog` gains an optional `tier?: SubscriptionTier | null` prop and decides label + click handler from a small `TOP_TIERS` set. The body copy adapts in the same branch ("upgrade your plan" vs "contact us if you need more capacity"). - `RateLimitGate` fetches `useGetSubscriptionStatus` (same hook `useYourPlanCard` uses) gated on `rateLimitMessage` being present, and forwards `tier`. The query is disabled when no rate-limit dialog is needed, so this adds zero extra requests on the happy path. - `mailto:contact@agpt.co` is the same address used by `WaitlistErrorContent` — kept the convention rather than introducing a new constant. ### Changes 🏗️ - `RateLimitResetDialog.tsx` — accept `tier`, branch CTA label/handler/body trailer between Upgrade plan and Contact us. - `RateLimitGate.tsx` — pull subscription status (gated on `rateLimitMessage`), forward `tier` to the dialog. - Tests: - `RateLimitResetDialog.test.tsx` — parameterised over `NO_TIER`/`BASIC`/`PRO` (Upgrade plan + router push) and `MAX`/`BUSINESS`/`ENTERPRISE` (Contact us + `window.open`). - `RateLimitGate.test.tsx` — added `useGetSubscriptionStatus` mock; verified tier forwarding (MAX → "MAX", null → null) and that the subscription query is also disabled when no rate-limit message is present. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] `pnpm format && pnpm lint && pnpm types` clean - [x] `pnpm test:unit` against `RateLimitResetDialog/` — 26/26 passing - [ ] Manual: hit daily limit on dev as a `PRO` user → popup shows **"Upgrade plan"**, click routes to `/settings/billing` - [ ] Manual: hit daily limit on dev as a `MAX` user → popup shows **"Contact us"**, click opens `mailto:contact@agpt.co` in a new tab - [ ] Manual: hit daily limit on dev as a `NO_TIER` user → popup shows **"Upgrade plan"**, routes to `/settings/billing` - [ ] Manual: subscription-status request fails / returns no tier → popup falls back to **"Upgrade plan"** (safe default) cc @lluis-agusti — flagging for review per request. Draft until manual QA on dev is done. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prevent USD-cap bypass (Significant-Gravitas#12990) ## Summary `backend/copilot/rate_limit.py:check_rate_limit` was failing **open** when Redis was unreachable: on `RedisError` / `ConnectionError` / `OSError` it logged a warning and returned silently, letting the request through unrate-limited. During the GKE auto-upgrade rolling stateful pool incident captured in the postmortem [Significant-Gravitas/AutoGPT_cloud_infrastructure#318](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/318), this means a user (or a script-driven user) on the affected slot can blast the LLM loop unmetered for the duration of the brown-out, bypassing the per-user daily/weekly USD caps. ### Dollar-risk surface * Brown-out window: ~5 min (one stateful-set rolling step) — sometimes longer if a node goes Pending. * Worst case per affected user: a bot driving the chat at provider-bound throughput easily hits **$50–$500** of LLM spend in 5 minutes on a frontier model. Multiply by the number of users on the affected slot during the brown-out. * The user has already paid up-front via the credit wallet for **block** spend, but the microdollar cap on the **copilot LLM turn itself** is operator-side infrastructure cost that we cannot recover after the fact. ### Fix `check_rate_limit` now raises a new exception `RateLimitUnavailable` on Redis errors. The chat route handler maps that to HTTP 503 with `Retry-After: 30`. 503 is the right code (transient infra outage, retry shortly) rather than 429 ("you hit your limit"); they're different UX. ### Fail-open paths kept fail-open (deliberately) Only the **enforcement** path is fail-closed. The other Redis call sites in `rate_limit.py` are observability or best-effort and would create noisier failures if they 503'd: | Function | Behaviour | Why kept fail-open | | --- | --- | --- | | `get_usage_status` | returns zeros | Read-only gauge for the usage banner; returning zeros during a brown-out is fine. | | `record_cost_usage` | logs and returns | Losing one cost increment is preferable to 500-ing the whole turn *after* generation. The next turn re-checks the cap. | | `release_reset_lock` | swallows | TTL-bounded — lock will expire on its own. | | `increment_daily_reset_count` | logs and returns | Informational counter, not the authoritative daily reset state. | ### Fail-closed paths preserved These already failed closed and are **unchanged**: | Function | Behaviour | | --- | --- | | `check_rate_limit` | **NEW**: raises `RateLimitUnavailable` | | `acquire_reset_lock` | returns `False` (reset is rejected — cannot serialise without lock) | | `get_daily_reset_count` | returns `None` so callers refuse the billed reset | | `reset_daily_usage` | returns `False` | | `reset_user_usage` | re-raises (admin resets cannot silently no-op) | ### Test coverage * `rate_limit_test.py::TestCheckRateLimit::test_raises_unavailable_when_redis_connection_error` * `rate_limit_test.py::TestCheckRateLimit::test_raises_unavailable_when_redis_redis_error` * `rate_limit_test.py::TestCheckRateLimit::test_raises_unavailable_when_os_error` * `rate_limit_test.py::TestCheckRateLimit::test_skips_redis_and_does_not_raise_unavailable_when_unlimited` — confirms unlimited users (limit=0) don't 503 on a brown-out. * `routes_test.py::test_stream_chat_returns_503_with_retry_after_when_rate_limit_unavailable` — confirms route handler maps `RateLimitUnavailable` to 503 + `Retry-After: 30` header. The previously-passing test `test_allows_when_redis_unavailable` (which asserted fail-open) is replaced with the new fail-closed assertions — that was the bug. ## Test plan - [x] `poetry run ruff format` clean on touched files - [x] `poetry run ruff check` clean on touched files - [x] `poetry run pytest backend/copilot/rate_limit_test.py backend/api/features/chat -q` green - [ ] Manual: trigger Redis brown-out in dev, confirm chat route returns 503 with `Retry-After: 30` instead of allowing the turn through
…gnificant-Gravitas#12994) ### Why / What / How The Discord bot can currently send the unlinked-server setup prompt from unrelated thread messages. That is noisy in servers where the bot is installed but setup has not been completed yet. This PR keeps the intended server behavior: regular channel messages are only handled after an explicit bot mention, and bot-created threads continue working after they are subscribed. Thread messages that do not belong to a bot-created/subscribed thread are ignored silently. The handler now checks the thread subscription gate before checking server link status. Once a thread passes that gate, target resolution can reuse the thread directly. ### Changes - Ignore unsubscribed Discord thread messages before resolving server link status. - Keep the /setup prompt behavior for explicit bot mentions in regular server channels. - Update bot handler tests to cover the thread subscription gate. - No configuration changes. ### Checklist #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] poetry run python -m py_compile backend/copilot/bot/handler.py backend/copilot/bot/handler_test.py - [x] git diff --check - [ ] Deploy to dev and verify unrelated server/thread messages do not receive the setup prompt #### For configuration changes: - [x] .env.default is updated or already compatible with my changes - [x] docker-compose.yml is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes)
…age (Significant-Gravitas#12999) ### Why / What / How **Why:** The existing warning text was written from an implementation perspective. User feedback requested clearer, user-benefit-oriented copy that helps users budget appropriately. **What:** Updated the warning text inside the auto-refill dialog on `/settings/billing` to use plain language that explains the safety mechanism and its budgeting implications. **How:** Single string change in `AutoRefillDialog.tsx`. ### Changes 🏗️ - Updated warning copy in `AutoRefillCard/AutoRefillDialog.tsx` from: > "A single agent run can only trigger one auto-refill. Set a refill amount that covers your typical usage so agents don't pause mid-run." To: > "As a safety mechanism, auto-refill will only trigger once per task. Keep this in mind when budgeting to ensure your balance does not hit zero and your tasks don't pause mid-run." ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Navigate to `/settings/billing`, open the auto-refill dialog, verify new warning text is displayed correctly --------- Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
…ravitas#12995) ### Why / What / How The bot currently creates its own Discord threads after a server-channel mention, but it cannot be explicitly invited into an existing thread. If someone mentions the bot inside a thread that the bot did not create, the handler has no way to distinguish that from ambient thread chatter, and it has no prior thread context to send to AutoPilot. This PR adds explicit thread adoption. The Discord adapter records whether the bot was mentioned and, when mentioned inside a thread, fetches recent user messages from that thread. The handler uses that mention signal to subscribe the existing thread after link validation, then includes the recent thread context in the first AutoPilot message. This is stacked on top of the unlinked-thread spam fix so unrelated/unsubscribed thread messages still stay silent unless the bot is explicitly mentioned. ### Changes - Add `bot_mentioned` and `thread_history` to `MessageContext`. - Fetch recent Discord thread history when the bot is mentioned inside an existing thread. - Subscribe an existing thread when the bot is explicitly mentioned there and the server is linked. - Include recent thread context in the AutoPilot prompt for the adoption message. - Add handler and Discord adapter tests for thread adoption and history formatting. - No configuration changes. ### Checklist #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] poetry run python -m py_compile backend/copilot/bot/adapters/base.py backend/copilot/bot/adapters/discord/adapter.py backend/copilot/bot/handler.py backend/copilot/bot/handler_test.py backend/copilot/bot/adapters/discord/adapter_test.py - [x] git diff --check - [ ] Deploy to dev and verify mentioning the bot inside an existing linked-server thread adopts the thread and replies with prior thread context #### For configuration changes: - [x] .env.default is updated or already compatible with my changes - [x] docker-compose.yml is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes)
…hint for baseline (Significant-Gravitas#13002) ## Why The autopilot SDK already carries a per-query `max_budget_usd` ceiling that the CLI uses to nudge the model when it's close to the cap (see `claude_agent_max_budget_usd: 10.0` in `config.py` — that's the "$10 session budget" you see in the UI). Two gaps in the current setup: 1. **The cap is static.** A user with $1.50 of daily USD headroom left still gets `max_budget_usd=10.0`, so the in-CLI "wrap up" reminder never fires until *after* they've blown the real cap (the post-turn Redis recorder catches it then, which is too late for the model to pace itself). 2. **Baseline has no equivalent.** The OpenRouter-direct path streams completions and accumulates `cost_usd` post-turn, but the model never sees its own running cost or remaining USD headroom mid-stream. So baseline turns burn through to the limit blindly. Tracked via the autopilot dev testing thread: https://discord.com/channels/1126875755960336515/1499923303609925793/ ## What - **SDK**: per-query `max_budget_usd` now resolves dynamically to `min(static_cap, remaining_daily_or_weekly_usd)`, floored at `$0.50` so a near-cap user still dispatches. - **Baseline**: parity via a small `<budget_context>` block injected through `inject_user_context`'s existing `env_ctx` param, carrying the same remaining-USD figure. - Both fed by a single new helper `get_remaining_usd_budget(user_id, daily, weekly)` in `rate_limit.py` so the source of truth stays one place. Note that "balance" here is the **remaining daily/weekly USD spend cap** (the real money we infra-budget per user) — not the credit wallet. The two budgets are separate by design (see the existing module docstring on `rate_limit.py`); credit balance is a future unification. ## How `backend/copilot/rate_limit.py` - `get_remaining_usd_budget(...)`: returns the smaller of `(daily_limit - daily_used)` and `(weekly_limit - weekly_used)` in USD. `inf` when both caps are 0 (unlimited). Floored on Redis brown-out so observability paths don't pretend the user has unlimited budget. - `build_budget_env_ctx(...)`: thin wrapper that formats the result as a `<budget_context>` block; returns `""` for unlimited / no-user-id (skip injection). `backend/copilot/sdk/service.py` - New module-level `_resolve_dynamic_max_budget_usd(user_id)` reads the user's tier limits via `get_global_rate_limits` and clamps `claude_agent_max_budget_usd` to `[_MAX_BUDGET_USD_FLOOR, remaining_usd]`. - Wired into `ClaudeAgentOptions` construction (replaces the bare `config.claude_agent_max_budget_usd`). `backend/copilot/baseline/service.py` - On the first user message of a turn, fetches `daily/weekly` via `get_global_rate_limits`, builds the env_ctx block, passes it through `inject_user_context(env_ctx=...)`. SDK does NOT do this — its CLI already has a richer running-cost mechanism, so adding a one-shot env_ctx hint there would just be noise. ## Test plan - [x] `poetry run pytest backend/copilot/rate_limit_test.py::TestGetRemainingUsdBudget backend/copilot/rate_limit_test.py::TestBuildBudgetEnvCtx backend/copilot/sdk/service_test.py::TestResolveDynamicMaxBudgetUsd` — 14 pass - [x] `poetry run black` / `poetry run isort` / `poetry run ruff check` on changed files — clean - [ ] Manual: chat session at 90% of daily cap → SDK CLI surfaces "wrap up" reminder ~$0.50 of spend later, not $10 later - [ ] Manual: baseline chat with `<budget_context>` injected — verify model is more conservative on tool depth ## Related - Builds on the per-query `max_budget_usd` mechanism shipped earlier (P0 guardrail). - Independent of Significant-Gravitas#12992 (re-prompt fix); both can ship in parallel.
…rollback (Significant-Gravitas#13003) ## Why Paid→paid upgrades (the canonical case is **Pro→Max**) called `stripe.Subscription.modify_async` with `proration_behavior="create_prorations"`. That writes prorated line items to the *next* invoice instead of billing the user now, so the upgrade goes through "for free" and the user is then surprise-billed alongside next month's full \$320 charge at cycle end. Worse, the DB tier flip already lands before payment is collected, so if the user's card later declines they're stuck on MAX with no charge captured. Linear: SECRT-2315. ## What - `modify_stripe_subscription_for_tier` upgrade branch (`backend/data/credit.py`) now calls Stripe with: - `proration_behavior="always_invoice"` — Stripe creates and pays the prorated invoice synchronously instead of deferring it. - `payment_behavior="error_if_incomplete"` — Stripe raises `stripe.CardError` (or `InvalidRequestError` when there's no default payment method) if the auto-charge fails, so the modify is rolled back and we never flip the DB tier. - `update_subscription_tier` (`backend/api/features/v1.py`) gets a new `except stripe.CardError` branch returning **HTTP 402** with `"Your card was declined. The plan was not changed; please update your payment method and try again."`. Placed before the existing `InvalidRequestError`/`StripeError` catches so the user-facing 402 wins for declined-card failures. The DB tier flip already runs *after* `stripe.Subscription.modify_async` — Stripe error short-circuits before `set_subscription_tier`, so failed payment ⇒ user stays on Pro. ## How (tests) `backend/data/credit_subscription_test.py`: - New `test_modify_stripe_subscription_for_tier_pro_to_max_bills_immediately` — Pro→Max calls Stripe with `always_invoice` + `error_if_incomplete` and the DB tier flips. - New `test_modify_stripe_subscription_for_tier_pro_to_max_card_decline_does_not_flip_tier` — Stripe raises `CardError` ⇒ function propagates the error and `set_subscription_tier` is never awaited. - Updated upgrade-path assertions in 4 existing tests (`modifies_existing_sub`, `clears_cancel_at_period_end_on_upgrade`, `upgrade_immediate_proration`, `upgrade_releases_pending_schedule`) to expect the new kwargs. `backend/api/features/subscription_routes_test.py`: - New `test_update_subscription_tier_pro_to_max_card_declined_returns_402` — POST `/credits/subscription` with `tier=MAX` where Stripe modify raises `CardError` returns HTTP 402 and `set_subscription_tier` is never awaited. ## Local verification - `poetry run ruff format` + `poetry run ruff check` on touched files: clean. - `pytest backend/data/credit_subscription_test.py -k "modify_stripe_subscription_for_tier or upgrade or downgrade or pro_to_max or schedule"`: **27 passed**. - `pytest backend/api/features/subscription_routes_test.py -k "card_declined or paid_to_paid or max_checkout"`: 4 of 5 pass when run individually; the lone fail is a known pytest-asyncio event-loop scoping flake when `paid_to_paid_modifies_subscription` happens to run first in a fresh session — it passes when run alone or with my new test ordered before it. Unrelated to this change. ## Risk - Customers without a default payment method on file (rare for paid subs) will now see a 402 instead of a silent deferred-charge upgrade. That's the correct, intended behaviour: users must have a working payment method to upgrade. - Webhook idempotency unchanged — the existing `customer.subscription.updated` handler still reconciles after a successful modify.
…e-limit through DB-manager (Significant-Gravitas#12992) ## Why Two production fixes surfaced from John Ababseh's dev testing on 2026-05-01 (Discord thread `1499923303609925793`): - **Issue #5** — chat session `c93dc51f-bb38-4427-975a-6dc033358689` finished after multiple minutes of work and showed only `(Done — no further commentary.)` Langfuse trace `7d1a674eb7c84ffb5a4b34875306eea9` shows the model wrote the entire restaurant-list answer **inside an extended-thinking `ThinkingBlock`** (931 completion tokens, $0.50 spend) and ended the turn with empty `content: []`. Our existing thinking-only guard immediately stamped the placeholder, so the user never saw the actual answer the model already generated. - **Issue #2** — every image-generation request (`AIImageCustomizerBlock` / `AIImageGeneratorBlock`) on dev failed with `prisma.errors.ClientNotConnectedError: Client is not connected to the query engine`. Regression from Significant-Gravitas#12780 (tier-based workspace file storage limits): the new pre-write quota check at `util/workspace.py:225` called `get_workspace_total_size` directly from `backend.data.workspace`, which is a Prisma read. The copilot-executor process doesn't connect Prisma — it RPCs into `database-manager` for everything else — so every `manager.write_file()` from a tool blew up. ## What - **Issue 5** — layered fallback for thinking-only final turns: 1. Adapter sets `pending_thinking_only_reprompt` and defers placeholder/StreamFinish. 2. Driver re-enters the SDK loop and fires one synthetic `client.query("Please write a brief user-facing summary of what you found...")`. 3. If the re-prompt also returns thinking-only, promote the most recent `ThinkingBlock` content to a visible `TextDelta`. 4. Only when thinking is also empty, emit the original `(Done — no further commentary.)` placeholder. Bounded to **one** re-prompt per turn so the worst case is ~one extra LLM call. - **Issue 2** — route the storage-limit pre-check through the existing `workspace_db()` accessor and expose `get_workspace_total_size` on `DatabaseManager` so the copilot-executor RPCs into database-manager (where Prisma is connected), the same path other workspace queries on this codepath use. ## How `backend/copilot/sdk/response_adapter.py` - New `pending_thinking_only_reprompt`, `thinking_only_reprompted`, `_last_thinking_content` fields on `SDKResponseAdapter`. - Capture latest `block.thinking` when streaming reasoning so the second-tier promote-fallback has content. - ResultMessage thinking-only branch — first hit defers; second hit prefers `_last_thinking_content`, falls back to placeholder. `backend/copilot/sdk/service.py` - Wrap the `async for sdk_msg in _iter_sdk_messages(client):` block in a `while True:` retry loop. After the inner loop ends, check `pending_thinking_only_reprompt` — if set and not yet retried, fire `client.query(_THINKING_ONLY_REPROMPT, ...)` and re-enter; else break. Most of the diff is +4-space indentation churn. - Module-level `_THINKING_ONLY_REPROMPT` constant for the re-prompt copy. `backend/data/db_manager.py` - Import `get_workspace_total_size` and expose it via `_(...)` so it becomes an RPC on `DatabaseManager` and the corresponding async client. `backend/util/workspace.py` - Drop the direct `get_workspace_total_size` import; call `workspace_db().get_workspace_total_size(self.workspace_id)` instead. `backend/util/workspace_test.py`, `backend/copilot/sdk/response_adapter_test.py` - Existing thinking-only test split into three: defer-on-first-pass, promote-thinking-on-second-pass, fallback-to-placeholder-when-no-thinking. - Updated `test_flush_unresolved_at_result_message` to expect deferral instead of immediate placeholder. - New `test_write_file_storage_check_routes_through_workspace_db_accessor` proving the storage-limit pre-check goes through the accessor (would have caught Issue 2). ## Test plan - [x] `poetry run pytest backend/copilot/sdk/response_adapter_test.py backend/util/workspace_test.py` — 67 pass - [x] `poetry run ruff check` on changed files — clean - [x] `poetry run black` / `poetry run isort` on changed files — clean - [x] `/pr-test --fix` against dev preview to exercise the re-prompt + image-write paths end-to-end - [x] `/pr-polish` until merge-ready ## Related - Regression introduced by Significant-Gravitas#12780 (tier-based workspace file storage limits)
…icant-Gravitas#12996) ### Changes Removes the country / currency selector button from the onboarding Subscription step (`/onboarding?step=4`). - Removed the `<CountrySelector>` render and its wrapper div from `SubscriptionStep.tsx` - Removed the unused `countryIdx` / `setCountryIdx` from the `useSubscriptionStep` return shape and the now-orphaned `setCountryIdx` helper - Removed the `setSelectedCountryCode` selector inside the hook (store API kept intact) - Deleted `CountrySelector/CountrySelector.tsx` (no remaining consumers) - Removed the "changing country persists the country code" integration test The `country` value derived from `selectedCountryCode` (defaulting to `US`) is still passed to `<PlanCard>` for pricing display — only the UI affordance for changing it is removed. ### Checklist - [x] Removed UI element as instructed via page feedback - [x] Cleaned up unused imports, hook returns, and stale test - [x] No backend/API changes Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com>
…gnificant-Gravitas#13004) ## Why The onboarding paywall already renders a Monthly/Yearly toggle, but the toggle is purely cosmetic — the backend always charges the monthly Stripe price. This PR wires `billing_cycle` end-to-end so the toggle actually drives Stripe price-ID selection, plus a number of related billing-UX bugs surfaced during /pr-test (silent Pro→Max upgrade, missing tier downgrade dialog, yearly→monthly behaving immediately instead of deferred, etc.) — see the in-PR comments for the full list. Linear: [SECRT-2317](https://linear.app/agpt/issue/SECRT-2317), [SECRT-2306](https://linear.app/agpt/issue/SECRT-2306). Replaces closed PR Significant-Gravitas#12998 with a clean rewrite. ## What **Backend yearly support** (`backend/data/credit.py`, `backend/api/features/v1.py`): - `get_subscription_price_id(tier, billing_cycle="monthly")` reads the `copilot-tier-stripe-prices` LD flag using **additive flat suffix keys** — monthly stays at `<TIER>` (existing key, e.g. `"PRO"`), yearly lives at `<TIER>_YEARLY` (e.g. `"PRO_YEARLY"`). This shape is deploy-order-safe: adding the yearly key in LD never changes what the old code reads from `<TIER>`, so the LD edit and the code deploy can land in either order. Yearly request for a tier without a configured `_YEARLY` key returns `None` (fail-closed; we never silently bill monthly when the caller asked for yearly). - `create_subscription_checkout` and `modify_stripe_subscription_for_tier` accept `billing_cycle` and forward it; the Checkout Session metadata carries `billing_cycle` for observability and the modify path refreshes sub metadata so the Stripe Dashboard reflects the live tier+cycle. - `sync_subscription_from_stripe` gathers monthly + yearly prices for every priceable tier so a user on a yearly Pro plan still maps back to `SubscriptionTier.PRO`. Same dual-cycle map is used by `get_pending_subscription_change` so scheduled cycle-only changes resolve to the correct tier. - `SubscriptionTierRequest` gains `billing_cycle: Literal["monthly", "yearly"] = "monthly"` (default preserves back-compat with the settings billing tab where no cycle was sent). `SubscriptionStatusResponse` exposes `billing_cycle`, `tier_costs_yearly`, and `pending_billing_cycle` so the UI can render the right labels and copy. - `update_subscription_tier` route: forwards `billing_cycle` to all helpers; same-tier-cycle-downgrade (yearly→monthly) routes through `_schedule_downgrade_at_period_end` (no immediate proration); same-tier-cycle-upgrade (monthly→yearly) and tier upgrades stay on the immediate `proration_behavior=always_invoice` path. Same-tier short-circuit (release-pending-schedule) gates on the user having an active Stripe sub so admin-granted users can still pay. **Backend webhook robustness:** - Handle the new Stripe API event types (`invoice_payment.paid` / `invoice_payment.payment_failed`) by hydrating the underlying Invoice and delegating to the legacy handlers; `_invoice_subscription_id` reads `parent.subscription_details.subscription` first with a fallback to the legacy `invoice.subscription` field. Without this, accounts on the new Stripe API would flip the tier on `customer.subscription.created` but never grant credits because `invoice.payment_succeeded` was no longer being emitted. - `update_subscription_tier` maps `stripe.CardError` (including `code="authentication_required"` and the `subscription_payment_intent_requires_action` SCA variant) to a clear HTTP 402 with appropriate copy; `stripe.InvalidRequestError` for missing payment method (`code in {resource_missing, missing}`, `param in {default_payment_method, payment_method, invoice_settings.default_payment_method}`) also maps to 402. Substring matching on the error message is kept as a defensive fallback because Stripe documents `e.param` as nullable. - `tier_multipliers` lookup uses `t.value` (string) to match the `dict[str, float]` keying that `get_tier_multipliers()` documents in its return type — the prior enum lookup silently fell back to `1.0` for every tier. **Frontend** (onboarding paywall, Settings → Billing, PaywallGate): - `PlanCard` lifted from `(no-navbar)/onboarding/.../components/PlanCard/` into `src/components/molecules/PlanCard/` so onboarding, Settings → Billing, and PaywallGate share a single implementation. Plan list is driven by `tier_costs` / `tier_costs_yearly` from the API response — adding a new tier in LD makes it appear on every surface without code changes. - `PaywallGate` modal renders the shared `PlanCard` grid + the same Monthly/Yearly toggle pattern. Team (BUSINESS) opens the contact-sales intake form rather than POSTing to `/credits/subscription`. - `Settings → Billing` (`YourPlanCard`): - New `CycleToggle` + `SwitchCycleDialog`: yearly→monthly dialog promises end-of-period deferral (now actually deferred backend-side); monthly→yearly dialog promises immediate prorated charge. - New `SwitchTierDialog` (reused for both upgrade and downgrade): paid→paid upgrades surface the prorated immediate-charge copy before firing the mutation; paid→paid downgrades surface the keep-until-period-end copy before scheduling. Downgrade also gates on `serverCycle` so a yearly subscriber stays on yearly through the rest of their period. - Cycle toggle is hidden for `ENTERPRISE` (admin-managed), `BASIC` (reserved internal slot), `NO_TIER` (no active sub to switch), and `null` (loading). - PaywallGate's Upgrade flow gates on `has_active_stripe_subscription` — admin-overridden NO_TIER users with an active Stripe sub get the same SwitchTierDialog (so the modify-in-place isn't silent); genuine fresh NO_TIER users go straight to Stripe Checkout (the Checkout page is the confirmation). - `useSubscriptionStep` (onboarding) sends `billing_cycle: isYearly ? "yearly" : "monthly"` to `useUpdateSubscriptionTier`. - 422 toast for "yearly billing not yet available" is scoped to `billing_cycle === "yearly"` requests so monthly-targeted 422s surface the generic toast instead. ## How (tests) - **Backend** (touched test files, all passing locally): `credit_subscription_test.py` (price-id resolution incl. suffix-key shape, yearly fail-closed, sync mapping for yearly→tier, schedule cycle for yearly→monthly downgrade, idempotency on replay), `subscription_routes_test.py` (modify forwarding, 422 fail-closed, decline / SCA / no-PM 402 mapping, cycle-switch routing, admin-granted Checkout fallthrough), `rate_limit_test.py` (drift-warning yearly skip), `v1_test.py`. - **Frontend** (vitest + RTL + MSW): `PaywallModal.test.tsx` (dynamic plan rendering from `tier_costs`, cycle toggle, mutation routing, Stripe redirect, 422 toast, empty fallback, loading skeletons, Team contact-sales divert, dialog gate for active-sub case), `billing-cards.test.tsx` (cycle toggle + SwitchCycleDialog + SwitchTierDialog flows, downgrade preserves yearly, NO_TIER hides toggle, BASIC/ENTERPRISE hides toggle), `billing-hooks.test.tsx` (hook-level mutation paths through dialogs). - /pr-test --fix passed all V1–V9 scenarios end-to-end against a live Stripe sandbox (initial yearly charge, cycle switches, tier upgrade with SCA + decline, downgrade defers via Stripe Schedule, webhook idempotency, etc.). Several Sentry catches (SCA event type, downgrade cycle, BASIC toggle hide, race in confirmTierDowngrade, scoped 422 yearly toast) addressed in subsequent commits per the PR conversation. ## LaunchDarkly migration (operator runbook) This PR ships **without** any LD config change required — the suffix-key shape is purely additive. To enable yearly billing post-merge: 1. Create yearly Stripe Price IDs (per tier, 15% off the annual equivalent). 2. Edit `copilot-tier-stripe-prices`: keep the existing `"PRO"` / `"MAX"` keys, **add** `"PRO_YEARLY"` / `"MAX_YEARLY"` with the new yearly Price IDs. Test mode for dev, live mode for prod. 3. No code redeploy needed — the running backend already reads `<TIER>_YEARLY`. Tiers without a `_YEARLY` key keep showing only monthly in the UI (cycle toggle present but yearly request returns 422 fail-closed → toast `Yearly billing is not yet available for your plan.`). ## Checklist - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [x] Any dependent changes have been merged and published
…vitas#12624) ## Why Completes the two linking flows from Significant-Gravitas#12615 / Significant-Gravitas#12618. When the bot sends a user a one-time `{frontend}/link/{token}?platform=...` URL, this page is where the user actually connects their AutoGPT account — whether that's claiming a server or linking their personal DMs. ## The flows **Server link (`linkType: SERVER`)** 1. User runs `/setup` in a Discord guild → bot replies ephemerally with the link 2. Clicks link → logs in to AutoGPT if needed → lands here 3. Page shows: *"Set up AutoPilot for {ServerName}"* with a clear billing notice 4. Confirm → `POST /api/platform-linking/tokens/{token}/confirm` 5. Everyone in the server can now mention the bot; all usage bills to this user **DM link (`linkType: USER`)** 1. User DMs the bot → bot creates a USER token and posts the link in the DM 2. Clicks link → logs in → lands here 3. Page shows: *"Link your {Platform} DMs"* with a personal-context billing notice 4. Confirm → `POST /api/platform-linking/user-tokens/{token}/confirm` 5. DMs now run as that user's own AutoPilot, billed to themselves **Stacked on:** Significant-Gravitas#12618 → Significant-Gravitas#12615. Merge those first. ## Implementation Single file: `autogpt_platform/frontend/src/app/(no-navbar)/link/[token]/page.tsx` - State machine: `loading` → `not-authenticated` | `ready` → `linking` → `success` | `error` - `fetchTokenInfo(token)` hits `GET /api/platform-linking/tokens/{token}/info` — no auth needed, returns `platform`, `server_name`, and `link_type`. The page branches all copy and the confirm endpoint choice on `link_type`. - `?platform=DISCORD` query-param fallback so the platform name renders instantly even before `/info` resolves (removes a UI flash on slower connections). - "Signed in as {email}" footer with a one-click "Not you? Sign out" that logs out and redirects back to `/login?next=/link/{token}` — handles the common "wrong account" case. - 30s `AbortController` timeout on confirm. Timeouts surface as a retry prompt rather than hanging silently. - Reuses `AuthCard`, `Button`, `Text`, `Link` from the design system; Phosphor icons only (no emojis); Tailwind only. ## States | State | What the user sees | |-------|--------------------| | Loading | Spinner + "Loading…" | | Not authenticated | "Sign in to continue" → `/login?next=/link/{token}` | | Ready (SERVER) | *"Set up AutoPilot for {ServerName}"* — 4-bullet explainer + billing callout | | Ready (USER) | *"Link your {Platform} DMs"* — 3-bullet personal explainer | | Linking | Spinner + "Setting up AutoPilot…" | | Success (SERVER) | CheckCircle + "*{ServerName}* is now connected — everyone in the server can start using AutoPilot" | | Success (USER) | CheckCircle + "your *{Platform}* account is now connected — you can chat with AutoPilot in your DMs" | | Error | LinkBreak + inline error + "Ask the bot for a new setup link" | | Malformed token | Inline error — rejects client-side before any network call | ## Security - **Token format validation client-side**: `^[A-Za-z0-9_-]{1,64}$` — mirrors the backend's Path regex, so a malformed `params.token` never hits `/api/proxy/...`. - **All requests go through `/api/proxy`** which handles the Supabase session cookie server-side — no session token ever touches client-side fetch headers. - **Confirm endpoints are JWT-authed** on the backend (`Security(auth.requires_user)`). - **Token info endpoint is unauthenticated by design**: 32-byte-entropy tokens with 30-min TTL are safe to look up for display, and the confirm step still requires JWT. ## Follow-on wiring - `AUTOGPT_FRONTEND_URL` on the bot points back here (used in `/unlink` and the DM auto-link message) — no hardcoded hostnames. - Backend openapi.json fully regenerated on Significant-Gravitas#12615, typed API types regenerate via `pnpm generate:api` if you re-run it. <img width="617" height="613" alt="image" src="https://github.com/user-attachments/assets/c81c6935-c21d-4f31-9d67-0a4f9f1709bf" /> <img width="568" height="637" alt="image" src="https://github.com/user-attachments/assets/406148b3-a1fc-4fed-8973-22ab1e3f7f43" /> <img width="607" height="518" alt="image" src="https://github.com/user-attachments/assets/de29680f-8a70-45e1-92d3-5759ecbba4c4" />
…ostLog (Significant-Gravitas#13009) ## Why The post-execution **activity-status generator** ([`activity_status_generator.py`](autogpt_platform/backend/backend/executor/activity_status_generator.py)) runs an LLM call (gpt-4o-mini via the platform's `openai_internal_api_key`) on every completed graph run to produce a 1-3 sentence user-friendly summary + correctness score. It uses `AIStructuredResponseGeneratorBlock` to issue the call. **Bug:** the block writes its `provider_cost` into a local `NodeExecutionStats` via `merge_stats`, but because the block runs **outside** the executor's `execute_node` loop, neither [`log_system_credential_cost`](autogpt_platform/backend/backend/executor/cost_tracking.py) nor `charge_reconciled_usage` ever fires — the cost is dropped on the floor when the function returns. This is an **observability gap, not a billing gap**: the platform is paying OpenAI for activity-status generation on every completed graph execution, but those calls don't show up in the admin cost dashboard / per-user attribution. The [simulator](autogpt_platform/backend/backend/executor/simulator.py) (same shape: platform-paid LLM call on the user's behalf) already routes through `persist_and_record_usage` so its spend is attributed correctly. This PR brings activity-status to parity. ## What - New `_persist_activity_status_cost(...)` helper that builds a `PlatformCostEntry` and schedules it via `schedule_platform_cost_log` (renamed from the previously-private `_schedule_log` — see refactor commit) so external modules can use it without reaching into private API. - Call site placed after `structured_block.run()` succeeds, before `return activity_response`. Reads `provider_cost`, `input_token_count`, `output_token_count` off `structured_block.execution_stats`. - Helper body is wrapped in `try/except Exception` so any cost-log failure (transient DB / scheduling error) is logged but never strips a successful activity-status response from the user — cost-logging is strictly best-effort. - Early-return guard uses `not cost_usd` so both `provider_cost is None` and `provider_cost == 0.0` (with zero tokens) short-circuit, avoiding empty rows that would dilute dashboard averages. - Distinguishes `cost_usd`-tracked rows (`tracking_type="cost_usd"`) from tokens-only rows (`tracking_type="tokens"`, `tracking_amount = input + output`) so admins can still filter by request volume when the provider doesn't report a USD cost. - Deliberately **does not** bill the user's wallet (no `spend_credits`) and **does not** count against the user's copilot rate-limit (no `record_cost_usage`) — activity-status is platform-side overhead, not user-triggered. Matches the simulator's stance for dry-run executions. ## How (tests) `backend/executor/activity_status_generator_test.py`: - `test_generate_status_persists_platform_cost`: stubs `execution_stats=NodeExecutionStats(input=120, output=40, provider_cost=0.0042)`, patches `schedule_platform_cost_log`, and asserts the entry carries the right `user_id` / `graph_exec_id` / `graph_id` / `block_name="activity_status_generator"` / `provider="openai"` / `model="gpt-4o-mini"` / `tracking_type="cost_usd"` / `cost_microdollars=4200` / `metadata.source="activity_status_generator"`. - `test_generate_status_no_cost_no_log`: zero-cost zero-token case must skip the log write. - `test_generate_status_tokens_only_branch`: provider returns no USD cost but tokens are present (input=200, output=80) — entry is logged with `tracking_type="tokens"`, `tracking_amount=280.0`, `cost_microdollars=None`. - Updated three existing success-path tests to set `mock_instance.execution_stats = NodeExecutionStats()` so the new short-circuit has concrete numbers to compare against (was a `MagicMock` attribute before). `poetry run pytest backend/executor/activity_status_generator_test.py` — 17 passed locally. `poetry run pyright` / `poetry run ruff` — clean on touched files. ## Checklist - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [x] Any dependent changes have been merged and published
… show 'Go to billing' (Significant-Gravitas#13011) ## Why [SECRT-2296](https://linear.app/agpt/issue/SECRT-2296) — When a user hits the daily AutoPilot limit, the CTA briefly flashed the legacy **"Reset daily limit for $5.00"** copy before switching to **"Go to billing."** Visible in: - The usage dropdown in the header - The in-chat `UsageLimitReachedCard` - The library briefing panel's usage section Each remount (thread switch, dropdown reopen) reproduced the flash. ### Root cause The render branch picked the CTA based on `hasInsufficientCredits = credits < resetCost`. `useCredits` returns `credits = null` until the API resolves, so `hasInsufficientCredits` defaulted to **false** on first paint: ``` isDailyExhausted && !isWeeklyExhausted && resetCost > 0 && !hasInsufficientCredits → renders "Reset daily limit for $5.00" // first paint, credits not loaded ``` Once credits resolved (typically 0 for users on the free tier), the flag flipped and the CTA swapped to "Go to billing." That's the flash. PR Significant-Gravitas#12973 already removed this credit-based reset flow from the rate-limit dialog (`Wait for reset` / `Go to billing`), but the same legacy code was still living in the dropdown, the in-chat card, and the briefing panel. ## What Finish the migration started by PR Significant-Gravitas#12973. Kill the credit-reset path everywhere; show a single deterministic **"Go to billing"** CTA when the daily limit is exhausted. - `UsagePanelContent` — removed `ResetButton`, `useResetRateLimit`, `hasInsufficientCredits` / `isBillingEnabled` / `onCreditChange` props. Renders one CTA, gated by `ENABLE_PLATFORM_PAYMENT`, routing to `/settings/billing`. - `UsageLimits` (dropdown) and `UsageLimitReachedCard` (in-chat) — dropped `useCredits` + flag prop drilling. - `BriefingTabContent` (library) — removed `UsageFooter` (paid reset / add-credits). Same single CTA, same flag gate. - Deleted the now-unused `useResetRateLimit` hook. - Tests updated: - Assert the legacy `Reset daily limit` button is **never** rendered, in any state. - Assert "Go to billing" is gated only by `ENABLE_PLATFORM_PAYMENT`. ## How The flash was a render-order race against an async credit fetch. Removing the conditional path that depends on `credits` removes the race entirely — the CTA is now a pure function of `usage` and a synchronous LD flag, both available on the first render that has usage data. Behavior change worth flagging for review: users who *did* have ≥ $5 credit can no longer click "Reset daily limit" to spend credits and unblock themselves immediately. They go to billing or wait for daily reset. This matches PR Significant-Gravitas#12973's stated intent ("Replaced the credit-based reset flow with **Wait for reset** and **Go to billing**"). If product wants to preserve paid-reset for credit-rich users, that needs a separate decision — happy to revert that part and instead fix the flash by gating render on `credits !== null`. ## Test plan - [x] `pnpm format`, `pnpm lint`, `pnpm types` all clean - [x] `pnpm vitest run UsagePanelContentRender BriefingTabContent` — 38 passed - [x] `pnpm vitest run CopilotPage RateLimit` — 91 passed - [ ] Manual: hit daily limit on dev, open usage dropdown — "Go to billing" appears immediately, no flash - [ ] Manual: switch chat threads with limit reached — no flash on remount - [ ] Manual: open in-chat `UsageLimitReachedCard` — single "Go to billing" CTA - [ ] Manual: library Briefing panel "All" tab with limit reached — single "Go to billing" CTA - [ ] Manual: with `ENABLE_PLATFORM_PAYMENT` off, no "Go to billing" / "Manage billing" rendered - [ ] Manual: clicking "Go to billing" lands on `/settings/billing`
…Significant-Gravitas#13007) ## Why A user opening `/library` for the first time (zero agents, zero folders) saw an entirely blank grid below the header — no guidance, no call to action, no indication of what to do next. Existing empty-state copy only existed for the **Favorites** tab; the main "All" tab fell through to an empty `InfiniteScroll` and rendered nothing. This is a poor onboarding moment for net-new users and looks like a broken page for users who deleted all their agents. https://github.com/user-attachments/assets/7b3b2097-36bb-4fc6-82de-78da30e1f287 ## What Adds a dedicated `LibraryEmptyState` rendered only in the **pristine** zero-state on the main tab: - ✅ no agents - ✅ no folders - ✅ no active search term - ✅ not inside a folder - ✅ `statusFilter === "all"` - ✅ not the favorites tab (which keeps its existing `HeartIcon` empty state) Other empty cases (search-no-results, status-filter-empty, empty subfolder) keep their current behaviour so the CTAs don't appear in inappropriate contexts. ## How **New component** — [`LibraryEmptyState/LibraryEmptyState.tsx`](autogpt_platform/frontend/src/app/(platform)/library/components/LibraryEmptyState/LibraryEmptyState.tsx) - **Custom SVG illustration** — three stacked, progressively-wider rounded "agent cards" (avatar circle + title bar + action squares + trailing pill). No external assets, just inline SVG. - **Two CTAs** using the design-system `Button` atom (`size="large"`, `as="NextLink"`): - Primary → `/build` ("Build an agent") - Secondary → `/marketplace` ("Browse marketplace") - **Copy** uses the design-system `Text` atom. **Animation** — applies Emil Kowalski's animation principles: - Staggered fade-up entrance for every child (illustration → heading → body → CTAs) - The 3 cards inside the illustration also stagger (back-to-front, 80 ms apart) so the deck visually "deals out" - Shared `cubic-bezier(0.22, 1, 0.36, 1)` (out-quint) curve, ~350 ms per element - Only `transform` + `opacity` animated (GPU-friendly) - Respects `prefers-reduced-motion` via `useReducedMotion`: collapses to a single 200 ms opacity fade with no stagger and no translate **Wired into** [`LibraryAgentList.tsx`](autogpt_platform/frontend/src/app/(platform)/library/components/LibraryAgentList/LibraryAgentList.tsx) — added a single `isPristineEmpty` derivation guarding the existing render branch ladder, just below the favorites empty state. ## Tests New integration test file — [`empty-state.test.tsx`](autogpt_platform/frontend/src/app/(platform)/library/__tests__/empty-state.test.tsx) covering: - Renders heading + body copy on zero-state - "Build an agent" CTA points to `/build` - "Browse marketplace" CTA points to `/marketplace` - Empty state is **not** rendered when at least one agent exists - Empty state is **not** rendered when folders exist (even with zero agents) ## Test plan - [x] `pnpm test:unit` — new `empty-state.test.tsx` passes - [x] Visual: open `/library` with a fresh user → see illustration + CTAs animate in - [x] Visual: with agents present → empty state is hidden - [x] Visual: search a non-existent term → search-empty state shown (not new empty state) - [x] Visual: filter by a status with no matches → existing filter-exhaust UX preserved - [x] Visual: navigate into an empty folder → does not render the new empty state - [x] A11y: with `prefers-reduced-motion: reduce`, only opacity fades — no translate, no stagger - [x] CTAs route via Next.js client navigation (no full page reload)
…nificant-Gravitas#12960) ### Why / What / How **Why:** `frontend/TESTING.md` is explicit that page/component-level integration tests (Vitest + RTL + MSW) are the default — yet ~30 hook tests across the frontend were exercising hooks directly via `renderHook`, with mock-heavy harnesses that drifted from how the hooks are actually consumed. They were brittle, hard to navigate, and provided coverage that didn't reflect user-visible behavior. A handful of test files also lived outside `__tests__/` siblings and a couple were duplicated. This PR cleans the surface area and brings test placement in line with the convention. **What:** - Migrates 22 direct hook tests into the consumer component test where the behavior is observable through DOM rendering. Where a sibling component test didn't exist (`GoogleDrivePicker`, `ChatSidebar`, `CredentialsInput`, `PushNotificationProvider`), creates a minimal one rendered with `render()` from `@/tests/integrations/test-utils`. - Consolidates the 3 `services/push-notifications` hook tests into a single `PushNotificationProvider.test.tsx` that mounts the provider and asserts via the same boundaries the hook tests previously stubbed (`next/navigation`, `useSupabase`, service-worker registration helpers, push API). - Extracts pure helpers exposed by hooks into helper-test files (e.g., `classifyCredentials.test.ts`, `useArtifactContentHelpers.test.ts`). Drops behaviors that were purely internal `useEffect` orchestration with no DOM-observable surface. - Relocates 27 misplaced test files into `__tests__/` siblings. Merges 3 helper-test duplicates (`CredentialsInput/helpers.test.ts`, `copilot/store.test.ts`, `copilot/helpers.test.ts`) into a single canonical copy each — no coverage lost. - Resolves 2 real test-file duplicates (`downloadArtifact.test.ts`, `useAutoOpenArtifacts.test.ts`). **How:** Each migration was a strict triage: 1. **Already covered** by the consumer component test → drop the hook-level case. 2. **Reachable through DOM rendering** of a consumer component → port as a render-driven test using MSW handlers (and existing component-test patterns) rather than mocking the hook itself. 3. **Pure helper logic exported from the hook module** → keep coverage in a uniquely-named helpers test next to the hook. 4. **Pure internal `useEffect` orchestration** that can only be reached by `renderHook` → drop, with the lost behavior documented per-hook so it can be reinstated later. Hook source files are untouched. The existing `CopilotPage.test.tsx` (which mocks `useCopilotPage` directly and is therefore a smoke test today) is also untouched — it's the natural home for the page-level orchestration coverage that this PR drops, but rewriting it to render with MSW is its own much larger change and is left as follow-up. ### Changes 🏗️ **Hook-test migrations (sub-component layer)** - `useChatInput` → behaviors merged into `ChatInput.test.tsx` - `useElapsedTimer` → `TurnStatsBar.test.tsx` (DOM-observed via a small `TimerHarness`) - `useArtifactContent` → `ArtifactContent.test.tsx`; cache helpers preserved in new `useArtifactContentHelpers.test.ts` - `useAutoOpenArtifacts` → `ChatContainer.test.tsx` (real Zustand store + real hook drive behavior) - `useBuilderChatPanel` → already covered; deleted (24 internal-only behaviors documented as dropped) - `useDiagnosticsContent` → 2 error-coalesce cases added to `DiagnosticsContent.test.tsx` - `useRateLimitManager` → 12 cases ported into `RateLimitManager.test.tsx` (now 19 tests) - `useCredentialsInput` → 1 case ported into a new `CredentialsInput.test.tsx`; 2 dropped as dead code (the hook exposes `userUpgradeableCredentials` and `handleScopeUpgrade` but no consumer reads them) - `useGoogleDrivePicker` → new `GoogleDrivePicker.test.tsx` covers token/error/scope paths - `useCredentials` → 7 helper cases extracted into `classifyCredentials.test.ts`; 2 context-plumbing cases dropped - `usePushNotifications` + `useReportClientUrl` + `useReportNotificationsEnabled` → consolidated into `PushNotificationProvider.test.tsx` (19 tests) - `useCopilotStop` → 8 cases ported into `ChatInput.test.tsx` via a `StopHarness` (44 tests total now) - `useSessionDeletion` → 4 cases ported into a new `ChatSidebar.test.tsx` **Hook-test migrations (page-level layer — coverage gap, see below)** The remaining 14 copilot page-level hook tests under `app/(platform)/copilot/__tests__/use*.test.ts` are deleted. Most of their behaviors were `renderHook` + mocked-args orchestration with no DOM surface other than through `<CopilotPage>` itself. Since `CopilotPage.test.tsx` mocks `useCopilotPage` wholesale, none of that orchestration is currently reachable from any rendered test. Behaviors observable through other consumers were ported there (e.g., `useCopilotStop` → ChatInput); the rest are dropped. **Documented coverage drops (highest-risk):** - `useCopilotStream` (12 behaviors) — registry reuse, hydration-gated resume, restore latch, 6s stall watchdog, message-snapshot persistence, idempotent resume, `setMessages` pre-replay strip, unmount cleanup, background-disconnect reload marking, 30s forced reconnect, 429 rate-limit recovery branches. - `useCopilotPage` orchestrator — onSend queue-in-flight routing (5 sub-cases), active-restore trim + cached-snapshot preference, `turnStats` merge precedence, backward pagination ordering. - `useSendMessage` — file count cap, file size cap, all-uploads-fail toast/throw, first-send stash + createSession flow, double-send concurrency guard, queued first-send flush on sessionId change. - `useStreamActivityWatchdog` (6), `useWakeResync` (8), `useHydrateOnStreamEnd` (11), `useCopilotPendingChips` (8 promotion lifecycle), `useLoadMoreMessages` (8 cursor/backoff), `useSessionTitlePoll` (6), `useWorkflowImportAutoSubmit` (7), `useChatSession` (5), `useCopilotNotifications` (4 push-SW dedupe). **Recommended follow-ups:** 1. Rewrite `CopilotPage.test.tsx` to render with MSW (replacing the `useCopilotPage` mock). Highest-priority targets: `onSend` queue-in-flight routing, `useCopilotStream`'s reconnect/restore lifecycle, `useSendMessage`'s file caps + first-send flush, `useChatSession`'s freshSessionData masking. 2. Extract embedded pure logic from `useHydrateOnStreamEnd` (`preservePromotedUserBubbles`, zombie ledger), `useStreamActivityWatchdog` guards, `useCopilotPendingChips` promotion logic, `useLoadMoreMessages` backoff/window — currently all module-private — into siblings exports so they can be helper-tested without page-level setup. 3. Already-exported but untested helpers flagged by agents during this work: `getLatestAssistantStatusMessage`, `concatWithAssistantMerge`, `deduplicateMessages`, `hasInProgressAssistantParts`, `hasVisibleAssistantContent` — cheap follow-up additions to `helpers.test.ts`. **Test relocations** 27 helper/component test files moved from feature directories into `__tests__/` siblings (renderers under `OutputRenderers/`, helpers under copilot tools / ArtifactPanel / CredentialsInput / SubscriptionStep / SubscriptionTierSection / APIKeyList, `route.helpers.test.ts`, `lib/utils.test.ts`, `lib/autogpt-server-api/{client,helpers}.test.ts`, `middleware.test.ts`, `providers/agent-credentials/credentials-provider.test.ts`, `types/auth.test.ts`). Three helper-test pairs that existed both inside and outside `__tests__/` were merged into the canonical inside copy with no coverage lost. **Real duplicates removed** - `app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.test.ts` — older smaller copy, the inside `__tests__/` version is the comprehensive one. - `app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.test.ts` — kept the more recent outside copy by moving its content into `__tests__/`, then migrated it into `ChatContainer.test.tsx` as part of phase A. **Net diff:** 65 files changed, +2,214 / −9,233 lines. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] `pnpm format` — clean - [x] `pnpm lint` — clean (only pre-existing tailwind warnings) - [x] `pnpm types` — only pre-existing PaywallGate / SubscriptionTierSection errors (verified by stashing this PR's changes; same errors exist on dev) - [x] `pnpm exec vitest run --no-coverage src/` — 164 test files, 2,223 tests passing - [x] Spot-checked targeted suites after each migration: each touched component test file passes in isolation #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sageLimitReachedCard (Significant-Gravitas#13012) ### Why / What / How <img width="800" alt="Screenshot 2026-05-06 at 19 04 53" src="https://github.com/user-attachments/assets/109aeaa6-c5a8-4cd4-8aa4-8bacbb3ce784" /> <img width="800" alt="Screenshot 2026-05-06 at 19 05 03" src="https://github.com/user-attachments/assets/16e908bb-fa95-4842-877d-3899870df583" /> <img width="800" alt="Screenshot 2026-05-06 at 19 05 56" src="https://github.com/user-attachments/assets/11b3fd1b-de7e-41eb-8935-7a895a632c84" /> **Why:** The chat copilot's `UsageLimits/` folder mixed two distinct concerns through a shared `UsagePanelContent` abstraction with three boolean knobs (`showHeader`, `showBillingLink`, size). The shared component made the "Go to billing" CTA conditional inside the card, even though the chat container already gates whether the card renders at all — which is why the legacy "Reset daily limit" button kept flashing back through history. While doing the split it also made sense to clean up the duplicated query config across the three new hooks, align the visual language for usage bars and tier pills with the `AgentBriefingPanel`, and surface a billing entry point from the popover (and on the briefing panel itself) so users don't need to wait for the limit-reached state to manage their plan. **What:** - Splits `UsageLimits/` into two fully independent components — `<UsagePopover />` (chart-bar trigger button + popover panel) and `<UsageLimitReachedCard />` (alert card shown above the chat input when the limit is reached). The card always renders the "Go to billing" button (modulo the platform billing flag) and now defensively guards on actual exhaustion; ChatContainer's `useIsUsageLimitReached` decides whether to mount the card at all. - Adds a shared `useCopilotUsage` hook so the three consumer hooks (`useUsagePopover`, `useUsageLimitReachedCard`, `useIsUsageLimitReached`) share one `USAGE_QUERY_CONFIG` constant and can't drift. - Aligns the usage-bar visual language across the copilot popover, the limit-reached card, the credits page, and the settings/billing `AutopilotUsageCard` — single `h-2` track with blue→orange threshold at 80%, label and percent in the header row, reset/caption underneath, and the same `<Badge variant="info" size="small" className="bg-[rgb(224,237,255)]">{Tier} plan</Badge>` pill used by `AgentBriefingPanel.UsageMeter`. - Adds an always-on `Manage billing` link to the usage popover (gated by the platform billing flag) so users can jump to `/settings/billing` without first hitting the limit. - Replaces the conditional "Go to billing" CTA in `AgentBriefingPanel.UsageSection` with an unconditional top-right `Manage billing` link, keeping the briefing card calmer and consistent with the popover. - Switches the popover and card integration tests from `vi.mock`-ing the data hooks to driving them via MSW handlers on `*/api/chat/usage`, so the new shared hooks are actually exercised end-to-end (closes the codecov/patch gap the bot previously flagged). **How:** - New layout follows CONTRIBUTING.md: `ComponentName/ComponentName.tsx` + `useComponentName.ts` + colocated `__tests__/`. - Extracted small render primitives `UsageBar.tsx` and `StorageBar.tsx` so the popover, card, and `credits/page.tsx` can each render usage bars without inheriting the old multi-mode component. `UsageBar` no longer takes a `size` prop — the bar is `h-2` everywhere, matching the briefing panel reference. - `useCopilotUsage` lives at the folder root; the per-consumer hooks (`useUsagePopover`, `useUsageLimitReachedCard`, `useIsUsageLimitReached`) wrap it and only add the bits each consumer needs (e.g. the platform billing flag for the popover/card). - `useIsUsageLimitReached` lives in its own file at the folder root — used by ChatContainer, not the card itself. - `formatBytes` moved into `usageHelpers.ts` (alongside `formatResetTime`) so the StorageBar primitive doesn't own a generic helper. - `AutopilotUsageCard` (`/settings/billing`) keeps its framer-motion fill animation but adopts the unified `h-2` blue/orange track + `body`-variant percent label + small caption for "Spent: $X" so it visually matches the copilot bars. - Stale tests for `UsagePanelContent` deleted (the `formatResetTime` tests were redundant with `usageHelpers.test.ts`); added focused integration tests for `UsagePopover` (9) and `UsageLimitReachedCard` (7) plus `formatBytes` table tests. The popover and card tests now drive `*/api/chat/usage` via MSW so the real hooks run during the test instead of being shimmed. - `BriefingTabContent.UsageSection` simplified — dropped the local `showGoToBilling` derivation and the bottom CTA; the `Manage billing` link is now the only billing entry point and renders whenever `ENABLE_PLATFORM_PAYMENT` is on. ### Changes 🏗️ - **Added** - `UsagePopover/UsagePopover.tsx` + `useUsagePopover.ts` - `UsageLimitReachedCard/UsageLimitReachedCard.tsx` + `useUsageLimitReachedCard.ts` - `UsageBar.tsx`, `StorageBar.tsx`, `useIsUsageLimitReached.ts`, `useCopilotUsage.ts` - `UsagePopover/__tests__/UsagePopover.test.tsx`, `UsageLimitReachedCard/__tests__/UsageLimitReachedCard.test.tsx` - `formatBytes` in `usageHelpers.ts` + tests - "Manage billing" link in `<UsagePopover />` (always shown when `ENABLE_PLATFORM_PAYMENT` is on) - **Removed** - `UsageLimits.tsx`, `UsagePanelContent.tsx`, flat `UsageLimitReachedCard.tsx` - `UsageLimits/__tests__/` (replaced by per-component test folders + `usageHelpers.test.ts`) - "Go to billing" CTA from `AgentBriefingPanel.UsageSection` (replaced by the existing top-right "Manage billing" link, which now always renders when billing is on) - **Updated** - `ChatSidebar.tsx` — imports `UsagePopover` - `ChatContainer.tsx` + its test — split imports across new file paths - `profile/(user)/credits/page.tsx` — uses `UsageBar` + `StorageBar` directly instead of the deleted `UsagePanelContent` - `settings/billing/.../AutopilotUsageCard.tsx` — replaces the `h-6` striped gray bar with the unified `h-2` blue/orange bar - `library/.../AgentBriefingPanel/BriefingTabContent.tsx` — `Manage billing` link is now unconditional under the billing flag - Visual primitives `UsageBar.tsx` / `StorageBar.tsx` aligned to the briefing-panel reference (`h-2`, blue→orange ≥80%, header row + caption layout, `<Text>` atoms) - Tier pill across copilot popover, limit-reached card, and briefing panel uses the same `Badge variant="info"` with `bg-[rgb(224,237,255)]` and `"{Tier} plan"` label ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `pnpm format` clean - [x] `pnpm lint` clean (only pre-existing img warnings) - [x] `pnpm types` clean for all refactored files (pre-existing yearly-billing type errors unrelated to this PR) - [x] `pnpm vitest run` for all affected test files: 101/101 passing across 7 files (`UsagePopover` 9, `UsageLimitReachedCard` 7, `usageHelpers` 22, `ChatContainer` 2, `billing-cards` 42, `billing-page` 9, `BriefingTabContent` 10) - [ ] Manually open `/copilot`, confirm the chart-bar popover in the sidebar still shows usage bars, tier pill, storage section, and the new "Manage billing" link at the bottom - [ ] Manually trigger a daily-limit-reached state and confirm the alert card renders with "Go to billing" leading to `/settings/billing`, plus the defensive guard hides the card below 100% - [ ] Open `/profile/credits` and confirm the AutoPilot Usage & Storage section still renders bars and storage as before - [ ] Open `/settings/billing` and confirm the Autopilot usage card now uses the unified blue/orange `h-2` bar - [ ] Open `/library` and confirm the Agent Briefing usage section shows the always-on "Manage billing" link top-right (and no "Go to billing" CTA below the bars) #### For configuration changes: - [x] N/A — no config touched --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend CI was flaky because run duration was around 15 minutes (= the current cutoff)
…97) (Significant-Gravitas#13016) ### Why / What / How Refreshes the walkthrough videos shown in the wallet onboarding checklist ([SECRT-2297](https://linear.app/autogpt/issue/SECRT-2297)). Old recordings were stale and one task (`SCHEDULE_AGENT`) had no walkthrough at all. - Replace `marketplace-add.mp4` with a re-recorded version. - Add `agent-run.mp4` (used by `MARKETPLACE_RUN_AGENT`, replaces `marketplace-run.mp4`). - Add `agent-schedule.mp4` and wire it to the previously video-less `SCHEDULE_AGENT` task. - Delete unreferenced clips: `builder-open/save/run.mp4`, `marketplace-visit.mp4`, `marketplace-run.mp4`. - Add `!public/onboarding/*.mp4` exception to `.gitignore` so onboarding videos slip through the global `*.mp4` rule. > Note: the `TRIGGER_WEBHOOK` task video was deferred to [SECRT-2186](https://linear.app/autogpt/issue/SECRT-2186). ### Changes 🏗️ - `frontend/.gitignore`: allow `public/onboarding/*.mp4`. - `frontend/public/onboarding/`: refreshed `marketplace-add.mp4`; new `agent-run.mp4`, `agent-schedule.mp4`; removed orphaned clips. - `frontend/src/components/layout/Navbar/components/Wallet/Wallet.tsx`: point `MARKETPLACE_RUN_AGENT` at `agent-run.mp4` and add `video` for `SCHEDULE_AGENT`. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Open the wallet popover, expand "Get an agent from the marketplace" and confirm the new `marketplace-add.mp4` plays. - [ ] Expand "Open the Library page and run an agent" and confirm `agent-run.mp4` plays. - [ ] Expand "Schedule your first agent" and confirm `agent-schedule.mp4` now plays (previously no video). - [ ] Confirm no console 404s for `/onboarding/*.mp4`.
POST /api/blocks/{block_id}/execute now charges via block_usage_cost
+ spend_credits before running, mirroring the billing wrapper used
in graph execution (manager.py:1014-1022) and the copilot tool
helper. Insufficient balance surfaces as HTTP 402.
The execution-count tier charge is intentionally omitted to match
copilot/chat-route semantics — that tier is graph-execution-only.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ignificant-Gravitas#13000) ## Changes Implements [SECRT-2311](https://linear.app/autogpt/issue/SECRT-2311): Makes AutoPilot aware of the user's subscription tier so it can surface billing CTAs and tailor responses without polluting the system prompt. ### What's included 1. **`get_platform_info` tool** — A pull-model tool the agent calls on-demand with a `subscription` topic. Returns: - Current tier name & description - Rate-limit multiplier & workspace storage limit - Billing portal URL - Available upgrade tiers (excluded for Enterprise) 2. **Tier hint in `user_context`** — Injects a one-word `Plan: PRO` line into the existing `<user_context>` block on the first message, so the agent passively knows the tier without a tool call. 3. **Identity fix in fallback system prompt** — Updates the hardcoded fallback prompt from "AI automation assistant" to "AutoPilot, the AI assistant on the AutoGPT platform" with a "never direct to external AI services" instruction. 4. **Full registration** — `PlatformInfoResponse` model, `PLATFORM_INFO` response type, `TOOL_REGISTRY` entry, `ToolName` literal update. 5. **12 unit tests** — All tiers, no-auth, invalid topic, DB failure, registry presence, schema validation. ### Design decisions - **Pull model** — Agent calls the tool when relevant (user asks about billing, hits limits, etc.) rather than stuffing tier info into every system prompt. Preserves LLM prompt-cache hits. - **Topic enum** — `subscription` is the only topic for V1. Designed to expand to `integrations`, `webhooks`, `capabilities` later. - **Enterprise excluded** from upgrade suggestions (not self-serve). - **Tier lookup failure is non-fatal** — silently caught, tool returns graceful error. ### Files changed | File | Change | |------|--------| | `copilot/service.py` | Identity fix + `user_id` param on `inject_user_context()` + tier lookup | | `copilot/tools/platform_info.py` | **NEW** — `PlatformInfoTool` implementation | | `copilot/tools/test_platform_info.py` | **NEW** — 12 unit tests | | `copilot/tools/models.py` | `PlatformInfoResponse` + `PLATFORM_INFO` response type | | `copilot/tools/__init__.py` | Import + registry entry | | `copilot/permissions.py` | `ToolName` literal update | | `copilot/baseline/service.py` | Pass `user_id` to `inject_user_context()` | | `copilot/sdk/service.py` | Pass `user_id` to `inject_user_context()` | ### Checklist - [x] Code follows project conventions - [x] `poetry run format` passes - [x] Unit tests written and passing (12/12) - [x] No changes to Langfuse prompt (managed separately) - [x] `inject_user_context()` signature change is backward-compatible (`user_id` defaults to `None`) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Adds a new authenticated tool that exposes subscription/billing info and injects plan context into first-turn prompts, which can affect assistant behavior and requires correct feature-flag/tier lookups. Risk is moderate due to new tool surface area and prompt-content changes, but scope is contained to copilot tooling and context injection. > > **Overview** > **AutoPilot is now tier-aware without changing the cacheable system prompt.** The first-turn `inject_user_context` flow now accepts `user_id` and, when available, appends a `Plan: <TIER>` line (via `get_user_tier`) into the trusted `<user_context>` prefix; both baseline and SDK execution paths were updated to pass `user_id`. > > Adds a new authenticated tool, `get_platform_info`, registered in the tool registry and permissions, with a corresponding `PlatformInfoResponse`/`ResponseType` and OpenAPI enum update. The tool returns subscription tier and a billing URL (or an “open access” response when billing is feature-flag disabled) and includes messaging to keep billing guidance on the AutoGPT platform only. > > Also updates the cacheable system prompt identity string to name “AutoPilot on the AutoGPT platform.” > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 12c56fd. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…roxy (Significant-Gravitas#13019) ### Why / What / How **Why.** The Supabase proxy was the bottleneck for client-side API latency. Two things were doing avoidable work on every browser API call: 1. The Next.js middleware's matcher matched `/api/proxy/*`, so each call ran `supabase.auth.getUser()` — a round-trip to the Supabase GoTrue auth server — *before* the actual backend call. 2. The proxy handler parsed every request body (`req.json()` / `req.formData()`) and re-serialised it before forwarding. For 256 MB FormData uploads that meant the whole upload was buffered in Next.js memory, then re-encoded for the backend. The response body was likewise read into memory and re-wrapped with `NextResponse.json(body, { status: 200 })`, which (a) flattened backend `201` / `202` / `204` to `200` and (b) stripped headers like `Content-Disposition`, `Cache-Control`, `Location`, `ETag`. Empty / unparseable POST bodies fell back to the literal string `"null"` reaching the backend. The httpOnly-cookie / no-token-in-browser security model is unchanged — JWTs are still injected server-side and never exposed to the browser, so Google CASA posture is preserved. **What.** - Excludes `/api/proxy` from the middleware matcher. - Rewrites `src/app/api/proxy/[...path]/route.ts` as a single stream-through proxy: `req.body` (`ReadableStream`) is forwarded with `duplex: "half"`, and `backendResponse.body` is piped straight back into `NextResponse`. Status code, statusText, and a filtered set of response headers (hop-by-hop entries dropped per RFC 7230) are preserved. - Workspace download branch is untouched — it still buffers via `arrayBuffer()` because of the existing Vercel/Next.js streaming truncation bug for large binaries. - Adds `PROXY_FOLLOWUPS.md` capturing the medium / low-impact items from the review (cache(), sentinel cleanup, browser supabase consolidation, useSupabase refactor, edge runtime, etc.) with suggested PR splits, so the next pass has a clear starting point. **How.** - `src/middleware.ts` matcher gains `api/proxy` in its negative lookahead. The proxy still authenticates itself (`getServerAuthToken()` → bearer header) and the backend re-validates the JWT, so the middleware-level auth was pure overhead for API requests. - The new `route.ts` is ~110 lines vs ~290 lines before. All the per-content-type branching (`handleJsonRequest` / `handleFormDataRequest` / `handleUrlEncodedRequest` / `handleGetDeleteRequest`) and the `createResponse` / `createErrorResponse` helpers collapse into one `handler()` that just calls `fetch()`. `makeAuthenticatedRequest` / `makeAuthenticatedFileUpload` in `lib/autogpt-server-api/helpers.ts` are still used by the legacy `BackendAPI` server-side path, so they stay. - Forwarded request headers are an explicit allow-list (Content-Type, Content-Length, Accept, Accept-Language, Accept-Encoding, X-Act-As-User-Id, X-API-Key, sentry-trace, baggage). Hop-by-hop response headers are filtered out (Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization, TE, Trailer, Transfer-Encoding, Upgrade, Content-Encoding). ### Changes 🏗️ - `src/middleware.ts`: matcher now excludes `api/proxy`. - `src/app/api/proxy/[...path]/route.ts`: streaming pass-through proxy with status/header propagation; workspace download branch retained. - `src/app/api/proxy/PROXY_FOLLOWUPS.md`: new follow-up plan for the medium/low-impact items. ### Checklist 📋 #### For code changes: - [ ] I have clearly listed my changes in the PR description - [ ] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] `pnpm format`, `pnpm lint`, `pnpm types` clean - [ ] `pnpm test:unit` — 178 test files pass (no regressions in the existing `route.helpers.test.ts` or `helpers.test.ts` suites) - [ ] Manual: load library, builder, copilot, monitor, settings/billing — verify React Query queries return data, mutations work, file uploads succeed, downloads stream - [ ] Manual: verify backend `Cache-Control: no-store` headers now reach the browser (devtools → network) - [ ] Manual: verify backend 204s (e.g. delete operations) round-trip as 204, not 200 - [ ] Manual: verify large file upload (>50 MB) still works and Next.js memory stays bounded - [ ] Manual: verify session expiry still redirects correctly (admin pages, protected pages) — this depended on middleware running on page navigations, not on `/api/proxy`, so should be unaffected, but worth confirming #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [ ] `docker-compose.yml` is updated or already compatible with my changes - [ ] I have included a list of my configuration changes in the PR description (under **Changes**) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n /api/proxy (Significant-Gravitas#13019)" This reverts commit 431730c.
…Significant-Gravitas#13379) Resolves SECRT-2424. ### Why / What / How **Why** — In AutoPilot, messages in the middle of a conversation silently disappeared from the rendered view during a multi-turn session (no refresh). Scrolling up did **not** bring them back; only a full page refresh restored them. The messages were never lost on the backend — purely a client-side rendering hole. **What** — Stop the post-turn force-hydrate from dropping in-memory messages that are older than the refetched window. **How** — AutoPilot displays `concatWithAssistantMerge(pagedMessages, currentMessages)`: - `GET /sessions/{id}` returns only the **most-recent ~50 messages** — a sliding *tail window* (`backend/copilot/db.py`, `limit=50`). - After every turn (`status: streaming → ready`), `useHydrateOnStreamEnd` **force-replaced** `currentMessages` with that window. - As the conversation grows the window slides forward (its `oldest_sequence` increases), so the replace drops every in-memory message older than the new window's oldest sequence. - Meanwhile `pagedMessages` (older history the user scrolled back to) is intentionally preserved across refetches and only ever extends **older** (`before_sequence`), so it never covers the newly-vacated region. - Result: a hole between the top of `pagedMessages` and the bottom of the new window. Scroll-back can't fill a *middle* hole, so the messages stay gone until refresh rebuilds both sources contiguously. The fix adds `retainOlderHistory(prev, hydrated)` in `useHydrateOnStreamEnd`: before force-replacing, it keeps the `prev` messages whose DB sequence predates the window's oldest sequence and prepends them, keeping the result contiguous with the older history. Streaming rows (AI-SDK uuids with no `-seq-N` id) are excluded — they're already inside the refetched window, so there are no duplicates. ### Changes 🏗️ - `useHydrateOnStreamEnd.ts`: new `retainOlderHistory` helper; the force-hydrate path now retains older-than-window messages instead of dropping them. - `helpers/convertChatSessionToUiMessages.ts`: export the existing `extractDbSequence` (`-seq-N` parser) so the hook reuses it instead of duplicating the regex. - `__tests__/useHydrateOnStreamEnd.test.ts`: **new** regression coverage (the hook previously had none). ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `pnpm format`, `pnpm lint`, `pnpm types` — clean - [x] `pnpm test:unit` for the new test — 3/3 pass; changed hook coverage 71% stmts / 78% lines (>70%) - [x] Full copilot suite (`src/app/(platform)/copilot`) — 1200/1200 tests pass, no regressions - [x] Manual: long AutoPilot session (>50 messages), scroll up to load older history, send another message, confirm middle messages remain after the turn completes (no refresh) ### Test plan to reproduce the original bug 1. Open an AutoPilot session with enough turns that total persisted messages exceed 50. 2. Scroll up so older history loads (you can see the first message). 3. Send another message and let the turn finish — **without refreshing**. 4. Before this fix: messages between the loaded older pages and the recent window are missing and scroll-back won't recover them. After this fix: the conversation stays contiguous.
…sult (Significant-Gravitas#13381) ### Why / What / How **Why** — A `run_sub_session` sub-agent (Otto/AutoPilot) that does substantial work and then *delivers it by writing workspace files* — summarising only briefly in its final message — returns a hollow body to the parent (SECRT-2377). Confirmed in production: a sub made 44 tool calls and wrote ~37 KB of findings across 3 workspace files, then returned 227 tokens of *"delivered in three workspace documents, what's next?"*. The parent had no visibility into those files, treated the run as empty, and re-ran the entire task — ~22.6k tokens wasted, no error surfaced. Root cause is a **contract gap**: `response_from_outcome` carried only the sub's final assistant text (`response`) + a tool-call log. Workspace files are scoped to the *sub's* session (the parent lists its own session by default), and there was no field enumerating files the sub produced. So "deliver via files + summarise" was silent data loss. **What** — Add a `sub_workspace_files` manifest to `SubSessionStatusResponse`, populated on completion, so the parent can recover work delivered via files. Each entry has `file_id`, `name`, fully session-qualified `path`, and `size_bytes` — `path` is directly usable with `read_workspace_file(path=...)`. **How** — Two sources, authoritative-first: - **`list_sub_workspace_files`** (authoritative) reads the sub's session for `origin=agent-created` files. This captures writes from **any** turn — including the already-terminal / cold-poll path in `get_sub_session_result`, where the rebuilt tool-call log reflects only the sub's last message and would otherwise miss earlier writes. Returns `None` on lookup failure; `[]` means the sub wrote nothing. Capped at 50 entries. - **Tool-call mining** (`_workspace_files_from_tool_calls`) is the fallback when the listing is unavailable, parsing `write_workspace_file` outputs defensively (JSON string on live-drain, dict on persisted-replay). Both `run_sub_session._execute` and `get_sub_session_result._execute` fetch the listing on `completed` and pass it through `response_from_outcome`. The completed message also nudges the parent toward the files so a hollow `response` isn't mistaken for an empty run. (Stored `WorkspaceFile.path` is already session-qualified via `_resolve_path` on write, so no re-prefixing is needed.) ### Changes 🏗️ - `models.py`: new `SubWorkspaceFileInfo` model; `sub_workspace_files: list[...] | None` field on `SubSessionStatusResponse`. - `run_sub_session.py`: `list_sub_workspace_files` (authoritative), `_workspace_files_from_tool_calls` + `_as_payload` (fallback miner), `workspace_files` override param on `response_from_outcome`, completed-message nudge; `_execute` fetches the listing on completion. - `get_sub_session_result.py`: fetches the listing on completion and passes it through — fixes the cold-poll/terminal path. - `sub_session_test.py`: repro at `response_from_outcome` level + terminal-path and live-path `_execute` coverage; autouse fixture stubs the listing so no test hits the workspace DB. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/copilot/tools/sub_session_test.py` — 22 passed - [x] Repro: a completed sub that wrote files but returned a hollow `response` now surfaces `sub_workspace_files` with session-qualified paths - [x] Terminal/cold-poll path (`get_sub_session_result`, last message terminal, waiter skipped) still populates the manifest from the authoritative listing - [x] A sub that answered inline (no writes) yields `sub_workspace_files = None` (no noise) - [x] `poetry run format` and `poetry run lint` (ruff, isort, black, pyright) clean
…#13170) ### Why / What / How **Why:** The platform currently lacks a native way to parse JSON strings into Python objects, or encode Python objects into JSON strings. **What:** This PR introduces two new core data blocks: `JSONEncoderBlock` and `JSONDecoderBlock`. **How:** Built standard `Block` subclasses using the built-in `orjson` wrappers (`dumps` and `loads` from `backend.util.json`) to handle the conversion gracefully. Added comprehensive edge-case and boundary testing for both blocks. Closes : Significant-Gravitas#11108 ### Changes 🏗️ - Added `JSONEncoderBlock` to convert Python dictionaries/lists/primitives into JSON strings. - Added `JSONDecoderBlock` to parse JSON strings back into Python dictionaries/lists. - Added comprehensive unit tests in `test_json_blocks.py` covering successful encoding/decoding as well as error handling for malformed JSON strings. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Ensure automated unit tests pass for `JSONEncoderBlock` success and error boundary conditions - [x] Ensure automated unit tests pass for `JSONDecoderBlock` success and error boundary conditions - [x] Verify the blocks successfully pass the global Block Registry schema validation --------- Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: majdyz <zamil.majdy@agpt.co> Co-authored-by: Reinier van der Leer <pwuts@agpt.co> Co-authored-by: Mistral Vibe <vibe@mistral.ai>
…nificant-Gravitas#13393) ### Why / What / How **Why:** Two adjacent Copilot UX rough edges: - Entering a session that already has files auto-opened to the Context panel's "Files" tab — an extra click from actually seeing the most relevant output — and once an artifact was open there was no quick way back to the full files list. The header icon buttons also relied on native `title` tooltips. - The "Enable browser notifications" banner and the "out of automation credits" banner looked inconsistent (full-width amber bar vs. bordered card, different buttons/icons) and sat flush against each other. **What:** - Auto-open the **last generated file** directly in the Artifact panel when a session already has generated files, instead of the Context panel Files tab. - Add a folder button to the Artifact panel header (before the close "x") that opens the Context panel on the Files tab. - Show design-system tooltips (positioned below the button) with the action name for all Artifact panel header icon buttons. - Restyle the notification banner to match the low-credit banner (shared `Alert` molecule, `warning` variant), unify both action buttons to the default primary style, and add a gap between the two banners. **How:** - New store action `autoOpenArtifact(ref)` opens the Artifact panel on a given file, respecting the existing `_autoOpenUserClosed` guard. `useAutoOpenForFiles` now picks the most recently generated file (by `created_at`) and opens it via this action; the per-session `triggered` guard and session-change reset are preserved. - New store action `showFilesTab()` is an explicit (un-guarded) open of the Context panel Files tab, wired through `useArtifactPanel` → `ArtifactPanel` → `ArtifactPanelHeader` as `onOpenFiles` (desktop + mobile). - `HeaderButton` wraps its button in `Tooltip`/`TooltipTrigger`/`TooltipContent` (`side="bottom"`) from `atoms/Tooltip`, using the action name as content; native `title` replaced by the tooltip, `aria-label` kept for accessibility. A global `TooltipProvider` already wraps the app. - `NotificationBanner` now renders via the `Alert` molecule. Added a small backward-compatible `icon?` override prop to `Alert` so the banner keeps its bell icon (defaults to the variant icon, so no existing `Alert` usage changes). Both banners use `<Button variant="primary" size="small">`, and `CopilotPage` wraps them in `flex flex-col gap-3 ... empty:hidden`. ### Changes 🏗️ **Artifact / Context panel** - `copilot/store.ts`: add `autoOpenArtifact(ref)` and `showFilesTab()` actions. - `ContextPanel/useAutoOpenForFiles.ts`: open the last generated file in the Artifact panel instead of the Context panel Files tab. - `ArtifactPanel/components/ArtifactPanelHeader.tsx`: add folder button before close; wrap header icon buttons in tooltips positioned below. - `ArtifactPanel/useArtifactPanel.ts` + `ArtifactPanel/ArtifactPanel.tsx`: expose and wire `showFilesTab` as `onOpenFiles`. - `ContextPanel/__tests__/ContextPanelAutoOpen.test.tsx`: update assertion to the new artifact-open behavior. **Banners** - `copilot/components/NotificationBanner/NotificationBanner.tsx`: use the `Alert` molecule (warning variant) with the bell icon and a default-primary "Enable" button. - `components/layout/TopUpPrompt/LowCreditBanner/LowCreditBanner.tsx`: drop the custom orange "Top up" button styling so it matches the default primary. - `components/molecules/Alert/Alert.tsx`: add optional `icon` override prop. - `copilot/CopilotPage.tsx`: wrap both banners in a flex column with a gap. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Open a session that already has generated files → the last generated file opens directly in the Artifact panel - [ ] Click the folder button in the Artifact panel header → Context panel opens on the Files tab - [ ] Hover each header icon button (Back, Copy, Download, All files, Close) → tooltip with the action name shows **below** the button - [ ] Explicitly close the panel, then re-enter the session → auto-open is suppressed (respects user close) - [ ] Verify behavior on mobile (folder button closes the artifact drawer and opens the Files sheet) - [ ] With low credits + notifications available, both banners render as matching bordered cards with a gap, identical primary buttons, and matching orange icons --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…icant-Gravitas#13395) ### Why / What / How **Why:** The copilot tool-call UI animations were heavy and broken — the status line did a per-character 3D-spring-blur stagger that re-ran on every streamed token (leaving "gray blurry lines" for seconds and replaying for every line on reload), and the result accordion expanded with a springy bounce into an unbounded, black code block that could swallow the page. **What:** Reworked the tool-call status line, the result accordion, and the code block styling for a faster, calmer, light-mode feel. **How:** `MorphingTextAnimation` now renders crisp text with a single opacity-only fade gated to live streaming (`animate={isStreaming}`), so historical lines stay static on reload and nothing smears mid-stream; the `ToolAccordion` expand swaps the spring for a no-bounce ease-out tween with a softer blur bridge; and the expanded content is capped at `max-h-[24rem]` with `overflow-y-auto` while `ContentCodeBlock` moves from black to a light `bg-neutral-100`. ### Changes 🏗️ - `MorphingTextAnimation`: removed per-character 3D/spring/blur stagger → single opacity fade, only for actively-streaming tool calls; respects `prefers-reduced-motion`. - `ToolAccordion`: replaced springy bounce with ease-out tween, reduced blur bridge, capped content height with a scroller so long output can't cover the page. - `ContentCodeBlock`: switched from black to light mode (`bg-neutral-100` / `text-neutral-800`). ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run a copilot session with tool calls; confirm status lines fade in once (no per-char/blur smear) while streaming - [x] Reload a session with past tool calls; confirm lines render static with no entrance replay - [x] Expand a tool result accordion; confirm smooth ease-out (no bounce) and long output scrolls within a bounded height - [x] Confirm the code block renders in light mode
Significant-Gravitas#13408) ### Why / What / How **Why:** picsum.photos was used as a placeholder for marketplace agent images in three product call sites. One of them (the publish modal's Details-step initial image) is **not display-only** — `thumbnailSrc` seeds the modal's image list which is submitted as `image_urls`. So publishing **without** a custom image persisted a **random** `picsum.photos/300/200` URL (no seed → different image each request) as the live listing thumbnail. This is a latent bug. The same external dependency also caused **intermittent E2E failures**: seeded picsum images are fetched server-side via Next's `/_next/image` optimizer; when picsum slowed down, hung image requests saturated the HTTP/1.1 localhost connection pool and stalled the next client-side navigation (logout, view-progress), timing out assertions in `auth-happy-path` and `publish-happy-path`. Decision (with @krzysztof.czerwinski): **just remove picsum.** Agents already have proper image options (upload or AI-generate), and image-less listings already render a solid-color fallback — so there's no real gap to fill. **What:** Remove the picsum.photos dependency entirely — from product fallbacks, E2E seed data, the Next.js image-domain allow-list, and a dev styleguide demo. Make the publish E2E hermetic. **How:** - Replaced the `|| "https://picsum.photos/300/200"` fallbacks with `|| ""`. The existing `images.length === 0 → "At least one image is required"` validation now genuinely requires a real image instead of being silently satisfied by a junk URL. - Seed data seeds `image_urls` as `[]` (cards render their built-in fallback) and creator avatars as `""` (see note below), removing the `get_image()` picsum helper. - The `publish-happy-path` E2E selects a local fixture and **stubs the media upload response** (the E2E stack has no GCS bucket); it does **not** call the AI `generate_image` endpoint. #### Follow-up fixes (after first CI run) Removing picsum surfaced two unrelated latent issues that the picsum pre-fill had been masking. Both are now fixed: 1. **Creator avatars must be non-null.** The `Creator` DB view types `avatar_url` as a non-nullable `String` (maps `p."avatarUrl"` with no COALESCE). Seeding `avatarUrl=None` made `GET /api/store/creators` 500 (`converting field avatar_url … found incompatible value of null`), so the marketplace landing rendered its error card and "Become a Creator" never appeared. Seed `""` instead — non-null, renders the frontend fallback avatar, no external fetch. 2. **No GCS media bucket in the E2E stack** (`MEDIA_GCS_BUCKET_NAME` empty, no emulator), so a real upload 500s. The publish spec stubs the upload via Playwright `page.route` with a local asset URL. This spec covers the publish → track → delete dashboard flow, not GCS storage. (Open to wiring a fake-GCS emulator instead if preferred.) ### Changes 🏗️ - **Product fallbacks:** `usePublishAgentModal.ts` (×2) and `useAgentSelectStep.ts` — picsum fallback → `""`. - **E2E seed data:** `backend/test/e2e_test_data.py` & `backend/test/test_data_creator.py` — removed `get_image()`; `image_urls` → `[]`, creator `avatarUrl` → `""`. Updated docstrings + `backend/backend/TEST_DATA_README.md`. - **Playwright:** `credentials/index.ts` seed `image_urls` → `[]`; `marketplace.page.ts` `submitAgentForReview` opens the Thumbnails accordion, selects `assets/test-thumbnail.png` (new 1×1 PNG fixture), and stubs the media-upload response before submitting. - **Config:** `next.config.mjs` — removed `picsum.photos` from allowed image domains. - **Data migration** (`20260622120000_scrub_picsum_image_urls`): removes legacy picsum URLs already persisted in the DB — filters them out of `StoreListingVersion.imageUrls`, sets `Profile.avatarUrl` to `""` (non-null, Creator view), and `LibraryAgent.imageUrl` to NULL. Real user-chosen images are untouched. - **Styleguide demo:** `copilot/styleguide/page.tsx` — two picsum URLs → local `/placeholder.png`. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Full-stack E2E CI green — `publish-happy-path` and `auth-happy-path` pass; no picsum requests / 504s in the run - [x] `pnpm format`, `pnpm lint` (only pre-existing warnings), `pnpm types` pass - [x] Publish modal unit tests pass (44 passed) - [x] Backend seed files compile; backend `create_store_submission` accepts empty `image_urls` #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation + fix Exa/Airtable bugs (Significant-Gravitas#13135) ### Why / What / How **Why.** While auditing the webhook ingress path, three issues surfaced: 1. **Exa** webhook verification was broken. The signature check ran inside `validate_payload`, computed HMAC against `webhook.secret` (Exa actually signs with the secret *it* returns at registration, stored in `config["exa_secret"]`), read the wrong header (`X-Exa-Signature` vs the real `Exa-Signature`), compared the raw `t=<ts>,v1=<hex>` header against a bare hex digest, and signed the body without the required `<timestamp>.` prefix. It could never have validated a real Exa delivery. 2. **Airtable** verification had the same flavor of bug: Airtable returns the signing secret base64-encoded as `macSecretBase64` (stored in `config["mac_secret"]`), but the code used the base64 string verbatim as the HMAC key instead of base64-decoding it first. Additionally, the cursor update in `validate_payload` *replaced* the whole `config` blob, wiping `mac_secret` after the first delivery. 3. The **generic webhook** trigger accepted any payload posted to its URL with no way to opt into a shared-secret check, even though the user controls both ends. These didn't outright break anything today because Exa/Airtable gated the check behind `if header_present:` — absent header → check skipped. Net effect: verification was effectively a no-op everywhere except GitHub/Telegram. **What.** Consolidate signature verification into a dedicated, provider-overridable step that runs *before* payload validation, and make it actually correct for every provider whose protocol supports it. **How.** New `BaseWebhooksManager.verify_signature(webhook, request)` classmethod, called by the ingress router before `validate_payload`. The default is a no-op so providers without a signing scheme (Compass, Slant3D) don't have to fake one. Verification failures return **403** (not 404 — that would leak webhook existence). Providers that sign override it and `hmac.compare_digest` against the stored secret. > **Note on feature flags:** an earlier revision of this PR gated Exa/Airtable enforcement behind LaunchDarkly flags (`ENFORCE_*`) for a staged rollout. We dropped that approach: a DB check confirmed **no agents use the Exa or Airtable webhook triggers** (zero registered webhooks for either provider), so the flags were pure ceremony — and a `default=False` gate makes a security control fail *open* on a LaunchDarkly outage. Verification now enforces unconditionally, consistent with GitHub/Telegram. ### Changes 🏗️ | File | Change | |---|---| | `webhooks/_base.py` | Add `verify_signature` classmethod (no-op default) | | `api/features/integrations/router.py` | Call `verify_signature` before `validate_payload`; 403 on failure | | `webhooks/github.py`, `telegram.py` | Move existing checks from `validate_payload` into `verify_signature` (pure refactor, behavior unchanged) | | `blocks/exa/_webhook.py` | Correct implementation: read `Exa-Signature`, parse `t=<ts>,v1=<hex>` (multi-`v1` supported), sign `<ts>.<raw body>`, key with `config["exa_secret"]` ([Exa docs](https://docs.exa.ai/websets/api/webhooks/verifying-signatures)) | | `blocks/airtable/_webhook.py` | Base64-decode `config["mac_secret"]` before HMAC; verify `X-Airtable-Content-MAC` (`hmac-sha256=<hex>`); merge cursor update into existing config instead of replacing it | | `blocks/generic_webhook/_webhook.py`, `triggers.py` | Optional `secret_token` input → require matching `X-Webhook-Secret` (constant-time); unset = today's behavior | | `docs/platform/new_blocks.md`, `docs/.../generic_webhook/triggers.md` | Document the `verify_signature` extension point and the generic `secret_token` | **Compass / Slant3D:** no code change — their protocols have no signing scheme; the default no-op covers them (the UUID URL is the bearer secret). ### Test plan 📋 `backend/api/features/integrations/webhook_ingress_test.py` (24 tests): - Unsigned providers (Compass, Slant3D) pass through. - Always-signed providers (GitHub, Telegram) reject missing/invalid sigs, accept valid ones. - **Exa**: missing / malformed / wrong-signature → 403; correct `t=,v1=` over `<ts>.<body>` → accepted; signature computed over body-only (the old bug) → rejected; missing `config["exa_secret"]` fail-closes. - **Airtable**: missing sig / missing `config["mac_secret"]` → 403; correct base64-decoded HMAC → accepted; regression test that the old "base64 string as key" HMAC does **not** verify. - **Generic webhook**: no-token / empty-token pass through; configured token → missing/wrong/correct `X-Webhook-Secret`. - Ordering: `verify_signature` runs before `validate_payload`. #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/api/features/integrations/webhook_ingress_test.py` — 24 passed - [x] `poetry run lint` / `type-check` clean; full CI green Closes [SECRT-2359](https://linear.app/autogpt/issue/SECRT-2359). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t at (links, replies, forwards), securely (Significant-Gravitas#13396) ### Why / What / How **Why:** When a user pointed the Discord bot at an existing conversation — pasting a channel/message link, `<#>`-mentioning a channel, **replying** to a message, or **forwarding** one — the bot couldn't read it. It has no web access to Discord and would deflect: *"I can't open links / read Discord history — paste the text here."* But the bot is already on the gateway and can read anything it (and the user) has access to. The context was right there; we just weren't gathering it or handing it to the model. **What:** The bot now gathers the conversation context a user points it at, from **every** way they can point at one, and feeds it into the AutoPilot turn — so it answers from the real conversation instead of deflecting. Every read is gated on the **requesting user's** own permissions, so the bot never surfaces anything the user couldn't already see. **How:** - A gateway-free `references.py` parses channel/thread IDs **and specific message IDs** out of message text (permalinks + `<#id>` mentions), de-duplicated and order-preserving. - The Discord adapter fetches each referenced conversation via its own gateway connection: - **Channel/`<#>` reference** → recent history. - **Specific-message permalink** (`/channel/message`) → that exact message plus the turns leading up to it (works for old messages, and is kept even when it points at the current channel — "what was said here `<link>`"). - **Replies** (`message.reference`) and **forwards** (`message_snapshots`) are folded in as quoted context. - The platform-agnostic handler renders the gathered conversations into the prompt and leads with a firm instruction that the content is *already supplied*, then rewrites the raw link into a readable `#channel-name` token so the model stops fixating on a URL it thinks it must open. **Security model (every read gated on the requester, not the bot):** - Same-guild only — never surfaces content from another server. - Requester must have `view_channel` + `read_message_history` on the target. - **Private threads** additionally require the requester to actually be a member (`manage_threads` bypasses, as in the client) — channel-level perms aren't enough. - References are scanned **only from the user's own typed message** — links inside a forwarded/replied-to/quoted message are context, never auto-fetched or rewritten. - Requester resolved from `message.author` (the bot runs without the privileged members intent, so the member cache is unreliable). - Bounded: ≤3 conversations, ~8k chars each; LLM-produced mentions stay on an allowlist. ### Changes 🏗️ - **New** `adapters/discord/references.py`: `extract_referenced_targets()` (channel + optional message id), `replace_referenced_links()`, and a frozen `ReferenceTarget` pydantic model. - **`adapters/base.py`**: `ReferencedConversation` (title, `channel_id`, messages) + `MessageContext.referenced_conversations`. - **`adapters/discord/adapter.py`**: `_fetch_referenced_conversations` / `_fetch_one_referenced` (same-guild, requester-ACL, private-thread membership, specific-message vs recent-history fetch, budgeted), `_can_requester_read`, `_with_reply_context` / `_resolve_reply`, link-rewriting; reference scanning runs on the user's own text only. - **`handler.py`**: always surfaces referenced conversations (channels, DMs, threads), gating only thread-history behind the first-@-into-thread flag; firm "you already have the content, don't deflect" framing. - Comprehensive tests across `references_test.py`, `adapter_test.py`, `handler_test.py` (permalink parsing, ACL gates incl. private-thread membership, reply/forward context, "quoted links aren't fetched", requester-from-author, fail-safe paths). ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Unit/integration: 253 bot tests pass - [x] **Local end-to-end** against the live gateway: linked a channel and a specific-message permalink, replied to a message, forwarded a message — confirmed the bot reads the real content instead of deflecting - [x] Verified security gates: cross-guild skipped, private channel/thread without access skipped, links inside quoted (forward/reply) content not fetched
…t-Gravitas#13399) ### Why / What / How <!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? --> This PR addresses issue BUILDER-7HR. Previously, `Sentry.captureConsoleIntegration()` was called without any options, causing it to capture all console levels (including `info`, `debug`, `log`). This led to Sentry ingesting noisy messages, such as "open [object Event]" from wallet browser extension SDKs, which were logged to the console at info/debug levels. This change prevents the ingestion of irrelevant console messages, reducing Sentry noise and improving observability. <!-- What: What does this PR change? Summarize the changes at a high level. --> This PR restricts the console levels captured by Sentry in the frontend. <!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. --> By modifying `autogpt_platform/frontend/instrumentation-client.ts` to pass `{ levels: ["fatal", "error", "warn"] }` to `Sentry.captureConsoleIntegration()`. This aligns the client-side Sentry configuration with the already correct configuration in `sentry.edge.config.ts`. ### Changes 🏗️ <!-- List the key changes. Keep it higher level than the diff but specific enough to highlight what's new/modified. --> - Modified `autogpt_platform/frontend/instrumentation-client.ts` to explicitly configure `Sentry.captureConsoleIntegration()` to only capture `fatal`, `error`, and `warn` console levels. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [ ] Manually trigger `console.info()`, `console.debug()`, `console.log()` in the frontend and verify these messages do *not* appear in Sentry. - [ ] Manually trigger `console.warn()`, `console.error()`, `console.fatal()` (or throw an unhandled error) in the frontend and verify these messages *do* appear in Sentry. - [ ] Observe Sentry dashboard for a reduction in noisy `info`/`debug` level console events. <details> <summary>Example test plan</summary> - [ ] Create from scratch and execute an agent with at least 3 blocks - [ ] Import an agent from file upload, and confirm it executes correctly - [ ] Upload agent to marketplace - [ ] Import an agent from marketplace and confirm it executes correctly - [ ] Edit an agent from monitor, and confirm it executes correctly </details> #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [ ] `docker-compose.yml` is updated or already compatible with my changes - [ ] I have included a list of my configuration changes in the PR description (under **Changes**) <details> <summary>Examples of configuration changes</summary> - Changing ports - Adding new services that need to communicate with each other - Secrets or environment variable changes - New or infrastructure changes such as databases </details> Fixes BUILDER-7HR Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
…nt-Gravitas#13309) Resolves [OPEN-3155](https://linear.app/autogpt/issue/OPEN-3155/make-trigger-agent-creation-more-consistent) ### Why / What / How **Why:** The "Trigger Agent" pattern (Significant-Gravitas#12740) lets AutoPilot poll a data source and run an *action agent* once per detected change via the `AgentExecutorBlock`. Two things made it inconsistent: 1. AutoPilot often crammed the polling loop **and** the action into a single scheduled agent. That agent runs on every poll, so its run list is mostly empty polls and the user can't tell which runs actually did anything. 2. A trigger agent exists only to drive its action (parent) agent and is never listed on its own in the library — but when the action agent was deleted, its trigger agents were left behind: orphaned, invisible, and still firing on their schedule. Separately, AutoPilot tended to reach for AI blocks even when plain logic would do — AI processing costs orders of magnitude more, spending the user's money needlessly. **What:** - Clarify the agent-building guide so AutoPilot reliably builds the polling (trigger) agent and the action agent as **two separate agents**. - Cascade-delete trigger agents when their action agent is deleted. - Add a general guide rule to prefer deterministic blocks over AI when an equivalent exists (cost), applying to all agents. **How:** - *Guide (triggers):* rewrote the `### Building Trigger Agents` section of `agent_generation_guide.md` to lead with a decision rule (poll + act-on-change ⇒ two separate agents), the anti-pattern and its run-list rationale, an over-split guard (a plain "do X on a schedule" agent stays a single agent), an explicit build order, and `AgentExecutorBlock` as the preferred sink. All new guidance stays inside the `GENERIC_TRIGGER_AGENTS`-gated section (bold labels only, heading unchanged) so the feature-flag strip still works. - *Guide (cost):* added a top **Key Rules** bullet steering AutoPilot to deterministic blocks (`CodeExecutionBlock`, `ConditionBlock`, …) over LLM blocks whenever a non-AI equivalent exists, reserving AI for genuine reasoning/summarization/generation. Always-visible (not gated), so it applies to every agent. - *Cleanup:* `delete_library_agent` now finds the hidden agents whose graph runs the deleted graph via an `AgentExecutorBlock` (the same derived relationship `list_trigger_agents` uses) and recursively deletes each, reusing the existing schedule/webhook cleanup. A trigger that also drives a *different* action agent is kept (the deleted agent must be its only sink). The cascade is skipped when deleting a hidden agent, which also bounds recursion to one level. ### Changes 🏗️ - **`copilot/sdk/agent_generation_guide.md`** — - Rewrote the "Building Trigger Agents" section: mandates the action/trigger split with an over-split guard; `AgentExecutorBlock` is the preferred sink, `AutoPilotBlock` the fallback. (Also ~40% more concise than the prior version.) - Added a first "Key Rules" bullet: prefer pure logic over AI when an equivalent exists, to avoid unnecessary LLM cost (applies to all agents). - **`copilot/tools/get_agent_building_guide_test.py`** — 2 regression tests locking the "two separate agents" + "do NOT over-split" guidance (gating coverage inherited from the existing flag-off test). - **`api/features/library/db.py`** — `delete_library_agent` cascades to orphaned trigger agents via two new helpers: `_cleanup_trigger_agents_for_graph` (finds + deletes sole-sink triggers) and `_trigger_targets_other_graph` (the "only sink" guard). - **`api/features/library/db_test.py`** — 5 tests: cascade fires for visible agents, skips hidden agents, deletes sole-sink triggers, keeps multi-sink triggers, and the sink-detection helper. No configuration changes. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `get_agent_building_guide_test.py` — 10 passed (heading sentinel, both new split-guidance assertions, flag on/off strip-gating); still green after the cost-rule note - [x] `db_test.py` cascade tests — 5 passed (visible→cascade, hidden→skip, sole-sink→delete, multi-sink→keep, sink detection) - [x] Full `api/features/library/db_test.py` — 25 passed (no regressions, incl. after merging dev) - [x] black / isort / ruff / pyright clean on all changed files - [x] Manual end-to-end AutoPilot eval (build a polling+action goal → expect a visible action agent + hidden trigger; delete the action agent → expect the trigger gone) — not run locally; needs a `GENERIC_TRIGGER_AGENTS`-on session --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Gravitas#13382) ### Why / What / How **Why:** When a backing service (Redis cluster, RabbitMQ, an RPC peer) is unreachable, the connection-acquisition retry loops kept the whole backend from shutting down. `conn_retry` and `create_retry_decorator` retry up to ~100 times with 30s exponential backoff (`pyro_client_comm_retry=100`, `pyro_client_max_wait=30`) — roughly 50 minutes — with **no shutdown awareness**. The backoff sleeps run on worker/background-event-loop threads that never receive SIGINT/SIGTERM (only the main thread does), and the eager `await redis_client.get_redis_async()` in `AppService.lifespan` keeps Uvicorn stuck in *startup*, so its graceful-shutdown path never runs. Net effect: `Ctrl+C` is ignored while a service is retrying, and the logs spam "Acquiring connection failed … Retrying now…" indefinitely. **What:** Make the retry loops abort promptly when the process receives a shutdown signal. **How:** - Added a process-wide shutdown flag in `backend/util/retry.py` — `request_shutdown()` / `is_shutting_down()` backed by a `threading.Event` (signal-safe: it only sets an event). - Added a tenacity `_StopOnShutdown` stop condition (combined via `|` with the existing `stop_after_attempt`) so no new retry is attempted once shutdown is requested. - Added interruptible sleeps (`_interruptible_sleep` / `_interruptible_async_sleep`) that wake immediately on shutdown instead of sleeping out the full backoff. `conn_retry` and `create_retry_decorator` pick the right one based on whether the wrapped function is sync or async. - Both signal handlers — `AppProcess._self_terminate` and `AppService._self_terminate` — now call `request_shutdown()` first. On the first `Ctrl+C`, the in-flight retry wakes from its backoff, the stop condition fires, `connect_async()` raises, the lifespan startup fails fast, Uvicorn proceeds to shut down, and the process exits. ### Changes 🏗️ - `backend/util/retry.py`: add `request_shutdown()`/`is_shutting_down()`, `_StopOnShutdown` tenacity stop condition, and `_interruptible_sleep`/`_interruptible_async_sleep`; wire them into `conn_retry` and `create_retry_decorator` (the latter now picks the sync/async interruptible sleep at decoration time). Final-failure log distinguishes "aborted: shutting down" from "failed after retries". - `backend/util/process.py`: `AppProcess._self_terminate` calls `request_shutdown()`. - `backend/util/service.py`: `AppService._self_terminate` calls `request_shutdown()` so the eager Redis connect in `lifespan` stops blocking Uvicorn startup. - `backend/util/retry_test.py`: 5 new tests (sync/async `conn_retry` abort on shutdown, `create_retry_decorator` abort, both interruptible sleeps return early) + an autouse fixture that resets the shutdown flag between tests. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/util/retry_test.py` — 19 passed (14 existing + 5 new shutdown tests) - [x] `poetry run ruff check` + `poetry run pyright` clean on all changed files; pre-commit hooks (ruff/isort/black/pyright) pass - [x] `backend/util/service_test.py::test_graceful_shutdown` --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…tas#13384) ### Why / What / How **Why:** On startup, the app outputs multiple deprecation warnings from Pydantic v2 and field name shadowing that clutter the console and indicate outdated code patterns. The search RPC layer also logged return-type validation warnings. **What:** This PR fixes deprecation/validation warnings in our own codebase and eliminates dependency-related warnings where possible by upgrading dependencies. **How:** - Converted class-based Pydantic `Config` to `SettingsConfigDict` (Pydantic v2 style) - Renamed shadowed field names to avoid conflicts with `BaseModel` attributes, with a backward-compatibility validator so existing serialized data still deserializes - Removed an unused field from a `TypedDict` return contract that was triggering return-type validation warnings - Upgraded dependencies that were causing transitive deprecation warnings ### Changes 🏗️ - `backend/copilot/config.py`: Converted `ChatConfig.Config` class to `model_config = SettingsConfigDict(...)`; reworded the dialect docstring so the `ollama` default key no longer trips the secret scanner - `backend/blocks/exa/helpers.py`: Renamed `SummarySettings.schema` to `output_schema` to avoid shadowing `BaseModel.schema`, with a `model_validator(mode="before")` that accepts the legacy `schema` key for backward compatibility. Also fixed zero-valued `extras` int counts (`links`/`image_links`) being sent to the Exa API instead of omitted. - `backend/blocks/exa/contents.py`: Updated all references to the renamed field and applied the same zero-extras omission fix in `ExaContentsBlock.run()` - `backend/api/features/search/hybrid_search.py`: Removed the unused `lexical_raw` field from `HybridSearchRow`. It is computed in SQL only to derive `lexical_score` and is never projected into result rows, so its required presence on the `TypedDict` made every cross-service search RPC log a return-type validation warning. - Tests: added coverage for the Exa `output_schema` → `schema` SDK mapping (both `process_contents_settings` and `ExaContentsBlock.run` paths), zero-extras omission, the legacy `schema` alias, and a `HybridSearchRow` contract test (positive + negative) - `pyproject.toml`: Upgraded `supabase` from exact pin `2.28.0` to caret `^2.31.0`, aligning with the caret convention used by sibling dependencies - `poetry.lock`: Updated to reflect dependency changes ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified all imports work without warnings from our own code - [x] Ran existing tests for affected modules (16 tests in cost_tracking_test.py, 15 tests in helpers_cost_test.py) - [x] Confirmed no type checking errors - [x] Verified app startup is clean of warnings from our codebase --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Mistral Vibe <vibe@mistral.ai>
…rk PRs (Significant-Gravitas#13406) ## Why Six pre-existing copilot tests fail on **every fork PR** while staying green on `dev`: - `backend/copilot/baseline/service_unit_test.py::TestBaselineReasoningStreaming::test_reasoning_param_absent_on_non_anthropic_routes` - `…::test_reasoning_param_suppressed_when_thinking_tokens_zero` - `backend/copilot/graphiti/communities_test.py::TestRebuildCommunitiesForUser::test_success_path_calls_detach_delete_then_build` - `…::test_failure_path_returns_error_in_result` - `…::TestRebuildForUserActivityGate::test_force_bypasses_gate` - `…::TestRebuildPathSelection::test_uses_flex_client_by_default` **Root cause:** these tests implicitly assume the ambient `ChatConfig` transport is `openrouter`. `openrouter_active` is only true when `api_key` is non-empty, and `api_key` falls back to `OPENAI_API_KEY`. GitHub Actions **withholds repository secrets from `pull_request` runs triggered by forks**, so `OPENAI_API_KEY` is empty there and the transport silently drops to `direct_anthropic`: - graphiti: `use_flex = flex_requested and chat_cfg.transport.supports_flex_tier`; only `openrouter` has `supports_flex_tier=True`, so the sync path runs the *real* client and the tests that only patch `make_flex_graphiti_client` fall through (wrong shape / no error / `execution_path == "sync"`). - baseline: without openrouter, `extra_body` isn't built → `KeyError: 'extra_body'`. `dev` is green because its CI runs as `push` events, where secrets are present. This surfaced on JSON-blocks PR Significant-Gravitas#13170 purely because it was a fork PR — its code changes were unrelated. > **Note:** this PR is deliberately opened **from a fork** (`Pwuts/AutoGPT`) so CI runs without secrets — verifying the fix actually holds under the failing condition. It is the fork-CI counterpart of the same change in Significant-Gravitas#13386. ## What Make the affected tests hermetic by pinning the OpenRouter transport, so they pass regardless of credential availability: - **`service_unit_test.py`** — pin `use_openrouter` / `api_key` / `base_url` in the two reasoning tests that relied on `extra_body` being built (matching the existing Kimi-test pattern). - **`communities_test.py`** — add an autouse fixture pinning the transport so the flex-tier path is taken deterministically. The flag-driven sync test (`test_uses_sync_client_when_flex_disabled`) is unaffected — it forces sync via `community_rebuild_use_flex_tier`, not the transport. ## How Verified locally in both credential states (`-p no:cov`, 138 tests each): ```bash # Fork-CI simulation (secrets cleared) — previously 6 failures, now: env CHAT_API_KEY= CHAT_BASE_URL= OPENAI_API_KEY= OPEN_ROUTER_API_KEY= OPENAI_BASE_URL= \ poetry run pytest backend/copilot/baseline/service_unit_test.py \ backend/copilot/graphiti/communities_test.py # -> 138 passed # Normal env (creds present) — no regression: poetry run pytest backend/copilot/baseline/service_unit_test.py \ backend/copilot/graphiti/communities_test.py # -> 138 passed ``` The real proof is this PR's own CI (fork → no secrets) going green. No production code changes — test-only. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ignificant-Gravitas#13186) ## Summary Fixes the bug tracked in [SECRT-2373](https://linear.app/autogpt/issue/SECRT-2373). The title-generation LLM call in `_generate_session_title` was passing the user's raw first message verbatim as the `user` turn. With no framing around it, the model occasionally interpreted imperative user messages (e.g. "Write me a Python script...") as direct instructions and responded with task output rather than a short title. ## Changes **`autogpt_platform/backend/backend/copilot/service.py`** - Updated system prompt to explicitly tell the model it will be shown a user message and must not follow any instructions in it. - Wrapped the user's message in an `<conversation>` XML tag in the user turn, clearly separating it from the model's own instructions. --------- Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com>
… model (Significant-Gravitas#13313) ## Summary Removes a duplicate `unreal_speech_api_key` field declaration in the `Secrets` model (`autogpt_platform/backend/backend/util/settings.py`). The field was declared twice (lines 751 and 754) with identical definitions. In Pydantic, the second declaration silently shadows the first, which is confusing and could mask issues if the descriptions or defaults ever diverge. ## Changes - Removed the duplicate `unreal_speech_api_key` field on line 754 of `settings.py` ## Testing - No functional change — Pydantic uses the last declaration which had the same default and description. - Existing tests should pass unchanged. ## Checklist - [x] My code follows the code style of this project - [x] I have performed a self-review of my code Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
…pt-in) (Significant-Gravitas#13133) ## Why The `PlatformLinkingManager` AppService was added in PR Significant-Gravitas#12615 and deployed to dev/prod via infra PR Significant-Gravitas#310 — but it was never added to the local `docker-compose` stack. Anyone trying to test the bot flow (Discord or Slack) locally hits the same wall I just did: ``` httpx.ConnectError: All connection attempts failed ... bot_backend.create_link_token → self._client.create_server_link_token ``` The rest_server tries to RPC to a service that doesn't exist in the local Compose project. This PR backfills the missing service so local bot testing works. ## What - **`docker-compose.platform.yml`** — new `platform_linking_manager` service definition (same shape as `notification_server`: same Dockerfile, `["platform-linking-manager"]` poetry script, port 8009, shared backend env, depends on `db` + `redis-0` + `migrate` + `database_manager`). - **`docker-compose.platform.yml`** (env block) — adds `PLATFORMLINKINGMANAGER_HOST: platform_linking_manager` to the `x-backend-env` YAML anchor that all backend services inherit. Mirrors the existing entries for `DATABASEMANAGER_HOST`, `NOTIFICATIONMANAGER_HOST`, etc. - **`docker-compose.yml`** — references the new service via the existing `extends` pattern. - **`bot/README.md`** — adds a "Docker (local dev)" subsection under "Running" explaining the opt-in profile and why the service is needed. ## How **Opt-in via Docker Compose profiles.** The service has `profiles: ["bot"]` so it doesn't start with a default `docker compose up -d`. Two ways to opt in: ```bash # Start the linking manager alongside your existing stack docker compose up -d platform_linking_manager # Or include via the profile (also brings up anything else profile-tagged later) docker compose --profile bot up -d ``` This matches the spirit of how the service is managed in dev/prod (its own Helm chart, separately deployable) while keeping local Compose reproducible and the default boot fast. **No code changes.** The service definition uses the existing backend image and the `platform-linking-manager` poetry script that's been in `pyproject.toml` since PR Significant-Gravitas#12615. Pure Compose + env wiring. ## Tests - `docker compose up -d platform_linking_manager` brings the service up healthy on port 8009 ✓ - `docker compose up -d` (no profile / no explicit name) does **not** start the service — default behaviour preserved ✓ - `docker exec autogpt_platform-rest_server-1 sh -c 'echo $PLATFORMLINKINGMANAGER_HOST'` returns `platform_linking_manager` after recreate, RPC works end-to-end ✓ - Slack `/setup` slash command end-to-end through `chat.postMessage` button → confirm page → workspace link created in DB ✓ (verified via PR Significant-Gravitas#13132 testing) ## Out of scope (intentional) - This PR doesn't add `platform_linking_manager` to the prod docker-compose files used elsewhere — it's already deployed as its own Helm chart in GKE. This is purely a local-dev fix. - This PR doesn't touch the bot service definitions themselves — Discord runs via its own Helm chart, Slack rides the existing `autogpt-server` pod. ## Related - PR Significant-Gravitas#12615 — added the `PlatformLinkingManager` AppService - PR Significant-Gravitas#310 (infra) — deployed it to dev/prod - PR Significant-Gravitas#13130 — webhook adapter base for the bot - PR Significant-Gravitas#13132 — Slack adapter (the thing that surfaced this gap during local testing) Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
…edule (Significant-Gravitas#13419) ### Why / What / How **Why:** Agents in the Library — notably Marketplace-installed ones — showed a **"Scheduled"** status badge even when the current user had never scheduled them. The same flaw inflated the fleet-summary "scheduled" count. Reported as OPEN-3184. **What:** The frontend scheduled-predicate `isAgentScheduled()` now keys solely off the user-scoped `is_scheduled` flag and no longer treats `recommended_schedule_cron` as an active schedule. **How:** `recommended_schedule_cron` comes from `AgentGraph.recommendedScheduleCron` — a creator-defined *suggestion* attached to the graph and shared by every user who installs the agent. It is **not** evidence that the current user scheduled anything. The backend already computes `is_scheduled` correctly (per-user, via `_fetch_schedule_info(user_id=...)`); only the frontend predicate was wrong. Since the badge, fleet summary, sitrep, status-map, and list filter all funnel through this one predicate, the single-line change fixes every call site. ```ts // before return !!agent.is_scheduled || !!agent.recommended_schedule_cron; // after return !!agent.is_scheduled; ``` ### Changes 🏗️ - `library/hooks/executionHelpers.ts` — `isAgentScheduled()` now returns `!!agent.is_scheduled` only; dropped `recommended_schedule_cron` from the predicate and its signature. - `library/hooks/executionHelpers.test.ts` — updated unit tests; added a case asserting `recommended_schedule_cron` alone is ignored. - `library/__tests__/filter.test.tsx` — added page-level regression tests: a recommendation-only agent is excluded from the "scheduled" filter and included in "idle". ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `pnpm vitest run executionHelpers.test.ts filter.test.tsx` — 9/9 pass (TDD: confirmed the new assertion failed against the old impl first) - [x] Full `src/app/(platform)/library` suite — 115/116 pass; the single failure is a pre-existing, timezone-dependent test in `followups/.../helpers.test.ts`, verified failing on the clean tree too and untouched by this change - [x] `pnpm format`, `pnpm lint` (only pre-existing warnings in unrelated files), `pnpm types` — all clean Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Significant-Gravitas#13421) ### Why / What / How **Why:** The AutoPilot skills library could only list, view, and delete skills. Skills could only be *created* by the copilot itself via the `store_skill` tool — users had no way to bring their own skills in or take one out to share or back up. **What:** Adds an **Upload skill** action (page-level) and a **Download** action (per skill) to the library skills page, backed by a new `POST /api/skills` upload endpoint. **How:** The skill-write logic (validation → per-user `AsyncClusterLock` cap → workspace write) is extracted from `StoreSkillTool` into a shared `store_user_skill()` helper so the new REST endpoint and the copilot tool use one code path. Upload sends the raw `SKILL.md` text, which the backend parses with the existing `parse_skill_markdown` (malformed → 400, cap reached → 409, existing slug → upsert). Download fetches the skill detail and reconstructs a re-uploadable `SKILL.md` client-side (JSON string/array literals are valid YAML, so it round-trips without a YAML lib). ### Changes 🏗️ - **Backend:** new `POST /api/skills` (`uploadCopilotSkill`) endpoint + `UploadCopilotSkillRequest`; extracted shared `store_user_skill()` helper and `SkillLimitError`; refactored `StoreSkillTool` to reuse it. Added endpoint + helper tests. - **Frontend:** `UploadSkillButton` (hidden file picker + `useUploadCopilotSkill`) in the page header; `Download` button on each skill row; markdown-render/download helpers. Added integration tests for upload (success + limit) and download. - **API client:** regenerated `openapi.json` (+49 lines, scoped to the new endpoint/schema). ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] Backend: `poetry run pytest` for skills (88 passing, incl. 4 new upload tests); ruff + pyright clean - [x] Frontend: `pnpm test:unit` for the skills page (11 passing, incl. upload + download tests) - [x] Manual: upload a valid `SKILL.md` and confirm it appears in the list - [x] Manual: download a skill and re-upload the file to confirm it round-trips - [x] Manual: upload a malformed file (400) and at the 50-skill cap (409) show error toasts #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**)
The following vulnerabilities are fixed with an upgrade: - https://snyk.io/vuln/SNYK-JS-VITEST-17375131 - https://snyk.io/vuln/SNYK-JS-VITESTCOVERAGEV8-17375132 - https://snyk.io/vuln/SNYK-JS-SHELLQUOTE-16799355
|
This upgrade involves a medium-risk minor version update for Vitest and a low-risk patch update for concurrently. vitest (
|
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
This PR targets the Automatically setting the base branch to |
|
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
Snyk has created this PR to fix 3 vulnerabilities in the pnpm dependencies of this project.
Snyk changed the following file(s):
autogpt_platform/frontend/package.jsonVulnerabilities that will be fixed with an upgrade:
SNYK-JS-VITEST-17375131
SNYK-JS-VITESTCOVERAGEV8-17375132
SNYK-JS-SHELLQUOTE-16799355
Breaking Change Risk
Important
Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.
For more information:
🧐 View latest project report
📜 Customise PR templates
🛠 Adjust project settings
📚 Read about Snyk's upgrade logic
Learn how to fix vulnerabilities with free interactive lessons:
🦉 Arbitrary Command Injection