Run remaining benchmark scenarios in fedify bench#802
Conversation
Implement actor and object benchmark runners so read-path workloads can fetch actor documents and object documents after discovery. Authenticated reads use the synthetic actor server, and dry runs now describe the resolved read targets before any load is sent. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Implement the fanout scenario with benchmark-owned sink inboxes, trigger endpoint delivery, stats-based queue drain polling, and delivery-throughput expectations. Queue task counters are now projected from benchmark stats so the runner can tell when fanout and outbox work has drained. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Implement expected-outcome failure benchmarking and weighted mixed scenario execution so the remaining executable scenario types can run under the published suite schema. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Update the benchmarking manual and changelog for the newly executable bench scenario runners, including queue-backend, failure-outcome, and mixed-run semantics. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Keep unauthenticated actor and object reads on a read-only destination gate so remote private reads do not require a reachable synthetic actor server. Also validate mixed child references during runner preflight so bad suites exit as configuration errors before probing the target. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Gate object-discovery actor and collection fetches before crawling them, serialize fanout trigger and drain windows, and merge mixed scenario latency histograms from the underlying samples. These fixes keep benchmark safety gates ahead of public discovery requests and make reported fanout and mixed metrics match the traffic actually measured. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Apply mixed maxInFlight as a parent-wide fetch limiter instead of copying the same cap into every concurrent child scenario. This keeps the configured in-flight safety limit intact even when children run in parallel. Also skip URL-only collection items when an object source type filter is set, because their type cannot be checked without additional discovery fetches. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Pass the dry-run fetch and read destination gate into object crawl discovery so inspection cannot contact off-target public actor or collection URLs before the same safety checks used by real runs. Also validate explicit object source URLs and crawl seed identifiers during runner preflight, making malformed object sources fail as configuration errors before any probe or load is sent. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Pass the configured dry-run fetch into actor handle discovery so acct: actor plans use the same injected fetch and User-Agent wrapper as real actor runs. This prevents dry-run planning from falling back to the global network fetch. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Route authenticated actor and object reads through the configured signing pipeline so presign and pipeline modes do not silently fall back to inline JIT signing during the measured send path. Resolve and gate inbound failure scenario inboxes before starting the load loop, then reuse the discovered delivery target for each generated failure request. This keeps repeated WebFinger and actor discovery out of the measured samples. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Treat drained fanout queue failures as failed benchmark samples so failed sink deliveries cannot inflate success rate or delivery throughput. Allow missing-actor-only failure scenarios to run against non-loopback targets without --advertise-host, since the missing actor identity is placed under the benchmark target rather than the synthetic actor server. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Route missing-actor failure destinations through the gate that does not require a reachable synthetic actor server, while still applying public load safety checks. Track fanout delivery throughput separately from request throughput so mixed scenarios cannot satisfy deliveryThroughput expectations with read traffic from other children. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Leave fanout throughputPerSec as the trigger request rate and report recipient delivery rate only through deliveryThroughputPerSec. This keeps fanout reports consistent with request counts and prevents mixed scenarios from folding delivery throughput into ordinary request throughput. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Send ActivityPub Accept headers while crawling actor and collection sources for object benchmarks, keeping WebFinger discovery on a JRD Accept header. Carry deliveryThroughputPerSec into scenario reports, render it in text and Markdown, and publish report schema v2 so the JSON report preserves the new field without mutating the immutable v1 schema. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Keep inbound failure scenarios from treating target-side 5xx responses as expected client-fault rejections, so server crashes remain visible in the error buckets. Reject ambiguous mixed child references when duplicate scenario names would otherwise make child lookup bind the first matching scenario silently. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Make remote failure benchmark modes call the target benchmark trigger and observe the target's queue metrics before reporting the expected fault as successful. This keeps offline or non-participating targets from producing synthetic passing samples. Require a sender for remote failure faults and require advertised sink reachability for non-loopback targets, matching the benchmark-owned sink server used to exercise the outbound path. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Unwrap ActivityPub activity items while crawling object sources so outbox entries benchmark their contained objects instead of the wrapper activity. Apply source type filters after that unwrap step. Validate actor recipients during runner preflight so malformed values fail as configuration errors before probing or sending benchmark load. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Mixed scenarios currently merge client-side measurements but discard child server metrics, so server-side expectations would pass preflight and then fail after load with null actual values. Reject those expectations during mixed validation instead, before probing or sending benchmark load. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Validate actor and object read URLs before safety gates and load scheduling so malformed resolved destinations do not turn into full-duration failed sample runs. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Fail actor and object read scenarios during runner validation when resolved URLs cannot be fetched safely, and reject mixed queue-observing children that would share unscoped target queue counters with concurrent queue producers. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Treat a completed outbound task with a remaining retry as the expected network-error signal, matching Fedify's normal retry path where transport failures enqueue follow-up work instead of failing the current task. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Only report fanout queue-drain latency after at least one measured drain sample, and validate explicit failure inbox selectors before discovery can turn typos into runtime URL errors. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Select typed objects from every unwrapped activity object candidate, and wait for all mixed child runners to settle before rethrowing a child setup error. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Describe the newly executable benchmark scenario types in the CLI help and keep nearby schema/type comments aligned with collection being the only reserved scenario type. fedify-dev#785 Assisted-by: Codex:gpt-5.5
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds actor, object, fanout, failure, and mixed bench scenario runners; shared discovery/read helpers and destination gating; queue-task and permanent-failure snapshot parsing; delivery-throughput and report-v2 support; CLI dry-run/preflight wiring; extensive tests, fixtures, schema, and docs. ChangesFedify Benchmark Scenario Runners: Actor, Object, Fanout, Failure, Mixed
Sequence Diagram(s)sequenceDiagram
participant Runner
participant SyntheticServer
participant Target
participant StatsStore
participant Reporter
Runner->>SyntheticServer: spawn sink servers / create signing fleet
Runner->>Target: discovery requests (WebFinger, collections) and read requests
Runner->>Target: POST trigger (fanout / remote-failure)
Target->>StatsStore: update scopeMetrics (queue tasks, delivery failures)
Runner->>StatsStore: poll snapshots (baseline & end)
StatsStore->>Runner: return snapshots
Runner->>Runner: diff snapshots, aggregate client/server metrics
Runner->>Reporter: build report-v2 and render outputs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Fanout and remote failure scenarios now pass generated sink inboxes through the actorless destination gate before posting benchmark trigger payloads, so safe targets cannot be used to relay outbound load to unchecked public sinks. fedify-dev#802 (comment) fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces version 2 of the benchmark scenario and report schemas, implementing new scenario runners for actor, object, fanout, failure, and mixed types in fedify bench. It includes comprehensive validation, tests, and documentation updates for these runners, alongside new metrics for tracking delivery throughput and queue drain performance. Feedback is provided regarding a Time-of-Check to Time-of-Use (TOCTOU) race condition in the reservePort test helper, which could cause test flakiness under concurrent execution.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ee6ba0530
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Inbound failure scenarios now require the expected 401 rejection instead of accepting any client error as an observed fault. The action-level remote failure test no longer reserves and releases a port before the runner starts; it exercises the advertised sink path so the sink server binds its own available port. fedify-dev#802 (comment) fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request adds actor, object, fanout, failure, and mixed scenario runners to fedify bench, upgrading the benchmark scenario and report schemas to version 2. Key feedback on these changes highlights two issues: first, in runner.ts, scenario.load.ratePerSec is undefined in closed-loop scenarios, which propagates NaN and should be guarded; second, in fanout.ts, the parsed HTTP status code from sinkBehavior lacks validation within the 100-599 range, which can cause a runtime RangeError crash.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4830f150ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Type-filtered object discovery now dereferences IRI candidates before deciding whether to skip them, preserving common collection shapes that contain object references instead of embedded objects. Benchmark sink behavior status values are also clamped to valid HTTP response codes so invalid configuration falls back to the default accepted response instead of crashing the sink server. fedify-dev#802 (comment) fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds several new scenario runners (actor, object, fanout, failure, and mixed) to the fedify bench command, updating the benchmark scenario and report schemas to version 2. It includes new validation rules, tests, and updated documentation. The review feedback highlights three main areas for improvement: optimizing objectUrlsFromSource in object-discovery.ts by adding early limit checks to avoid redundant network requests, adding a failure flag in mapWithConcurrency within read.ts to stop workers once an error occurs, and using index access point["value"] in stats-client.ts to prevent strict TypeScript compilation errors on Record<string, unknown> types.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c88845cbb5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Read destination gating now stops scheduling additional gate work after the first failure, preventing background destination checks from continuing after preflight has already rejected the scenario. The benchmark stats parser uses index access for raw metric point values, and actor scenario expect validation now mirrors object scenarios by allowing signature metrics only for authenticated reads. fedify-dev#802 (comment) fedify-dev#802 (comment) fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements the actor, object, fanout, failure, and mixed scenario runners for the fedify bench tool, alongside updating the scenario and report schemas to version 2. It includes comprehensive test suites, documentation updates, and validation logic for these new runners. The review feedback highlights a potential issue in the mapWithConcurrency helper within packages/cli/src/bench/scenarios/read.ts, where concurrent execution could lead to unhandled promise rejections if multiple workers fail, and suggests a robust error-handling alternative.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 376057417a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Read destination preflight now waits for already-started gate workers before returning the first failure, so later failures do not keep running outside the caller's error path. Object source discovery now applies the activity unwrapping path to fetched activity references before type filtering, including a visited URL guard for cyclic references. fedify-dev#802 (comment) fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements the remaining benchmark scenario runners (actor, object, fanout, failure, and mixed) for the fedify bench tool, updating both the scenario and report schemas to version 2. It introduces comprehensive discovery helpers, workload generation, queue drain observation, and measurement merging for mixed scenarios, accompanied by extensive unit tests and documentation updates. Feedback on the changes highlights a potential Denial of Service vulnerability in the object discovery helper (typedReferencedObjectUrl) due to the lack of a recursion depth limit when traversing deep or circular JSON-LD/ActivityPub object chains, and suggests enforcing a limit using the existing MAX_ACTIVITY_UNWRAP_DEPTH constant.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. You're on a roll. Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Typed object discovery now caps recursive URL reference traversal with the same depth limit used for embedded activity unwrapping. This prevents deep reference chains from driving unbounded fetch recursion during object source crawls. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements several new scenario runners—actor, object, fanout, failure, and mixed—for the fedify bench tool, upgrading the benchmark scenario and report schemas to version 2. It introduces robust discovery helpers for crawling actor collections and resolving WebFinger handles, adds destination gating for enhanced benchmark safety, and implements queue-drain tracking to measure delivery throughput. Additionally, the CLI rendering and expectation evaluation have been updated to support these new metrics. I have no feedback to provide as there are no review comments.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Another round soon, please! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Background
This finishes the scenario runner work split out from #744 into #785. Before this change, the benchmark suite format could describe more scenario types than the CLI could actually run. That made the schema useful as a sketch of the intended benchmark surface, but left most of it unavailable to users.
What changed
This PR adds executable runners for
actor,object,fanout,failure, andmixedscenarios. The existinginboxandwebfingerrunners remain in place, andcollectionstays reserved by the suite format.Read scenarios now cover actor and object document fetches, including authenticated GET requests. Object discovery can crawl actor collections, unwrap activities, apply type filters, send ActivityPub
Acceptheaders, and gate every discovered read destination before fetching it.Fanout scenarios now drive the benchmark trigger endpoint, start local sink inboxes, wait for target queue drain, report delivery throughput separately from trigger request throughput, and avoid reporting queue-drain latency when no drain sample was measured.
Failure scenarios now exercise both inbound and outbound fault paths. Inbound faults cover invalid signatures and missing actors. Remote faults cover 404, 410, slow inbox, and network-error delivery behavior through the target's normal outbound path.
Mixed scenarios now run weighted child scenario blends, share the parent
maxInFlightcap, reject queue-metric combinations that cannot be attributed cleanly, and wait for child runners to settle before reporting setup errors.The report model and schema now preserve
deliveryThroughputPerSec, published as schema/bench/report-v2.json. Text and Markdown renderers show delivery throughput when a scenario measured it.The user-facing docs and fixtures were updated in docs/manual/benchmarking.md, CHANGES.md, packages/cli/src/bench/command.ts, and the benchmark scenario fixtures.
Closes #785. Part of #744.
Validation
I used targeted scenario tests throughout the implementation and ran the CLI package checks before each commit. The final branch has also passed the review follow-ups for destination gating, dry-run discovery, signing modes, fanout drain accounting, mixed result aggregation, malformed URL preflight, and help text.
Commands used during the last verification passes included:
mise run check-each clideno test -A packages/cli/src/bench/scenarios/object.test.ts packages/cli/src/bench/scenarios/mixed.test.tsdeno test -A packages/cli/src/bench/scenarios/registry.test.ts packages/cli/src/bench/command.test.ts packages/cli/src/bench/schema.test.tsgit diff --check