Skip to content

feat: add pre-tool GuardrailProvider adapter#6432

Open
safal207 wants to merge 10 commits into
crewAIInc:mainfrom
safal207:feat/provider-adapter
Open

feat: add pre-tool GuardrailProvider adapter#6432
safal207 wants to merge 10 commits into
crewAIInc:mainfrom
safal207:feat/provider-adapter

Conversation

@safal207

@safal207 safal207 commented Jul 2, 2026

Copy link
Copy Markdown

Summary

Add the minimal provider-agnostic GuardrailProvider boundary proposed in #4877 on top of CrewAI's existing global before-tool-call hook registry.

This first PR is intentionally binary and narrow: external providers receive a detached request and return allow/deny. Async suspension, continuation, decision receipts, and executor-side verification remain outside this change.

Changes

  • add frozen GuardrailRequest and GuardrailDecision records;
  • add a minimal GuardrailProvider protocol with only evaluate();
  • add enable_guardrail() as a thin adapter over register_before_tool_call_hook();
  • expose the API from crewai.hooks;
  • add focused regressions for allow, deny, fail-closed, explicit fail-open, detached tool input, invalid provider results, non-boolean verdicts, original tool name vs sanitized alias, stable agent identity, and unregistering the returned hook.

Minimal request boundary

GuardrailRequest contains only:

  • original tool name;
  • sanitized tool alias;
  • detached tool input;
  • optional agent ID;
  • optional agent role.

It deliberately does not duplicate task, crew, policy-engine identity, continuation, or governance-record fields.

Safety behavior

  • provider input is deep-copied from mutable live tool arguments;
  • provider/request failures block by default;
  • invalid return types and non-boolean verdicts follow the same failure policy;
  • fail_closed=False is an explicit availability-over-enforcement opt-out;
  • provider exception text is logged internally and is not returned through the hook result;
  • GuardrailProvider.evaluate() is synchronous; providers that perform I/O must enforce their own finite timeout and return promptly. The adapter does not claim it can safely preempt arbitrary synchronous Python code.

Compatibility

  • no change to the tool execution pipeline;
  • no change to existing hook ordering or return semantics;
  • providers are optional and register through the current global before-tool-call registry;
  • the returned hook can be removed with unregister_before_tool_call_hook().

Relationship to #6165

#6165 defines a richer runtime hook-result type (PROCEED, NEEDS_REVIEW, SILENCE) and propagates it through execution paths. This PR defines the external provider boundary only.

On current main, GuardrailDecision(allow=True) maps to None and allow=False maps to False. If #6165 lands first, this adapter can be rebased onto ToolCallDecision without duplicating its execution-path changes.

Out of scope

  • suspend / resolve() / resume semantics;
  • human-approval continuation;
  • bundled policy engines or RBAC;
  • GovernanceDecision / GovernanceOutcome duplication from feat: Add GovernanceDecision and GovernanceOutcome contract types #6030;
  • executor-side intent or target-state verification;
  • signed audit records and durable replay storage;
  • changes to task guardrails.

Related to #4877.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 557b8b26-290d-41a4-bb4c-532dc6b950a6

📥 Commits

Reviewing files that changed from the base of the PR and between 20d48fb and 31ebaba.

📒 Files selected for processing (1)
  • lib/crewai/tests/hooks/test_guardrail_provider.py

📝 Walkthrough

Walkthrough

The PR adds guardrail request and decision types, a before-tool-call hook that evaluates a provider response, blocks denied tool calls, and handles provider errors. It also re-exports the new hook API and extends tests for registration, detached input, and failure handling.

Changes

Guardrail hook API

Layer / File(s) Summary
Guardrail types and hook
lib/crewai/src/crewai/hooks/tool_hooks.py
Adds GuardrailRequest, GuardrailDecision, GuardrailProvider, a module logger, and enable_guardrail() with request snapshotting, runtime validation, and fail-open/fail-closed handling.
Public exports
lib/crewai/src/crewai/hooks/__init__.py
Re-exports the guardrail API symbols from the hooks package namespace.
Guardrail hook tests
lib/crewai/tests/hooks/test_decorators.py, lib/crewai/tests/hooks/test_guardrail_provider.py
Adds decorator and registry tests for request mapping, detached tool input, allow/deny behavior, provider failures, and hook unregistering.

Sequence Diagram(s)

sequenceDiagram
  participant ToolCallHookContext
  participant enable_guardrail
  participant GuardrailProvider
  participant logger

  ToolCallHookContext->>enable_guardrail: before-tool-call context
  enable_guardrail->>enable_guardrail: deepcopy tool_input into GuardrailRequest
  enable_guardrail->>GuardrailProvider: evaluate(request)
  GuardrailProvider-->>enable_guardrail: GuardrailDecision or error
  enable_guardrail->>logger: log exception or invalid result
Loading
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a pre-tool GuardrailProvider adapter.
Description check ✅ Passed The description is directly related to the guardrail adapter, its API, safety behavior, and tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

safal207 commented Jul 2, 2026

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@safal207 safal207 marked this pull request as ready for review July 2, 2026 12:28

safal207 commented Jul 2, 2026

Copy link
Copy Markdown
Author

Relationship to #6165 (ToolCallDecision)

The two changes operate at different layers:

external guardrail engine
  -> GuardrailProvider.evaluate(GuardrailRequest)
  -> GuardrailDecision
  -> adapter
  -> existing before-tool hook result
  -> runtime execution path

The first #6432 contract is intentionally binary: allow=True maps to None, and allow=False maps to False, preserving current hook semantics. suspend / resume and human-approval continuation remain explicitly out of scope.

If #6165 lands first, this adapter can be rebased onto ToolCallDecision without duplicating its execution-path changes. There is no need for #6432 to introduce a parallel runtime verdict model.

@maxpetrusenkoagent your view on this seam would be useful before either PR is finalized.

@safal207 safal207 changed the title feat: add pre-tool policy provider adapter feat: add pre-tool GuardrailProvider adapter Jul 2, 2026

safal207 commented Jul 2, 2026

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

safal207 commented Jul 2, 2026

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

safal207 commented Jul 2, 2026

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@rpelevin

rpelevin commented Jul 2, 2026

Copy link
Copy Markdown

Good first boundary. I would keep this PR exactly this narrow.

For this layer, the useful contract is:

  1. the provider receives a detached snapshot, not mutable runtime context.
  2. original tool name and sanitized alias are both visible.
  3. agent identity is stable when present.
  4. provider exceptions fail closed by default.
  5. invalid provider results fail closed by default.
  6. explicit fail-open is opt-in and visible.
  7. deny maps to the existing before-tool hook block path before execution.

I would avoid calling this a full authority receipt yet. This is the adapter boundary. The next layer can bind decision receipts, continuation or resume state, mode-compatible evidence, and executor-side verification, but those should stay separate from the first adapter PR.

For merge, I would want the tests to prove:

  • tool_input is deeply detached before provider evaluation.
  • non-GuardrailDecision results follow the failure policy.
  • non-boolean allow values follow the failure policy.
  • fail_closed default blocks on provider error.
  • original tool name versus sanitized alias is preserved.
  • the returned hook can be unregistered through the existing hook registry.

That keeps the claim crisp: minimal provider boundary is in place; authority receipts and mode-compatible transition proof remain the next layer.

safal207 commented Jul 2, 2026

Copy link
Copy Markdown
Author

Exact-head validation completed for 20d48fbeef9abdb14ed19968d9f1f4652a9a16f2 using the same workflow definitions in the fork validation PR.

Results:

  • Run Tests: success after re-running only the initially failed/cancelled matrix jobs;
  • Lint: success;
  • Run Type Checks: success across Python 3.10–3.13;
  • CodeQL Advanced: success;
  • PR Title Check: success;
  • PR Size Check: success;
  • CodeRabbit: success with no actionable review threads.

The first test attempt had one matrix failure that triggered fail-fast cancellation of several jobs. Re-running only failed/cancelled jobs on the unchanged SHA completed successfully, so no code change was required.

Fork validation run: https://github.com/safal207/crewAI-upstream/actions/runs/28592141628

Upstream workflows on this exact SHA still show action_required because they require maintainer approval before jobs can start.

safal207 commented Jul 2, 2026

Copy link
Copy Markdown
Author

@rpelevin thank you — agreed on keeping this as the adapter boundary rather than describing it as an authority receipt.

I checked the current tests against your merge list. The existing suite already covered deep detachment, invalid result handling, non-boolean verdicts, fail-closed provider errors, and original tool name versus sanitized alias. The one missing explicit assertion was registry removal, so I added a focused regression proving that the hook returned by enable_guardrail() can be removed with unregister_before_tool_call_hook().

I also clarified the synchronous timing contract in the PR description: providers that perform I/O must enforce their own finite timeout and return promptly. The adapter does not claim safe preemption of arbitrary synchronous Python code.

Current exact head: 31ebabad79c54e88a750c73f92838b859e6f7b15.

On that head, Lint, Type Checks (Python 3.10–3.13), CodeQL, PR Title, PR Size, and CodeRabbit are green. The full fork test matrix is running on the same SHA; completed shards are green so far.

Receipts, continuation/resume, mode-compatible evidence, and executor-side verification remain explicitly outside this PR.

@babyblueviper1

Copy link
Copy Markdown

This is a clean, minimal boundary — exactly narrow enough to compose with an external provider without the adapter needing to know anything about who's on the other end. Built a real one against it to make sure the shape actually works end-to-end, not just in theory:

class InvinoveritasGuardrailProvider:
    """Backed by invinoveritas's /review — an independent, third-party pre-action
    verdict (approve / approve_with_concerns / reject) with a signed, recomputable proof."""

    def evaluate(self, request: GuardrailRequest) -> GuardrailDecision:
        artifact = f"Tool call: {request.tool_name} Input: {dict(request.tool_input)}" \
                   f" Agent role: {request.agent_role}"
        try:
            resp = requests.post(ENDPOINT, headers={"Authorization": f"Bearer {api_key}"},
                                 json={"artifact_type": "agent_output", "artifact": artifact},
                                 timeout=8)
        except requests.exceptions.RequestException:
            return GuardrailDecision(allow=True, reason="unreachable — failing open")  # never block on OUR availability
        verdict = resp.json().get("verdict") if resp.status_code == 200 else None
        if verdict == "reject":
            return GuardrailDecision(allow=False, reason=resp.json()["summary"])
        return GuardrailDecision(allow=True, reason=resp.json().get("summary") if verdict else None)

Ran it live against a deliberately reckless synthetic tool call (place_order, BTC-USD, $5000 at 20x leverage) to check it actually catches something, not just plumbs through:

GuardrailDecision(
    allow=False,
    reason="invinoveritas rejected: The proposed trade poses significant risk due to "
           "excessive leverage and potential account-threatening size.",
    metadata={"invinoveritas_verdict": "reject", "issues": [
        {"severity": "blocker", "category": "position_sizing",
         "description": "The leverage of 20x on a $5000 position is excessively high and "
                        "could lead to rapid liquidation in the event of a small adverse "
                        "price movement."},
        {"severity": "high", "category": "drawdown_risk", ...}
    ]}
)

And confirmed the fail-open path independently (mocked a ConnectionError) — returns allow=True with the reason recorded, never raises, never blocks on our own downtime. Combined with enable_guardrail(provider, fail_closed=False), an external provider going down degrades to no-op rather than freezing the crew.

One open question for the design: GuardrailDecision is currently binary (allow: bool). Our own /review has a third state — approve_with_concerns — that this provider currently maps to allow=True with the concern surfaced only in reason/metadata, which the caller has to know to check. Is a ternary decision (or a structured severity field alongside allow) in scope for a later iteration, or deliberately out of scope for this first PR the way async/continuation is? Not blocking either way — just flagging it since "approved but flagged" is a real, common outcome for an external reviewer, not just allow/deny.

safal207 commented Jul 3, 2026

Copy link
Copy Markdown
Author

@babyblueviper1 thank you — this is exactly the kind of independent integration proof this boundary was meant to enable. The live place_order rejection and the explicit provider-side timeout are especially useful evidence that the adapter shape works outside the in-repo fixtures.

On the open question: I would keep GuardrailDecision deliberately binary in this first PR.

approve_with_concerns is a real provider outcome, but there are two different runtime meanings it could have:

  1. Proceed, but retain structured advisory evidence — this can remain allow=True plus a later structured status/severity field or external receipt.
  2. Do not proceed until review/approval — this requires the richer runtime decision and continuation/resume path, so it belongs with the later NEEDS_REVIEW / suspension layer rather than this adapter.

One important current limitation: enable_guardrail() maps the provider decision into the existing None / False hook result, so reason and metadata are not propagated to the tool caller today. A provider can persist or sign that evidence externally, as your implementation does, but the adapter should not imply that advisory details are surfaced through the current hook pipeline.

So the intended sequence is:

  • this PR: stable detached request + binary pre-execution allow/deny boundary;
  • later iteration: structured provider outcome such as ALLOW, ALLOW_WITH_CONCERNS, DENY (or a severity field), mapped onto the richer runtime hook result;
  • separate continuation layer: NEEDS_REVIEW / suspend / resume when a concern must block pending human or external resolution.

Also, the provider-level allow=True on RequestException and adapter-level fail_closed=False are complementary but distinct: the former is your provider policy for known availability failures; the latter covers unexpected provider exceptions or invalid results that escape evaluate().

Current exact head 31ebabad79c54e88a750c73f92838b859e6f7b15 is now green across the full fork test matrix, Lint, Type Checks (Python 3.10–3.13), CodeQL, PR Title, PR Size, and CodeRabbit.

@babyblueviper1

Copy link
Copy Markdown

That resolves it cleanly, thanks — the split makes sense: this PR is the detached request/binary boundary, ALLOW_WITH_CONCERNS-style richness is a later runtime-hook change, and anything that has to actually block-and-wait belongs to the continuation layer, not this adapter. Trying to smuggle a three-state outcome through a binary hook today would've been the wrong kind of "helpful."

Good catch on the propagation gap too — worth being explicit about for anyone adapting this provider: right now reason/metadata only reach whoever's holding the GuardrailDecision object in-process. If a concern needs to be visible downstream of that (an audit trail, a human review queue, another agent checking later), the provider has to persist it somewhere externally itself — which is exactly why /review returns a signed, independently-verifiable proof rather than just a string: the adapter can drop the proof reference in metadata today even though the hook pipeline doesn't surface it, so nothing is lost, it's just not wired through yet.

Will hold here for the ALLOW_WITH_CONCERNS iteration rather than pre-building against a hook shape that doesn't exist yet. Nice, tight scoping on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants