feat(mcp): add SemanticToolFilter for embedding-based tool filtering by C0d3N1nja97342 · Pull Request #6454 · crewAIInc/crewAI

C0d3N1nja97342 · 2026-07-03T23:57:48Z

Problem

MCP servers can expose many tools, and CrewAI injects all of them into the agent. When a server exposes more than ~20 tools, the full tool schema bloats the prompt and the LLM is more likely to pick a wrong or semantically-similar tool. The existing tool_filter on MCP server configs only supports static allow/block lists (StaticToolFilter) or a hand-written callable — there is no way to filter tools by relevance to the requesting agent.

Solution

Add SemanticToolFilter, a dynamic filter that implements the existing per-tool ToolFilter protocol (__call__(context, tool) -> bool). It embeds a query derived from the requesting agent's role/goal/backstory (or run_context["query"] when populated) and each tool's name + description, including the tool only when cosine similarity is at least threshold.

It plugs into tool_resolver with no resolver changes — the resolver already constructs ToolFilterContext(agent=self._agent, ...) and calls tool_filter(context, tool) for the 2-arg dynamic form. It reuses embedders from crewai.rag.embeddings (build_embedder), computes cosine similarity in pure Python (no new dependency), and caches embeddings per text.

It is threshold-based rather than top-K because the ToolFilter protocol decides each tool independently (it has no view of the full set or a budget). Fail-open: when no query can be derived or the embedder raises, the tool is included — a misconfigured embedder never silently strips all tools.

Changes

lib/crewai/src/crewai/mcp/filters.py: SemanticToolFilter class (__init__(embedder, threshold=0.3, run_context_key="query"), 2-arg __call__, per-text embedding cache, fail-open), _cosine_similarity helper, create_semantic_tool_filter factory.
lib/crewai/src/crewai/mcp/__init__.py: export SemanticToolFilter and create_semantic_tool_filter.
lib/crewai/tests/mcp/test_semantic_tool_filter.py: 8 tests covering relevant-vs-irrelevant filtering by agent profile, fail-open (no query / embedder raises / empty tool), run_context query precedence, threshold boundary, embedding caching, and the factory.

Testing

New unit tests pass: uv run pytest lib/crewai/tests/mcp/test_semantic_tool_filter.py → 8 passed.
ruff check and ruff format --check clean on changed files.
A minimal runtime demo (deterministic bag-of-words embedder, no API key) verified that a Researcher agent keeps web_search while a Coder agent keeps run_code from the same shared tool list.

Notes for Reviewer

This is an AI-assisted contribution; per .github/CONTRIBUTING.md, the llm-generated label is requested (applied if permissions allow).
Design choice: implemented as a per-tool threshold filter fitting the existing ToolFilter protocol (MCP-only, load-time) rather than a top-K retrieval over all tools at injection time. The latter would require changing the core tool-injection path and is a larger, separate piece of work; this PR is additive and non-invasive. run_context-based query is supported in the API for forward compatibility, though the current resolver passes run_context=None.

Add SemanticToolFilter, a dynamic MCP tool filter that includes a tool only when its embedding (name + description) is semantically similar to a query derived from the requesting agent's role/goal/backstory, or from run_context[run_context_key] when populated. It implements the existing per-tool ToolFilter protocol (2-arg __call__), so it plugs into tool_resolver with no resolver changes. Reuses embedders from crewai.rag.embeddings (build_embedder) and computes cosine similarity in pure Python (no new dependency). Embeddings are cached per text. Fails open: when no query can be derived or the embedder raises, the tool is included so a misconfigured embedder never silently strips all tools. Also adds a create_semantic_tool_filter factory and exports both from crewai.mcp.

Add tests for SemanticToolFilter: relevant-vs-irrelevant streaming by agent profile, fail-open when no query is available, fail-open when the embedder raises, run_context query taking precedence over the agent profile, threshold boundary, per-text embedding caching, the factory, and empty tool name/description failing open. Uses a deterministic bag-of-words embedder so no model or network is required.

coderabbitai · 2026-07-03T23:58:15Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c2af1e8e-6433-47e7-b824-b9a2b88f49f8

📥 Commits

Reviewing files that changed from the base of the PR and between 31f8876 and 155e901.

📒 Files selected for processing (2)

lib/crewai/src/crewai/mcp/filters.py
lib/crewai/tests/mcp/test_semantic_tool_filter.py

🚧 Files skipped from review as they are similar to previous changes (2)

lib/crewai/tests/mcp/test_semantic_tool_filter.py
lib/crewai/src/crewai/mcp/filters.py

📝 Walkthrough

Walkthrough

Adds semantic, embedding-based MCP tool filtering with a new filter class and factory in crewai.mcp.filters, exports them from crewai.mcp, and adds tests for similarity decisions, fail-open cases, caching, and factory wiring.

Changes

Semantic Tool Filter

Layer / File(s)	Summary
Cosine similarity and filter implementation `lib/crewai/src/crewai/mcp/filters.py`	Adds `math` import and `TYPE_CHECKING`-only `EmbeddingFunction` reference, a pure-Python `_cosine_similarity` helper, `SemanticToolFilter` with query derivation, embedding caching, threshold-based inclusion, and fail-open behavior, plus `create_semantic_tool_filter`.
Package export wiring `lib/crewai/src/crewai/mcp/__init__.py`	Adds `SemanticToolFilter` and `create_semantic_tool_filter` to the MCP package imports and `__all__`.
Tests `lib/crewai/tests/mcp/test_semantic_tool_filter.py`	Adds deterministic embedder doubles, an `_agent` helper, and tests for inclusion/exclusion, fail-open behavior, query precedence, threshold boundaries, caching, factory behavior, and empty tool text handling.

Sequence Diagram(s)

sequenceDiagram
  participant Agent
  participant SemanticToolFilter
  participant EmbeddingFunction
  participant Tool

  Agent->>SemanticToolFilter: filter(context, tool)
  SemanticToolFilter->>SemanticToolFilter: derive query from run_context or agent role/goal/backstory
  SemanticToolFilter->>Tool: build embedding text from name/description
  SemanticToolFilter->>EmbeddingFunction: embed(query)
  SemanticToolFilter->>EmbeddingFunction: embed(tool text)
  EmbeddingFunction-->>SemanticToolFilter: embeddings
  SemanticToolFilter->>SemanticToolFilter: compute cosine similarity
  SemanticToolFilter-->>Agent: include tool if similarity >= threshold

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding SemanticToolFilter for embedding-based MCP tool filtering.
Description check	✅ Passed	The description accurately matches the PR’s changes to SemanticToolFilter, exports, tests, and fail-open behavior.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

lib/crewai/src/crewai/mcp/filters.py (1)
182-182: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

zip(..., strict=False) silently masks dimension mismatches.

If a and b ever differ in length (e.g., query/tool embedded by different models), the extra components are dropped and a wrong similarity is returned instead of surfacing the misconfiguration. Since both vectors here come from the same embedder this is unlikely, but strict=True would turn a latent config error into an explicit failure (which then fails open via the except in __call__).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/mcp/filters.py` at line 182, The cosine-similarity
calculation in the dot-product helper currently uses zip(..., strict=False),
which can hide embedding length mismatches and return an incorrect score. Update
the vector pairing in the similarity logic in filters.py to enforce equal
lengths with strict=True (or otherwise validate lengths before summing) so
mismatched embeddings fail explicitly and are handled by the existing fallback
path in __call__.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/mcp/filters.py`:
- Around line 261-267: The fail-open path in `ToolFilter` is bypassed when
`_embed()` returns an empty vector instead of raising, causing
`_cosine_similarity()` to exclude tools silently. Update the `ToolFilter` logic
around the `try`/`except` block to treat empty `query_vec` or `tool_vec` the
same as an embedding failure, and return `True` in that case so the class-level
fail-open guarantee holds. Use the `_embed`, `_cosine_similarity`, and
`threshold` flow in `filters.py` to locate the fix.

---

Nitpick comments:
In `@lib/crewai/src/crewai/mcp/filters.py`:
- Line 182: The cosine-similarity calculation in the dot-product helper
currently uses zip(..., strict=False), which can hide embedding length
mismatches and return an incorrect score. Update the vector pairing in the
similarity logic in filters.py to enforce equal lengths with strict=True (or
otherwise validate lengths before summing) so mismatched embeddings fail
explicitly and are handled by the existing fallback path in __call__.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 81bc4c4c-36ed-424d-b93d-0841761fcf60

📥 Commits

Reviewing files that changed from the base of the PR and between 2b90117 and 31f8876.

📒 Files selected for processing (3)

lib/crewai/src/crewai/mcp/__init__.py
lib/crewai/src/crewai/mcp/filters.py
lib/crewai/tests/mcp/test_semantic_tool_filter.py

Address CodeRabbit review: treat empty embedding vectors as a failure (return True, fail open) instead of letting _cosine_similarity return 0.0 and silently exclude the tool; move the cosine call inside the try/except so it is covered by the fail-open path; and use zip(strict=True) so mismatched embedding dimensions raise explicitly and are caught by the same fallback. Add tests for empty-vector and mismatched-length fail-open.

C0d3N1nja97342 added 2 commits July 4, 2026 07:54

coderabbitai Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread lib/crewai/src/crewai/mcp/filters.py Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mcp): add SemanticToolFilter for embedding-based tool filtering#6454

feat(mcp): add SemanticToolFilter for embedding-based tool filtering#6454
C0d3N1nja97342 wants to merge 3 commits into
crewAIInc:mainfrom
C0d3N1nja97342:feat/mcp-semantic-tool-filter

C0d3N1nja97342 commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

C0d3N1nja97342 commented Jul 3, 2026

Problem

Solution

Changes

Testing

Notes for Reviewer

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading