Skip to content

feat(mcp): add SemanticToolFilter for embedding-based tool filtering#6454

Open
C0d3N1nja97342 wants to merge 3 commits into
crewAIInc:mainfrom
C0d3N1nja97342:feat/mcp-semantic-tool-filter
Open

feat(mcp): add SemanticToolFilter for embedding-based tool filtering#6454
C0d3N1nja97342 wants to merge 3 commits into
crewAIInc:mainfrom
C0d3N1nja97342:feat/mcp-semantic-tool-filter

Conversation

@C0d3N1nja97342

Copy link
Copy Markdown

Problem

MCP servers can expose many tools, and CrewAI injects all of them into the agent. When a server exposes more than ~20 tools, the full tool schema bloats the prompt and the LLM is more likely to pick a wrong or semantically-similar tool. The existing tool_filter on MCP server configs only supports static allow/block lists (StaticToolFilter) or a hand-written callable — there is no way to filter tools by relevance to the requesting agent.

Solution

Add SemanticToolFilter, a dynamic filter that implements the existing per-tool ToolFilter protocol (__call__(context, tool) -> bool). It embeds a query derived from the requesting agent's role/goal/backstory (or run_context["query"] when populated) and each tool's name + description, including the tool only when cosine similarity is at least threshold.

It plugs into tool_resolver with no resolver changes — the resolver already constructs ToolFilterContext(agent=self._agent, ...) and calls tool_filter(context, tool) for the 2-arg dynamic form. It reuses embedders from crewai.rag.embeddings (build_embedder), computes cosine similarity in pure Python (no new dependency), and caches embeddings per text.

It is threshold-based rather than top-K because the ToolFilter protocol decides each tool independently (it has no view of the full set or a budget). Fail-open: when no query can be derived or the embedder raises, the tool is included — a misconfigured embedder never silently strips all tools.

Changes

  • lib/crewai/src/crewai/mcp/filters.py: SemanticToolFilter class (__init__(embedder, threshold=0.3, run_context_key="query"), 2-arg __call__, per-text embedding cache, fail-open), _cosine_similarity helper, create_semantic_tool_filter factory.
  • lib/crewai/src/crewai/mcp/__init__.py: export SemanticToolFilter and create_semantic_tool_filter.
  • lib/crewai/tests/mcp/test_semantic_tool_filter.py: 8 tests covering relevant-vs-irrelevant filtering by agent profile, fail-open (no query / embedder raises / empty tool), run_context query precedence, threshold boundary, embedding caching, and the factory.

Testing

  • New unit tests pass: uv run pytest lib/crewai/tests/mcp/test_semantic_tool_filter.py8 passed.
  • ruff check and ruff format --check clean on changed files.
  • A minimal runtime demo (deterministic bag-of-words embedder, no API key) verified that a Researcher agent keeps web_search while a Coder agent keeps run_code from the same shared tool list.

Notes for Reviewer

  • This is an AI-assisted contribution; per .github/CONTRIBUTING.md, the llm-generated label is requested (applied if permissions allow).
  • Design choice: implemented as a per-tool threshold filter fitting the existing ToolFilter protocol (MCP-only, load-time) rather than a top-K retrieval over all tools at injection time. The latter would require changing the core tool-injection path and is a larger, separate piece of work; this PR is additive and non-invasive. run_context-based query is supported in the API for forward compatibility, though the current resolver passes run_context=None.

Add SemanticToolFilter, a dynamic MCP tool filter that includes a tool only when its embedding (name + description) is semantically similar to a query derived from the requesting agent's role/goal/backstory, or from run_context[run_context_key] when populated. It implements the existing per-tool ToolFilter protocol (2-arg __call__), so it plugs into tool_resolver with no resolver changes.

Reuses embedders from crewai.rag.embeddings (build_embedder) and computes cosine similarity in pure Python (no new dependency). Embeddings are cached per text. Fails open: when no query can be derived or the embedder raises, the tool is included so a misconfigured embedder never silently strips all tools. Also adds a create_semantic_tool_filter factory and exports both from crewai.mcp.
Add tests for SemanticToolFilter: relevant-vs-irrelevant streaming by agent profile, fail-open when no query is available, fail-open when the embedder raises, run_context query taking precedence over the agent profile, threshold boundary, per-text embedding caching, the factory, and empty tool name/description failing open. Uses a deterministic bag-of-words embedder so no model or network is required.
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c2af1e8e-6433-47e7-b824-b9a2b88f49f8

📥 Commits

Reviewing files that changed from the base of the PR and between 31f8876 and 155e901.

📒 Files selected for processing (2)
  • lib/crewai/src/crewai/mcp/filters.py
  • lib/crewai/tests/mcp/test_semantic_tool_filter.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/crewai/tests/mcp/test_semantic_tool_filter.py
  • lib/crewai/src/crewai/mcp/filters.py

📝 Walkthrough

Walkthrough

Adds semantic, embedding-based MCP tool filtering with a new filter class and factory in crewai.mcp.filters, exports them from crewai.mcp, and adds tests for similarity decisions, fail-open cases, caching, and factory wiring.

Changes

Semantic Tool Filter

Layer / File(s) Summary
Cosine similarity and filter implementation
lib/crewai/src/crewai/mcp/filters.py
Adds math import and TYPE_CHECKING-only EmbeddingFunction reference, a pure-Python _cosine_similarity helper, SemanticToolFilter with query derivation, embedding caching, threshold-based inclusion, and fail-open behavior, plus create_semantic_tool_filter.
Package export wiring
lib/crewai/src/crewai/mcp/__init__.py
Adds SemanticToolFilter and create_semantic_tool_filter to the MCP package imports and __all__.
Tests
lib/crewai/tests/mcp/test_semantic_tool_filter.py
Adds deterministic embedder doubles, an _agent helper, and tests for inclusion/exclusion, fail-open behavior, query precedence, threshold boundaries, caching, factory behavior, and empty tool text handling.

Sequence Diagram(s)

sequenceDiagram
  participant Agent
  participant SemanticToolFilter
  participant EmbeddingFunction
  participant Tool

  Agent->>SemanticToolFilter: filter(context, tool)
  SemanticToolFilter->>SemanticToolFilter: derive query from run_context or agent role/goal/backstory
  SemanticToolFilter->>Tool: build embedding text from name/description
  SemanticToolFilter->>EmbeddingFunction: embed(query)
  SemanticToolFilter->>EmbeddingFunction: embed(tool text)
  EmbeddingFunction-->>SemanticToolFilter: embeddings
  SemanticToolFilter->>SemanticToolFilter: compute cosine similarity
  SemanticToolFilter-->>Agent: include tool if similarity >= threshold
Loading
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding SemanticToolFilter for embedding-based MCP tool filtering.
Description check ✅ Passed The description accurately matches the PR’s changes to SemanticToolFilter, exports, tests, and fail-open behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/crewai/src/crewai/mcp/filters.py (1)

182-182: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

zip(..., strict=False) silently masks dimension mismatches.

If a and b ever differ in length (e.g., query/tool embedded by different models), the extra components are dropped and a wrong similarity is returned instead of surfacing the misconfiguration. Since both vectors here come from the same embedder this is unlikely, but strict=True would turn a latent config error into an explicit failure (which then fails open via the except in __call__).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/mcp/filters.py` at line 182, The cosine-similarity
calculation in the dot-product helper currently uses zip(..., strict=False),
which can hide embedding length mismatches and return an incorrect score. Update
the vector pairing in the similarity logic in filters.py to enforce equal
lengths with strict=True (or otherwise validate lengths before summing) so
mismatched embeddings fail explicitly and are handled by the existing fallback
path in __call__.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/mcp/filters.py`:
- Around line 261-267: The fail-open path in `ToolFilter` is bypassed when
`_embed()` returns an empty vector instead of raising, causing
`_cosine_similarity()` to exclude tools silently. Update the `ToolFilter` logic
around the `try`/`except` block to treat empty `query_vec` or `tool_vec` the
same as an embedding failure, and return `True` in that case so the class-level
fail-open guarantee holds. Use the `_embed`, `_cosine_similarity`, and
`threshold` flow in `filters.py` to locate the fix.

---

Nitpick comments:
In `@lib/crewai/src/crewai/mcp/filters.py`:
- Line 182: The cosine-similarity calculation in the dot-product helper
currently uses zip(..., strict=False), which can hide embedding length
mismatches and return an incorrect score. Update the vector pairing in the
similarity logic in filters.py to enforce equal lengths with strict=True (or
otherwise validate lengths before summing) so mismatched embeddings fail
explicitly and are handled by the existing fallback path in __call__.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 81bc4c4c-36ed-424d-b93d-0841761fcf60

📥 Commits

Reviewing files that changed from the base of the PR and between 2b90117 and 31f8876.

📒 Files selected for processing (3)
  • lib/crewai/src/crewai/mcp/__init__.py
  • lib/crewai/src/crewai/mcp/filters.py
  • lib/crewai/tests/mcp/test_semantic_tool_filter.py

Comment thread lib/crewai/src/crewai/mcp/filters.py Outdated
Address CodeRabbit review: treat empty embedding vectors as a failure (return True, fail open) instead of letting _cosine_similarity return 0.0 and silently exclude the tool; move the cosine call inside the try/except so it is covered by the fail-open path; and use zip(strict=True) so mismatched embedding dimensions raise explicitly and are caught by the same fallback. Add tests for empty-vector and mismatched-length fail-open.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant