fix(agents): recover real tool call from fabricated Observation continuations#6450
fix(agents): recover real tool call from fabricated Observation continuations#6450ritsth wants to merge 2 commits into
Conversation
…nuations Models that reject the stop parameter (gpt-5 and o1 families, many custom BaseLLM implementations) generate past the point where the "\nObservation:" stop sequence would end the completion, fabricating an Observation and a Final Answer after a genuine Action. Since crewAIInc#2483 removed the parser's both-action-and-final-answer error, the recovery block in process_llm_response could never trigger: it caught an exception that is no longer raised, so the fabricated Final Answer always won and the real tool was never executed (reported symptom: crewAIInc#3154). Replace the dead except block with an explicit position check: when stop words are unsupported and a parseable Action precedes the Final Answer, truncate at the fabricated Observation between them so the actual tool call executes. Behavior for stop-word models and for all other response shapes is unchanged.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR updates ChangesFabricated Observation Handling
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/crewai/src/crewai/utilities/agent_utils.py (1)
579-591: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low valueTruncation boundary correctly gated on stop-word support and action-before-final-answer ordering.
Verified against the added tests: the
action_match.start() < final_answer_idxcheck correctly excludes the case where Action/Action Input/Observation text is merely quoted inside a real final answer (test_final_answer_before_action_text_unchanged), and theuse_stop_words=Truebranch preserves prior final-answer-wins semantics. Logic looks correct for the targeted fabrication pattern.One narrow edge case:
answer.find("Observation:", action_match.start(), final_answer_idx)searches from the start of theAction:match rather than after the action input, so if the actualAction Inputpayload itself contains the literal substring"Observation:"(e.g., a search query mentioning it), truncation would cut the input short before the true fabricated observation. This has a safe-ish fallback (parse likely fails and falls back to rawAgentFinish), so it's low priority, but worth being aware of.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/src/crewai/utilities/agent_utils.py` around lines 579 - 591, The truncation logic in agent_utils should avoid cutting inside a legitimate Action Input payload when it contains the literal text "Observation:". Update the fabricated-observation trim in the `action_match` / `FINAL_ANSWER_ACTION` branch so the search for `Observation:` starts after the action input boundary rather than from `action_match.start()`, while keeping the existing `use_stop_words` and `action_match.start() < final_answer_idx` guards unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/crewai/src/crewai/utilities/agent_utils.py`:
- Around line 579-591: The truncation logic in agent_utils should avoid cutting
inside a legitimate Action Input payload when it contains the literal text
"Observation:". Update the fabricated-observation trim in the `action_match` /
`FINAL_ANSWER_ACTION` branch so the search for `Observation:` starts after the
action input boundary rather than from `action_match.start()`, while keeping the
existing `use_stop_words` and `action_match.start() < final_answer_idx` guards
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 1b86d47c-e168-46bc-af7e-dbf89216fff3
📒 Files selected for processing (2)
lib/crewai/src/crewai/utilities/agent_utils.pylib/crewai/tests/utilities/test_agent_utils.py
…ence
Truncate at "\nObservation:" (the actual stop word crewAI configures, see
I18N_DEFAULT.slice("observation")) rather than a bare "Observation:". This
mirrors exactly what the stop sequence would have removed and prevents an
"Observation:" substring inside the Action Input payload from being treated
as the fabricated-Observation boundary. Adds a regression test for that case.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks @coderabbitai — good catch on the Instead I anchored the trim on |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
cd $(fd -t d crewai -d 3 | head -1) 2>/dev/null || true
find / -maxdepth 2 -iname "*crew*" 2>/dev/nullLength of output: 253 🏁 Script executed: #!/bin/bash
set -e
gh pr diff 6450 --repo crewAIInc/crewAI > /tmp/pr.diff || true
wc -l /tmp/pr.diff
grep -n "Observation" /tmp/pr.diff | head -50Length of output: 1711 🏁 Script executed: #!/bin/bash
sed -n '1,168p' /tmp/pr.diffLength of output: 7880 This confirms the fix addresses the edge case correctly — anchoring on
✏️ Learnings added
|
Closes #6449. Root cause behind the reports collected in #3154.
Problem
For models that don't support the
stopparameter (use_stop_words=False: gpt-5 family, o1 family, customBaseLLMs), nothing cuts generation at"\nObservation:", so the LLM emits a realActionfollowed by a fabricatedObservationandFinal Answerin one completion.parse()returns the fabricatedAgentFinish, and the real tool never executes.process_llm_response()contains a recovery block designed for exactly this — but it has been dead code since #2483 (efe27bd5, Apr 2025), which removed the parser raise (FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE) that the block catches. The recovery worked from #1322 (Oct 2024) until then. Full regression timeline and analysis in #6449.This explains the correlation users kept reporting in #3154: gpt-4.1 fine, gpt-5/o-series fabricating —
supports_stop_words()returnsFalsefor exactly the failing models.Fix
Replace the dead
exceptblock with an explicit position check (no exception plumbing, so it is unaffected byformat_answer()'s broadexcept Exception— the separate issue #3771):use_stop_words=False— behavior for stop-word-supporting models is untouched.Action<Observation<Final Answer(boundedfind), so:AgentExecutor,CrewAgentExecutor,LiteAgent,step_executor) and makes the experimental executor's existing "Final Answer:but parsed as AgentAction" warning reachable, giving users visibility when recovery happens.Tests
New
TestProcessLlmResponseclass inlib/crewai/tests/utilities/test_agent_utils.py(8 tests):test_fabricated_observation_recovers_real_action— fails on currentmain(returns the fabricatedAgentFinish), passes with this fix:tests/utilities/test_agent_utils.py,tests/agents/test_crew_agent_parser.py, andtests/agents/test_agent_executor.pyall pass (209 tests).ruff check,ruff format --check, andmypyclean on the changed files.Disclosure
This fix was developed with AI assistance (Claude); the regression history, reproduction, and red/green results were verified by hand. I don't have permission to add labels — could a maintainer add the
llm-generatedlabel per the contribution guidelines?