fix(agents): recover real tool call from fabricated Observation continuations by ritsth · Pull Request #6450 · crewAIInc/crewAI

ritsth · 2026-07-03T09:06:10Z

Closes #6449. Root cause behind the reports collected in #3154.

Problem

For models that don't support the stop parameter (use_stop_words=False: gpt-5 family, o1 family, custom BaseLLMs), nothing cuts generation at "\nObservation:", so the LLM emits a real Action followed by a fabricated Observation and Final Answer in one completion. parse() returns the fabricated AgentFinish, and the real tool never executes.

process_llm_response() contains a recovery block designed for exactly this — but it has been dead code since #2483 (efe27bd5, Apr 2025), which removed the parser raise (FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE) that the block catches. The recovery worked from #1322 (Oct 2024) until then. Full regression timeline and analysis in #6449.

This explains the correlation users kept reporting in #3154: gpt-4.1 fine, gpt-5/o-series fabricating — supports_stop_words() returns False for exactly the failing models.

Fix

Replace the dead except block with an explicit position check (no exception plumbing, so it is unaffected by format_answer()'s broad except Exception — the separate issue #3771):

Gated to use_stop_words=False — behavior for stop-word-supporting models is untouched.
Requires the exact fabrication shape Action < Observation < Final Answer (bounded find), so:
- "Final Answer first, ReAct format quoted after" still returns the final answer;
- "Action then Final Answer with no Observation" keeps current behavior;
- every other shape is unchanged.
Fixes all four call sites at once (experimental AgentExecutor, CrewAgentExecutor, LiteAgent, step_executor) and makes the experimental executor's existing "Final Answer: but parsed as AgentAction" warning reachable, giving users visibility when recovery happens.

Tests

New TestProcessLlmResponse class in lib/crewai/tests/utilities/test_agent_utils.py (8 tests):

test_fabricated_observation_recovers_real_action — fails on current main (returns the fabricated AgentFinish), passes with this fix:

# main (fix stashed):
FAILED tests/utilities/test_agent_utils.py::TestProcessLlmResponse::test_fabricated_observation_recovers_real_action
1 failed, 7 passed

# with fix:
8 passed

The other 7 tests pin the non-fabrication shapes and pass on both sides, demonstrating no behavior change outside the targeted case.

tests/utilities/test_agent_utils.py, tests/agents/test_crew_agent_parser.py, and tests/agents/test_agent_executor.py all pass (209 tests). ruff check, ruff format --check, and mypy clean on the changed files.

Disclosure

This fix was developed with AI assistance (Claude); the regression history, reproduction, and red/green results were verified by hand. I don't have permission to add labels — could a maintainer add the llm-generated label per the contribution guidelines?

…nuations Models that reject the stop parameter (gpt-5 and o1 families, many custom BaseLLM implementations) generate past the point where the "\nObservation:" stop sequence would end the completion, fabricating an Observation and a Final Answer after a genuine Action. Since crewAIInc#2483 removed the parser's both-action-and-final-answer error, the recovery block in process_llm_response could never trigger: it caught an exception that is no longer raised, so the fabricated Final Answer always won and the real tool was never executed (reported symptom: crewAIInc#3154). Replace the dead except block with an explicit position check: when stop words are unsupported and a parseable Action precedes the Final Answer, truncate at the fabricated Observation between them so the actual tool call executes. Behavior for stop-word models and for all other response shapes is unchanged.

coderabbitai · 2026-07-03T09:06:41Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7c801245-3d3a-487b-ad76-c4fb18deca28

📥 Commits

Reviewing files that changed from the base of the PR and between 8f91862 and 886e830.

📒 Files selected for processing (2)

lib/crewai/src/crewai/utilities/agent_utils.py
lib/crewai/tests/utilities/test_agent_utils.py

🚧 Files skipped from review as they are similar to previous changes (2)

lib/crewai/src/crewai/utilities/agent_utils.py
lib/crewai/tests/utilities/test_agent_utils.py

📝 Walkthrough

Walkthrough

This PR updates process_llm_response to detect fabricated Observation: text using ACTION_INPUT_REGEX and FINAL_ANSWER_ACTION when stop words are disabled, and adds tests covering the parsing outcomes for several transcript shapes.

Changes

Fabricated Observation Handling

Layer / File(s)	Summary
Truncation logic in process_llm_response `lib/crewai/src/crewai/utilities/agent_utils.py`	Imports `ACTION_INPUT_REGEX` and `FINAL_ANSWER_ACTION`; when `use_stop_words` is disabled, detects a fabricated `"Observation:"` occurring between a matched action and the final-answer marker, truncating the answer before calling `format_answer`.
Tests for process_llm_response scenarios `lib/crewai/tests/utilities/test_agent_utils.py`	Adds `TestProcessLlmResponse` with a fabricated transcript constant and test methods verifying `AgentAction`/`AgentFinish` outcomes across `use_stop_words` settings and observation/final-answer orderings.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main fix: recovering the real tool call from fabricated Observation continuations.
Description check	✅ Passed	The description is directly related to the change and accurately explains the bug, fix, and tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

lib/crewai/src/crewai/utilities/agent_utils.py (1)
579-591: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

Truncation boundary correctly gated on stop-word support and action-before-final-answer ordering.

Verified against the added tests: the action_match.start() < final_answer_idx check correctly excludes the case where Action/Action Input/Observation text is merely quoted inside a real final answer (test_final_answer_before_action_text_unchanged), and the use_stop_words=True branch preserves prior final-answer-wins semantics. Logic looks correct for the targeted fabrication pattern.

One narrow edge case: answer.find("Observation:", action_match.start(), final_answer_idx) searches from the start of the Action: match rather than after the action input, so if the actual Action Input payload itself contains the literal substring "Observation:" (e.g., a search query mentioning it), truncation would cut the input short before the true fabricated observation. This has a safe-ish fallback (parse likely fails and falls back to raw AgentFinish), so it's low priority, but worth being aware of.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/utilities/agent_utils.py` around lines 579 - 591, The
truncation logic in agent_utils should avoid cutting inside a legitimate Action
Input payload when it contains the literal text "Observation:". Update the
fabricated-observation trim in the `action_match` / `FINAL_ANSWER_ACTION` branch
so the search for `Observation:` starts after the action input boundary rather
than from `action_match.start()`, while keeping the existing `use_stop_words`
and `action_match.start() < final_answer_idx` guards unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/crewai/src/crewai/utilities/agent_utils.py`:
- Around line 579-591: The truncation logic in agent_utils should avoid cutting
inside a legitimate Action Input payload when it contains the literal text
"Observation:". Update the fabricated-observation trim in the `action_match` /
`FINAL_ANSWER_ACTION` branch so the search for `Observation:` starts after the
action input boundary rather than from `action_match.start()`, while keeping the
existing `use_stop_words` and `action_match.start() < final_answer_idx` guards
unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1b86d47c-e168-46bc-af7e-dbf89216fff3

📥 Commits

Reviewing files that changed from the base of the PR and between 2b90117 and 8f91862.

📒 Files selected for processing (2)

lib/crewai/src/crewai/utilities/agent_utils.py
lib/crewai/tests/utilities/test_agent_utils.py

…ence Truncate at "\nObservation:" (the actual stop word crewAI configures, see I18N_DEFAULT.slice("observation")) rather than a bare "Observation:". This mirrors exactly what the stop sequence would have removed and prevents an "Observation:" substring inside the Action Input payload from being treated as the fabricated-Observation boundary. Adds a regression test for that case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ritsth · 2026-07-03T19:17:43Z

Thanks @coderabbitai — good catch on the Observation:-inside-payload edge case. Addressed in 886e830, though via a slightly different route than suggested: searching after the Action Input boundary wouldn't help, since the payload (and thus any Observation: substring in it) comes after Action Input:.

Instead I anchored the trim on "\nObservation:" — which is the actual stop sequence crewAI configures (I18N_DEFAULT.slice("observation") at agent/core.py:1016). This mirrors exactly what the stop word would have removed, and because the fabricated Observation is always emitted on its own line per the ReAct format, an Observation: substring inside a single-line JSON Action Input is no longer a match. Added test_observation_substring_in_action_input_preserved to lock it in.

coderabbitai · 2026-07-03T19:18:56Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cd $(fd -t d crewai -d 3 | head -1) 2>/dev/null || true
find / -maxdepth 2 -iname "*crew*" 2>/dev/null

Length of output: 253

🏁 Script executed:

#!/bin/bash
set -e
gh pr diff 6450 --repo crewAIInc/crewAI > /tmp/pr.diff || true
wc -l /tmp/pr.diff
grep -n "Observation" /tmp/pr.diff | head -50

Length of output: 1711

🏁 Script executed:

#!/bin/bash
sed -n '1,168p' /tmp/pr.diff

Length of output: 7880

This confirms the fix addresses the edge case correctly — anchoring on "\nObservation:" (the real stop sequence) instead of a bare "Observation:" search avoids false positives when that word legitimately appears inside a single-line Action Input JSON payload. The new test_observation_substring_in_action_input_preserved test locks in exactly this behavior.

@ritsth Thanks for the detailed explanation and the fix in 886e830 — your reasoning makes sense. Anchoring on "\nObservation:" is indeed a more faithful match for the actual stop sequence (I18N_DEFAULT.slice("observation")), and it correctly sidesteps the case where the literal string "Observation:" appears inside a single-line JSON Action Input payload, since the fabricated observation is always on its own line per the ReAct format. This is a cleaner approach than what I suggested. The added test covers the edge case well.

✏️ Learnings added

Learnt from: ritsth
URL: https://github.com/crewAIInc/crewAI/pull/6450

Timestamp: 2026-07-03T19:18:55.512Z
Learning: In `lib/crewai/src/crewai/utilities/agent_utils.py`, the `process_llm_response` function's fabricated-Observation recovery logic (for models with `use_stop_words=False`) anchors truncation on the literal `"\nObservation:"` substring (with leading newline) rather than a bare `"Observation:"` search. This is intentional: `"\nObservation:"` mirrors the actual stop sequence crewAI configures (`I18N_DEFAULT.slice("observation")`), and since the fabricated Observation is always emitted on its own line per the ReAct format, this avoids false-positive truncation when the substring "Observation:" legitimately appears inside a single-line JSON `Action Input` payload. Test coverage for this edge case exists in `test_observation_substring_in_action_input_preserved` in `lib/crewai/tests/utilities/test_agent_utils.py`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

ritsth mentioned this pull request Jul 3, 2026

[BUG] 🐞Agent does not actually invoke tools, only simulates tool usage with fabricated output #3154

Closed

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): recover real tool call from fabricated Observation continuations#6450

fix(agents): recover real tool call from fabricated Observation continuations#6450
ritsth wants to merge 2 commits into
crewAIInc:mainfrom
ritsth:fix/recover-real-action-from-fabricated-observation

ritsth commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

ritsth commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ritsth commented Jul 3, 2026

Problem

Fix

Tests

Disclosure

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ritsth commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading