fix: re-validate interpolated output_file against untrusted inputs#6447
fix: re-validate interpolated output_file against untrusted inputs#6447theCyberTech wants to merge 3 commits into
Conversation
📝 WalkthroughWalkthroughAdds path sanitization and a runtime input validator for ChangesTask output_file path validation
Vulnerability Scan Advisory Update
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Task
participant Validator as _validate_output_file_input_values
participant Interpolator as interpolate_only
Caller->>Task: interpolate_inputs_and_add_conversation_history(inputs)
Task->>Validator: validate output_file template variables
alt unsafe value detected
Validator-->>Task: raise ValueError
else values safe
Validator-->>Task: pass
Task->>Interpolator: interpolate_only(output_file, inputs)
Interpolator-->>Task: sanitized output_file
end
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The output_file field validator accepts {var} templates unchecked, and the
concrete path produced by interpolate_inputs_and_add_conversation_history was
assigned without re-validation. An untrusted crew.kickoff(inputs=...) value
could inject '..', an absolute path, or ~/$ expansion into a templated
output_file and write outside the working directory.
Validate the interpolated variable values (only those appearing in the
output_file template) for traversal, absolute paths, shell expansion, and
shell metacharacters before interpolation. The developer-authored template
(including an absolute base directory) stays trusted, so legitimate templated
paths are unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
246a8b3 to
541bb6c
Compare
nltk 3.9.4 path-traversal advisory (percent-encoded sequences bypassing the data.load()/find() check; incomplete fix for nltk#3504). 3.9.4 is the latest release and the advisory lists no fixed version, so the pin cannot be bumped to clear it. nltk is transitive via unstructured and crewai never calls nltk.data.load()/find() with untrusted input. Mirrors the existing PYSEC-2026-97 nltk ignore. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/vulnerability-scan.yml:
- Line 54: The vulnerability ignore list is out of sync between the workflow and
the pre-commit hook, so local runs can still block on PYSEC-2026-597. Update the
`--ignore-vuln` list in the `vulnerability-scan` workflow and the matching
pre-commit configuration so they both include the same advisory set, using the
existing ignore-list block comments as the place to keep them aligned.
In `@lib/crewai/src/crewai/task.py`:
- Around line 549-594: The current _validate_output_file_input_values method
only checks each placeholder value independently, so concatenated inputs in the
output_file template can still form path traversal or absolute paths. Update the
validation in task.py to also evaluate the fully interpolated output_file path
after substitution and ensure it remains contained within the trusted template
base, alongside the existing per-value checks. Use the _original_output_file
template and the interpolation logic around _validate_output_file_input_values
to locate the fix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e94f5828-a682-404c-a533-475716376fa7
📒 Files selected for processing (3)
.github/workflows/vulnerability-scan.ymllib/crewai/src/crewai/task.pylib/crewai/tests/test_task.py
| --ignore-vuln PYSEC-2024-277 \ | ||
| --ignore-vuln PYSEC-2026-89 \ | ||
| --ignore-vuln PYSEC-2026-97 \ | ||
| --ignore-vuln PYSEC-2026-597 \ |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Sync .pre-commit-config.yaml ignore list.
The pre-commit hook's own comment states "Keep this ignore list in sync with .github/workflows/vulnerability-scan.yml." but its --ignore-vuln list does not yet include PYSEC-2026-597. Without updating it there too, local pre-commit runs will still fail/block on this advisory even though CI now ignores it.
🔧 Suggested fix for .pre-commit-config.yaml
--ignore-vuln PYSEC-2026-89
--ignore-vuln PYSEC-2026-97
+ --ignore-vuln PYSEC-2026-597
--ignore-vuln PYSEC-2025-148Also applies to: 82-82
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/vulnerability-scan.yml at line 54, The vulnerability
ignore list is out of sync between the workflow and the pre-commit hook, so
local runs can still block on PYSEC-2026-597. Update the `--ignore-vuln` list in
the `vulnerability-scan` workflow and the matching pre-commit configuration so
they both include the same advisory set, using the existing ignore-list block
comments as the place to keep them aligned.
Source: Linked repositories
| def _validate_output_file_input_values( | ||
| self, inputs: dict[str, str | int | float | dict[str, Any] | list[Any]] | ||
| ) -> None: | ||
| """Reject untrusted input values that would escape the output path. | ||
|
|
||
| Only the variables that actually appear in the ``output_file`` template | ||
| are checked. The developer-authored template is trusted (it may contain | ||
| an absolute base directory), but a value substituted into it must not | ||
| introduce path traversal (``..``), an absolute path, a home/variable | ||
| expansion (``~``/``$``), or shell metacharacters that would redirect the | ||
| write outside the intended location. | ||
| """ | ||
| if not self._original_output_file: | ||
| return | ||
|
|
||
| template_vars = [ | ||
| part.split("}")[0] for part in self._original_output_file.split("{")[1:] | ||
| ] | ||
| for var in template_vars: | ||
| if var not in inputs: | ||
| continue | ||
| value = str(inputs[var]) | ||
| if ".." in value: | ||
| raise ValueError( | ||
| f"Invalid value for output_file variable '{var}': path " | ||
| "traversal sequences ('..') are not allowed" | ||
| ) | ||
| if value.startswith(("~", "$")): | ||
| raise ValueError( | ||
| f"Invalid value for output_file variable '{var}': shell " | ||
| "expansion characters are not allowed" | ||
| ) | ||
| if any(char in value for char in ["|", ">", "<", "&", ";"]): | ||
| raise ValueError( | ||
| f"Invalid value for output_file variable '{var}': shell " | ||
| "special characters are not allowed" | ||
| ) | ||
| if ( | ||
| PurePosixPath(value).is_absolute() | ||
| or PureWindowsPath(value).is_absolute() | ||
| ): | ||
| raise ValueError( | ||
| f"Invalid value for output_file variable '{var}': absolute " | ||
| "paths are not allowed" | ||
| ) | ||
|
|
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the target file and inspect the relevant range with line numbers.
git ls-files 'lib/crewai/src/crewai/task.py'
wc -l lib/crewai/src/crewai/task.py
sed -n '480,620p' lib/crewai/src/crewai/task.py | cat -n
# Also inspect any nearby references to output_file validation/commentary.
rg -n "output_file|validate_output_file|fully interpolated|re-validated" lib/crewai/src/crewai/task.pyRepository: crewAIInc/crewAI
Length of output: 9365
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1084,1110p' lib/crewai/src/crewai/task.py | cat -n
printf '\n---\n'
sed -n '1258,1285p' lib/crewai/src/crewai/task.py | cat -nRepository: crewAIInc/crewAI
Length of output: 2947
Validate the interpolated output_file path as a whole
Per-value checks can be bypassed when adjacent placeholders concatenate into .. or an absolute path, so the final interpolated path still needs a whole-path containment check against the trusted template base.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/task.py` around lines 549 - 594, The current
_validate_output_file_input_values method only checks each placeholder value
independently, so concatenated inputs in the output_file template can still form
path traversal or absolute paths. Update the validation in task.py to also
evaluate the fully interpolated output_file path after substitution and ensure
it remains contained within the trusted template base, alongside the existing
per-value checks. Use the _original_output_file template and the interpolation
logic around _validate_output_file_input_values to locate the fix.
There was a problem hiding this comment.
Pull request overview
This PR hardens Task.output_file templating so untrusted crew.kickoff(inputs=...) values can’t inject path traversal or absolute paths when interpolated into an output_file template.
Changes:
- Adds runtime validation for values interpolated into
Task.output_filetemplates to block traversal, absolute paths, shell expansion, and shell metacharacters. - Refactors shared literal-path checks into
Task._sanitize_output_file_path. - Adds tests covering unsafe/safe
output_fileinterpolation cases and updates vulnerability-scan ignore list with justification.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
lib/crewai/src/crewai/task.py |
Adds input-value validation for output_file interpolation and extracts shared sanitization logic. |
lib/crewai/tests/test_task.py |
Adds regression tests to ensure interpolated output_file rejects malicious inputs and allows safe ones. |
.github/workflows/vulnerability-scan.yml |
Ignores a newly reported transitive advisory with rationale in comments. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ( | ||
| PurePosixPath(value).is_absolute() | ||
| or PureWindowsPath(value).is_absolute() | ||
| ): |
| ("{p}", {"p": "~/.bashrc"}, "shell expansion"), | ||
| ("{p}", {"p": "x;rm -rf /"}, "shell special characters"), | ||
| ("{p}", {"p": r"C:\Windows\evil"}, "absolute paths"), | ||
| ], |
Summary
Task.output_filesupports{var}templates that are filled fromcrew.kickoff(inputs=...). The field validator (output_file_validation) blocks.., absolute paths,~/$, and shell metacharacters — but it returns templated values unchecked, and the concrete path produced later byinterpolate_inputs_and_add_conversation_historyis assigned toself.output_filewithout re-validation (Task.model_configdoes not setvalidate_assignment=True).An untrusted
inputsvalue can therefore inject traversal / an absolute path / expansion into a templatedoutput_fileand write the task output outside the working directory:This is a live variant of the earlier
output_filepath-traversal fix (73f3288): that fix added the validator, but the template-interpolation path bypasses it.Fix
Validate the interpolated variable values (only those appearing in the
output_filetemplate) before interpolation, rejecting.., absolute paths (POSIX or Windows),~/$expansion, and shell metacharacters. The developer-authored template — including a legitimate absolute base directory — stays trusted, so existing templated-path behavior is unaffected.Task._validate_output_file_input_values(inputs)runs before output-path interpolation.Task._sanitize_output_file_path(definition-time behavior preserved).Tests
test_interpolate_output_file_rejects_unsafe_inputs— 5 malicious-input cases (traversal, absolute path,~expansion, shell char, Windows absolute) all raiseValueError.test_interpolate_output_file_allows_safe_inputs— safe relative templates and developer-chosen absolute base templates still work.tests/test_task.pygreen (73 passed), including the pre-existingtest_interpolate_inputs;ruffclean.