Skip to content

feat(analyzer): detect insecure deserialization (AST10, TT6, DS1–DS4)#246

Open
AbhiramDwivedi wants to merge 1 commit into
NVIDIA:mainfrom
AbhiramDwivedi:feat/insecure-deserialization-detection
Open

feat(analyzer): detect insecure deserialization (AST10, TT6, DS1–DS4)#246
AbhiramDwivedi wants to merge 1 commit into
NVIDIA:mainfrom
AbhiramDwivedi:feat/insecure-deserialization-detection

Conversation

@AbhiramDwivedi

Copy link
Copy Markdown
Contributor

What & why

Closes #245. Adds insecure-deserialization detection (CWE-502; OWASP ASI05 – Unexpected Code Execution). Before this change an RCE-class skill — e.g. PHP unserialize($_GET…) — scanned SAFE / 0.

Changes

  • behavioral_ast (AST10)pickle/cPickle/_pickle/marshal/dill/jsonpickle/joblib/pandas.read_pickle, plus argument-aware yaml.load, torch.load, numpy.load so hardened forms (SafeLoader, weights_only=True, default allow_pickle=False) are not flagged. MEDIUM / 0.70.
  • behavioral_taint_tracking (TT6) — external or file input → deserialization sink; HIGH / 0.85 (the deserialization analogue of TT5). File-read sources are deliberately included: loading a bundled/downloaded blob is the classic skill vector.
  • static_patterns_deserialization (DS1–DS4) — new language-gated regex module for the non-Python scripts a skill may bundle: PHP unserialize (DS1), Ruby Marshal/restore (DS2), Ruby YAML/Psych/Oj (DS3), JS node-serialize/serialize-to-js/funcster (DS4). Registered in the analyzer registry; Python is intentionally excluded here (covered with AST/taint precision above).
  • pattern_defaults.py — new Insecure Deserialization category plus explanation / remediation / display-name / category metadata for every new rule.

Scope

Only the languages SkillSpector already supports — Python (deep), JS/TS, Ruby, PHP (breadth). Java/.NET are intentionally out of scope.

Tests / validation

  • New unit tests: test_behavioral_ast.py, test_behavioral_taint_tracking.py, test_static_patterns_deserialization.py; test_registry.py updated for the new node.
  • make lint clean; ruff format --check clean; full unit suite: 1294 passed, 15 skipped, 6 xfailed.
  • E2E through the graph: a multi-language fixture → AST10 + TT6 + DS1 + DS2 + DS4, verdict DO_NOT_INSTALL / 90. Hardened Python forms (yaml.safe_load, weights_only=True) confirmed not false-positived.

All commits are DCO signed-off.

Closes the insecure-deserialization gap (OWASP ASI05 - Unexpected Code
Execution) across the analyzer stack:

- behavioral_ast (AST10): flags pickle / marshal / dill / jsonpickle /
  joblib / pandas.read_pickle, plus argument-aware yaml.load, torch.load,
  and numpy.load so the hardened forms (SafeLoader, weights_only=True,
  default allow_pickle=False) are not false-positived.
- behavioral_taint_tracking (TT6): external or file input -> deserialization
  sink, the RCE-class flow analogue of TT5.
- static_patterns_deserialization (DS1-DS4): language-gated regex breadth
  for the non-Python scripts a skill may bundle (PHP unserialize, Ruby
  Marshal/YAML/Oj, JS node-serialize/funcster).

Registers the new analyzer node, adds rule metadata (explanations,
remediations, category, pattern names), and ships unit tests for all rules
including hardened-form and language-gating negative cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Ram Dwivedi <abhiram.dwivedi@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] Insecure deserialization (pickle / PHP unserialize / Ruby Marshal / JS node-serialize) not detected — an RCE-class skill scores 0/SAFE

1 participant