Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,135 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [8.2.0] — Unreleased

### OSV Cache Staleness Flags + `cache_info` Output (issue #30)

Phase 1 roadmap (#21) checklist item: "Fix vuln DB staleness (OSV.dev
API, update scheduler)". The OSV client had a 24h TTL cache with
`cleanup()` but **no staleness indicator in vuln-scan output** and
**no way to force a refresh** — agents consuming `vuln-scan` had no
way to know whether the cached CVE data was fresh or stale, and no
way to override the 24h TTL for a single run.

This change adds three things:

1. **`cache_info` block in vuln-scan output** — a new top-level key
in the `vuln-scan` JSON describing OSV cache freshness:
```json
"cache_info": {
"last_refresh": "2026-06-28T10:00:00Z",
"age_hours": 23.5,
"ttl_hours": 24,
"is_stale": false,
"stale_packages": []
}
```
`last_refresh` is the ISO 8601 UTC timestamp of the most-recent
cache entry among the packages queried in this run. `age_hours` is
its age. `is_stale` is `true` when any queried package's cache
entry is past TTL or missing. `stale_packages` lists the
`"name@version"` strings of stale/missing packages (sorted for
deterministic output).

2. **`--refresh` flag** — `codelens vuln-scan --refresh` bypasses the
OSV cache and forces a fresh OSV.dev API call for every package.
The cache is updated with the new results. Silently ignored in
`--offline` mode (no network to refresh from).

3. **`--max-age Nh` flag** — `codelens vuln-scan --max-age 6h` treats
cache entries older than 6 hours as stale for this run only,
re-fetching them from the API. The stored TTL is **not** modified
(per-run override only). Accepts `Nh` (hours), `Nm` (minutes),
`Ns` (seconds), `Nd` (days), or a bare integer (interpreted as
hours, matching `--osv-ttl` semantics). `--max-age 0` is
equivalent to `--refresh` for cached entries.

Network calls happen only when `--refresh` is set OR the cache is
expired/missing/stale-per-`--max-age`. Default behaviour (no flags)
is unchanged: cached entries within TTL are served from the cache.

### Added (issue #30)

- **`scripts/osv_client.py:OSVCache.peek(key)`** — New method.
Returns the raw `(response, timestamp, ttl)` tuple WITHOUT applying
the stored TTL or deleting the entry. This is what `--max-age`
relies on to apply a per-run TTL threshold without mutating stored
state. Corrupt entries (invalid JSON) are deleted and treated as
missing, matching `get()`'s behaviour.
- **`scripts/osv_client.py:OSVClient.query_packages(packages,
force_refresh=False, max_age=None)`** — New optional params.
`force_refresh=True` bypasses the cache entirely (issue #30
`--refresh`); `max_age=N` (seconds) uses `peek()` to apply a
per-run TTL threshold (issue #30 `--max-age`). Behaviour is
unchanged when both are unset.
- **`scripts/osv_client.py:OSVClient._parse_cached_response(cached,
package)`** — New private helper. Factors the two-shape cache
parsing (list of vuln IDs vs list of full vuln dicts) out of
`query_packages` so all three code paths (normal, force_refresh,
max_age) share it. Zero dead code — the inline parsing logic was
moved, not duplicated.
- **`scripts/osv_client.py:OSVClient.get_cache_info(packages)`** —
New method. Returns the `cache_info` dict described above.
Packages with unsupported ecosystems are skipped. Missing entries
are treated as stale.
- **`scripts/commands/vuln_scan.py:_parse_max_age(raw)`** — New
helper. Parses `--max-age` duration strings into seconds.
- **`scripts/commands/vuln_scan.py`** — New `--refresh` and
`--max-age` CLI flags.
- **`tests/test_vuln_staleness.py`** — 39 tests across 7 classes
covering `_parse_max_age`, `OSVCache.peek`, `get_cache_info`
(empty/all-stale/all-fresh/mixed/sorted/ttl), `force_refresh`
(bypasses cache / uses cache / ignored offline), `max_age`
(old→stale / young→fresh / stored TTL unchanged / `0`=refresh),
end-to-end `scan_vulnerabilities` output on `clean_app` and
`vulnerable_app` fixtures, and CLI arg wiring. All network-free
(API calls mocked via `unittest.mock.patch.object`).

### Changed (issue #30)

- **`scripts/vulnscan_engine.py:scan_vulnerabilities()`** — Gains
`refresh` and `max_age` params, forwarded to
`osv_client.query_packages(force_refresh=, max_age=)`. Computes a
`cache_info` block after the OSV query (three code paths: success
→ from `get_cache_info()`; no packages → empty shape; OSV
exception → empty shape with `error` field). The return dict now
includes a `cache_info` key.
- **`scripts/commands/vuln_scan.py:execute()`** — Validates
`--max-age` via `_parse_max_age()` before calling the engine.
Invalid `--max-age` returns a structured
`{status:'error', error:'invalid_argument', message:...}` dict
instead of raising.

### Non-Breaking (issue #30)

- The `cache_info` block is additive — no existing `vuln-scan`
output key is removed or renamed. Consumers who don't read
`cache_info` see no change.
- `scan_vulnerabilities()`'s new params (`refresh`, `max_age`) are
optional with defaults (`False`, `None`), so existing callers are
unaffected.
- `OSVClient.query_packages()`'s new params are optional with
defaults (`False`, `None`); existing callers (including
`query_single`, `batch_query`, and `scan_with_osv`) are
unaffected.
- `OSVCache.peek()` is a new method; no existing method's signature
or behaviour changes.
- Network behaviour is unchanged by default: the OSV API is only
contacted when `--refresh` is set OR a cache entry is expired /
missing / stale per `--max-age`. The default 24h TTL path is
byte-for-byte identical to the pre-issue-#30 code.
- `--refresh` is silently ignored in `--offline` mode (matches the
existing offline contract — no network calls are ever attempted
when `offline=True`).

### Migration Notes for Agent Authors (issue #30)

Agents that consume `vuln-scan` output can now check
`cache_info.is_stale` to decide whether to trust the cached CVE
results. If stale, re-run with `--refresh` (force fresh API calls
for all packages) or `--max-age 6h` (only re-fetch entries older
than 6 hours, cheaper than a full refresh). `stale_packages` lists
the specific packages that need attention.

### Incremental Graph Update (issue #25)

Previously, `scan --incremental` updated only the flat backend registry
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ python3 scripts/codelens.py query "myFunction" --lite
| Command | Description |
|---------|-------------|
| `secrets [workspace] [--severity ...]` | Detect hardcoded API keys, passwords, tokens |
| `vuln-scan [workspace]` | Scan dependencies for known CVEs (OSV.dev + native audit) |
| `vuln-scan [workspace] [--severity ...] [--offline] [--osv-ttl N] [--refresh] [--max-age Nh]` | Scan dependencies for known CVEs (OSV.dev + native audit). `--refresh` bypasses the OSV cache and forces fresh API calls; `--max-age Nh` treats cache entries older than N hours as stale for this run only (issue #30). Output includes a `cache_info` block (`last_refresh`, `age_hours`, `ttl_hours`, `is_stale`, `stale_packages`) so agents can decide whether to trust the cached CVE data. |
| `taint [workspace]` | Run AST-based taint analysis for vulnerability detection |
| `dataflow [workspace] [--source] [--sink]` | Data flow taint analysis with cross-file call graph |
| `env-check [workspace] [--var NAME]` | Audit environment variables |
Expand Down
6 changes: 4 additions & 2 deletions SKILL-QUICK.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ $CLI list --limit 5 --offset 10 --format compact # → paginated + co
| `debug-leak` | `{stats, top_leaks[], leaks_total}` |
| `perf-hint` | `{risk, stats, top_hints[], hints_total}` |
| `secrets` | `{risk, action, stats, top_findings[]}` |
| `a11y` / `css-deep` / `regex-audit` / `vuln-scan` | `{risk, stats, top_items[], recommendations[]}` |
| `a11y` / `css-deep` / `regex-audit` | `{risk, stats, top_items[], recommendations[]}` |
| `vuln-scan` | `{risk, stats, findings[], osv_stats, cache_info{last_refresh, age_hours, ttl_hours, is_stale, stale_packages[]}, recommendations[]}` — `cache_info.is_stale` tells agents whether to re-run with `--refresh` (issue #30) |
| `taint` | `{status, stats, top_violations[], recommendations[]}` |
| `guard` | `{status, risk, action, blocked_reason?}` |
| `check` | `{status, exit_code, total_findings, critical_count}` |
Expand Down Expand Up @@ -74,6 +75,7 @@ $CLI list --limit 5 --offset 10 --format compact # → paginated + co
| "safe to rename?" | `refactor-safe` |
| "production ready?" | `smell` → `complexity` → `debug-leak` → `secrets` |
| "security audit" | `secrets` → `dataflow` → `env-check` → `vuln-scan` |
| "are CVE results fresh?" | `vuln-scan` → check `cache_info.is_stale` → if stale, re-run `vuln-scan --refresh` or `vuln-scan --max-age 6h` (issue #30) |
| "taint analysis" | `taint` (AST) or `dataflow` (cross-file) |
| "what to refactor?" | `smell` |
| "too complex?" | `complexity` |
Expand Down Expand Up @@ -126,7 +128,7 @@ $CLI list --limit 5 --offset 10 --format compact # → paginated + co
`entrypoints` · `api-map` · `state-map` · `detect` · `handbook` · `diff [--git-aware]` · `dashboard` · `history` · `graph-schema` · `resolve-types`

### Security (5)
`secrets [--severity ...]` · `taint` (AST-based) · `dataflow [--source ...] [--sink ...]` (cross-file) · `vuln-scan` (OSV.dev + native audit) · `env-check [--var NAME]`
`secrets [--severity ...]` · `taint` (AST-based) · `dataflow [--source ...] [--sink ...]` (cross-file) · `vuln-scan [--offline] [--osv-ttl N] [--refresh] [--max-age Nh]` (OSV.dev + native audit; `--refresh` bypasses cache, `--max-age Nh` overrides per-run TTL, `cache_info` in output signals staleness — issue #30) · `env-check [--var NAME]`

### Quality (9)
`smell [--categories ...] [--severity ...]` · `complexity [--name FN] [--threshold N] [--sort ...]` · `dead-code [--categories ...]` · `debug-leak [--category ...]` · `circular [--domain ...]` · `missing-refs` · `side-effect [--name FN]` · `perf-hint [--severity ...] [--category ...]` · `fix [--apply]`
Expand Down
66 changes: 66 additions & 0 deletions scripts/commands/vuln_scan.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,55 @@
"""Vuln-scan command — Scan dependencies for known CVEs using OSV.dev + native audit tools."""

import re

from vulnscan_engine import scan_vulnerabilities
from commands import register_command


# --max-age accepts a duration string like "6h", "30m", "2d", or a bare
# integer (interpreted as hours). Returns the value in seconds.
_MAX_AGE_RE = re.compile(r"^\s*(\d+(?:\.\d+)?)\s*([hmsd]?)\s*$", re.IGNORECASE)
_MAX_AGE_UNITS = {
"": 3600, # bare number → hours (matches --osv-ttl semantics)
"h": 3600,
"m": 60,
"s": 1,
"d": 86400,
}


def _parse_max_age(raw):
"""Parse a --max-age duration string into seconds.

Accepts forms like ``6h`` (6 hours), ``30m`` (30 minutes), ``2d``
(2 days), ``90s`` (90 seconds), or a bare integer (interpreted as
hours, matching ``--osv-ttl`` semantics).

Args:
raw: The raw string from argparse. May be None.

Returns:
int number of seconds, or None if ``raw`` is None.

Raises:
ValueError: If the string cannot be parsed.
"""
if raw is None:
return None
match = _MAX_AGE_RE.match(str(raw))
if match is None:
raise ValueError(
f"invalid --max-age value {raw!r} — expected forms like "
f"'6h', '30m', '2d', '90s', or a bare integer (hours)"
)
value = float(match.group(1))
unit = match.group(2).lower()
seconds = int(value * _MAX_AGE_UNITS[unit])
if seconds <= 0:
raise ValueError(f"--max-age must be positive, got {raw!r}")
return seconds


def add_args(parser):
parser.add_argument("workspace", nargs="?", default=None,
help="Path to workspace root (auto-detected if omitted)")
Expand All @@ -13,14 +59,34 @@ def add_args(parser):
help="Skip OSV.dev API queries (use cached data only)")
parser.add_argument("--osv-ttl", type=int, default=86400,
help="OSV cache TTL in seconds (default: 86400 = 24h)")
parser.add_argument("--refresh", action="store_true", default=False,
help="Bypass OSV cache and force fresh API calls for every "
"package (issue #30). Updates the cache with new results. "
"Ignored in --offline mode.")
parser.add_argument("--max-age", dest="max_age", default=None,
help="Treat OSV cache entries older than this as stale for "
"this run only (issue #30). Examples: '6h' (6 hours), "
"'30m' (30 minutes), '2d' (2 days), '90s' (90 seconds), "
"or a bare integer (interpreted as hours). Overrides the "
"default 24h TTL for this run; does not change stored TTL.")


def execute(args, workspace):
try:
max_age_seconds = _parse_max_age(getattr(args, "max_age", None))
except ValueError as exc:
return {
"status": "error",
"error": "invalid_argument",
"message": str(exc),
}
return scan_vulnerabilities(
workspace,
severity=args.severity,
offline=args.offline,
osv_ttl=args.osv_ttl,
refresh=args.refresh,
max_age=max_age_seconds,
)


Expand Down
Loading
Loading