diff --git a/CHANGELOG.md b/CHANGELOG.md index 6a68bb8..68a7859 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,135 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [8.2.0] — Unreleased +### OSV Cache Staleness Flags + `cache_info` Output (issue #30) + +Phase 1 roadmap (#21) checklist item: "Fix vuln DB staleness (OSV.dev +API, update scheduler)". The OSV client had a 24h TTL cache with +`cleanup()` but **no staleness indicator in vuln-scan output** and +**no way to force a refresh** — agents consuming `vuln-scan` had no +way to know whether the cached CVE data was fresh or stale, and no +way to override the 24h TTL for a single run. + +This change adds three things: + +1. **`cache_info` block in vuln-scan output** — a new top-level key + in the `vuln-scan` JSON describing OSV cache freshness: + ```json + "cache_info": { + "last_refresh": "2026-06-28T10:00:00Z", + "age_hours": 23.5, + "ttl_hours": 24, + "is_stale": false, + "stale_packages": [] + } + ``` + `last_refresh` is the ISO 8601 UTC timestamp of the most-recent + cache entry among the packages queried in this run. `age_hours` is + its age. `is_stale` is `true` when any queried package's cache + entry is past TTL or missing. `stale_packages` lists the + `"name@version"` strings of stale/missing packages (sorted for + deterministic output). + +2. **`--refresh` flag** — `codelens vuln-scan --refresh` bypasses the + OSV cache and forces a fresh OSV.dev API call for every package. + The cache is updated with the new results. Silently ignored in + `--offline` mode (no network to refresh from). + +3. **`--max-age Nh` flag** — `codelens vuln-scan --max-age 6h` treats + cache entries older than 6 hours as stale for this run only, + re-fetching them from the API. The stored TTL is **not** modified + (per-run override only). Accepts `Nh` (hours), `Nm` (minutes), + `Ns` (seconds), `Nd` (days), or a bare integer (interpreted as + hours, matching `--osv-ttl` semantics). `--max-age 0` is + equivalent to `--refresh` for cached entries. + +Network calls happen only when `--refresh` is set OR the cache is +expired/missing/stale-per-`--max-age`. Default behaviour (no flags) +is unchanged: cached entries within TTL are served from the cache. + +### Added (issue #30) + +- **`scripts/osv_client.py:OSVCache.peek(key)`** — New method. + Returns the raw `(response, timestamp, ttl)` tuple WITHOUT applying + the stored TTL or deleting the entry. This is what `--max-age` + relies on to apply a per-run TTL threshold without mutating stored + state. Corrupt entries (invalid JSON) are deleted and treated as + missing, matching `get()`'s behaviour. +- **`scripts/osv_client.py:OSVClient.query_packages(packages, + force_refresh=False, max_age=None)`** — New optional params. + `force_refresh=True` bypasses the cache entirely (issue #30 + `--refresh`); `max_age=N` (seconds) uses `peek()` to apply a + per-run TTL threshold (issue #30 `--max-age`). Behaviour is + unchanged when both are unset. +- **`scripts/osv_client.py:OSVClient._parse_cached_response(cached, + package)`** — New private helper. Factors the two-shape cache + parsing (list of vuln IDs vs list of full vuln dicts) out of + `query_packages` so all three code paths (normal, force_refresh, + max_age) share it. Zero dead code — the inline parsing logic was + moved, not duplicated. +- **`scripts/osv_client.py:OSVClient.get_cache_info(packages)`** — + New method. Returns the `cache_info` dict described above. + Packages with unsupported ecosystems are skipped. Missing entries + are treated as stale. +- **`scripts/commands/vuln_scan.py:_parse_max_age(raw)`** — New + helper. Parses `--max-age` duration strings into seconds. +- **`scripts/commands/vuln_scan.py`** — New `--refresh` and + `--max-age` CLI flags. +- **`tests/test_vuln_staleness.py`** — 39 tests across 7 classes + covering `_parse_max_age`, `OSVCache.peek`, `get_cache_info` + (empty/all-stale/all-fresh/mixed/sorted/ttl), `force_refresh` + (bypasses cache / uses cache / ignored offline), `max_age` + (old→stale / young→fresh / stored TTL unchanged / `0`=refresh), + end-to-end `scan_vulnerabilities` output on `clean_app` and + `vulnerable_app` fixtures, and CLI arg wiring. All network-free + (API calls mocked via `unittest.mock.patch.object`). + +### Changed (issue #30) + +- **`scripts/vulnscan_engine.py:scan_vulnerabilities()`** — Gains + `refresh` and `max_age` params, forwarded to + `osv_client.query_packages(force_refresh=, max_age=)`. Computes a + `cache_info` block after the OSV query (three code paths: success + → from `get_cache_info()`; no packages → empty shape; OSV + exception → empty shape with `error` field). The return dict now + includes a `cache_info` key. +- **`scripts/commands/vuln_scan.py:execute()`** — Validates + `--max-age` via `_parse_max_age()` before calling the engine. + Invalid `--max-age` returns a structured + `{status:'error', error:'invalid_argument', message:...}` dict + instead of raising. + +### Non-Breaking (issue #30) + +- The `cache_info` block is additive — no existing `vuln-scan` + output key is removed or renamed. Consumers who don't read + `cache_info` see no change. +- `scan_vulnerabilities()`'s new params (`refresh`, `max_age`) are + optional with defaults (`False`, `None`), so existing callers are + unaffected. +- `OSVClient.query_packages()`'s new params are optional with + defaults (`False`, `None`); existing callers (including + `query_single`, `batch_query`, and `scan_with_osv`) are + unaffected. +- `OSVCache.peek()` is a new method; no existing method's signature + or behaviour changes. +- Network behaviour is unchanged by default: the OSV API is only + contacted when `--refresh` is set OR a cache entry is expired / + missing / stale per `--max-age`. The default 24h TTL path is + byte-for-byte identical to the pre-issue-#30 code. +- `--refresh` is silently ignored in `--offline` mode (matches the + existing offline contract — no network calls are ever attempted + when `offline=True`). + +### Migration Notes for Agent Authors (issue #30) + +Agents that consume `vuln-scan` output can now check +`cache_info.is_stale` to decide whether to trust the cached CVE +results. If stale, re-run with `--refresh` (force fresh API calls +for all packages) or `--max-age 6h` (only re-fetch entries older +than 6 hours, cheaper than a full refresh). `stale_packages` lists +the specific packages that need attention. + ### Incremental Graph Update (issue #25) Previously, `scan --incremental` updated only the flat backend registry diff --git a/README.md b/README.md index a05b208..ea56752 100644 --- a/README.md +++ b/README.md @@ -121,7 +121,7 @@ python3 scripts/codelens.py query "myFunction" --lite | Command | Description | |---------|-------------| | `secrets [workspace] [--severity ...]` | Detect hardcoded API keys, passwords, tokens | -| `vuln-scan [workspace]` | Scan dependencies for known CVEs (OSV.dev + native audit) | +| `vuln-scan [workspace] [--severity ...] [--offline] [--osv-ttl N] [--refresh] [--max-age Nh]` | Scan dependencies for known CVEs (OSV.dev + native audit). `--refresh` bypasses the OSV cache and forces fresh API calls; `--max-age Nh` treats cache entries older than N hours as stale for this run only (issue #30). Output includes a `cache_info` block (`last_refresh`, `age_hours`, `ttl_hours`, `is_stale`, `stale_packages`) so agents can decide whether to trust the cached CVE data. | | `taint [workspace]` | Run AST-based taint analysis for vulnerability detection | | `dataflow [workspace] [--source] [--sink]` | Data flow taint analysis with cross-file call graph | | `env-check [workspace] [--var NAME]` | Audit environment variables | diff --git a/SKILL-QUICK.md b/SKILL-QUICK.md index 0d0869e..1d7f58c 100755 --- a/SKILL-QUICK.md +++ b/SKILL-QUICK.md @@ -45,7 +45,8 @@ $CLI list --limit 5 --offset 10 --format compact # → paginated + co | `debug-leak` | `{stats, top_leaks[], leaks_total}` | | `perf-hint` | `{risk, stats, top_hints[], hints_total}` | | `secrets` | `{risk, action, stats, top_findings[]}` | -| `a11y` / `css-deep` / `regex-audit` / `vuln-scan` | `{risk, stats, top_items[], recommendations[]}` | +| `a11y` / `css-deep` / `regex-audit` | `{risk, stats, top_items[], recommendations[]}` | +| `vuln-scan` | `{risk, stats, findings[], osv_stats, cache_info{last_refresh, age_hours, ttl_hours, is_stale, stale_packages[]}, recommendations[]}` — `cache_info.is_stale` tells agents whether to re-run with `--refresh` (issue #30) | | `taint` | `{status, stats, top_violations[], recommendations[]}` | | `guard` | `{status, risk, action, blocked_reason?}` | | `check` | `{status, exit_code, total_findings, critical_count}` | @@ -74,6 +75,7 @@ $CLI list --limit 5 --offset 10 --format compact # → paginated + co | "safe to rename?" | `refactor-safe` | | "production ready?" | `smell` → `complexity` → `debug-leak` → `secrets` | | "security audit" | `secrets` → `dataflow` → `env-check` → `vuln-scan` | +| "are CVE results fresh?" | `vuln-scan` → check `cache_info.is_stale` → if stale, re-run `vuln-scan --refresh` or `vuln-scan --max-age 6h` (issue #30) | | "taint analysis" | `taint` (AST) or `dataflow` (cross-file) | | "what to refactor?" | `smell` | | "too complex?" | `complexity` | @@ -126,7 +128,7 @@ $CLI list --limit 5 --offset 10 --format compact # → paginated + co `entrypoints` · `api-map` · `state-map` · `detect` · `handbook` · `diff [--git-aware]` · `dashboard` · `history` · `graph-schema` · `resolve-types` ### Security (5) -`secrets [--severity ...]` · `taint` (AST-based) · `dataflow [--source ...] [--sink ...]` (cross-file) · `vuln-scan` (OSV.dev + native audit) · `env-check [--var NAME]` +`secrets [--severity ...]` · `taint` (AST-based) · `dataflow [--source ...] [--sink ...]` (cross-file) · `vuln-scan [--offline] [--osv-ttl N] [--refresh] [--max-age Nh]` (OSV.dev + native audit; `--refresh` bypasses cache, `--max-age Nh` overrides per-run TTL, `cache_info` in output signals staleness — issue #30) · `env-check [--var NAME]` ### Quality (9) `smell [--categories ...] [--severity ...]` · `complexity [--name FN] [--threshold N] [--sort ...]` · `dead-code [--categories ...]` · `debug-leak [--category ...]` · `circular [--domain ...]` · `missing-refs` · `side-effect [--name FN]` · `perf-hint [--severity ...] [--category ...]` · `fix [--apply]` diff --git a/scripts/commands/vuln_scan.py b/scripts/commands/vuln_scan.py index 96fb006..6305295 100644 --- a/scripts/commands/vuln_scan.py +++ b/scripts/commands/vuln_scan.py @@ -1,9 +1,55 @@ """Vuln-scan command — Scan dependencies for known CVEs using OSV.dev + native audit tools.""" +import re + from vulnscan_engine import scan_vulnerabilities from commands import register_command +# --max-age accepts a duration string like "6h", "30m", "2d", or a bare +# integer (interpreted as hours). Returns the value in seconds. +_MAX_AGE_RE = re.compile(r"^\s*(\d+(?:\.\d+)?)\s*([hmsd]?)\s*$", re.IGNORECASE) +_MAX_AGE_UNITS = { + "": 3600, # bare number → hours (matches --osv-ttl semantics) + "h": 3600, + "m": 60, + "s": 1, + "d": 86400, +} + + +def _parse_max_age(raw): + """Parse a --max-age duration string into seconds. + + Accepts forms like ``6h`` (6 hours), ``30m`` (30 minutes), ``2d`` + (2 days), ``90s`` (90 seconds), or a bare integer (interpreted as + hours, matching ``--osv-ttl`` semantics). + + Args: + raw: The raw string from argparse. May be None. + + Returns: + int number of seconds, or None if ``raw`` is None. + + Raises: + ValueError: If the string cannot be parsed. + """ + if raw is None: + return None + match = _MAX_AGE_RE.match(str(raw)) + if match is None: + raise ValueError( + f"invalid --max-age value {raw!r} — expected forms like " + f"'6h', '30m', '2d', '90s', or a bare integer (hours)" + ) + value = float(match.group(1)) + unit = match.group(2).lower() + seconds = int(value * _MAX_AGE_UNITS[unit]) + if seconds <= 0: + raise ValueError(f"--max-age must be positive, got {raw!r}") + return seconds + + def add_args(parser): parser.add_argument("workspace", nargs="?", default=None, help="Path to workspace root (auto-detected if omitted)") @@ -13,14 +59,34 @@ def add_args(parser): help="Skip OSV.dev API queries (use cached data only)") parser.add_argument("--osv-ttl", type=int, default=86400, help="OSV cache TTL in seconds (default: 86400 = 24h)") + parser.add_argument("--refresh", action="store_true", default=False, + help="Bypass OSV cache and force fresh API calls for every " + "package (issue #30). Updates the cache with new results. " + "Ignored in --offline mode.") + parser.add_argument("--max-age", dest="max_age", default=None, + help="Treat OSV cache entries older than this as stale for " + "this run only (issue #30). Examples: '6h' (6 hours), " + "'30m' (30 minutes), '2d' (2 days), '90s' (90 seconds), " + "or a bare integer (interpreted as hours). Overrides the " + "default 24h TTL for this run; does not change stored TTL.") def execute(args, workspace): + try: + max_age_seconds = _parse_max_age(getattr(args, "max_age", None)) + except ValueError as exc: + return { + "status": "error", + "error": "invalid_argument", + "message": str(exc), + } return scan_vulnerabilities( workspace, severity=args.severity, offline=args.offline, osv_ttl=args.osv_ttl, + refresh=args.refresh, + max_age=max_age_seconds, ) diff --git a/scripts/osv_client.py b/scripts/osv_client.py index fa70248..454accc 100644 --- a/scripts/osv_client.py +++ b/scripts/osv_client.py @@ -249,6 +249,51 @@ def get(self, key: str) -> Optional[List[Dict[str, Any]]]: finally: conn.close() + def peek(self, key: str) -> Optional[Tuple[List[Dict[str, Any]], float, int]]: + """Retrieve a cache entry WITHOUT TTL check or deletion. + + Unlike :meth:`get`, this method does not apply the stored TTL when + deciding whether to return the entry. Callers receive the raw + ``(response, timestamp, ttl)`` tuple and decide staleness themselves + — for example using a ``--max-age`` override (issue #30). + + Corrupt entries (invalid JSON) are deleted and treated as missing. + + Args: + key: Cache key (e.g., ``"npm|lodash|4.17.15"``) + + Returns: + Tuple of ``(response, timestamp, ttl)`` or ``None`` if the key is + not present or corrupt. ``timestamp`` is a Unix epoch float; + ``ttl`` is the stored TTL in seconds. + """ + with self._lock: + conn = sqlite3.connect(self.db_path) + try: + cursor = conn.execute( + "SELECT response_json, timestamp, ttl FROM cache " + "WHERE package_ecosystem_version = ?", + (key,) + ) + row = cursor.fetchone() + if row is None: + return None + + response_json, timestamp, ttl = row + try: + response = json.loads(response_json) + except json.JSONDecodeError: + # Corrupt cache entry — delete and treat as missing + conn.execute( + "DELETE FROM cache WHERE package_ecosystem_version = ?", + (key,) + ) + conn.commit() + return None + return (response, timestamp, ttl) + finally: + conn.close() + def set(self, key: str, response: List[Dict[str, Any]], ttl: Optional[int] = None): """Cache an OSV API response. @@ -449,6 +494,8 @@ def query_single( def query_packages( self, packages: List[OSVPackage], + force_refresh: bool = False, + max_age: Optional[int] = None, ) -> List[OSVVulnerability]: """Query multiple packages against OSV.dev. @@ -457,6 +504,16 @@ def query_packages( Args: packages: List of OSVPackage objects to query + force_refresh: If True, bypass the OSV cache and force fresh + API calls for every package (issue #30 ``--refresh`` flag). + Silently ignored when ``self.offline`` is True (no network + available). Cached entries are still updated with new results. + max_age: Optional per-run TTL override in seconds. When set, + cached entries older than ``max_age`` are treated as stale + and re-fetched from the API for this run only (issue #30 + ``--max-age`` flag). The stored TTL is unchanged. Use + ``max_age=0`` to force-refresh all entries without the + ``force_refresh`` flag. Returns: List of OSVVulnerability objects (deduplicated) @@ -468,31 +525,48 @@ def query_packages( uncached_packages: List[OSVPackage] = [] uncached_keys: List[str] = [] - # Check cache first + # --refresh is meaningless in offline mode (no network to refresh + # from). Fall back to normal cache behaviour so users still get + # whatever cached data exists. + effective_force_refresh = force_refresh and not self.offline + for pkg in packages: if pkg.ecosystem is None: continue # Not supported by OSV cache_key = pkg.cache_key() + + if effective_force_refresh: + # Issue #30 --refresh: bypass cache, force fresh API call. + uncached_packages.append(pkg) + uncached_keys.append(cache_key) + continue + + if max_age is not None: + # Issue #30 --max-age: apply a per-run TTL threshold using + # peek() so the stored entry (and its stored TTL) is left + # intact for future runs. + entry = self.cache.peek(cache_key) + if entry is None: + uncached_packages.append(pkg) + uncached_keys.append(cache_key) + continue + response, timestamp, _stored_ttl = entry + if (time.time() - timestamp) > max_age: + # Stale per --max-age — re-fetch + uncached_packages.append(pkg) + uncached_keys.append(cache_key) + continue + # Fresh per --max-age — use cached response + self._cache_hit_count += 1 + all_vulns.extend(self._parse_cached_response(response, pkg)) + continue + + # Normal mode — TTL-based cache.get() cached = self.cache.get(cache_key) if cached is not None: self._cache_hit_count += 1 - - # cached can be: - # 1. A list of vuln IDs (from /v1/querybatch cache) → fetch each detail - # 2. A list of full vuln dicts (from /v1/query fallback cache) → parse directly - if cached and isinstance(cached[0], str): - # List of vuln IDs → fetch details from cache/API - for vuln_id in cached: - vuln_detail = self._fetch_vuln_detail(vuln_id) - if vuln_detail is not None: - parsed = self._parse_single_vuln(vuln_detail, pkg) - if parsed is not None: - all_vulns.append(parsed) - else: - # List of full vuln dicts → parse directly - vulns = self._parse_osv_response(cached, pkg) - all_vulns.extend(vulns) + all_vulns.extend(self._parse_cached_response(cached, pkg)) else: uncached_packages.append(pkg) uncached_keys.append(cache_key) @@ -504,6 +578,47 @@ def query_packages( return all_vulns + def _parse_cached_response( + self, + cached: List[Any], + package: OSVPackage, + ) -> List[OSVVulnerability]: + """Parse a cached OSV response for a single package. + + The OSV cache stores two response shapes (both as JSON lists): + + 1. A list of vulnerability IDs (strings) — produced by the + ``/v1/querybatch`` endpoint. Each ID must be resolved to its + full detail via :meth:`_fetch_vuln_detail` (which itself uses + the cache). + 2. A list of full vulnerability dicts — produced by the + ``/v1/query`` fallback path. Parsed directly via + :meth:`_parse_osv_response`. + + Args: + cached: The cached JSON list (may be empty). + package: The OSVPackage this cache entry belongs to. + + Returns: + List of OSVVulnerability objects (possibly empty). + """ + if not cached: + return [] + + if isinstance(cached[0], str): + # List of vuln IDs → fetch details from cache/API + results: List[OSVVulnerability] = [] + for vuln_id in cached: + vuln_detail = self._fetch_vuln_detail(vuln_id) + if vuln_detail is not None: + parsed = self._parse_single_vuln(vuln_detail, package) + if parsed is not None: + results.append(parsed) + return results + + # List of full vuln dicts → parse directly + return self._parse_osv_response(cached, package) + def batch_query( self, packages: List[OSVPackage], @@ -1106,6 +1221,90 @@ def _parse_affected( return (affected_str, fixed_str) + def get_cache_info( + self, + packages: List[OSVPackage], + ) -> Dict[str, Any]: + """Compute OSV cache freshness info for the queried packages. + + Implements the ``cache_info`` block requested in issue #30 so that + agents consuming ``vuln-scan`` output can decide whether to trust + the cached CVE data or trigger a refresh. + + The staleness assessment covers only the packages that were + actually queried in this run — other cache entries (from previous + scans of different packages) are ignored. A package counts as + stale if its cache entry is missing OR past the cache's stored + TTL. + + Args: + packages: List of OSVPackage objects that were queried in + this run (typically the result of + ``OSVQueryBuilder.build_from_workspace``). + + Returns: + Dict with the following keys: + + - ``last_refresh``: ISO 8601 UTC timestamp (``YYYY-MM-DDTHH:MM:SSZ``) + of the most recently written cache entry among the queried + packages, or ``None`` if no entries exist. + - ``age_hours``: Age in hours of that most-recent entry + (i.e., how long ago the cache was last refreshed for any + of the queried packages), or ``None`` if no entries exist. + - ``ttl_hours``: The cache TTL in hours (from + ``self.cache.ttl``), rounded to 2 decimals. + - ``is_stale``: ``True`` if any queried package's cache entry + is past TTL or missing. + - ``stale_packages``: List of ``"name@version"`` strings for + stale or missing packages (sorted for deterministic output). + """ + now = time.time() + ttl_seconds = self.cache.ttl + ttl_hours = round(ttl_seconds / 3600.0, 2) + + latest_timestamp: Optional[float] = None + stale_packages: List[str] = [] + + for pkg in packages: + if pkg.ecosystem is None: + continue # Not supported by OSV — skip staleness check + + cache_key = pkg.cache_key() + entry = self.cache.peek(cache_key) + if entry is None: + # Missing cache entry → treat as stale (needs fetch). + stale_packages.append(f"{pkg.name}@{pkg.version}") + continue + + _response, timestamp, _stored_ttl = entry + if latest_timestamp is None or timestamp > latest_timestamp: + latest_timestamp = timestamp + + if (now - timestamp) > ttl_seconds: + stale_packages.append(f"{pkg.name}@{pkg.version}") + + # Deterministic ordering for stable test/output assertions. + stale_packages.sort() + + if latest_timestamp is not None: + last_refresh = time.strftime( + "%Y-%m-%dT%H:%M:%SZ", time.gmtime(latest_timestamp) + ) + age_hours = round((now - latest_timestamp) / 3600.0, 2) + else: + last_refresh = None + age_hours = None + + is_stale = bool(stale_packages) + + return { + "last_refresh": last_refresh, + "age_hours": age_hours, + "ttl_hours": ttl_hours, + "is_stale": is_stale, + "stale_packages": stale_packages, + } + # ─── Statistics ────────────────────────────────────────────── def get_stats(self) -> Dict[str, Any]: diff --git a/scripts/vulnscan_engine.py b/scripts/vulnscan_engine.py index 3402b4a..31aa10b 100755 --- a/scripts/vulnscan_engine.py +++ b/scripts/vulnscan_engine.py @@ -507,6 +507,8 @@ def scan_vulnerabilities( config: Optional[Dict] = None, offline: bool = False, osv_ttl: int = 86400, + refresh: bool = False, + max_age: Optional[int] = None, ) -> Dict[str, Any]: """ Scan dependency files for known vulnerabilities. @@ -522,9 +524,18 @@ def scan_vulnerabilities( "vulnscan.skip_audit_tools" options) offline: If True, skip OSV API queries (use cache only) osv_ttl: Cache TTL for OSV results in seconds (default 86400 = 24h) + refresh: If True, bypass the OSV cache and force fresh API calls + for every package (issue #30 --refresh flag). Ignored when + ``offline`` is True. + max_age: Optional per-run TTL override in seconds. When set, cached + OSV entries older than ``max_age`` are treated as stale and + re-fetched from the API for this run only (issue #30 --max-age + flag). The stored TTL is unchanged. Returns: - Dict with findings, stats, risk level, audit availability, and recommendations + Dict with findings, stats, risk level, audit availability, + recommendations, and a ``cache_info`` block (issue #30) describing + OSV cache freshness. """ workspace = os.path.abspath(workspace) @@ -548,6 +559,7 @@ def scan_vulnerabilities( ignore_packages: Set[str] = set() skip_audit: bool = False osv_stats: Optional[Dict[str, Any]] = None + cache_info: Optional[Dict[str, Any]] = None # Parse config if config: @@ -568,7 +580,11 @@ def scan_vulnerabilities( osv_packages = OSVQueryBuilder.build_from_workspace(workspace) if osv_packages: - osv_vulns = osv_client.query_packages(osv_packages) + osv_vulns = osv_client.query_packages( + osv_packages, + force_refresh=refresh, + max_age=max_age, + ) osv_findings = [v.to_finding() for v in osv_vulns] # Tag OSV findings so we can prioritize them @@ -584,13 +600,35 @@ def scan_vulnerabilities( } logger.info("OSV.dev: queried %d packages, found %d vulnerabilities", len(osv_packages), len(osv_findings)) + + # Issue #30: cache freshness info (computed AFTER the query + # so it reflects the post-query state — any package just + # fetched or refreshed is now fresh). + cache_info = osv_client.get_cache_info(osv_packages) else: osv_stats = {"packages_queried": 0, "vulnerabilities_found": 0} logger.debug("OSV.dev: no packages to query") + # No packages → no cache entries to inspect. Still surface + # the cache_info block so consumers can rely on the shape. + cache_info = { + "last_refresh": None, + "age_hours": None, + "ttl_hours": round(osv_ttl / 3600.0, 2), + "is_stale": False, + "stale_packages": [], + } except Exception as exc: logger.warning("OSV.dev integration failed, continuing with native audit: %s", exc) osv_stats = {"error": str(exc)} + cache_info = { + "last_refresh": None, + "age_hours": None, + "ttl_hours": round(osv_ttl / 3600.0, 2), + "is_stale": False, + "stale_packages": [], + "error": str(exc), + } else: logger.debug("OSV.dev client not available (osv_client.py not importable)") @@ -717,6 +755,7 @@ def scan_vulnerabilities( "findings": findings[:200], # Cap to avoid explosion "audit_available": any_audit_available, "osv_stats": osv_stats, + "cache_info": cache_info, "recommendations": recommendations, } diff --git a/tests/test_vuln_staleness.py b/tests/test_vuln_staleness.py new file mode 100644 index 0000000..967c463 --- /dev/null +++ b/tests/test_vuln_staleness.py @@ -0,0 +1,513 @@ +""" +Tests for OSV cache staleness flags (issue #30). + +Covers the three deliverables of issue #30: + +1. ``cache_info`` block in vuln-scan output (``last_refresh``, ``age_hours``, + ``ttl_hours``, ``is_stale``, ``stale_packages``). +2. ``--refresh`` flag — bypasses the OSV cache and forces fresh API calls. +3. ``--max-age Nh`` flag — treats cache entries older than N hours as stale + for the current run only (stored TTL unchanged). + +Network access is never required: API calls are mocked via +``unittest.mock.patch.object(OSVClient, "_batch_query_api", ...)``. +""" + +import os +import shutil +import sqlite3 +import sys +import tempfile +import time +from argparse import Namespace +from unittest.mock import patch + +import pytest + +SCRIPT_DIR = os.path.join( + os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "scripts" +) +sys.path.insert(0, SCRIPT_DIR) + +from osv_client import DEFAULT_TTL, OSVCache, OSVClient, OSVPackage # noqa: E402 +from commands.vuln_scan import _parse_max_age # noqa: E402 +from commands import vuln_scan as vuln_scan_cmd # noqa: E402 +from vulnscan_engine import scan_vulnerabilities # noqa: E402 + +FIXTURES_DIR = os.path.join( + os.path.dirname(os.path.dirname(os.path.abspath(__file__))), + "benchmarks", + "fixtures", +) + + +# ─── Fixtures & helpers ──────────────────────────────────────── + + +@pytest.fixture +def tmp_workspace(): + """Provide a temp workspace dir, cleaned up after the test.""" + ws = tempfile.mkdtemp(prefix="codelens_vuln_test_") + yield ws + shutil.rmtree(ws, ignore_errors=True) + + +@pytest.fixture +def fresh_client(tmp_workspace): + """OSVClient in offline mode against an empty cache.""" + return OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=True) + + +def _make_pkg(name="lodash", version="4.17.15", ecosystem="npm"): + """Build an OSVPackage with a supported ecosystem by default.""" + return OSVPackage(name=name, version=version, ecosystem=ecosystem) + + +def _fake_vuln(name="lodash", ecosystem="npm"): + """Minimal OSV vuln dict that ``_parse_single_vuln`` can handle.""" + return { + "id": "GHSA-test-test-test", + "summary": f"Test vulnerability for {name}", + "severity": [{"type": "CVSS_V3", "score": "7.5"}], + "affected": [ + { + "package": {"name": name, "ecosystem": ecosystem}, + "ranges": [ + { + "type": "SEMVER", + "events": [ + {"introduced": "0"}, + {"fixed": "4.17.21"}, + ], + } + ], + } + ], + "references": [], + } + + +def _set_cache_timestamp(cache, key, age_seconds): + """Rewrite a cache entry's timestamp to make it artificially old. + + Used to simulate stale cache entries without waiting for real time + to pass. + """ + conn = sqlite3.connect(cache.db_path) + try: + conn.execute( + "UPDATE cache SET timestamp = ? " + "WHERE package_ecosystem_version = ?", + (time.time() - age_seconds, key), + ) + conn.commit() + finally: + conn.close() + + +# ─── _parse_max_age ──────────────────────────────────────────── + + +class TestParseMaxAge: + """``--max-age`` duration string parsing.""" + + @pytest.mark.parametrize( + "raw, expected", + [ + ("6h", 21600), + ("30m", 1800), + ("2d", 172800), + ("90s", 90), + ("48", 172800), # bare integer → hours (matches --osv-ttl semantics) + ("1.5h", 5400), + ("1H", 3600), # case-insensitive unit + (" 12h ", 43200), # whitespace tolerated + ], + ) + def test_valid_forms(self, raw, expected): + assert _parse_max_age(raw) == expected + + def test_none_returns_none(self): + assert _parse_max_age(None) is None + + @pytest.mark.parametrize("raw", ["abc", "-5h", "", "h", "5x", "5hrs"]) + def test_invalid_raises_value_error(self, raw): + with pytest.raises(ValueError): + _parse_max_age(raw) + + +# ─── OSVCache.peek ───────────────────────────────────────────── + + +class TestOSVCachePeek: + """``OSVCache.peek()`` returns entries without TTL check or deletion.""" + + def test_missing_key_returns_none(self, tmp_workspace): + cache = OSVCache(tmp_workspace) + assert cache.peek("nonexistent|key|1.0.0") is None + + def test_returns_entry_tuple(self, tmp_workspace): + cache = OSVCache(tmp_workspace) + cache.set("npm|lodash|4.17.15", [{"id": "VULN-1"}], ttl=3600) + entry = cache.peek("npm|lodash|4.17.15") + assert entry is not None + response, timestamp, ttl = entry + assert response == [{"id": "VULN-1"}] + assert isinstance(timestamp, float) + assert ttl == 3600 + + def test_does_not_apply_ttl(self, tmp_workspace): + """An entry past its TTL should still be returned by ``peek()``. + + This is what distinguishes ``peek`` from ``get`` and is what + ``--max-age`` relies on to apply its own per-run threshold. + """ + cache = OSVCache(tmp_workspace, ttl=1) + cache.set("npm|lodash|4.17.15", [{"id": "VULN-1"}], ttl=1) + _set_cache_timestamp(cache, "npm|lodash|4.17.15", age_seconds=3600) + + # peek ignores TTL — entry is still returned + assert cache.peek("npm|lodash|4.17.15") is not None + # get applies TTL — same entry is treated as expired and deleted + assert cache.get("npm|lodash|4.17.15") is None + + def test_corrupt_json_returns_none_and_deletes(self, tmp_workspace): + cache = OSVCache(tmp_workspace) + # Insert a corrupt-JSON entry directly into the DB. + conn = sqlite3.connect(cache.db_path) + try: + conn.execute( + "INSERT OR REPLACE INTO cache " + "(package_ecosystem_version, response_json, timestamp, ttl) " + "VALUES (?, ?, ?, ?)", + ("npm|corrupt|1.0.0", "{not valid json", time.time(), 86400), + ) + conn.commit() + finally: + conn.close() + + assert cache.peek("npm|corrupt|1.0.0") is None + + # Corrupt entry should have been deleted. + conn = sqlite3.connect(cache.db_path) + try: + row = conn.execute( + "SELECT COUNT(*) FROM cache " + "WHERE package_ecosystem_version = ?", + ("npm|corrupt|1.0.0",), + ).fetchone() + finally: + conn.close() + assert row[0] == 0 + + +# ─── OSVClient.get_cache_info ────────────────────────────────── + + +class TestGetCacheInfo: + """``OSVClient.get_cache_info()`` — the ``cache_info`` block (issue #30).""" + + def test_empty_packages(self, fresh_client): + info = fresh_client.get_cache_info([]) + assert info == { + "last_refresh": None, + "age_hours": None, + "ttl_hours": 24.0, + "is_stale": False, + "stale_packages": [], + } + + def test_no_cache_entries_all_stale(self, fresh_client): + """Packages with no cache entries are reported as stale.""" + pkgs = [ + _make_pkg("lodash", "4.17.15"), + _make_pkg("express", "4.17.0"), + ] + info = fresh_client.get_cache_info(pkgs) + assert info["is_stale"] is True + assert sorted(info["stale_packages"]) == ["express@4.17.0", "lodash@4.17.15"] + assert info["last_refresh"] is None + assert info["age_hours"] is None + assert info["ttl_hours"] == 24.0 + + def test_all_fresh_entries(self, fresh_client): + pkg = _make_pkg() + fresh_client.cache.set(pkg.cache_key(), [_fake_vuln()]) + info = fresh_client.get_cache_info([pkg]) + + assert info["is_stale"] is False + assert info["stale_packages"] == [] + assert info["last_refresh"] is not None + # ISO 8601 UTC with a trailing Z + assert info["last_refresh"].endswith("Z") + assert "T" in info["last_refresh"] + # age_hours should be small (entry was just written) + assert info["age_hours"] is not None + assert info["age_hours"] < 1.0 + assert info["ttl_hours"] == 24.0 + + def test_one_stale_entry(self, fresh_client): + fresh = _make_pkg("fresh", "1.0.0") + stale = _make_pkg("stale", "2.0.0") + fresh_client.cache.set(fresh.cache_key(), []) + fresh_client.cache.set(stale.cache_key(), []) + # Make 'stale' artificially old (48h, past the 24h TTL). + _set_cache_timestamp(fresh_client.cache, stale.cache_key(), age_seconds=48 * 3600) + + info = fresh_client.get_cache_info([fresh, stale]) + assert info["is_stale"] is True + assert info["stale_packages"] == ["stale@2.0.0"] + # last_refresh & age_hours reflect the FRESH (most recent) entry + assert info["last_refresh"] is not None + assert info["age_hours"] < 1.0 + + def test_stale_packages_sorted(self, fresh_client): + pkgs = [ + _make_pkg("zebra", "1.0.0"), + _make_pkg("alpha", "1.0.0"), + _make_pkg("mid", "1.0.0"), + ] + info = fresh_client.get_cache_info(pkgs) + assert info["stale_packages"] == [ + "alpha@1.0.0", + "mid@1.0.0", + "zebra@1.0.0", + ] + + def test_ttl_hours_reflects_cache_ttl(self, tmp_workspace): + client = OSVClient(workspace=tmp_workspace, ttl=3600, offline=True) + info = client.get_cache_info([]) + assert info["ttl_hours"] == 1.0 + + +# ─── OSVClient.query_packages: force_refresh ─────────────────── + + +class TestForceRefresh: + """``--refresh`` flag bypasses the OSV cache.""" + + def test_force_refresh_bypasses_cache(self, tmp_workspace): + """With ``force_refresh=True``, cached entries are ignored and the API is hit.""" + # Online client so force_refresh actually takes effect. + client = OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=False) + pkg = _make_pkg() + # Pre-populate cache with a vuln that would be returned if the + # cache was consulted. + client.cache.set(pkg.cache_key(), [_fake_vuln()]) + + with patch.object(client, "_batch_query_api", return_value=[]) as mock_api: + result = client.query_packages([pkg], force_refresh=True) + # Cache was bypassed → only the mocked API (empty) contributed. + assert result == [] + # API was called for the force-refreshed package. + mock_api.assert_called_once() + + def test_no_force_refresh_uses_cache(self, tmp_workspace): + """Without ``force_refresh``, cached entries are used and the API is NOT hit.""" + client = OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=False) + pkg = _make_pkg() + client.cache.set(pkg.cache_key(), [_fake_vuln()]) + + with patch.object(client, "_batch_query_api", return_value=[]) as mock_api: + result = client.query_packages([pkg], force_refresh=False) + # Cache was used → vuln returned. + assert len(result) == 1 + assert result[0].id == "GHSA-test-test-test" + # API was NOT called. + mock_api.assert_not_called() + + def test_force_refresh_ignored_in_offline(self, tmp_workspace): + """In offline mode, ``--refresh`` is silently ignored (no network).""" + client = OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=True) + pkg = _make_pkg() + client.cache.set(pkg.cache_key(), [_fake_vuln()]) + + with patch.object(client, "_batch_query_api", return_value=[]) as mock_api: + result = client.query_packages([pkg], force_refresh=True) + # Cache was used (force_refresh ignored in offline mode). + assert len(result) == 1 + mock_api.assert_not_called() + + +# ─── OSVClient.query_packages: max_age ───────────────────────── + + +class TestMaxAge: + """``--max-age`` flag overrides TTL for the current run only.""" + + def test_max_age_marks_old_entry_stale(self, tmp_workspace): + """An entry older than ``max_age`` is re-fetched from the API.""" + client = OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=False) + pkg = _make_pkg() + client.cache.set(pkg.cache_key(), [_fake_vuln()]) + # Make entry 10h old. + _set_cache_timestamp(client.cache, pkg.cache_key(), age_seconds=10 * 3600) + + with patch.object(client, "_batch_query_api", return_value=[]) as mock_api: + # max_age=6h → entry is stale (10h > 6h) → API hit + result = client.query_packages([pkg], max_age=6 * 3600) + assert result == [] # mocked API returned nothing + mock_api.assert_called_once() + + def test_max_age_keeps_fresh_entry(self, tmp_workspace): + """An entry younger than ``max_age`` is served from the cache.""" + client = OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=False) + pkg = _make_pkg() + client.cache.set(pkg.cache_key(), [_fake_vuln()]) + # Make entry 10h old. + _set_cache_timestamp(client.cache, pkg.cache_key(), age_seconds=10 * 3600) + + with patch.object(client, "_batch_query_api", return_value=[]) as mock_api: + # max_age=24h → entry is fresh (10h < 24h) → cache used + result = client.query_packages([pkg], max_age=24 * 3600) + assert len(result) == 1 + assert result[0].id == "GHSA-test-test-test" + mock_api.assert_not_called() + + def test_max_age_does_not_change_stored_ttl(self, tmp_workspace): + """``--max-age`` must NOT modify the stored TTL (per-run override only).""" + client = OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=False) + pkg = _make_pkg() + client.cache.set(pkg.cache_key(), [_fake_vuln()], ttl=86400) + _set_cache_timestamp(client.cache, pkg.cache_key(), age_seconds=10 * 3600) + + with patch.object(client, "_batch_query_api", return_value=[]): + client.query_packages([pkg], max_age=6 * 3600) + + entry = client.cache.peek(pkg.cache_key()) + assert entry is not None + _, _, stored_ttl = entry + assert stored_ttl == 86400 + + def test_max_age_zero_acts_like_refresh(self, tmp_workspace): + """``max_age=0`` treats every entry as stale (force-refresh equivalent).""" + client = OSVClient(workspace=tmp_workspace, ttl=DEFAULT_TTL, offline=False) + pkg = _make_pkg() + client.cache.set(pkg.cache_key(), [_fake_vuln()]) + # Even a freshly-written entry (age ~0s) is stale per max_age=0. + + with patch.object(client, "_batch_query_api", return_value=[]) as mock_api: + result = client.query_packages([pkg], max_age=0) + assert result == [] + mock_api.assert_called_once() + + +# ─── scan_vulnerabilities: cache_info in output ──────────────── + + +class TestScanVulnerabilitiesCacheInfo: + """End-to-end: ``scan_vulnerabilities()`` output includes ``cache_info``.""" + + def test_clean_app_no_deps_has_cache_info(self): + """``clean_app`` has no dependency files → empty ``cache_info``.""" + fixture = os.path.join(FIXTURES_DIR, "clean_app") + result = scan_vulnerabilities(fixture, offline=True) + assert result["status"] == "ok" + assert "cache_info" in result + info = result["cache_info"] + # No packages queried → cache_info takes the empty shape. + assert info["is_stale"] is False + assert info["stale_packages"] == [] + assert info["ttl_hours"] == 24.0 + + def test_vulnerable_app_has_stale_cache_info(self): + """``vulnerable_app`` has npm deps; offline mode → all stale.""" + fixture = os.path.join(FIXTURES_DIR, "vulnerable_app") + result = scan_vulnerabilities(fixture, offline=True) + assert result["status"] == "ok" + assert "cache_info" in result + info = result["cache_info"] + # Offline mode → no cache entries written → all queried packages stale. + assert info["is_stale"] is True + assert len(info["stale_packages"]) > 0 + # npm packages from vulnerable_app's package.json + assert "lodash@4.17.15" in info["stale_packages"] + + def test_cache_info_shape(self): + """``cache_info`` dict has exactly the keys specified in issue #30.""" + fixture = os.path.join(FIXTURES_DIR, "clean_app") + result = scan_vulnerabilities(fixture, offline=True) + info = result["cache_info"] + expected_keys = { + "last_refresh", + "age_hours", + "ttl_hours", + "is_stale", + "stale_packages", + } + assert set(info.keys()) == expected_keys + + def test_cache_info_is_additive(self): + """Adding cache_info must not remove or rename existing output keys.""" + fixture = os.path.join(FIXTURES_DIR, "clean_app") + result = scan_vulnerabilities(fixture, offline=True) + # Pre-issue-#30 output keys must still be present. + for key in ( + "status", + "workspace", + "stats", + "risk", + "findings", + "audit_available", + "osv_stats", + "recommendations", + ): + assert key in result, f"missing pre-existing key: {key}" + + +# ─── vuln-scan CLI: arg parsing & wiring ─────────────────────── + + +class TestVulnScanCLI: + """``--refresh`` and ``--max-age`` flags are parsed and wired through.""" + + def test_execute_passes_refresh_and_max_age(self, tmp_workspace): + """``execute()`` forwards ``--refresh`` and ``--max-age`` to the engine.""" + args = Namespace( + workspace=None, + severity=None, + offline=True, + osv_ttl=86400, + refresh=True, + max_age="6h", + ) + with patch.object( + vuln_scan_cmd, "scan_vulnerabilities", return_value={"status": "ok"} + ) as mock_scan: + result = vuln_scan_cmd.execute(args, tmp_workspace) + assert result == {"status": "ok"} + mock_scan.assert_called_once() + _, kwargs = mock_scan.call_args + assert kwargs.get("refresh") is True + assert kwargs.get("max_age") == 21600 # 6h in seconds + + def test_execute_invalid_max_age_returns_error(self, tmp_workspace): + args = Namespace( + workspace=None, + severity=None, + offline=False, + osv_ttl=86400, + refresh=False, + max_age="not-a-duration", + ) + result = vuln_scan_cmd.execute(args, tmp_workspace) + assert result["status"] == "error" + assert result["error"] == "invalid_argument" + assert "--max-age" in result["message"] + + def test_execute_no_flags_passes_defaults(self, tmp_workspace): + args = Namespace( + workspace=None, + severity=None, + offline=False, + osv_ttl=86400, + refresh=False, + max_age=None, + ) + with patch.object( + vuln_scan_cmd, "scan_vulnerabilities", return_value={"status": "ok"} + ) as mock_scan: + vuln_scan_cmd.execute(args, tmp_workspace) + _, kwargs = mock_scan.call_args + assert kwargs.get("refresh") is False + assert kwargs.get("max_age") is None