Skip to content

fix: consolidate _default_db_path to utils.py — closes #40#71

Open
Wolfvin wants to merge 1 commit into
mainfrom
fix/40-default-db-path-dedup
Open

fix: consolidate _default_db_path to utils.py — closes #40#71
Wolfvin wants to merge 1 commit into
mainfrom
fix/40-default-db-path-dedup

Conversation

@Wolfvin

@Wolfvin Wolfvin commented Jun 28, 2026

Copy link
Copy Markdown
Owner

Summary

Consolidates the default-db-path logic from 4 duplicated sites into a single helper in scripts/utils.py. Pure refactor — no behavior change. The default path is still <workspace>/.codelens/codelens.db.

Before this PR, the same one-liner os.path.join(workspace, ".codelens", "codelens.db") lived in four places:

File Location Form
scripts/commands/graph_schema.py L29-31 _default_db_path() function
scripts/graph_model.py L161-163 _default_db_path() function (identical copy)
scripts/persistent_registry.py L132-133 inline os.path.join in PersistentRegistry.__init__
scripts/persistent_registry.py L858 inline os.path.join in db_exists()

Any future change to the default path — e.g., honoring a CODELENS_DB_PATH env var, or moving the db out of .codelens/ — had to be made in four places. Forgetting one would silently create a bug where graph-schema and graph_model queries hit different databases.

Why

Closes #40.

Single source of truth. The issue itself flags the drift risk: "Any future change to the default path ... must be made in both places. Forgetting one creates a silent bug where graph-schema and graph_model queries hit different databases."

This PR extracts the helper to utils.default_db_path(workspace: str) -> str and replaces all four call sites with imports / calls to it. Future changes to the path logic now happen in exactly one place.

Files changed

File Change
scripts/utils.py Added default_db_path(workspace: str) -> str helper with full docstring (single source of truth).
scripts/commands/graph_schema.py Deleted local _default_db_path def; added from utils import default_db_path; updated the one call site in get_graph_schema.
scripts/graph_model.py Replaced local _default_db_path def with from utils import default_db_path + a backward-compat alias _default_db_path = default_db_path (preserves the underscore-private symbol imported by tests/test_graph_model.py and tests/test_hybrid_type_resolver.py — both outside the file allowlist for this PR, so removing the symbol would have broken them). Internal call sites now use the public name.
scripts/persistent_registry.py __init__ and db_exists() now call default_db_path(workspace). DB_FILENAME = "codelens.db" constant preserved as a module-level export with a comment noting it's kept for backwards-compat with external importers (no longer used internally).
tests/test_graph_model.py Added TestDefaultDbPathConsistency.test_default_db_path_consistency — asserts all three call sites (utils, graph_model alias, graph_schema import) resolve to the same path for the same workspace, and that graph_schema.default_db_path is utils.default_db_path (same callable, not a re-definition). Also covers PersistentRegistry default and db_exists returning False for a non-existent workspace.

No other stacks or files touched. Total: 5 files, +115 / −17.

Architectural decisions

  1. Why keep graph_model._default_db_path as an alias instead of deleting it. The constraint "Hanya touch file: ... tests/test_graph_model.py" plus scope-discipline forbids modifying tests/test_hybrid_type_resolver.py, which imports _default_db_path from graph_model in 9 places. Deleting the symbol would break those imports and force touching a file outside the allow-list. The alias _default_db_path = default_db_path preserves the convention-based API while making utils.default_db_path the single source of truth for the path logic. The new test asserts the alias is the same callable, so any future drift (e.g. someone re-defining _default_db_path locally) will be caught.

  2. Why also update db_exists() in persistent_registry.py (not just __init__). The DoD explicitly calls out __init__, but db_exists() at L858 is a second path-construction site in the same file using the same os.path.join(workspace, ".codelens", DB_FILENAME) pattern. Leaving it untouched would defeat the single-source-of-truth goal (the issue text says "three sources of truth actually"). It's the same file already in scope, so updating it keeps the refactor consistent. Noted explicitly here for reviewer awareness.

  3. Why preserve DB_FILENAME. The constraint says "Preserve DB_FILENAME = "codelens.db" constant in persistent_registry.py if ada code lain yang pakai". A repo-wide grep shows it's no longer referenced anywhere internally after this refactor, but it's a module-level public symbol that external plugins/importers could depend on. Keeping it (with a comment) is the conservative choice; deleting it would be a separate breaking-API decision for the BOS to make.

Verification

1. Functional test — new TestDefaultDbPathConsistency proves the consolidation works

$ python3 -m pytest tests/test_graph_model.py::TestDefaultDbPathConsistency -v
============================== test session starts ==============================
platform linux -- Python 3.12.13, pytest-9.1.2, ...
collected 1 item

tests/test_graph_model.py::TestDefaultDbPathConsistency::test_default_db_path_consistency PASSED

The test asserts:

  • graph_model._default_db_path(ws) == utils.default_db_path(ws) (alias works)
  • graph_schema.default_db_path is utils.default_db_path (same callable, not a re-definition)
  • graph_schema.default_db_path(ws) == utils.default_db_path(ws) (resolves to same path)
  • PersistentRegistry(ws).db_path == utils.default_db_path(ws) (registry uses the helper)
  • db_exists(ws) is False for a non-existent workspace (helper doesn't crash)
  • Canonical path ends with .codelens/codelens.db (sanity)

2. All existing tests still pass — no regressions in any storage-related test

$ python3 -m pytest tests/test_graph_model.py tests/test_graph_incremental.py \
    tests/test_hybrid_type_resolver.py tests/test_persistent_registry.py \
    tests/test_persistent_registry_extra.py tests/test_registry.py
..................................................... [ 24%]
..................................................... [ 48%]
..................................................... [ 72%]
..................................................... [ 96%]
...........                                          [100%]
299 passed in 4.64s
$ python3 -m pytest tests/test_graph_model.py -v
============================== 21 passed in 1.27s ==============================

(20 existing + 1 new = 21.)

$ python3 -m pytest tests/test_cli.py
15 passed in 1.15s

Also ran broader engine + parser suites (architecture, circular, complexity, dataflow, deadcode, history, smell, search, secrets, semantic, perfhint, codelens, css/html/js/rust parsers, hybrid engine, vuln_staleness): 393 passed, 1 pre-existing failure (see "Out of scope" below), 12 skipped.

3. DoD #8python3 -c "from utils import default_db_path; print(default_db_path('/tmp/test'))"

$ cd scripts && python3 -c "from utils import default_db_path; print(default_db_path('/tmp/test'))"
/tmp/test/.codelens/codelens.db

4. DoD #7python3 scripts/codelens.py graph-schema works on test fixtures

$ python3 scripts/codelens.py graph-schema benchmarks/fixtures/clean_app
[CodeLens] Auto-detected workspace: .../clean_app
{
  "status": "ok",
  "workspace": ".../clean_app",
  "nodes": 0,
  "edges": 0,
  "node_types": {},
  "edge_types": {},
  "indexes": 0,
  "note": "database does not exist; run 'scan' first"
}

$ python3 scripts/codelens.py scan benchmarks/fixtures/clean_app
{..."outline_generated": true, "history_snapshot_saved": true...}

$ python3 scripts/codelens.py graph-schema benchmarks/fixtures/clean_app
{
  "status": "ok",
  "workspace": ".../clean_app",
  "nodes": 31,
  "edges": 134,
  "node_types": {"function": 30, "class": 1},
  "edge_types": {"CALLS": 97, "IMPORTS": 37},
  "indexes": 6
}

graph-schema works both pre-scan (zero-shaped) and post-scan (31 nodes / 134 edges / 6 indexes — same as before refactor). The .codelens/ artifacts were cleaned up after the test so they don't pollute the PR.

5. All modules import cleanly + alias is the same callable

$ python3 -c "
import sys; sys.path.insert(0, 'scripts')
import utils, graph_model, persistent_registry
from commands import graph_schema
print('graph_model._default_db_path is utils.default_db_path:',
       graph_model._default_db_path is utils.default_db_path)
print('graph_schema.default_db_path is utils.default_db_path:',
       graph_schema.default_db_path is utils.default_db_path)
"
graph_model._default_db_path is utils.default_db_path: True
graph_schema.default_db_path is utils.default_db_path: True

6. PEP 8 / line length

All lines I added respect the 120-character limit (verified by a per-line check on the diff). The 5 pre-existing 120+ char lines in scripts/utils.py (L354-362, in check_gitignore_health recommendations) are untouched by this PR — out of scope per scope-discipline.

7. Byte-compile check

$ python3 -m py_compile scripts/utils.py scripts/commands/graph_schema.py \
    scripts/graph_model.py scripts/persistent_registry.py tests/test_graph_model.py
$ echo $?
0

Out of scope (not fixed)

  • tests/test_vuln_staleness.py::test_vulnerable_app_has_stale_cache_info fails on both main and this branch — pre-existing date-based flake (WARNING: Built-in VULN_DB is 485 days old (last updated 2025-02-28)). Verified by checking out main and re-running the same test: same failure, same warning. Unrelated to this refactor.

Issue link

Closes #40: #40

The default-db-path logic was triplicated across:
  - scripts/commands/graph_schema.py:29-31  (_default_db_path)
  - scripts/graph_model.py:161-163          (_default_db_path)
  - scripts/persistent_registry.py:132-133  (__init__ os.path.join)
  - scripts/persistent_registry.py:858      (db_exists os.path.join)

All four sites returned os.path.join(workspace, '.codelens', 'codelens.db')
verbatim. Any future change (e.g. honoring CODELENS_DB_PATH env var, moving
the db out of .codelens/) had to be made in four places — easy to miss one
and create a silent bug where graph-schema and graph_model queries hit
different databases.

This refactor consolidates to a single source of truth:
utils.default_db_path(workspace: str) -> str, with a docstring.

Changes per file:
- scripts/utils.py: add default_db_path helper (single source of truth).
- scripts/commands/graph_schema.py: delete local _default_db_path; import
  default_db_path from utils; update call site.
- scripts/graph_model.py: replace local _default_db_path definition with
  import + backward-compat alias
  (preserves the underscore-private symbol used by existing tests and
  test_hybrid_type_resolver.py); internal call sites now use the public
  name.
- scripts/persistent_registry.py: __init__ and db_exists() now call
  default_db_path(workspace). DB_FILENAME constant preserved as a
  module-level export (no longer used internally; comment notes it's
  kept for backwards-compat with external importers).
- tests/test_graph_model.py: add TestDefaultDbPathConsistency test that
  asserts all three call sites resolve to the same path for the same
  workspace (and graph_schema imports the same callable, not a
  re-definition).

Pure refactor — no behavior change. The default path is still
<workspace>/.codelens/codelens.db.

Verification:
- python3 -m pytest tests/test_graph_model.py -v → 21/21 pass
  (20 existing + 1 new TestDefaultDbPathConsistency)
- python3 -m pytest tests/test_graph_model.py tests/test_graph_incremental.py
  tests/test_hybrid_type_resolver.py tests/test_persistent_registry.py
  tests/test_persistent_registry_extra.py tests/test_registry.py
  → 129/129 pass (no regressions in any storage-related test)
- python3 -m pytest tests/test_cli.py → 15/15 pass
- python3 -c "from utils import default_db_path; print(default_db_path('/tmp/test'))"
  → /tmp/test/.codelens/codelens.db
- python3 scripts/codelens.py graph-schema benchmarks/fixtures/clean_app
  → runs clean (zero-shaped before scan; 31 nodes/134 edges/6 indexes
  after scan — same as before refactor)
- All modified files byte-compile with py_compile (no syntax errors)
- All lines I added respect the 120-char limit (PEP 8 + repo convention)

Out of scope (not fixed):
- tests/test_vuln_staleness.py::test_vulnerable_app_has_stale_cache_info
  fails on both main and this branch — pre-existing date-based flake
  ('VULN_DB is 485 days old'), unrelated to this refactor.
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG-10] _default_db_path() duplicated in graph_schema.py and graph_model.py — divergence risk

1 participant