Skip to content

fix(update): honor yaml source_root so update stops mass-deleting the index#320

Merged
HumanBean17 merged 1 commit into
masterfrom
fix/update-source-root-resolution
Jun 14, 2026
Merged

fix(update): honor yaml source_root so update stops mass-deleting the index#320
HumanBean17 merged 1 commit into
masterfrom
fix/update-source-root-resolution

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Summary

java-codebase-rag update resolved source_root differently from every other operator command and, for the documented nested-config layout, pointed cocoindex at the config subdir (no Java) while pointed at the real, fully-populated index — so cocoindex treated every indexed file as removed and mass-deleted them.

Root cause

run_update passed the discovered config dir as an explicit source_root to resolve_operator_config (installer.py):

cfg = resolve_operator_config(source_root=project_root, cli_index_dir=None)

A non-None source_root routes into the explicit-override branch that skips the YAML source_root field (config.py). With a config in my-project-context/ next to source_root: ../, update then resolved source_root to my-project-context/ (only the YAML) but index_dir to the real index one level up (../.java-codebase-rag). cocoindex, told the source had no Java but aimed at a full index, began deleting every vector.

This is the identical bug class #316 fixed for the MCP server — its own docstring warns that a non-None source_root skips the YAML field. run_update (added in #290, before #316 existed) was the last production caller still passing a discovered dir.

Symptoms this explains

  • increment = clean 6 s no-op, but update's identical index phase hung 5 min+ at "Updating index (Lance + graph)…".
  • Lance _deletions grew monotonically (the mass row-deletions).
  • After ctrl+C left cocoindex.db mid-reconcile, the next increment (which resolves the root correctly) re-embedded nearly everything → 1000 s+.

The fix

One line — pass source_root=None so the YAML source_root is honored exactly like increment/init/reprocess:

cfg = resolve_operator_config(source_root=None, cli_index_dir=None)

The discover_project_root(cwd) no-config guard above it is unchanged. run_install is unaffected (it passes the user-confirmed Java root). run_update is now the only production caller of resolve_operator_config besides the (already-fixed) MCP server and the CLI, and all honor the YAML field consistently.

Validation (TDD)

  • New regression test test_update_honors_yaml_source_root_for_nested_config_dir mirrors the reported layout, captures the env handed to cocoindex, asserts JAVA_CODEBASE_RAG_SOURCE_ROOT = the YAML root (not the config dir) and JAVA_CODEBASE_RAG_INDEX_DIR = the real index. Watched it fail RED (SOURCE_ROOT off by one level), then pass GREEN.
  • .venv/bin/ruff check . — clean.
  • .venv/bin/python -m pytest tests -q775 passed, 11 skipped (heavy-gated). No regressions.

User-visible behaviour changes

  • java-codebase-rag update now resolves source_root consistently with increment/init/reprocess/the MCP server. For the common case (config at the source root, or running from the source root) behaviour is unchanged. For a config living in a subdirectory of the Java tree (the my-context/ + source_root: ../ layout documented in fix(config): consistent index_dir/source_root resolution for CLI and MCP #316), update's index phase now operates on the correct source instead of mass-deleting the index.

  • No reindex required. No schema, ontology, embedding, or env-var change. Existing indexes remain valid.

Recovery for affected indexes

If an index was damaged by the bug (run update from a nested config dir before this fix), from the config dir:

pkill -f "cocoindex update"; pkill -f "build_ast_graph"   # stop any stuck run
java-codebase-rag erase --yes                              # wipe corrupted index (config .yml preserved)
java-codebase-rag init                                     # re-create from correct source_root

increment/init/reprocess resolve the root correctly, so they are safe to run even before applying this fix; only update's index phase was affected.

🤖 Generated with Claude Code

HumanBean17 added a commit that referenced this pull request Jun 14, 2026
A pyarrow/lance worker thread (loaded via lancedb in lifecycle commands) can
outlive CPython finalization in a one-shot CLI subprocess and trip
PyGILState_Release (SIGABRT, exit -6). It's a thread-timing race — flaky —
and it intermittently red-blocked unrelated PRs: it killed the erase step of
test_cli_lifecycle_round_trip_init_increment_meta_erase on PR #320 (which
touches only installer.py), while the same test passed on green master #319.

Route the installed `java-codebase-rag` entry through _console_script_main,
which flushes stdout/stderr and os._exit(rc) instead of returning into the
racy teardown. main() stays return-based so in-process test callers keep
working.

Co-authored-by: Claude <noreply@anthropic.com>
… index

run_update passed the discovered config dir as an explicit source_root to
resolve_operator_config, routing it into the branch that SKIPS the YAML
source_root field. With a config living in a subdir next to
`source_root: ../`, update then indexed that subdir (no Java) against the
real index one level up, so cocoindex treated every indexed file as removed
and deleted them — the "Updating index (Lance + graph)..." hang, and the
ever-growing Lance `_deletions` + 1000s+ increment after a ctrl+C left
cocoindex.db mid-reconcile.

This is the same bug class #316 fixed for the MCP server (its docstring
warns that a non-None source_root skips the YAML field); run_update was the
last production caller still passing a discovered dir. Pass source_root=None
so the YAML source_root is honored exactly like increment/init/reprocess.
run_install is unaffected (it passes the user-confirmed Java root).

Adds a regression test mirroring the reported layout (config in
my-project-context/, source_root: ../, real index one level up) that
captures the env handed to cocoindex and asserts SOURCE_ROOT resolves to
the YAML root, not the config dir.

No schema, ontology, embedding, or env-var change. Existing indexes remain
valid; no reindex required.

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 force-pushed the fix/update-source-root-resolution branch from 053de82 to a953461 Compare June 14, 2026 18:34
@HumanBean17 HumanBean17 merged commit 3c3343b into master Jun 14, 2026
1 check passed
HumanBean17 added a commit that referenced this pull request Jun 15, 2026
Catch-up: master advanced (#322 installer cross_service_resolution,
#323 config embedding.model resolution, #325 version 0.6.2, #326 PR-1
progress.py) while the index-output-rework stack was based on #320.
This merges those in so the catch-up PR (#330) carries only PR-2/3/4.

Conflicts resolved (both add/add, feature branch is the superset):
- java_codebase_rag/progress.py  (master had PR-1 state; branch has
  PR-1 + CallbackRenderer/make_relay/build_index_progress_context)
- tests/test_progress.py         (master had PR-1's 14 tests; branch
  adds PR-2/3/4 tests)

Auto-merged cleanly: installer.py (#322 + PR-4), pyproject.toml
(version 0.6.2 + rich>=14,<15), tests/test_installer.py.

Verified: ruff clean; full suite 833 passed, 13 skipped (heavy-gated).

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant