repomap: boost co-changed files in personalization via git history#5207
Open
De-Cri wants to merge 1 commit into
Open
repomap: boost co-changed files in personalization via git history#5207De-Cri wants to merge 1 commit into
De-Cri wants to merge 1 commit into
Conversation
Files that are committed together often share implicit coupling that the
reference graph can't see: a base class and its subclasses, parallel
implementations, config paired with logic. The reference graph edge goes
from subclass -> base, so when editing the base, PageRank distributes
weight along its outgoing edges (its own dependencies) and the subclasses
never surface.
_git_cochange_partners() runs git log --name-only -n500 for each active
file, counts file co-occurrences per commit, and returns a coupling score
(Jaccard rate * log frequency). The score is added to the personalization
dict before PageRank, using the same personalize multiplier as existing
signals. Files not already in the repo scan are ignored.
Gracefully returns {} if git is unavailable, the repo has no history,
or the command times out. No new dependencies, no new flags, no API change.
Resolves: Aider-AI#5203
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5203.
When editing
group.pyin brian2, the repomap correctly surfaces its dependencies (codeobject.py,variables.py,base.py). It does not surfaceneurongroup.pyorsynapses.py, even though both import fromgroup.pyand are almost certainly affected by any change to it. The reference graph edges goneurongroup → groupandsynapses → group. PageRank distributes weight alonggroup.py's outgoing edges, so the subclasses — which point toward it — never surface.In 5 years of brian2 history,
group.pyhas been committed alongsidesynapses.py111 times andneurongroup.py76 times. Git history sees the coupling that the reference graph misses.Implementation
New private method
_git_cochange_partners(abs_fname):git log --no-merges -n500 --name-only --format=COMMIT -- <file>{rel_fname: rate * log(1 + count)}for pairs abovemin_count=2{}gracefully if git is unavailable, repo has no history, or command times out (5s)In
get_ranked_tags(), after the existing personalization loop:No new dependencies. No new flags. No API change. Purely additive to the existing ranking.
Benchmark
Ground truth: held-out recent commits. For each commit that modified 2+ source files, given one file — does the tool surface the others?
(brian2tools, 25 held-out commits, 233 training commits.)
The signal degrades gracefully on repos with little history — below ~50 commits there are few reliable pairs and the boost is near zero.