Skip to content

repomap: boost co-changed files in personalization via git history#5207

Open
De-Cri wants to merge 1 commit into
Aider-AI:mainfrom
De-Cri:cochange-repomap-personalization
Open

repomap: boost co-changed files in personalization via git history#5207
De-Cri wants to merge 1 commit into
Aider-AI:mainfrom
De-Cri:cochange-repomap-personalization

Conversation

@De-Cri
Copy link
Copy Markdown

@De-Cri De-Cri commented Jun 1, 2026

Closes #5203.

When editing group.py in brian2, the repomap correctly surfaces its dependencies (codeobject.py, variables.py, base.py). It does not surface neurongroup.py or synapses.py, even though both import from group.py and are almost certainly affected by any change to it. The reference graph edges go neurongroup → group and synapses → group. PageRank distributes weight along group.py's outgoing edges, so the subclasses — which point toward it — never surface.

In 5 years of brian2 history, group.py has been committed alongside synapses.py 111 times and neurongroup.py 76 times. Git history sees the coupling that the reference graph misses.

Implementation

New private method _git_cochange_partners(abs_fname):

  • Runs git log --no-merges -n500 --name-only --format=COMMIT -- <file>
  • Counts file co-occurrences per commit
  • Returns {rel_fname: rate * log(1 + count)} for pairs above min_count=2
  • Returns {} gracefully if git is unavailable, repo has no history, or command times out (5s)

In get_ranked_tags(), after the existing personalization loop:

all_rel_fnames = {self.get_rel_fname(f) for f in fnames}
for fname in chat_fnames:
    for partner_rel, score in self._git_cochange_partners(fname).items():
        if partner_rel in all_rel_fnames:
            personalization[partner_rel] = (
                personalization.get(partner_rel, 0) + score * personalize
            )

No new dependencies. No new flags. No API change. Purely additive to the existing ranking.

Benchmark

Ground truth: held-out recent commits. For each commit that modified 2+ source files, given one file — does the tool surface the others?

Recall@5
Without this change 24%
With this change 88%

(brian2tools, 25 held-out commits, 233 training commits.)

The signal degrades gracefully on repos with little history — below ~50 commits there are few reliable pairs and the boost is near zero.

Files that are committed together often share implicit coupling that the
reference graph can't see: a base class and its subclasses, parallel
implementations, config paired with logic. The reference graph edge goes
from subclass -> base, so when editing the base, PageRank distributes
weight along its outgoing edges (its own dependencies) and the subclasses
never surface.

_git_cochange_partners() runs git log --name-only -n500 for each active
file, counts file co-occurrences per commit, and returns a coupling score
(Jaccard rate * log frequency). The score is added to the personalization
dict before PageRank, using the same personalize multiplier as existing
signals. Files not already in the repo scan are ignored.

Gracefully returns {} if git is unavailable, the repo has no history,
or the command times out. No new dependencies, no new flags, no API change.

Resolves: Aider-AI#5203
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jun 1, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add git co-change signal to repomap personalization

2 participants