Skip to content

Turn git survey into a deprecated shim over git repo structure#6268

Draft
dscho wants to merge 8 commits into
mainfrom
survey-pivot-into-git-repo
Draft

Turn git survey into a deprecated shim over git repo structure#6268
dscho wants to merge 8 commits into
mainfrom
survey-pivot-into-git-repo

Conversation

@dscho
Copy link
Copy Markdown
Member

@dscho dscho commented Jun 6, 2026

git survey was always experimental, and git repo structure covers most of the same ground with a cleaner option surface and a stable output contract. This PR closes the remaining gap (annotated-tag breakdown, ref scoping, top-N paths by count/disk/inflated, and the corresponding configuration knob) and then turns git survey into a thin shim that warns about deprecation, translates its old command line into the equivalent git repo structure invocation, and re-execs the canonical command. Net result: one user-facing tool to maintain and to teach instead of two.

The intent is that scripts pinned to git survey keep working (a warning aside), and that operators have a single answer when they ask "how do I see what's making my repository large?". The survey.* configuration keys are intentionally dropped; the only one that mattered, survey.top, has a direct replacement in repo.structure.top.

Marking as a draft because the first chunk of commits (everything except the deprecation notice and the shim) makes sense to upstream on its own merit and may want to ship through gitgitgadget independently before this Git-for-Windows-only deprecation lands.

dscho added 8 commits June 5, 2026 23:33
Mirror what git survey already reports: lightweight tags
(pointing straight at a commit/tree/blob) and annotated tags
(pointing at an OBJ_TAG that is itself stored as a separate
object) are different things in many monorepo contexts, and one
of the differences git survey users routinely care about. Add
an annotated_tags counter to struct ref_stats, populate it in
count_references() by peeking at the ref OID's object type, and
expose it as a sub-row under Tags in the table output and as
references.tags.annotated.count in the machine-readable formats.

Step toward pivoting the standalone git survey command onto
git repo structure; this fills the first of the four feature
gaps documented in the assessment.

Tests in t1901 widened to assert the new row and key.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git repo structure` walks every reference enumerated by
`refs_for_each_ref()` and feeds each reference's tip into the path
walk that produces the object counts. There is no way to scope the
inquiry to a subset of refs, even though that is the most common
need when an operator is investigating what part of the history is
driving cost: only branches, only release tags, only one remote's
view, etc.

Add a single `--ref-filter=<pattern>` option that, when given,
restricts both the reference count and the object walk to refs whose
full name matches one of the patterns. The option is repeatable;
multiple patterns form a union, so `--ref-filter='refs/heads/*'
--ref-filter='refs/tags/v*'` includes local branches and tags whose
short name starts with `v`. Patterns use `wildmatch()` with
`WM_PATHNAME` semantics so a `*` does not cross `/`, matching the
convention used by `git for-each-ref` positional arguments.

Choosing a single flexible filter, rather than a proliferation of
per-kind flags like `--branches`, `--tags`, `--remotes`, keeps the
option surface small and lets the same mechanism express
narrow selections the per-kind flags could not, such as "only release
tags" (`'refs/tags/v*'`) or "only one remote's branches"
(`'refs/remotes/origin/*'`). Without `--ref-filter`, behaviour is
unchanged: every ref `refs_for_each_ref()` enumerates contributes.

Both the reference counter and the path-walk seeding (via
`add_pending_oid()`) sit on the same callback, so an early return
when no pattern matches naturally excludes a ref from both. No
separate object-walk machinery is needed.

Cover the two interesting code paths with tests in t1901: a single
filter narrowing to branches, and two filters unioning to include
both branches and tags.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` distinguishes itself from `git repo structure` largely by
its path-level reporting: in addition to whole-repo totals it lists the
paths whose object histories dominate the repository, ranked by raw
count, on-disk size, and inflated size, separately for trees and blobs.
That is often the most actionable output from `git survey`, since it
points an operator at the directories and files that should be reviewed
for cleanup, sparse-checkout exclusion, or rewriting.

`git repo structure` already drives the same path-walk traversal that
`git survey` uses to gather its per-path numbers; the callback simply
discards the path. Aggregate per-(path, type) summaries inside that
existing callback and add a bounded, descending-sorted "top-N" table
keyed by each of the three axes. Gate the feature behind a new
`--top=<n>` option, defaulting to 0, so unadorned invocations are
unaffected and pay no extra work for the top-N tracking.

Mirror the sort and eviction strategy from `builtin/survey.c`: keep an
array of at most N entries sorted from largest to smallest, walk it
from the bottom on each candidate, and shift entries down when a new
one belongs. Compared to `builtin/survey.c`, drop the void-pointer
indirection in the table data, type the comparator's arguments, and
fold the trivial comparators into the `(a > b) - (a < b)` idiom.

For the human-readable `table` output, extend the existing nested
bullet layout with two new top-level sections, `* Top trees` and
`* Top blobs`, each containing three sub-tables (`Top by count`,
`Top by disk size`, `Top by inflated size`). The path becomes the row
name and the relevant scalar becomes the value, reusing
`stats_table_count_addf` and `stats_table_size_addf` so units and
column alignment match the rest of the table.

For the `lines`/`nul` key-value formats, emit one
`objects.<type>.top.by_<axis>.<rank>.path=<path>` entry alongside an
`objects.<type>.top.by_<axis>.<rank>.<axis>=<value>` entry per ranked
path, so consumers can dispatch by axis without parsing the schema.
The root tree's path is the empty string as produced by the path-walk
machinery; preserve that as-is to stay faithful to the upstream
representation rather than fabricating a placeholder.

This is the first piece of folding `git survey`'s functionality into
`git repo structure`. Subsequent commits will add the corresponding
configuration knob and, eventually, turn `git survey` into a thin
deprecated shim over `git repo structure`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding commit added `--top=<n>` to `git repo structure`,
reporting the top-N paths per type ranked by count, on-disk size, and
inflated size. Cover the three behaviors that matter for that option:

  * Without `--top`, the key-value output emits no `top.*` keys, so
    existing parsers stay unaffected.

  * `--top=N` produces exactly N ranked entries on each of the six
    `objects.<type>.top.by_<axis>` axes (count/disk_size/inflated_size
    crossed with trees/blobs), and a constructed input where one blob
    is several orders of magnitude bigger than the other lets us
    assert the ordering on the disk-size and inflated-size axes.

  * A negative `--top` is rejected with a non-zero exit and a message
    naming the constraint, so a typo cannot silently degrade into the
    default zero.

Avoid grep patterns starting with `--`; grep would parse the leading
double dash as an option terminator.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` exposes its `--top` default via `survey.top` so that a
site or per-repository operator can switch the detail tables on once
and have every subsequent invocation include them. Mirror that
ergonomics for `git repo structure` so that, as `git survey`'s
functionality is folded into `git repo structure`, the configuration
side of the migration story stays equivalent.

Add a small `git_config_int` callback bound to `repo.structure.top`
and invoke it before `parse_options()`, so a `--top=<N>` on the
command line cleanly overrides the configured default (including
`--top=0` to opt out of the detail tables when configuration enables
them). Reject negative configured values with the same wording as the
command-line guard, since `git_config_int()` happily returns negative
integers.

Document the new variable in a fresh `Documentation/config/repo.adoc`
and wire it into the alphabetical includes in `Documentation/config.adoc`
between `repack.adoc` and `rerere.adoc`. Cover the precedence
behaviour with a t1901 test: a configured value enables the tables by
default, and a command-line `--top=0` suppresses them again.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` started life as an experimental scale-measurement tool;
the preceding commits give `git repo structure` the path-level detail
tables and ref-scoping mechanism that were `git survey`'s main draw,
so the two now overlap substantially. Plan the migration explicitly:
add a short notice at the top of the description making clear which
of `git survey`'s knobs map to which `git repo structure` option, and
state that a future release will turn `git survey` into a thin shim
over `git repo structure`.

Putting the notice in the description (rather than only the synopsis)
ensures it shows up in `git help survey` rendering before the reader
sees any option specifics, so an operator skimming the page learns
about the replacement before adopting any survey-specific flags.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` was an experimental scale-measurement tool whose
distinctive features (ref-kind filters, top-N path tables) are now
all available in `git repo structure`. With the path-level reporting
in place (commits "repo: filter the structure scope via
--ref-filter=<pattern>" and "repo: report top-N paths by count, disk,
and inflated size in structure"), there is no functionality `git
survey` provides that `git repo structure` cannot.

Replace the 764-line `git survey` implementation with a roughly
hundred-line shim that:

  * Accepts the existing `git survey` command line so callers in
    scripts continue to parse without changes.
  * Emits a deprecation warning naming the replacement command, so
    interactive users learn about the migration target.
  * Translates the survey-specific knobs into the equivalent
    `git repo structure` invocation and re-execs the canonical
    command via `execv_git_cmd()`. Per-kind ref selectors fan out
    into the corresponding `refs/heads/*`, `refs/tags/*`, etc.
    `--ref-filter` patterns; `--top=<N>` is forwarded directly;
    `--all-refs` becomes the absence of any `--ref-filter`.

Two survey options have no `git repo structure` counterpart:
`--verbose` controlled per-step trace output the new command does
not emit, and `--[no-]detached` selected the detached HEAD which
`git repo structure` does not enumerate separately. Both are
silently accepted and produce a single warning each, so old
invocations keep working while the absence of these knobs in `git
repo structure` is made visible.

Rewrite t8100 to assert the shim's contract: the deprecation
warning is printed, the output is byte-identical to a corresponding
`git repo structure` invocation, and the per-kind selector
translation produces the right `--ref-filter` pattern. The
preceding survey-specific output assertions (the multi-column
plaintext tables) no longer apply, since `git repo structure`'s
output format is now the canonical one and is covered by t1901.

The `survey.*` configuration keys (`survey.top`, `survey.progress`,
`survey.verbose`) are no longer honored by the shim. They were
mirrored by the preceding `repo.structure.top` work for the most
useful knob; users with `survey.top` set in config should migrate
to `repo.structure.top`. This is a backward-incompatible removal
documented by the deprecation notice in `git-survey.adoc`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the survey-pivot-into-git-repo branch from dd46870 to d14deae Compare June 7, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant