Files
git/Documentation/config/repack.adoc
Taylor Blau 5ee86c273b repack: exclude cruft pack(s) from the MIDX where possible
In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.

To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:38 -07:00

49 lines
2.1 KiB
Plaintext

repack.useDeltaBaseOffset::
By default, linkgit:git-repack[1] creates packs that use
delta-base offset. If you need to share your repository with
Git older than version 1.4.4, either directly or via a dumb
protocol such as http, then you need to set this option to
"false" and repack. Access from old Git versions over the
native protocol are unaffected by this option.
repack.packKeptObjects::
If set to true, makes `git repack` act as if
`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
details. Defaults to `false` normally, but `true` if a bitmap
index is being written (either via `--write-bitmap-index` or
`repack.writeBitmaps`).
repack.useDeltaIslands::
If set to true, makes `git repack` act as if `--delta-islands`
was passed. Defaults to `false`.
repack.writeBitmaps::
When true, git will write a bitmap index when packing all
objects to disk (e.g., when `git repack -a` is run). This
index can speed up the "counting objects" phase of subsequent
packs created for clones and fetches, at the cost of some disk
space and extra time spent on the initial repack. This has
no effect if multiple packfiles are created.
Defaults to true on bare repos, false otherwise.
repack.updateServerInfo::
If set to false, linkgit:git-repack[1] will not run
linkgit:git-update-server-info[1]. Defaults to true. Can be overridden
when true by the `-n` option of linkgit:git-repack[1].
repack.cruftWindow::
repack.cruftWindowMemory::
repack.cruftDepth::
repack.cruftThreads::
Parameters used by linkgit:git-pack-objects[1] when generating
a cruft pack and the respective parameters are not given over
the command line. See similarly named `pack.*` configuration
variables for defaults and meaning.
repack.midxMustContainCruft::
When set to true, linkgit:git-repack[1] will unconditionally include
cruft pack(s), if any, in the multi-pack index when invoked with
`--write-midx`. When false, cruft packs are only included in the MIDX
when necessary (e.g., because they might be required to form a
reachability closure with MIDX bitmaps). Defaults to true.