Commit Graph

78508 Commits

Author SHA1 Message Date
Patrick Steinhardt
0e994d9f38 builtin/maintenance: extend "maintenance.strategy" to manual maintenance
The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.

In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.

So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.

Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.

Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.

But luckily, we can match that behaviour by:

  - Keeping all current tasks of the incremental strategy as
    `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
    run during manual maintenance.

  - Configuring the "gc" task so that it is invoked during manual
    maintenance.

Like this, the user shouldn't observe any difference in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:44 -07:00
Patrick Steinhardt
6a7d3eeb47 builtin/maintenance: run maintenance tasks depending on type
We basically have three different ways to execute repository
maintenance:

  1. Manual maintenance via `git maintenance run`.

  2. Automatic maintenance via `git maintenance run --auto`.

  3. Scheduled maintenance via `git maintenance run --schedule=`.

At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.

Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.

The types picked for existing strategies match the status quo:

  - The default strategy is only ever executed as part of a manual
    maintenance run. It is not possible to use it for scheduled
    maintenance.

  - The incremental strategy is only ever executed as part of a
    scheduled maintenance run. It is not possible to use it for manual
    maintenance.

The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:44 -07:00
Patrick Steinhardt
e83e92e876 builtin/maintenance: improve readability of strategies
Our maintenance strategies are essentially a large array of structures,
where each of the tasks can be enabled and scheduled individually. With
the current layout though all the configuration sits on the same nesting
layer, which makes it a bit hard to discern which initialized fields
belong to what task.

Improve readability of the individual tasks by using nested designated
initializers instead.

Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:44 -07:00
Patrick Steinhardt
d465be2327 builtin/maintenance: don't silently ignore invalid strategy
When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.

Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:43 -07:00
Patrick Steinhardt
5c2ad50193 builtin/maintenance: make the geometric factor configurable
The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.

Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:43 -07:00
Patrick Steinhardt
9bc151850c builtin/maintenance: introduce "geometric-repack" task
Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.

There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.

To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:43 -07:00
Patrick Steinhardt
60c0af8e20 builtin/gc: make too_many_loose_objects() reusable without GC config
To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.

This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.

Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:42 -07:00
Patrick Steinhardt
0ea94b023a builtin/gc: remove global repack variable
The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.

Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.

The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.

For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.

As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:42 -07:00
Junio C Hamano
8bca1c5d59 Merge branch 'tb/incremental-midx-part-3.1' into ps/maintenance-geometric
* tb/incremental-midx-part-3.1: (64 commits)
  builtin/repack.c: clean up unused `#include`s
  repack: move `write_cruft_pack()` out of the builtin
  repack: move `write_filtered_pack()` out of the builtin
  repack: move `pack_kept_objects` to `struct pack_objects_args`
  repack: move `finish_pack_objects_cmd()` out of the builtin
  builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
  repack: extract `write_pack_opts_is_local()`
  repack: move `find_pack_prefix()` out of the builtin
  builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
  builtin/repack.c: introduce `struct write_pack_opts`
  repack: 'write_midx_included_packs' API from the builtin
  builtin/repack.c: inline packs within `write_midx_included_packs()`
  builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
  builtin/repack.c: inline `remove_redundant_bitmaps()`
  builtin/repack.c: reorder `remove_redundant_bitmaps()`
  repack: keep track of MIDX pack names using existing_packs
  builtin/repack.c: use a string_list for 'midx_pack_names'
  builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
  builtin/repack.c: remove ref snapshotting from builtin
  repack: remove pack_geometry API from the builtin
  ...
2025-10-21 11:39:31 -07:00
Taylor Blau
935ab44a0a builtin/repack.c: clean up unused #includes
Over the past several dozen commits, we have moved a large amount of
functionality out of the repack builtin and into other files like
repack.c, repack-cruft.c, repack-filtered.c, repack-midx.c, and
repack-promisor.c.

These files specify the minimal set of `#include`s that they need to
compile successfully, but we did not change the set of `#include`s in
the repack builtin itself.

Now that the code movement is complete, let's clean up that set of
`#include`s and trim down the builtin to include the minimal amount of
external headers necessary to compile.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
09797bd966 repack: move write_cruft_pack() out of the builtin
In an identical fashion as the previous commit, move the function
`write_cruft_pack()` into its own compilation unit, and make the
function visible through the repack.h API.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
7ac4231b42 repack: move write_filtered_pack() out of the builtin
In a similar fashion as in previous commits, move the function
`write_filtered_pack()` out of the builtin and into its own compilation
unit.

This function is now part of the repack.h API, but implemented in its
own "repack-filtered.c" unit as it is a separate component from other
kinds of repacking operations.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
d278970aef repack: move pack_kept_objects to struct pack_objects_args
The "pack_kept_objects" variable is defined as static to the repack
builtin, but is inherently related to the pack-objects arguments that
the builtin uses when generating new packs.

Move that field into the "struct pack_objects_args", and shuffle around
where we append the corresponding command-line option when preparing a
pack-objects process. Specifically:

 - `write_cruft_pack()` always wants to pass "--honor-pack-keep", so
   explicitly set the `pack_kept_objects` field to "0" when initializing
   the `write_pack_opts` struct before calling `write_cruft_pack()`.

 - `write_filtered_pack()` no longer needs to handle writing the
   command-line option "--honor-pack-keep" when preparing a pack-objects
   process, since its call to `prepare_pack_objects()` will have already
   taken care of that.

   `write_filtered_pack()` also reads the `pack_kept_objects` field to
   determine whether to write the existing kept packs with a leading "^"
   character, so update that to read through the `po_args` pointer
   instead.

 - `cmd_repack()` also no longer has to write the "--honor-pack-keep"
   flag explicitly, since this is also handled via its call to
   `prepare_pack_objects()`.

Since there is a default value for "pack_kept_objects" that relies on
whether or not we are writing a bitmap (and not writing a MIDX), extract
a default initializer for `struct pack_objects_args` that keeps this
conditional default behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
fa0787a6cc repack: move finish_pack_objects_cmd() out of the builtin
In a similar spirit as the previous commit(s), now that the function
`finish_pack_objects_cmd()` has no explicit dependencies within the
repack builtin, let's extract it.

This prepares us to extract the remaining two functions within the
repack builtin that explicitly write packfiles, which are
`write_cruft_pack()` and `write_filtered_pack()`, which will be done in
the future commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
80db3cd189 builtin/repack.c: pass write_pack_opts to finish_pack_objects_cmd()
To prepare to move the `finish_pack_objects_cmd()` function out of the
builtin and into the repack.h API, there are a couple of things we need
to do first:

 - First, let's take advantage of `write_pack_opts_is_local()` function
   introduced in the previous commit instead of passing "local"
   explicitly.

 - Let's also avoid referring to the static 'packtmp' field within
   builtin/repack.c by instead accessing it through the write_pack_opts
   argument.

There are three callers which need to adjust themselves in order to
account for this change. The callers which reside in write_cruft_pack()
and write_filtered_pack() both already have an "opts" in scope, so they
can pass it through transparently.

The other call (at the bottom of `cmd_repack()`) needs to initialize its
own write_pack_opts to pass the necessary fields over to the direct call
to `finish_pack_objects_cmd()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
2f79c79bba repack: extract write_pack_opts_is_local()
Similar to the previous commit, the functions `write_cruft_pack()` and
`write_filtered_pack()` both compute a "local" variable via the exact
same mechanism:

    const char *scratch;
    int local = skip_prefix(opts->destination, opts->packdir, &scratch);

Not only does this cause us to repeat the same pair of lines, it also
introduces an unnecessary "scratch" variable that is common between both
functions.

Instead of repeating ourselves, let's extract that functionality into a
new function in the repack.h API called "write_pack_opts_is_local()".
That function takes a pointer to a "struct write_pack_opts" (which has
as fields both "destination" and "packdir"), and can encapsulate the
dangling "scratch" field.

Extract that function and make it visible within the repack.h API, and
use it within both `write_cruft_pack()` and `write_filtered_pack()`.
While we're at it, match our modern conventions by returning a "bool"
instead of "int", and use `starts_with()` instead of `skip_prefix()` to
avoid storing the dummy "scratch" variable.

The remaining duplication (that is, that both `write_cruft_pack()` and
`write_filtered_pack()` still both call `write_pack_opts_is_local()`)
will be addressed in the following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
98fa0d50a7 repack: move find_pack_prefix() out of the builtin
Both callers within the repack builtin which call functions that take a
'write_pack_opts' structure have the following pattern:

    struct write_pack_opts opts = {
        .packdir = packdir,
        .packtmp = packtmp,
        .pack_prefix = find_pack_prefix(packdir, packtmp),
        /* ... */
    };
    int ret = write_some_kind_of_pack(&opts, /* ... */);

, but both "packdir" and "packtmp" are fields within the write_pack_opts
struct itself!

Instead of also computing the pack_prefix ahead of time, let's have the
callees compute it themselves by moving `find_pack_prefix()` out of the
repack builtin, and have it take a write_pack_opts pointer instead of
the "packdir" and "packtmp" fields directly.

This avoids the callers having to do some prep work that is common
between the two of them, but also avoids the potential pitfall of
accidentally writing:

    .pack_prefix = find_pack_prefix(packtmp, packdir),

(which is well-typed) when the caller meant to instead write:

    .pack_prefix = find_pack_prefix(packdir, packtmp),

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
3d2ac2065e builtin/repack.c: use write_pack_opts within write_cruft_pack()
Similar to the changes made in the previous commit to
`write_filtered_pack()`, teach `write_cruft_pack()` to take a
`write_pack_opts` struct and use that where possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
7a9c81a38d builtin/repack.c: introduce struct write_pack_opts
There are various functions within the 'repack' builtin which are
responsible for writing different kinds of packs. They include:

 - `static int write_filtered_pack(...)`
 - `static int write_cruft_pack(...)`

as well as the function `finish_pack_objects_cmd()`, which is
responsible for finalizing a new pack write, and recording the checksum
of its contents in the 'names' list.

Both of these `write_` functions have a few things in common. They both
take a pointer to the 'pack_objects_args' struct, as well as a pair of
character pointers for `destination` and `pack_prefix`.

Instead of repeating those arguments for each function, let's extract an
options struct called "write_pack_opts" which has these three parameters
as member fields. While we're at it, add fields for "packdir," and
"packtmp", both of which are static variables within the builtin, and
need to be read from within these two functions.

This will shorten the list of parameters that callers have to provide to
`write_filtered_pack()`, avoid ambiguity when passing multiple variables
of the same type, and provide a unified interface for the two functions
mentioned earlier.

(Note that "pack_prefix" can be derived on the fly as a function of
"packdir" and "packtmp", making it unnecessary to store "pack_prefix"
explicitly. This commit ignores that potential cleanup in the name of
doing as few things as possible, but a later commit will make that
change.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
6d05eb135f repack: 'write_midx_included_packs' API from the builtin
Now that we have sufficiently cleaned up the write_midx_included_packs()
function, we can move it (along with the struct repack_write_midx_opts)
out of the builtin, and into the repack.h header.

Since this function (and the static ones that it depends on) are
MIDX-specific details of the repacking process, move them to the
repack-midx.c compilation unit instead of the general repack.c one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
f17757487b builtin/repack.c: inline packs within write_midx_included_packs()
To write a MIDX at the end of a repack operation, 'git repack' presently
computes the set of packs to write into the MIDX, before invoking
`write_midx_included_packs()` with a `string_list` containing those
packs.

The logic for computing which packs are supposed to appear in the
resulting MIDX is within `midx_included_packs()`, where it is aware of
details like which cruft pack(s) were written/combined, if/how we did a
geometric repack, etc.

Computing this list ourselves before providing it to the sole function
to make use of that list `write_midx_included_packs()` is somewhat
awkward. In the future, repack will learn how to write incremental
MIDXs, which will use a very different pack selection routine.

Instead of doing something like:

    struct string_list included_packs = STRING_LIST_INIT_DUP;
    if (incremental) {
        midx_incremental_included_packs(&included_packs, ...):
        write_midx_incremental_included_packs(&included_packs, ...);
    } else {
        midx_included_packs(&included_packs, ...):
        write_midx_included_packs(&included_packs, ...);
    }

in the future, let's have each function that writes a MIDX be
responsible for itself computing the list of included packs. Inline the
declaration and initialization of `included_packs` into the
`write_midx_included_packs()` function itself, and repeat that pattern
in the future when we introduce new ways to write MIDXs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
f07263fd9f builtin/repack.c: pass repack_write_midx_opts to midx_included_packs
Instead of passing individual parameters (in this case, "existing",
"names", and "geometry") to `midx_included_packs()`, pass a pointer to a
`repack_write_midx_opts` structure instead.

Besides reducing the number of parameters necessary to call the
`midx_included_packs` function, this refactoring sets us up nicely to
inline the call to `midx_included_packs()` into
`write_midx_included_packs()`, thus making the caller (in this case,
`cmd_repack()`) oblivious to the set of packs being written into the
MIDX.

In order to do this, `repack_write_midx_opts` has to keep track of the
set of existing packs, so add an additional field to point to that set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
337baea721 builtin/repack.c: inline remove_redundant_bitmaps()
After writing a new MIDX, the repack command removes any bitmaps
belonging to packs which were written into the MIDX.

This is currently done in a separate function outside of
`write_midx_included_packs()`, which forces the caller to keep track of
the set of packs written into the MIDX.

Prepare to no longer require the caller to keep track of such
information by inlining the clean-up into `write_midx_included_packs()`.
Future commits will make the caller oblivious to the set of packs
included in the MIDX altogether.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
42088e3d4a builtin/repack.c: reorder remove_redundant_bitmaps()
The next commit will inline the call to `remove_redundant_bitmaps()`
into `write_midx_included_packs()`. Reorder these two functions to avoid
a forward declaration to `remove_redundant_bitmaps()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
2fee63a71a repack: keep track of MIDX pack names using existing_packs
Instead of storing the list of MIDX pack names separately, let's inline
it into the existing_packs struct, further reducing the number of
parameters we have to pass around.

This amounts to adding a new string_list to the existing_packs struct,
and populating it via `existing_packs_collect()`. This is fairly
straightforward to do, since we are already looping over all packs, all
we need to do is:

    if (p->multi_pack_index)
        string_list_append(&existing->midx_packs, pack_basename(p));

Note, however, that this check *must* come before other conditions where
we discard and do not keep track of a pack, including the condition "if
(!p->pack_local)" immediately below. This is because the existing
routine which collects MIDX pack names does so blindly, and does not
discard, for example, non-local packs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
c3690c97d7 builtin/repack.c: use a string_list for 'midx_pack_names'
When writing a new MIDX, repack must determine whether or not there are
any packs in the MIDX it is replacing (if one exists) that are not
somehow represented in the new MIDX (e.g., either by preserving the pack
verbatim, or rolling it up as part of a geometric repack, etc.).

In order to do this, it keeps track of a list of pack names from the
MIDX present in the repository at the start of the repack operation.
Since we manipulate and close the object store, we cannot rely on the
repository's in-core representation of the MIDX, since this is subject
to change and/or go away.

When this behavior was introduced in 5ee86c273b (repack: exclude cruft
pack(s) from the MIDX where possible, 2025-06-23), we maintained an
array of character pointers instead of using a convenience API, such as
string-list.h.

Store the list of MIDX pack names in a string_list, thereby reducing the
number of parameters we have to pass to `midx_has_unknown_packs()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
e6b0907721 builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
The function 'write_midx_included_packs()', which is responsible for
writing a new MIDX with a given set of included packs, currently takes a
list of six arguments.

In order to extract this function out of the builtin, we have to pass
in a few additional parameters, like 'midx_must_contain_cruft' and
'packdir', which are currently declared as static variables within the
builtin/repack.c compilation unit.

Instead of adding additional parameters to `write_midx_included_packs()`
extract out an "opts" struct that names these parameters, and pass a
pointer to that, making it less cumbersome to add additional parameters.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
ccb7f822d5 builtin/repack.c: remove ref snapshotting from builtin
When writing a MIDX, 'git repack' takes a snapshot of the repository's
references and writes the result out to a file, which it then passes to
'git multi-pack-index write' via the '--refs-snapshot'.

This is done in order to make bitmap selections with respect to what we
are packing, thus avoiding a race where an incoming reference update
causes us to try and write a bitmap for a commit not present in the
MIDX.

Extract this functionality out into a new repack-midx.c compilation
unit, and expose the necessary functions via the repack.h API.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
62d3fa09b3 repack: remove pack_geometry API from the builtin
Now that the pack_geometry API is fully factored and isolated from the
rest of the builtin, declare it within repack.h and move its
implementation to "repack-geometry.c" as a separate component.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
b2ebeed1d8 builtin/repack.c: pass 'packdir' to pack_geometry_remove_redundant()
For similar reasons as the preceding commit, pass the "packdir" variable
directly to `pack_geometry_remove_redundant()` as a parameter to the
function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
2a15a739a2 builtin/repack.c: pass 'pack_kept_objects' to pack_geometry_init()
Prepare to move pack_geometry-related APIs to their own compilation unit
by passing in the static "pack_kept_objects" variable directly as a
parameter to the 'pack_geometry_init()' function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
e05c2d5566 builtin/repack.c: rename various pack_geometry functions
Rename functions which work with 'struct pack_geometry' to begin with
"pack_geometry_". While we're at it, change `free_pack_geometry()` to
instead be named `pack_geometry_release()` to match our conventions, and
make clear that that function frees the contents of the struct, not the
memory allocated to hold the struct itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
29e935515d builtin/repack.c: remove "repack_promisor_objects()" from the builtin
Now that we have properly factored the portion of the builtin which is
responsible for repacking promisor objects, we can move that function
(and associated dependencies) out of the builtin entirely.

Similar to previous extractions, this function is declared in repack.h,
but implemented in a separate repack-promisor.c file. This is done to
separate promisor-specific repacking functionality from generic repack
utilities (like "existing_packs", and "generated_pack" APIs).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
bebf941f7d builtin/repack.c: pass "packtmp" to repack_promisor_objects()
In a similar spirit as previous commit(s), pass the "packtmp" variable
to "repack_promisor_objects()" as an explicit parameter of the function,
preparing us to move this function in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
f053ab6c2b repack: remove 'generated_pack' API from the builtin
Now that we have factored the "generated_pack" API, we can move it to
repack.ch, further slimming down builtin/repack.c.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
7036d131ae builtin/repack.c: provide pack locations to generated_pack_install()
Repeat what was done in the preceding commit for the
`generated_pack_install()` function, which needs both "packdir" and
"packtmp".

(As an aside, it is somewhat unfortunate that the final three parameters
to this function are all "const char *", making errors like passing
"packdir" and "packtmp" in the wrong order easy. We could define a new
structure here, but that may be too heavy-handed.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
184f0abeb8 builtin/repack.c: pass "packtmp" to generated_pack_populate()
In a similar spirit as previous commits, this function needs to know the
temporary pack prefix, which it currently accesses through the static
"packtmp" variable within builtin/repack.c.

Pass it explicitly as a function parameter to facilitate moving this
function out of builtin/repack.c entirely.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
c0427692cb builtin/repack.c: factor out "generated_pack_install"
Once all new packs are known to exist, 'repack' installs their contents
from their temporary location into their permanent one. This is a
semi-involved procedure for each pack, since for each extension (e.g.,
".idx", ".pack", ".mtimes", and so on) we have to either:

 - adjust the filemode of the temporary file before renaming it into
   place, or

 - die() if we are missing a non-optional extension, or

 - unlink() any existing file for extensions that we did not generate
   (e.g., if a non-cruft pack we generated was identical to, say, a
   cruft pack which existed at the beginning of the process, we have to
   remove the ".mtimes" file).

Extract this procedure into its own function, and call it
"generated_pack_install"(). This will set us up for pulling this
function out of the builtin entirely and making it part of the repack.h
API, which will be done in a future commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
2b72c12367 builtin/repack.c: rename "struct generated_pack_data"
The name "generated_pack_data" is somewhat redundant, since the contents
of the struct *is* the data associated with the generated pack.

Rename the structure to just "generated_pack", resulting in less awkward
function names, like "generated_pack_has_ext()" which is preferable to
"generated_pack_data_has_ext()".

Rename a few related functions to align with the convention that
functions to do with a struct "S" should be prefixed with "S_".

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
7d1f442588 repack: remove 'existing_packs' API from the builtin
The repack builtin defines an API for keeping track of which packs
were found in the repository at the beginning of the repack operation.
This is used to classify what state a pack was in (kept, non-kept, or
cruft), and is also used to mark which packs to delete (or keep) at the
end of a repack operation.

Now that the prerequisite refactoring is complete, this API is isolated
enough that it can be moved out to repack.[ch] and removed from the
builtin entirely.

As a result, some of its functions become static within repack.c,
cleaning up the visible API.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
dab24e4bcb builtin/repack.c: avoid unnecessary numeric casts in existing_packs
There are a couple of spots that cause warnings within the
existing_packs API without DISABLE_SIGN_COMPARE_WARNINGS under
DEVELOPER=1 mode.

In both cases, we have int values that are being compared against size_t
ones. Neither of these two cases are incorrect, and the cast is
completely OK in practice. But both are unnecessary, since:

 - in existing_packs_mark_for_deletion_1(), 'hexsz' should be defined as
   a size_t anyway, since algop->hexsz is.

 - in existing_packs_collect(), 'i' should be defined as a size_t since
   it is counting up to the value of a string_list's 'nr' field.

(This patch is a little bit of noise, but I would rather see us squelch
these warnings ahead of moving the existing_packs API into a separate
compilation unit to avoid having to define DISABLE_SIGN_COMPARE_WARNINGS
in repack.c.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
9574e8f31d builtin/repack.c: pass "packdir" when removing packs
builtin/repack.c defines a static "packdir" to instruct pack-objects on
where to write any new packfiles. This is also the directory scanned
when removing any packfiles which were made redundant by the latest
repack.

Prepare to move the "existing_packs_remove_redundant" function to its
own compilation unit by passing in this information as a parameter to
that function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
f905f49c68 repack: remove 'remove_redundant_pack' from the builtin
Extract "remove_redundant_pack()" as generic repack-related
functionality by moving its implementation to the repack.[ch]
compilation unit.

This is a prerequisite to moving the "existing_packs" API, which is one
of the callers of this function. (The remaining caller in the pack
geometry code will eventually move to its own compilation unit as well,
and will likewise rely on this function.)

While moving it over, prefix the function name with "repack_" to
indicate that it belongs to the repack-subsystem.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
a0dcecb146 builtin/repack.c: rename many 'struct existing_packs' functions
Rename many of the 'struct existing_packs'-related functions according
to the convention introduced in and described by 541204aabe
(Documentation: document naming schema for structs and their functions,
2024-07-30).

Note that some functions which operate over an individual entry in the
list of existing packs are prefixed with "existing_pack_" instead of the
plural form.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
7005d2594b repack: remove 'prepare_pack_objects' from the builtin
Now that the 'prepare_pack_objects' function no longer refers to
external, static variables, move it out to repack.h as generic
functionality.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
e35ef71e00 repack: move 'delta_base_offset' to 'struct pack_objects_args'
The static variable 'delta_base_offset' determines whether or not we
pass the "--delta-base-offset" command-line argument when spawning
pack-objects as a child process. Its introduction dates back to when
repack was rewritten in C, all the way back in a1bbc6c017 (repack:
rewrite the shell script in C, 2013-09-15).

'struct pack_objects_args' was introduced much later on in 4571324b99
(builtin/repack.c: allow configuring cruft pack generation, 2022-05-20),
but did not move the 'delta_base_offset' variable.

Since the 'delta_base_offset' is a property of an individual
pack-objects command, re-introduce that variable as a member of 'struct
pack_objects_args', which will enable further code movement in the
subsequent commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:54 -07:00
Taylor Blau
19f6e8d023 builtin/repack.c: pass both pack_objects args to repack_config
A subsequent commit will remove 'delta_base_offset' as a static variable
within builtin/repack.c, and reintroduce it as a member of the 'struct
pack_objects_args'.

As a result, the repack_config callback will need to have both the
cruft- and non-cruft 'struct pack_objects_args's in scope. Introduce a
new 'struct repack_config_ctx' to allow the callee to provide both
pointers to the callback.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:53 -07:00
Taylor Blau
c7a120722e repack: introduce new compilation unit
Over the years, builtin/repack.c has turned into a grab-bag of
functionality powering the 'git repack' builtin. Among its many
capabilities, it:

 - can build and spawn 'git pack-objects' commands, which in turn
   generate new packs
 - has infrastructure to manage the set of existing packs in a
   repository
 - has infrastructure to split a sequence of packs into a geometric
   progression based on object size
 - can manage both generating and combining cruft packs together
 - can write new MIDXs

to name a few.

As a result, this builtin has accumulated a lot of code, making adding
new functionality difficult. In the future, 'repack' will learn how to
manage a chain of incremental MIDXs, adding yet more functionality into
the builtin.

As a prerequisite step, let's first move some of the functionality in
the builtin into its own repack.[ch].

This will be done over the course of many steps, since there are many
individual components, some of which will end up in other, yet-to-exist
compilation units of their own. Some of the code movement here is also
non-trivial, so performing it in individual steps will make it easier to
verify.

Let's start by migrating 'struct pack_objects_args' (and the related
corresponding pack_objects_args_release() function) into repack.h, and
teach both the Makefile and Meson how to build the new compilation unit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:53 -07:00
Taylor Blau
8a5d4bd87d builtin/repack.c: avoid using hash_to_hex() in pack geometry
In previous commits, we started passing either repository or
git_hash_algo pointers around to various spots within builtin/repack.c
to reduce our dependency on the_repository in the hope of undef'ing
USE_THE_REPOSITORY_VARIABLE.

This commit takes us as far as we can (easily) go in that direction by
removing the only use of a convenience function that only exists when
USE_THE_REPOSITORY_VARIABLE is defined.

Unfortunately, the only other such function is "is_bare_repository()",
which is less than straightforward to convert into, say,
"repo_is_bare()", the latter of the two accepting a repository pointer.

Punt on that for now, and declare this commit as the stopping point for
our efforts in the direction of undef'ing USE_THE_REPOSITORY_VARIABLE.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:53 -07:00
Taylor Blau
c660b0dbcb builtin/repack.c: avoid "the_hash_algo" in finish_pack_objects_cmd()
In a similar spirit as previous commits, avoid referring directly to
"the_hash_algo" in builtin/repack.c::finish_pack_objects_cmd() and
instead accept one as a parameter to the function.

Since this function has a number of callers throughout the builtin, the
diff is a little noisier than previous commits. However, each hunk is
limited to passing the hash_algo parameter from a repository pointer
that is already in scope.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:53 -07:00