Commit Graph

78767 Commits

Author SHA1 Message Date
Patrick Steinhardt
e66077ae45 ref-filter: detect broken tags when dereferencing them
Users can ask git-for-each-ref(1) to peel tags and return information of
the tagged object by adding an asterisk to the format, like for example
"%(*$objectname)". If so, git-for-each-ref(1) peels that object to the
first non-tag object and then returns its values.

As mentioned in preceding commits, it can happen that the tagged object
type and the claimed object type differ, effectively resulting in a
corrupt tag. git-for-each-ref(1) would notice this mismatch, print an
error and then bail out when trying to peel the tag.

But we only notice this corruption in some very specific edge cases!
While we have a test in "t/for-each-ref-tests.sh" that verifies the
above scenario, this test is specifically crafted to detect the issue at
hand. Namely, we create two tags:

  - One tag points to a specific object with the correct type.

  - The other tag points to the *same* object with a different type.

The fact that both tags point to the same object is important here:
`peel_object()` wouldn't notice the corruption if the tagged objects
were different.

The root cause is that `peel_object()` calls `lookup_${type}()`
eventually, where the type is the same type declared in the tag object.
Consequently, when we have two tags pointing to the same object but with
different declared types we'll call two different lookup functions. The
first lookup will store the object with an unverified type A, whereas
the second lookup will try to look up the object with a different
unverified type B. And it is only now that we notice the discrepancy in
object types, even though type A could've already been the wrong type.

Fix the issue by verifying the object type in `populate_value()`. With
this change we'll also notice type mismatches when only dereferencing a
tag once.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
6ec4c0b45b refs: don't store peeled object IDs for invalid tags
Both the "files" and "reftable" backend store peeled object IDs for
references that point to tags:

  - The "files" backend stores the value when packing refs, where each
    peeled object ID is prefixed with "^".

  - The "reftable" backend stores the value whenever writing a new
    reference that points to a tag via a special ref record type.

Both of these backends use `peel_object()` to find the peeled object ID.
But as explained in the preceding commit, that function does not detect
the case where the tag's tagged object and its claimed type mismatch.

The consequence of storing these bogus peeled object IDs is that we're
less likely to detect such corruption in other parts of Git.
git-for-each-ref(1) for example does not notice anymore that the tag is
broken when using "--format=%(*objectname)" to dereference tags.

One could claim that this is good, because it still allows us to mostly
use the tag as intended. But the biggest problem here is that we now
have different behaviour for such a broken tag depending on whether or
not we have its peeled value in the refdb.

Fix the issue by verifying the object type when peeling the object. If
that verification fails we simply skip storing the peeled value in
either of the reference formats.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
7ec85185b1 object: add flag to peel_object() to verify object type
When peeling a tag to a non-tag object we repeatedly call
`parse_object()` on the tagged object until we find the first object
that isn't a tag. While this feels sensible at first, there is a big
catch here: `parse_object()` doesn't actually verify the type of the
tagged object.

The relevant code path here eventually ends up in `parse_tag_buffer()`.
Here, we parse the various fields of the tag, including the "type". Once
we've figured out the type and the tagged object ID, we call one of the
`lookup_${type}()` functions for whatever type we have found. There is
two possible outcomes in the successful case:

  1. The object is already part of our cached objects. In that case we
     double-check whether the type we're trying to look up matches the
     type that was cached.

  2. The object is _not_ part of our cached objects. In that case, we
     simply create a new object with the expected type, but we don't
     parse that object.

In the first case we might notice type mismatches, but only in the case
where our cache has the object with the correct type. In the second
case, we'll blindly assume that the type is correct and then go with it.
We'll only notice that the type might be wrong when we try to parse the
object at a later point.

Now arguably, we could change `parse_tag_buffer()` to verify the tagged
object's type for us. But that would have the effect that such a tag
cannot be parsed at all anymore, and we have a small bunch of tests for
exactly this case that assert we still can open such tags. So this
change does not feel like something we can retroactively tighten, even
though one shouldn't ever hit such corrupted tags.

Instead, add a new `flags` field to `peel_object()` that allows the
caller to opt in to strict object verification. This will be wired up at
a subset of callsites over the next few commits.

Note that this change also inlines `deref_tag_noverify()`. There's only
been two callsites of that function, the one we're changing and one in
our test helpers. The latter callsite can trivially use `deref_tag()`
instead, so by inlining the function we avoid having to pass down the
flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
705114772e refs: drop infrastructure to peel via iterators
Now that the peeled object ID gets propagated via the `struct reference`
there is no need anymore to call into the reference iterator itself to
dereference an object. Remove this infrastructure.

Most of the changes are straight-forward deletions of code. There is one
exception though in `refs/packed-backend.c::write_with_updates()`. Here
we stop peeling the iterator and instead just pass the peeled object ID
of that iterator directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
5a5c7359f7 refs: drop current_ref_iter hack
In preceding commits we have refactored all callers of
`peel_iterated_oid()` to instead use `reference_get_peeled_oid()`. This
allows us to thus get rid of the former function.

Getting rid of that function is nice, but even nicer is that this also
allows us to get rid of the `current_ref_iter` hack. This global
variable tracked the currently-active ref iterator so that we can use it
to peel an object ID. Now that the peeled object ID is propagated via
`struct reference` though we don't have to depend on this hack anymore,
which makes for a more robust and easier-to-understand infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
feaaea4c12 builtin/show-ref: convert to use reference_get_peeled_oid()
The git-show-ref(1) command has multiple different modes:

  - It knows to show all references matching a pattern.

  - It knows to list all references that are an exact match to whatever
    the user has provided.

  - It knows to check for reference existence.

The first two commands use mostly the same infrastructure to print the
references via `show_one()`. But while the former mode uses a proper
iterator and thus has a `struct reference` available in its context, the
latter calls `refs_read_ref()` and thus doesn't. Consequently, we cannot
easily use `reference_get_peeled_oid()` to print the peeled value.

Adapt the code so that we manually construct a `struct reference` when
verifying refs. We wouldn't ever have the peeled value available anyway
as we're not using an iterator here, so we can simply plug in the values
we _do_ have.

With this change we now have a `struct reference` available at both
callsites of `show_one()` and can thus pass it, which allows us to use
`reference_get_peeled_oid()` instead of `peel_iterated_oid()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
70b783c3a1 ref-filter: propagate peeled object ID
When queueing a reference in the "ref-filter" subsystem we end up
creating a new ref array item that contains the reference's info. One
bit of info that we always discard though is the peeled object ID, and
because of that we are forced to use `peel_iterated_oid()`.

Refactor the code to propagate the peeled object ID via the ref array,
if available. This allows us to manually peel tags without having to go
through the object database.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
adecd5f0b6 upload-pack: convert to use reference_get_peeled_oid()
The `write_v0_ref()` callback is invoked from two callsites:

  - Once via `send_ref()` which is a callback passed to
    `for_each_namespaced_ref_1()` and `refs_head_ref_namespaced()`.

  - Once manually to announce capabilities.

When sending references to the client we also send the peeled value of
tags. As we don't have a `struct reference` available in the second
case, we cannot easily peel by calling `reference_get_peeled_oid()`, but
we instead have to depend on on global state via `peel_iterated_oid()`.

We do have a reference available though in the first case, it's only the
second case that keeps us from using `reference_get_peeled_oid()`. But
that second case only announces capabilities anyway, so we're not really
handling a reference at all here.

Adapt that case to construct a reference manually and pass that to
`write_v0_ref()`. Start to use `reference_get_peeled_oid()` now that we
always have a `struct reference` available.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
f898661637 refs: expose peeled object ID via the iterator
Both the "files" and "reftable" backend are able to store peeled values
for tags in the respective formats. This allows for a more efficient
lookup of the target object of such a tag without having to manually
peel via the object database.

The infrastructure to access these peeled object IDs is somewhat funky
though. When iterating through objects, we store a pointer reference to
the current iterator in a global variable. The callbacks invoked by that
iterator are then expected to call `peel_iterated_oid()`, which checks
whether the globally-stored iterator's current reference refers to the
one handed into that function. If so, we ask the iterator to peel the
object, otherwise we manually peel the object via the object database.
Depending on global state like this is somewhat weird and also quite
fragile.

Introduce a new `struct reference::peeled_oid` field that can be
populated by the reference backends. This field can be accessed via a
new function `reference_get_peeled_oid()` that either uses that value,
if set, or alternatively peels via the ODB. With this change we don't
have to rely on global state anymore, but make the peeled object ID
available to the callback functions directly.

Adjust trivial callers that already have a `struct reference` available.
Remaining callers will be adjusted in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
eb2934d94b refs: refactor reference status flags
The reference flags encode information like whether or not a reference
is a symbolic reference or whether it may be broken. This information is
stored in a `int flags` bitfield, which is in conflict with our modern
best practices; we tend to use an unsigned integer to store flags.

Change the type of the field to be `unsigned`. While at it, refactor the
individual flags to be part of an `enum` instead of using preprocessor
defines.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
4cea042287 refs: fully reset struct ref_iterator::ref on iteration
With the introduction of the `struct ref_iterator::ref` field it now is
a whole lot easier to introduce new fields that become accessible to the
caller without having to adapt every single callsite. But there's a
downside: when a new field is introduced we always have to adapt all
backends to set that field.

This isn't something we can avoid in the general case: when the new
field is expected to be populated by all backends we of course cannot
avoid doing so. But new fields may be entirely optional, in which case
we'd still have such churn. And furthermore, it is very easy right now
to leak state from a previous iteration into the next iteration.

Address this issue by ensuring that the reference backends all fully
reset the field on every single iteration. This ensures that no state
from previous iterations can leak into the next one. And it ensures that
any newly introduced fields will be zeroed out by default.

Note that we don't have to explicitly adapt the "files" backend, as it
uses the `cache_ref_iterator` internally. Furthermore, other "wrapping"
iterators like for example the `prefix_ref_iterator` copy around the
whole reference, so these don't need to be adapted either.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
89baa52da6 refs: introduce .ref field for the base iterator
The base iterator has a couple of fields that tracks the name, target,
object ID and flags for the current reference. Due to this design we
have to create a new `struct reference` whenever we want to hand over
that reference to the callback function, which is tedious and not very
efficient.

Convert the structure to instead contain a `struct reference` as member.
This member is expected to be populated by the implementations of the
iterator and is handed over to the callback directly.

While at it, simplify `should_pack_ref()` to take a `struct reference`
directly instead of passing its respective fields.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
bdbebe5714 refs: introduce wrapper struct for each_ref_fn
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.

It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.

One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.

Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.

[1]: <ZmarVcF5JjsZx0dl@tanuki>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:24 -08:00
Junio C Hamano
f2bf477c7e Merge branch 'jt/repo-structure' into ps/ref-peeled-tags
* jt/repo-structure:
  builtin/repo: add progress meter for structure stats
  builtin/repo: add keyvalue and nul format for structure stats
  builtin/repo: add object counts in structure output
  builtin/repo: introduce structure subcommand
  ref-filter: export ref_kind_from_refname()
  ref-filter: allow NULL filter pattern
  builtin/repo: rename repo_info() to cmd_repo_info()
2025-10-22 07:47:24 -07:00
Junio C Hamano
6131a76399 Merge branch 'tb/incremental-midx-part-3.1' into ps/ref-peeled-tags
* tb/incremental-midx-part-3.1: (49 commits)
  builtin/repack.c: clean up unused `#include`s
  repack: move `write_cruft_pack()` out of the builtin
  repack: move `write_filtered_pack()` out of the builtin
  repack: move `pack_kept_objects` to `struct pack_objects_args`
  repack: move `finish_pack_objects_cmd()` out of the builtin
  builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
  repack: extract `write_pack_opts_is_local()`
  repack: move `find_pack_prefix()` out of the builtin
  builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
  builtin/repack.c: introduce `struct write_pack_opts`
  repack: 'write_midx_included_packs' API from the builtin
  builtin/repack.c: inline packs within `write_midx_included_packs()`
  builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
  builtin/repack.c: inline `remove_redundant_bitmaps()`
  builtin/repack.c: reorder `remove_redundant_bitmaps()`
  repack: keep track of MIDX pack names using existing_packs
  builtin/repack.c: use a string_list for 'midx_pack_names'
  builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
  builtin/repack.c: remove ref snapshotting from builtin
  repack: remove pack_geometry API from the builtin
  ...
2025-10-22 07:47:01 -07:00
Justin Tobler
16a93c03c7 builtin/repo: add progress meter for structure stats
When using the structure subcommand for git-repo(1), evaluating a
repository may take some time depending on its shape. Add a progress
meter to provide feedback to the user about what is happening. The
progress meter is enabled by default when the command is executed from a
tty. It can also be explicitly enabled/disabled via the --[no-]progress
option.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-21 14:40:38 -07:00
Justin Tobler
17215675b5 builtin/repo: add keyvalue and nul format for structure stats
All repository structure stats are outputted in a human-friendly table
form. This format is not suitable for machine parsing. Add a --format
option that supports three output modes: `table`, `keyvalue`, and `nul`.
The `table` mode is the default format and prints the same table output
as before.

With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
and values. The `nul` mode is similar to `keyvalue`, but key-values are
delimited by a NUL character instead of a newline. Also, instead of a
'=' character to delimit between keys and values, a newline character is
used. This allows stat values to support special characters without
having to cquote them. These two new modes provides output that is more
machine-friendly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-21 14:40:38 -07:00
Justin Tobler
eb5cf58ffc builtin/repo: add object counts in structure output
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-21 14:40:38 -07:00
Justin Tobler
bbb2b93348 builtin/repo: introduce structure subcommand
The structure of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface repository metrics regarding its structure/shape via a
single command. Acquiring this information requires users to be familiar
with the relevant data points and the various Git commands required to
surface them. To fill this gap, supplemental tools such as git-sizer(1)
have been developed.

To allow users to more readily identify repository structure related
information, introduce the "structure" subcommand in git-repo(1). The
goal of this subcommand is to eventually provide similar functionality
to git-sizer(1), but natively in Git.

The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
and other reference types. The corresponding information is displayed in
a human-friendly table formatted in a very similar manner to
git-sizer(1). The width of each table column is adjusted automatically
to satisfy the requirements of the widest row contained.

Subsequent commits will surface additional relevant data points to
output and also provide other more machine-friendly output formats.

Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-21 14:40:37 -07:00
Justin Tobler
6d1997f6cb ref-filter: export ref_kind_from_refname()
When filtering refs, `ref_kind_from_refname()` is used to determine the
ref type. In a subsequent commit, this same logic is reused when
counting refs by type. Export the function to prepare for this change.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-21 14:40:37 -07:00
Justin Tobler
eafc03dbe3 ref-filter: allow NULL filter pattern
When setting up `struct ref_filter` for filter_refs(), the
`name_patterns` field must point to an array of pattern strings even if
no patterns are required. To improve this interface, treat a NULL
`name_patterns` field the same as when it points to an empty array.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-21 14:40:37 -07:00
Justin Tobler
026ad60160 builtin/repo: rename repo_info() to cmd_repo_info()
Subcommand functions are often prefixed with `cmd_` to denote that they
are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-21 14:40:37 -07:00
Junio C Hamano
133d151831 The twenty-first batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-20 14:12:18 -07:00
Junio C Hamano
8329f6724b Merge branch 'tb/cat-file-objectmode-update'
Code clean-up.

* tb/cat-file-objectmode-update:
  builtin/cat-file.c: simplify calling `report_object_status()`
2025-10-20 14:12:18 -07:00
Junio C Hamano
a23c82509f Merge branch 'kh/doc-continued-paragraph-fix'
Doc mark-up fixes.

* kh/doc-continued-paragraph-fix:
  doc: fix accidental literal blocks
2025-10-20 14:12:17 -07:00
Junio C Hamano
5a34f66fb9 Merge branch 'js/unreachable-workaround-for-no-symlink-head'
Code clean-up.

* js/unreachable-workaround-for-no-symlink-head:
  refs: forbid clang to complain about unreachable code
2025-10-20 14:12:17 -07:00
Junio C Hamano
fc00bf0f9c Merge branch 'js/mingw-includes-cleanup'
Code clean-up.

* js/mingw-includes-cleanup:
  mingw: order `#include`s alphabetically
  mingw: avoid relative `#include`s
2025-10-20 14:12:17 -07:00
Junio C Hamano
29b0700515 Merge branch 'dk/stash-apply-index'
Doc update.

* dk/stash-apply-index:
  doc: explain the impact of stash.index on --autostash options
2025-10-20 14:12:17 -07:00
Junio C Hamano
f229982df1 The twentieth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-17 14:02:17 -07:00
Junio C Hamano
e0fe91489f Merge branch 'jk/diff-no-index-with-pathspec-fix'
An earlier addition to "git diff --no-index A B" to limit the
output with pathspec after the two directories misbehaved when
these directories were given with a trailing slash, which has been
corrected.

* jk/diff-no-index-with-pathspec-fix:
  diff --no-index: fix logic for paths ending in '/'
2025-10-17 14:02:17 -07:00
Junio C Hamano
ab447045ed Merge branch 'tb/doc-submitting-patches'
A few more things that patch authors can do to help maintainer to
keep track of their topics better.

* tb/doc-submitting-patches:
  SubmittingPatches: guidance for multi-series efforts
  SubmittingPatches: extend release-notes experiment to topic names
2025-10-17 14:02:17 -07:00
Junio C Hamano
cd6c082b44 Merge branch 'rs/add-patch-options-fix'
The code in "git add -p" and friends to iterate over hunks was
riddled with bugs, which has been corrected.

* rs/add-patch-options-fix:
  add-patch: reset "permitted" at loop start
  add-patch: let options a and d roll over like y and n
  add-patch: let options k and K roll over like j and J
  add-patch: let options y, n, j, and e roll over to next undecided
  add-patch: document that option J rolls over
  add-patch: improve help for options j, J, k, and K
2025-10-17 14:02:17 -07:00
Junio C Hamano
282a9684ab Merge branch 'en/make-libgit-a'
Instead of three library archives (one for git, one for reftable,
and one for xdiff), roll everything into a single libgit.a archive.
This would help later effort to FFI into Rust.

* en/make-libgit-a:
  make: delete REFTABLE_LIB, add reftable to LIB_OBJS
  make: delete XDIFF_LIB, add xdiff to LIB_OBJS
2025-10-17 14:02:16 -07:00
Taylor Blau
935ab44a0a builtin/repack.c: clean up unused #includes
Over the past several dozen commits, we have moved a large amount of
functionality out of the repack builtin and into other files like
repack.c, repack-cruft.c, repack-filtered.c, repack-midx.c, and
repack-promisor.c.

These files specify the minimal set of `#include`s that they need to
compile successfully, but we did not change the set of `#include`s in
the repack builtin itself.

Now that the code movement is complete, let's clean up that set of
`#include`s and trim down the builtin to include the minimal amount of
external headers necessary to compile.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
09797bd966 repack: move write_cruft_pack() out of the builtin
In an identical fashion as the previous commit, move the function
`write_cruft_pack()` into its own compilation unit, and make the
function visible through the repack.h API.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
7ac4231b42 repack: move write_filtered_pack() out of the builtin
In a similar fashion as in previous commits, move the function
`write_filtered_pack()` out of the builtin and into its own compilation
unit.

This function is now part of the repack.h API, but implemented in its
own "repack-filtered.c" unit as it is a separate component from other
kinds of repacking operations.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
d278970aef repack: move pack_kept_objects to struct pack_objects_args
The "pack_kept_objects" variable is defined as static to the repack
builtin, but is inherently related to the pack-objects arguments that
the builtin uses when generating new packs.

Move that field into the "struct pack_objects_args", and shuffle around
where we append the corresponding command-line option when preparing a
pack-objects process. Specifically:

 - `write_cruft_pack()` always wants to pass "--honor-pack-keep", so
   explicitly set the `pack_kept_objects` field to "0" when initializing
   the `write_pack_opts` struct before calling `write_cruft_pack()`.

 - `write_filtered_pack()` no longer needs to handle writing the
   command-line option "--honor-pack-keep" when preparing a pack-objects
   process, since its call to `prepare_pack_objects()` will have already
   taken care of that.

   `write_filtered_pack()` also reads the `pack_kept_objects` field to
   determine whether to write the existing kept packs with a leading "^"
   character, so update that to read through the `po_args` pointer
   instead.

 - `cmd_repack()` also no longer has to write the "--honor-pack-keep"
   flag explicitly, since this is also handled via its call to
   `prepare_pack_objects()`.

Since there is a default value for "pack_kept_objects" that relies on
whether or not we are writing a bitmap (and not writing a MIDX), extract
a default initializer for `struct pack_objects_args` that keeps this
conditional default behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:57 -07:00
Taylor Blau
fa0787a6cc repack: move finish_pack_objects_cmd() out of the builtin
In a similar spirit as the previous commit(s), now that the function
`finish_pack_objects_cmd()` has no explicit dependencies within the
repack builtin, let's extract it.

This prepares us to extract the remaining two functions within the
repack builtin that explicitly write packfiles, which are
`write_cruft_pack()` and `write_filtered_pack()`, which will be done in
the future commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
80db3cd189 builtin/repack.c: pass write_pack_opts to finish_pack_objects_cmd()
To prepare to move the `finish_pack_objects_cmd()` function out of the
builtin and into the repack.h API, there are a couple of things we need
to do first:

 - First, let's take advantage of `write_pack_opts_is_local()` function
   introduced in the previous commit instead of passing "local"
   explicitly.

 - Let's also avoid referring to the static 'packtmp' field within
   builtin/repack.c by instead accessing it through the write_pack_opts
   argument.

There are three callers which need to adjust themselves in order to
account for this change. The callers which reside in write_cruft_pack()
and write_filtered_pack() both already have an "opts" in scope, so they
can pass it through transparently.

The other call (at the bottom of `cmd_repack()`) needs to initialize its
own write_pack_opts to pass the necessary fields over to the direct call
to `finish_pack_objects_cmd()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
2f79c79bba repack: extract write_pack_opts_is_local()
Similar to the previous commit, the functions `write_cruft_pack()` and
`write_filtered_pack()` both compute a "local" variable via the exact
same mechanism:

    const char *scratch;
    int local = skip_prefix(opts->destination, opts->packdir, &scratch);

Not only does this cause us to repeat the same pair of lines, it also
introduces an unnecessary "scratch" variable that is common between both
functions.

Instead of repeating ourselves, let's extract that functionality into a
new function in the repack.h API called "write_pack_opts_is_local()".
That function takes a pointer to a "struct write_pack_opts" (which has
as fields both "destination" and "packdir"), and can encapsulate the
dangling "scratch" field.

Extract that function and make it visible within the repack.h API, and
use it within both `write_cruft_pack()` and `write_filtered_pack()`.
While we're at it, match our modern conventions by returning a "bool"
instead of "int", and use `starts_with()` instead of `skip_prefix()` to
avoid storing the dummy "scratch" variable.

The remaining duplication (that is, that both `write_cruft_pack()` and
`write_filtered_pack()` still both call `write_pack_opts_is_local()`)
will be addressed in the following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
98fa0d50a7 repack: move find_pack_prefix() out of the builtin
Both callers within the repack builtin which call functions that take a
'write_pack_opts' structure have the following pattern:

    struct write_pack_opts opts = {
        .packdir = packdir,
        .packtmp = packtmp,
        .pack_prefix = find_pack_prefix(packdir, packtmp),
        /* ... */
    };
    int ret = write_some_kind_of_pack(&opts, /* ... */);

, but both "packdir" and "packtmp" are fields within the write_pack_opts
struct itself!

Instead of also computing the pack_prefix ahead of time, let's have the
callees compute it themselves by moving `find_pack_prefix()` out of the
repack builtin, and have it take a write_pack_opts pointer instead of
the "packdir" and "packtmp" fields directly.

This avoids the callers having to do some prep work that is common
between the two of them, but also avoids the potential pitfall of
accidentally writing:

    .pack_prefix = find_pack_prefix(packtmp, packdir),

(which is well-typed) when the caller meant to instead write:

    .pack_prefix = find_pack_prefix(packdir, packtmp),

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
3d2ac2065e builtin/repack.c: use write_pack_opts within write_cruft_pack()
Similar to the changes made in the previous commit to
`write_filtered_pack()`, teach `write_cruft_pack()` to take a
`write_pack_opts` struct and use that where possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
7a9c81a38d builtin/repack.c: introduce struct write_pack_opts
There are various functions within the 'repack' builtin which are
responsible for writing different kinds of packs. They include:

 - `static int write_filtered_pack(...)`
 - `static int write_cruft_pack(...)`

as well as the function `finish_pack_objects_cmd()`, which is
responsible for finalizing a new pack write, and recording the checksum
of its contents in the 'names' list.

Both of these `write_` functions have a few things in common. They both
take a pointer to the 'pack_objects_args' struct, as well as a pair of
character pointers for `destination` and `pack_prefix`.

Instead of repeating those arguments for each function, let's extract an
options struct called "write_pack_opts" which has these three parameters
as member fields. While we're at it, add fields for "packdir," and
"packtmp", both of which are static variables within the builtin, and
need to be read from within these two functions.

This will shorten the list of parameters that callers have to provide to
`write_filtered_pack()`, avoid ambiguity when passing multiple variables
of the same type, and provide a unified interface for the two functions
mentioned earlier.

(Note that "pack_prefix" can be derived on the fly as a function of
"packdir" and "packtmp", making it unnecessary to store "pack_prefix"
explicitly. This commit ignores that potential cleanup in the name of
doing as few things as possible, but a later commit will make that
change.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
6d05eb135f repack: 'write_midx_included_packs' API from the builtin
Now that we have sufficiently cleaned up the write_midx_included_packs()
function, we can move it (along with the struct repack_write_midx_opts)
out of the builtin, and into the repack.h header.

Since this function (and the static ones that it depends on) are
MIDX-specific details of the repacking process, move them to the
repack-midx.c compilation unit instead of the general repack.c one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
f17757487b builtin/repack.c: inline packs within write_midx_included_packs()
To write a MIDX at the end of a repack operation, 'git repack' presently
computes the set of packs to write into the MIDX, before invoking
`write_midx_included_packs()` with a `string_list` containing those
packs.

The logic for computing which packs are supposed to appear in the
resulting MIDX is within `midx_included_packs()`, where it is aware of
details like which cruft pack(s) were written/combined, if/how we did a
geometric repack, etc.

Computing this list ourselves before providing it to the sole function
to make use of that list `write_midx_included_packs()` is somewhat
awkward. In the future, repack will learn how to write incremental
MIDXs, which will use a very different pack selection routine.

Instead of doing something like:

    struct string_list included_packs = STRING_LIST_INIT_DUP;
    if (incremental) {
        midx_incremental_included_packs(&included_packs, ...):
        write_midx_incremental_included_packs(&included_packs, ...);
    } else {
        midx_included_packs(&included_packs, ...):
        write_midx_included_packs(&included_packs, ...);
    }

in the future, let's have each function that writes a MIDX be
responsible for itself computing the list of included packs. Inline the
declaration and initialization of `included_packs` into the
`write_midx_included_packs()` function itself, and repeat that pattern
in the future when we introduce new ways to write MIDXs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
f07263fd9f builtin/repack.c: pass repack_write_midx_opts to midx_included_packs
Instead of passing individual parameters (in this case, "existing",
"names", and "geometry") to `midx_included_packs()`, pass a pointer to a
`repack_write_midx_opts` structure instead.

Besides reducing the number of parameters necessary to call the
`midx_included_packs` function, this refactoring sets us up nicely to
inline the call to `midx_included_packs()` into
`write_midx_included_packs()`, thus making the caller (in this case,
`cmd_repack()`) oblivious to the set of packs being written into the
MIDX.

In order to do this, `repack_write_midx_opts` has to keep track of the
set of existing packs, so add an additional field to point to that set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
337baea721 builtin/repack.c: inline remove_redundant_bitmaps()
After writing a new MIDX, the repack command removes any bitmaps
belonging to packs which were written into the MIDX.

This is currently done in a separate function outside of
`write_midx_included_packs()`, which forces the caller to keep track of
the set of packs written into the MIDX.

Prepare to no longer require the caller to keep track of such
information by inlining the clean-up into `write_midx_included_packs()`.
Future commits will make the caller oblivious to the set of packs
included in the MIDX altogether.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
42088e3d4a builtin/repack.c: reorder remove_redundant_bitmaps()
The next commit will inline the call to `remove_redundant_bitmaps()`
into `write_midx_included_packs()`. Reorder these two functions to avoid
a forward declaration to `remove_redundant_bitmaps()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
2fee63a71a repack: keep track of MIDX pack names using existing_packs
Instead of storing the list of MIDX pack names separately, let's inline
it into the existing_packs struct, further reducing the number of
parameters we have to pass around.

This amounts to adding a new string_list to the existing_packs struct,
and populating it via `existing_packs_collect()`. This is fairly
straightforward to do, since we are already looping over all packs, all
we need to do is:

    if (p->multi_pack_index)
        string_list_append(&existing->midx_packs, pack_basename(p));

Note, however, that this check *must* come before other conditions where
we discard and do not keep track of a pack, including the condition "if
(!p->pack_local)" immediately below. This is because the existing
routine which collects MIDX pack names does so blindly, and does not
discard, for example, non-local packs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00
Taylor Blau
c3690c97d7 builtin/repack.c: use a string_list for 'midx_pack_names'
When writing a new MIDX, repack must determine whether or not there are
any packs in the MIDX it is replacing (if one exists) that are not
somehow represented in the new MIDX (e.g., either by preserving the pack
verbatim, or rolling it up as part of a geometric repack, etc.).

In order to do this, it keeps track of a list of pack names from the
MIDX present in the repository at the start of the repack operation.
Since we manipulate and close the object store, we cannot rely on the
repository's in-core representation of the MIDX, since this is subject
to change and/or go away.

When this behavior was introduced in 5ee86c273b (repack: exclude cruft
pack(s) from the MIDX where possible, 2025-06-23), we maintained an
array of character pointers instead of using a convenience API, such as
string-list.h.

Store the list of MIDX pack names in a string_list, thereby reducing the
number of parameters we have to pass to `midx_has_unknown_packs()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00