Commit Graph

694 Commits

Author SHA1 Message Date
Junio C Hamano
7789d98c88 Merge branch 'ps/odb-for-each-object' into jch
Revamp object enumeration API around odb.

* ps/odb-for-each-object:
  odb: drop unused `for_each_{loose,packed}_object()` functions
  reachable: convert to use `odb_for_each_object()`
  builtin/pack-objects: use `packfile_store_for_each_object()`
  odb: introduce mtime fields for object info requests
  treewide: drop uses of `for_each_{loose,packed}_object()`
  treewide: enumerate promisor objects via `odb_for_each_object()`
  builtin/fsck: refactor to use `odb_for_each_object()`
  odb: introduce `odb_for_each_object()`
  packfile: introduce function to iterate through objects
  packfile: extract function to iterate through objects of a store
  object-file: introduce function to iterate through objects
  object-file: extract function to read object info from path
  odb: fix flags parameter to be unsigned
  odb: rename `FOR_EACH_OBJECT_*` flags
2026-02-23 14:25:48 -08:00
Junio C Hamano
b1b36c2ca7 Merge branch 'ps/for-each-ref-in-fixes' into jch
A handful of places used refs_for_each_ref_in() API incorrectly,
which has been corrected.

* ps/for-each-ref-in-fixes:
  bisect: simplify string_list memory handling
  bisect: fix misuse of `refs_for_each_ref_in()`
  pack-bitmap: fix bug with exact ref match in "pack.preferBitmapTips"
  pack-bitmap: deduplicate logic to iterate over preferred bitmap tips
2026-02-23 14:25:43 -08:00
Junio C Hamano
4855652e76 Merge branch 'ps/pack-concat-wo-backfill' into jch
"git pack-objects --stdin-packs" with "--exclude-promisor-objects"
fetched objects that are promised, which was not wanted.  This has
been fixed.

* ps/pack-concat-wo-backfill:
  builtin/pack-objects: don't fetch objects when merging packs
2026-02-23 14:25:42 -08:00
Patrick Steinhardt
ed693078e9 pack-bitmap: deduplicate logic to iterate over preferred bitmap tips
We have two locations that iterate over the preferred bitmap tips as
configured by the user via "pack.preferBitmapTips". Both of these
callsites are subtly wrong: when the preferred bitmap tips contain an
exact refname match, then we will hit a `BUG()`.

Prepare for the fix by unifying the two callsites into a new
`for_each_preferred_bitmap_tip()` function.

This removes the last callsite of `bitmap_preferred_tips()` outside of
"pack-bitmap.c". As such, convert the function to be local to that file
only. Note that the function is still used by a second caller, so we
cannot just inline it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-19 10:41:18 -08:00
Junio C Hamano
354b8d89ac Merge branch 'rs/clean-includes'
Clean up redundant includes of header files.

* rs/clean-includes:
  remove duplicate includes
2026-02-17 13:30:42 -08:00
Patrick Steinhardt
f4eff7116d builtin/pack-objects: don't fetch objects when merging packs
The "--stdin-packs" option can be used to merge objects from multiple
packfiles given via stdin into a new packfile. One big upside of this
option is that we don't have to perform a complete rev walk to enumerate
objects. Instead, we can simply enumerate all objects that are part of
the specified packfiles, which can be significantly faster in very large
repositories.

There is one downside though: when we don't perform a rev walk we also
don't have a good way to learn about the respective object's names. As a
consequence, we cannot use the name hashes as a heuristic to get better
delta selection.

We try to offset this downside though by performing a localized rev
walk: we queue all objects that we're about to repack as interesting,
and all objects from excluded packfiles as uninteresting. We then
perform a best-effort rev walk that allows us to fill in object names.

There is one gotcha here though: when "--exclude-promisor-objects" has
not been given we will perform backfill fetches for any promised objects
that are missing. This used to not be an issue though as this option was
mutually exclusive with "--stdin-packs". But that has changed recently,
and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with
geometric repacking, 2026-01-05) we will now repack promisor packs
during geometric compaction. The consequence is that a geometric repack
may now perform a bunch of backfill fetches.

We of course cannot pass "--exclude-promisor-objects" to fix this
issue -- after all, the whole intent is to repack objects part of a
promisor pack. But arguably we don't have to: the rev walk is intended
as best effort, and we already configure it to ignore missing links to
other objects. So we can adapt the walk to unconditionally disable
fetching any missing objects.

Do so and add a test that verifies we don't backfill any objects.

Reported-by: Lukas Wanko <lwanko@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-11 09:44:09 -08:00
René Scharfe
10c68d2577 remove duplicate includes
The following command reports that some header files are included twice:

   $ git grep '#include' '*.c' | sort | uniq -cd

Remove the second #include line in each case, as it has no effect.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-08 15:03:06 -08:00
Amisha Chhajed
2e711acfbd string-list: add string_list_sort_u() that mimics "sort -u"
Many callsites of string_list_remove_duplicates() call it
immdediately after calling string_list_sort(), understandably
as the former requires string-list to be sorted, it is clear
that these places are sorting only to remove duplicates and
for no other reason.

Introduce a helper function string_list_sort_u that combines
these two calls that often appear together, to simplify
these callsites. Replace the current calls of those methods with
string_list_sort_u().

Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-29 09:32:50 -08:00
Patrick Steinhardt
dd097bbe29 builtin/pack-objects: use packfile_store_for_each_object()
When enumerating objects that are supposed to be stored in a new cruft
pack we use `for_each_packed_object()` and then derive each object's
mtime individually. Refactor this logic to instead use the new
`packfile_store_for_each_object()` function with an object info request
that asks for the respective mtimes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:08 -08:00
Patrick Steinhardt
bd1855b897 odb: rename FOR_EACH_OBJECT_* flags
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:06 -08:00
Junio C Hamano
070fa41675 Merge branch 'ps/geometric-repacking-with-promisor-remotes'
"git repack --geometric" did not work with promisor packs, which
has been corrected.

* ps/geometric-repacking-with-promisor-remotes:
  builtin/repack: handle promisor packs with geometric repacking
  repack-promisor: extract function to remove redundant packs
  repack-promisor: extract function to finalize repacking
  repack-geometry: extract function to compute repacking split
  builtin/pack-objects: exclude promisor objects with "--stdin-packs"
2026-01-21 16:16:27 -08:00
Junio C Hamano
bc5cbbe246 Merge branch 'ps/read-object-info-improvements'
The object-info API has been cleaned up.

* ps/read-object-info-improvements:
  packfile: drop repository parameter from `packed_object_info()`
  packfile: skip unpacking object header for disk size requests
  packfile: disentangle return value of `packed_object_info()`
  packfile: always populate pack-specific info when reading object info
  packfile: extend `is_delta` field to allow for "unknown" state
  packfile: always declare object info to be OI_PACKED
  object-file: always set OI_LOOSE when reading object info
2026-01-21 08:29:00 -08:00
Junio C Hamano
d627023d80 Merge branch 'ps/packfile-store-in-odb-source'
The packfile_store data structure is moved from object store to odb
source.

* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
2026-01-21 08:28:59 -08:00
Junio C Hamano
ec16dde5c8 Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-object
* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
  odb: properly close sources before freeing them
  builtin/gc: fix condition for whether to write commit graphs
2026-01-15 05:50:16 -08:00
Junio C Hamano
c8e1706e8d Merge branch 'ps/read-object-info-improvements' into ps/odb-for-each-object
* ps/read-object-info-improvements:
  packfile: drop repository parameter from `packed_object_info()`
  packfile: skip unpacking object header for disk size requests
  packfile: disentangle return value of `packed_object_info()`
  packfile: always populate pack-specific info when reading object info
  packfile: extend `is_delta` field to allow for "unknown" state
  packfile: always declare object info to be OI_PACKED
  object-file: always set OI_LOOSE when reading object info
2026-01-15 05:47:47 -08:00
Patrick Steinhardt
0cd306ebc8 builtin/pack-objects: exclude promisor objects with "--stdin-packs"
It is currently not possible to combine "--exclude-promisor-objects"
with "--stdin-packs" because both flags want to set up a revision walk
to enumerate the objects to pack. In a subsequent commit though we want
to extend geometric repacks to support promisor objects, and for that we
need to handle the combination of both flags.

There are two cases we have to think about here:

  - "--stdin-packs" asks us to pack exactly the objects part of the
    specified packfiles. It is somewhat questionable what to do in the
    case where the user asks us to exclude promisor objects, but at the
    same time explicitly passes a promisor pack to us. For now, we
    simply abort the request as it is self-contradicting. As we have
    also been dying before this commit there is no regression here.

  - "--stdin-packs=follow" does the same as the first flag, but it also
    asks us to include all objects transitively reachable from any
    object in the packs we are about to repack. This is done by doing
    the revision walk mentioned further up. Luckily, fixing this case is
    trivial: we only need to modify the revision walk to also set the
    `exclude_promisor_objects` field.

Note that we do not support the "--exclude-promisor-objects-best-effort"
flag for now as we don't need it to support geometric repacking with
promisor objects.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-14 06:29:24 -08:00
Patrick Steinhardt
12d3b58b55 packfile: drop repository parameter from packed_object_info()
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.

Drop the redundant parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:51:15 -08:00
Patrick Steinhardt
84f0e60b28 packfile: move packfile store into object source
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.

Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.

Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.

Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:07 -08:00
Patrick Steinhardt
085de91b95 packfile: refactor kept-pack cache to work with packfile stores
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.

Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.

Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:06 -08:00
René Scharfe
b6e4cc8c32 tag: support arbitrary repositories in parse_tag()
Allow callers of parse_tag() pass in the repository to use.  Let most of
them pass in the_repository to get the same result as before.  One of
them has stopped using the_repository in ef9b0370da (sha1-name.c: store
and use repo in struct disambiguate_state, 2019-04-16); let it pass in
its stored repository.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-29 22:02:54 +09:00
Junio C Hamano
dbe54273a7 Merge branch 'ps/object-read-stream'
The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.

* ps/object-read-stream:
  streaming: drop redundant type and size pointers
  streaming: move into object database subsystem
  streaming: refactor interface to be object-database-centric
  streaming: move logic to read packed objects streams into backend
  streaming: move logic to read loose objects streams into backend
  streaming: make the `odb_read_stream` definition public
  streaming: get rid of `the_repository`
  streaming: rely on object sources to create object stream
  packfile: introduce function to read object info from a store
  streaming: move zlib stream into backends
  streaming: create structure for filtered object streams
  streaming: create structure for packed object streams
  streaming: create structure for loose object streams
  streaming: create structure for in-core object streams
  streaming: allocate stream inside the backend-specific logic
  streaming: explicitly pass packfile info when streaming a packed object
  streaming: propagate final object type via the stream
  streaming: drop the `open()` callback function
  streaming: rename `git_istream` into `odb_read_stream`
2025-12-16 11:08:34 +09:00
Junio C Hamano
f1799202ea Merge branch 'ps/object-read-stream' into ps/packfile-store-in-odb-source
* ps/object-read-stream:
  streaming: drop redundant type and size pointers
  streaming: move into object database subsystem
  streaming: refactor interface to be object-database-centric
  streaming: move logic to read packed objects streams into backend
  streaming: move logic to read loose objects streams into backend
  streaming: make the `odb_read_stream` definition public
  streaming: get rid of `the_repository`
  streaming: rely on object sources to create object stream
  packfile: introduce function to read object info from a store
  streaming: move zlib stream into backends
  streaming: create structure for filtered object streams
  streaming: create structure for packed object streams
  streaming: create structure for loose object streams
  streaming: create structure for in-core object streams
  streaming: allocate stream inside the backend-specific logic
  streaming: explicitly pass packfile info when streaming a packed object
  streaming: propagate final object type via the stream
  streaming: drop the `open()` callback function
  streaming: rename `git_istream` into `odb_read_stream`
2025-12-15 17:40:31 +09:00
Junio C Hamano
a545103244 Merge branch 'ps/object-source-loose'
A part of code paths that deals with loose objects has been cleaned
up.

* ps/object-source-loose:
  object-file: refactor writing objects via a stream
  object-file: rename `write_object_file()`
  object-file: refactor freshening of objects
  object-file: rename `has_loose_object()`
  object-file: read objects via the loose object source
  object-file: move loose object map into loose source
  object-file: hide internals when we need to reprepare loose sources
  object-file: move loose object cache into loose source
  object-file: introduce `struct odb_source_loose`
  object-file: move `fetch_if_missing`
  odb: adjust naming to free object sources
  odb: introduce `odb_source_new()`
  odb: fix subtle logic to check whether an alternate is usable
2025-11-24 15:46:41 -08:00
Patrick Steinhardt
7b94028652 streaming: drop redundant type and size pointers
In the preceding commits we have turned `struct odb_read_stream` into a
publicly visible structure. Furthermore, this structure now contains the
type and size of the object that we are about to stream. Consequently,
the out-pointers that we used before to propagate the type and size of
the streamed object are now somewhat redundant with the data contained
in the structure itself.

Drop these out-pointers and adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23 12:56:46 -08:00
Patrick Steinhardt
1599b68d5e streaming: move into object database subsystem
The "streaming" terminology is somewhat generic, so it may not be
immediately obvious that "streaming.{c,h}" is specific to the object
database. Rectify this by moving it into the "odb/" directory so that it
can be immediately attributed to the object subsystem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23 12:56:46 -08:00
Patrick Steinhardt
378ec56beb streaming: refactor interface to be object-database-centric
Refactor the streaming interface to be centered around object databases
instead of centered around the repository. Rename the functions
accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23 12:56:45 -08:00
Patrick Steinhardt
6bdda3a3b0 streaming: rename git_istream into odb_read_stream
In the following patches we are about to make the `git_istream` more
generic so that it becomes fully controlled by the specific object
source that wants to create it. As part of these refactorings we'll
fully move the structure into the object database subsystem.

Prepare for this change by renaming the structure from `git_istream`
to `odb_read_stream`. This mirrors the `odb_write_stream` structure that
we already have.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23 12:56:44 -08:00
Junio C Hamano
13134cecb0 Merge branch 'ps/ref-peeled-tags'
Some ref backend storage can hold not just the object name of an
annotated tag, but the object name of the object the tag points at.
The code to handle this information has been streamlined.

* ps/ref-peeled-tags:
  t7004: do not chdir around in the main process
  ref-filter: fix stale parsed objects
  ref-filter: parse objects on demand
  ref-filter: detect broken tags when dereferencing them
  refs: don't store peeled object IDs for invalid tags
  object: add flag to `peel_object()` to verify object type
  refs: drop infrastructure to peel via iterators
  refs: drop `current_ref_iter` hack
  builtin/show-ref: convert to use `reference_get_peeled_oid()`
  ref-filter: propagate peeled object ID
  upload-pack: convert to use `reference_get_peeled_oid()`
  refs: expose peeled object ID via the iterator
  refs: refactor reference status flags
  refs: fully reset `struct ref_iterator::ref` on iteration
  refs: introduce `.ref` field for the base iterator
  refs: introduce wrapper struct for `each_ref_fn`
2025-11-19 10:55:39 -08:00
Patrick Steinhardt
f898661637 refs: expose peeled object ID via the iterator
Both the "files" and "reftable" backend are able to store peeled values
for tags in the respective formats. This allows for a more efficient
lookup of the target object of such a tag without having to manually
peel via the object database.

The infrastructure to access these peeled object IDs is somewhat funky
though. When iterating through objects, we store a pointer reference to
the current iterator in a global variable. The callbacks invoked by that
iterator are then expected to call `peel_iterated_oid()`, which checks
whether the globally-stored iterator's current reference refers to the
one handed into that function. If so, we ask the iterator to peel the
object, otherwise we manually peel the object via the object database.
Depending on global state like this is somewhat weird and also quite
fragile.

Introduce a new `struct reference::peeled_oid` field that can be
populated by the reference backends. This field can be accessed via a
new function `reference_get_peeled_oid()` that either uses that value,
if set, or alternatively peels via the ODB. With this change we don't
have to rely on global state anymore, but make the peeled object ID
available to the callback functions directly.

Adjust trivial callers that already have a `struct reference` available.
Remaining callers will be adjusted in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
bdbebe5714 refs: introduce wrapper struct for each_ref_fn
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.

It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.

One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.

Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.

[1]: <ZmarVcF5JjsZx0dl@tanuki>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:24 -08:00
Patrick Steinhardt
05130c6c9e object-file: rename has_loose_object()
Rename `has_loose_object()` to `odb_source_loose_has_object()` so that
it becomes clear that this is tied to a specific loose object source.
This matches our modern naming schema for functions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03 12:18:47 -08:00
Junio C Hamano
5554738038 Merge branch 'ps/remove-packfile-store-get-packs'
Two slightly different ways to get at "all the packfiles" in API
has been cleaned up.

* ps/remove-packfile-store-get-packs:
  packfile: rename `packfile_store_get_all_packs()`
  packfile: introduce macro to iterate through packs
  packfile: drop `packfile_store_get_packs()`
  builtin/grep: simplify how we preload packs
  builtin/gc: convert to use `packfile_store_get_all_packs()`
  object-name: convert to use `packfile_store_get_all_packs()`
2025-10-30 08:00:19 -07:00
Patrick Steinhardt
c31bad4f7d packfile: track packs via the MRU list exclusively
We track packfiles via two different lists:

  - `struct packfile_store::packs` is a list that sorts local packs
    first. In addition, these packs are sorted so that younger packs are
    sorted towards the front.

  - `struct packfile_store::mru` is a list that sorts packs so that
    most-recently used packs are at the front.

The reasoning behind the ordering in the `packs` list is that younger
objects stored in the local object store tend to be accessed more
frequently, and that is certainly true for some cases. But there are
going to be lots of cases where that isn't true. Especially when
traversing history it is likely that one needs to access many older
objects, and due to our housekeeping it is very likely that almost all
of those older objects will be contained in one large pack that is
oldest.

So whether or not the ordering makes sense really depends on the use
case at hand. A flexible approach like our MRU list addresses that need,
as it will sort packs towards the front that are accessed all the time.
Intuitively, this approach is thus able to satisfy more use cases more
efficiently.

This reasoning casts some doubt on whether or not it really makes sense
to track packs via two different lists. It causes confusion, and it is
not clear whether there are use cases where the `packs` list really is
such an obvious choice.

Merge these two lists into one most-recently-used list.

Note that there is one important edge case: `for_each_packed_object()`
uses the MRU list to iterate through packs, and then it lists each
object in those packs. This would have the effect that we now sort the
current pack towards the front, thus modifying the list of packfiles we
are iterating over, with the consequence that we'll see an infinite
loop. This edge case is worked around by introducing a new field that
allows us to skip updating the MRU.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30 07:09:53 -07:00
Patrick Steinhardt
0d0e4b5954 builtin/pack-objects: simplify logic to find kept or nonlocal objects
The function `has_sha1_pack_kept_or_nonlocal()` takes an object ID and
then searches through packed objects to figure out whether the object
exists in a kept or non-local pack. As a performance optimization we
remember the packfile that contains a given object ID so that the next
call to the function first checks that same packfile again.

The way this is written is rather hard to follow though, as the caching
mechanism is intertwined with the loop that iterates through the packs.
Consequently, we need to do some gymnastics to re-start the iteration if
the cached pack does not contain the objects.

Refactor this so that we check the cached packfile at the beginning. We
don't have to re-verify whether the packfile meets the properties as we
have already verified those when storing the pack in `last_found` in the
first place. So all we need to do is to use `find_pack_entry_one()` to
check whether the pack contains the object ID, and to skip the cached
pack in the loop so that we don't search it twice.

Furthermore, stop using the `(void *)1` sentinel value and instead use a
simple `NULL` pointer to indicate that we don't have a last-found pack
yet.

This refactoring significantly simplifies the logic and makes it much
easier to follow.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30 07:09:53 -07:00
Patrick Steinhardt
f905a855b1 packfile: move the MRU list into the packfile store
Packfiles have two lists associated to them:

  - A list that keeps track of packfiles in the order that they were
    added to a packfile store.

  - A list that keeps track of packfiles in most-recently-used order so
    that packfiles that are more likely to contain a specific object are
    ordered towards the front.

Both of these lists are hosted by `struct packed_git` itself, So to
identify all packfiles in a repository you simply need to grab the first
packfile and then iterate the `->next` pointers or the MRU list. This
pattern has the problem that all packfiles are part of the same list,
regardless of whether or not they belong to the same object source.

With the upcoming pluggable object database effort this needs to change:
packfiles should be contained by a single object source, and reading an
object from any such packfile should use that source to look up the
object. Consequently, we need to break up the global lists of packfiles
into per-object-source lists.

A first step towards this goal is to move those lists out of `struct
packed_git` and into the packfile store. While the packfile store is
currently sitting on the `struct object_database` level, the intent is
to push it down one level into the `struct odb_source` in a subsequent
patch series.

Introduce a new `struct packfile_list` that is used to manage lists of
packfiles and use it to store the list of most-recently-used packfiles
in `struct packfile_store`. For now, the new list type is only used in a
single spot, but we'll expand its usage in subsequent patches.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30 07:09:52 -07:00
Junio C Hamano
ed3305fff7 Merge branch 'ps/remove-packfile-store-get-packs' into ps/packed-git-in-object-store
* ps/remove-packfile-store-get-packs: (55 commits)
  packfile: rename `packfile_store_get_all_packs()`
  packfile: introduce macro to iterate through packs
  packfile: drop `packfile_store_get_packs()`
  builtin/grep: simplify how we preload packs
  builtin/gc: convert to use `packfile_store_get_all_packs()`
  object-name: convert to use `packfile_store_get_all_packs()`
  builtin/repack.c: clean up unused `#include`s
  repack: move `write_cruft_pack()` out of the builtin
  repack: move `write_filtered_pack()` out of the builtin
  repack: move `pack_kept_objects` to `struct pack_objects_args`
  repack: move `finish_pack_objects_cmd()` out of the builtin
  builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
  repack: extract `write_pack_opts_is_local()`
  repack: move `find_pack_prefix()` out of the builtin
  builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
  builtin/repack.c: introduce `struct write_pack_opts`
  repack: 'write_midx_included_packs' API from the builtin
  builtin/repack.c: inline packs within `write_midx_included_packs()`
  builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
  builtin/repack.c: inline `remove_redundant_bitmaps()`
  ...
2025-10-28 10:00:56 -07:00
Patrick Steinhardt
ecad863c12 packfile: rename packfile_store_get_all_packs()
In a preceding commit we have removed `packfile_store_get_packs()`. With
this function removed it's somewhat useless to still have the "all"
infix in `packfile_store_get_all_packs()`. Rename the latter to drop
that infix.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 14:42:40 -07:00
Patrick Steinhardt
86d8c62f48 packfile: introduce macro to iterate through packs
We have a bunch of different sites that want to iterate through all
packs of a given `struct packfile_store`. This pattern is somewhat
verbose and repetitive, which makes it somewhat cumbersome.

Introduce a new macro `repo_for_each_pack()` that removes some of the
boilerplate.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 14:42:39 -07:00
Junio C Hamano
057a94fbbb Merge branch 'tb/incremental-midx-part-3.1' into ps/remove-packfile-store-get-packs
* tb/incremental-midx-part-3.1: (64 commits)
  builtin/repack.c: clean up unused `#include`s
  repack: move `write_cruft_pack()` out of the builtin
  repack: move `write_filtered_pack()` out of the builtin
  repack: move `pack_kept_objects` to `struct pack_objects_args`
  repack: move `finish_pack_objects_cmd()` out of the builtin
  builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
  repack: extract `write_pack_opts_is_local()`
  repack: move `find_pack_prefix()` out of the builtin
  builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
  builtin/repack.c: introduce `struct write_pack_opts`
  repack: 'write_midx_included_packs' API from the builtin
  builtin/repack.c: inline packs within `write_midx_included_packs()`
  builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
  builtin/repack.c: inline `remove_redundant_bitmaps()`
  builtin/repack.c: reorder `remove_redundant_bitmaps()`
  repack: keep track of MIDX pack names using existing_packs
  builtin/repack.c: use a string_list for 'midx_pack_names'
  builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
  builtin/repack.c: remove ref snapshotting from builtin
  repack: remove pack_geometry API from the builtin
  ...
2025-10-16 14:42:27 -07:00
Junio C Hamano
fbd67ab9a4 Merge branch 'ps/odb-clean-stale-wrappers'
Code clean-up.

* ps/odb-clean-stale-wrappers:
  odb: drop deprecated wrapper functions
2025-10-07 12:25:28 -07:00
Junio C Hamano
8c13c31404 Merge branch 'ps/packfile-store'
Code clean-up around the in-core list of all the pack files and
object database(s).

* ps/packfile-store:
  packfile: refactor `get_packed_git_mru()` to work on packfile store
  packfile: refactor `get_all_packs()` to work on packfile store
  packfile: refactor `get_packed_git()` to work on packfile store
  packfile: move `get_multi_pack_index()` into "midx.c"
  packfile: introduce function to load and add packfiles
  packfile: refactor `install_packed_git()` to work on packfile store
  packfile: split up responsibilities of `reprepare_packed_git()`
  packfile: refactor `prepare_packed_git()` to work on packfile store
  packfile: reorder functions to avoid function declaration
  odb: move kept cache into `struct packfile_store`
  odb: move MRU list of packfiles into `struct packfile_store`
  odb: move packfile map into `struct packfile_store`
  odb: move initialization bit into `struct packfile_store`
  odb: move list of packfiles into `struct packfile_store`
  packfile: introduce a new `struct packfile_store`
2025-10-07 12:25:27 -07:00
Junio C Hamano
4bac57bc67 Merge branch 'jk/setup-revisions-freefix'
There are double frees and leaks around setup_revisions() API used
in "git stash show", which has been fixed, and setup_revisions()
API gained a wrapper to make it more ergonomic when using it with
strvec-manged argc/argv pairs.

* jk/setup-revisions-freefix:
  revision: retain argv NULL invariant in setup_revisions()
  treewide: pass strvecs around for setup_revisions_from_strvec()
  treewide: use setup_revisions_from_strvec() when we have a strvec
  revision: add wrapper to setup_revisions() from a strvec
  revision: manage memory ownership of argv in setup_revisions()
  stash: tell setup_revisions() to free our allocated strings
2025-09-29 11:40:34 -07:00
Patrick Steinhardt
dd52a29b78 packfile: refactor get_packed_git_mru() to work on packfile store
The `get_packed_git_mru()` function prepares the packfile store and then
returns its packfiles in most-recently-used order. Refactor it to accept
a packfile store instead of a repository to clarify its scope.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24 11:53:51 -07:00
Patrick Steinhardt
d2779beb36 packfile: refactor get_all_packs() to work on packfile store
The `get_all_packs()` function prepares the packfile store and then
returns its packfiles. Refactor it to accept a packfile store instead of
a repository to clarify its scope.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24 11:53:51 -07:00
Jeff King
18068139f2 treewide: pass strvecs around for setup_revisions_from_strvec()
The previous commit converted callers of setup_revisions() with a strvec
to use the safer and easier _from_strvec() variant.

Let's now convert spots that don't directly have a strvec, but receive
an argc/argv pair that eventually comes from one. We'll instead pass the
strvec down to the point where we call setup_revisions().

That makes these functions slightly less flexible if they were to grow
other callers that don't use strvecs, but this rigidity is buying us
some safety. It is only safe to pass the free_removed_argv_elements
option to setup_revisions() if we know the elements of argv/argc are
allocated on the heap. That isn't communicated in the type system when
we are passed the bare elements. But if we get a strvec, we know that
the elements are allocated strings.

And at any rate, each of these modified functions has only a single
caller (that has a strvec), so the loss of flexibility is unlikely to
ever matter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22 14:27:03 -07:00
Junio C Hamano
c31a276f12 Merge branch 'ps/object-store-midx-dedup-info'
Further code clean-up for multi-pack-index code paths.

* ps/object-store-midx-dedup-info:
  midx: compute paths via their source
  midx: stop duplicating info redundant with its owning source
  midx: write multi-pack indices via their source
  midx: load multi-pack indices via their source
  midx: drop redundant `struct repository` parameter
  odb: simplify calling `link_alt_odb_entry()`
  odb: return newly created in-memory sources
  odb: consistently use "dir" to refer to alternate's directory
  odb: allow `odb_find_source()` to fail
  odb: store locality in object database sources
2025-09-12 10:41:18 -07:00
Patrick Steinhardt
e1d062e8ba odb: drop deprecated wrapper functions
In the Git 2.51 release cycle we've refactored the object database layer
to access objects via `struct object_database` directly. To make the
transition a bit easier we have retained some of the old-style functions
in case those were widely used.

Now that Git 2.51 has been released it's time to clean up though and
drop these old wrappers. Do so and adapt the small number of newly added
users to use the new functions instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-11 09:10:28 -07:00
Junio C Hamano
0b71555742 Merge branch 'ps/object-store-midx-dedup-info' into ps/packfile-store
* ps/object-store-midx-dedup-info:
  midx: compute paths via their source
  midx: stop duplicating info redundant with its owning source
  midx: write multi-pack indices via their source
  midx: load multi-pack indices via their source
  midx: drop redundant `struct repository` parameter
  odb: simplify calling `link_alt_odb_entry()`
  odb: return newly created in-memory sources
  odb: consistently use "dir" to refer to alternate's directory
  odb: allow `odb_find_source()` to fail
  odb: store locality in object database sources
2025-09-02 09:38:03 -07:00
Patrick Steinhardt
9ff2129615 midx: drop redundant struct repository parameter
There are a couple of functions that take both a `struct repository` and
a `struct multi_pack_index`. This provides redundant information though
without much benefit given that the multi-pack index already has a
pointer to its owning repository.

Drop the `struct repository` parameter from such functions. While at it,
reorder the list of parameters of `fill_midx_entry()` so that the MIDX
comes first to better align with our coding guidelines.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11 09:22:22 -07:00
Junio C Hamano
4ce0caa7cc Merge branch 'ps/object-file-wo-the-repository'
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.

* ps/object-file-wo-the-repository:
  object-file: get rid of `the_repository` in index-related functions
  object-file: get rid of `the_repository` in `force_object_loose()`
  object-file: get rid of `the_repository` in `read_loose_object()`
  object-file: get rid of `the_repository` in loose object iterators
  object-file: remove declaration for `for_each_file_in_obj_subdir()`
  object-file: inline `for_each_loose_file_in_objdir_buf()`
  object-file: get rid of `the_repository` when writing objects
  odb: introduce `odb_write_object()`
  loose: write loose objects map via their source
  object-file: get rid of `the_repository` in `finalize_object_file()`
  object-file: get rid of `the_repository` in `loose_object_info()`
  object-file: get rid of `the_repository` when freshening objects
  object-file: inline `check_and_freshen()` functions
  object-file: get rid of `the_repository` in `has_loose_object()`
  object-file: stop using `the_hash_algo`
  object-file: fix -Wsign-compare warnings
2025-08-05 11:53:55 -07:00