Commit Graph

12684 Commits

Author SHA1 Message Date
Junio C Hamano
72801dfde1 Merge branch 'ua/update-update-server-info'
Code simplification.

* ua/update-update-server-info:
  builtin/update-server-info: remove unnecessary if statement
2025-04-17 10:28:19 -07:00
Junio C Hamano
c3ebf18eb2 Merge branch 'en/merge-recursive-debug'
Remove remnants of the recursive merge strategy backend, which was
superseded by the ort merge strategy.

* en/merge-recursive-debug:
  builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM
  tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm
  merge-recursive.[ch]: thoroughly debug these
  merge, sequencer: switch recursive merges over to ort
  sequencer: switch non-recursive merges over to ort
  merge-ort: enable diff-algorithms other than histogram
  builtin/merge-recursive: switch to using merge_ort_generic()
  checkout: replace merge_trees() with merge_ort_nonrecursive()
2025-04-17 10:28:18 -07:00
Junio C Hamano
fe7ae3b87e Merge branch 'kn/blame-porcelain-unblamable'
"git blame --porcelain" mode now talks about unblamable lines and
lines that are blamed to an ignored commit.

* kn/blame-porcelain-unblamable:
  blame: print unblamable and ignored commits in porcelain mode
2025-04-17 10:28:18 -07:00
Junio C Hamano
b45113f581 Merge branch 'jk/fetch-follow-remote-head-fix'
"git fetch [<remote>]" with only the configured fetch refspec
should be the only thing to update refs/remotes/<remote>/HEAD,
but the code was overly eager to do so in other cases.

* jk/fetch-follow-remote-head-fix:
  fetch: make set_head() call easier to read
  fetch: don't ask for remote HEAD if followRemoteHEAD is "never"
  fetch: only respect followRemoteHEAD with configured refspecs
2025-04-17 10:28:17 -07:00
Junio C Hamano
a271b05066 Merge branch 'ps/cat-file-filter-batch'
"git cat-file --batch" and friends learned to allow "--filter=" to
omit certain objects, just like the transport layer does.

* ps/cat-file-filter-batch:
  builtin/cat-file: use bitmaps to efficiently filter by object type
  builtin/cat-file: deduplicate logic to iterate over all objects
  pack-bitmap: introduce function to check whether a pack is bitmapped
  pack-bitmap: add function to iterate over filtered bitmapped objects
  pack-bitmap: allow passing payloads to `show_reachable_fn()`
  builtin/cat-file: support "object:type=" objects filter
  builtin/cat-file: support "blob:limit=" objects filter
  builtin/cat-file: support "blob:none" objects filter
  builtin/cat-file: wire up an option to filter objects
  builtin/cat-file: introduce function to report object status
  builtin/cat-file: rename variable that tracks usage
2025-04-16 13:54:21 -07:00
Junio C Hamano
47478802da Merge branch 'kn/non-transactional-batch-updates'
Updating multiple references have only been possible in all-or-none
fashion with transactions, but it can be more efficient to batch
multiple updates even when some of them are allowed to fail in a
best-effort manner.  A new "best effort batches of updates" mode
has been introduced.

* kn/non-transactional-batch-updates:
  update-ref: add --batch-updates flag for stdin mode
  refs: support rejection in batch updates during F/D checks
  refs: implement batch reference update support
  refs: introduce enum-based transaction error types
  refs/reftable: extract code from the transaction preparation
  refs/files: remove duplicate duplicates check
  refs: move duplicate refname update check to generic layer
  refs/files: remove redundant check in split_symref_update()
2025-04-16 13:54:19 -07:00
Junio C Hamano
01a6e244f9 Merge branch 'ps/maintenance-reflog-expire'
"git maintenance" learns a new task to expire reflog entries.

* ps/maintenance-reflog-expire:
  builtin/maintenance: introduce "reflog-expire" task
  builtin/gc: split out function to expire reflog entries
  builtin/reflog: make functions regarding `reflog_expire_options` public
  builtin/reflog: stop storing per-reflog expiry dates globally
  builtin/reflog: stop storing default reflog expiry dates globally
  reflog: rename `cmd_reflog_expire_cb` to `reflog_expire_options`
2025-04-16 13:54:19 -07:00
Junio C Hamano
1a1661bd41 Merge branch 'jt/rev-list-z'
"git rev-list" learns machine-parsable output format that delimits
each field with NUL.

* jt/rev-list-z:
  rev-list: support NUL-delimited --missing option
  rev-list: support NUL-delimited --boundary option
  rev-list: support delimiting objects with NUL bytes
  rev-list: refactor early option parsing
  rev-list: inline `show_object_with_name()` in `show_object()`
2025-04-16 13:54:18 -07:00
Junio C Hamano
743d3a54f2 Merge branch 'ab/rm-sign-compare'
Some warnings from "-Wsign-compare" for builtin/rm.c have been
squelched.

* ab/rm-sign-compare:
  rm: fix sign comparison warnings
2025-04-16 13:54:17 -07:00
Junio C Hamano
518ed014f6 Merge branch 'jt/ref-transaction-abort-fix'
A ref transaction corner case fix.

* jt/ref-transaction-abort-fix:
  builtin/fetch: avoid aborting closed reference transaction
2025-04-16 13:54:17 -07:00
Junio C Hamano
7b03646f85 Merge branch 'js/comma-semicolon-confusion'
Code clean-up.

* js/comma-semicolon-confusion:
  detect-compiler: detect clang even if it found CUDA
  clang: warn when the comma operator is used
  compat/regex: explicitly mark intentional use of the comma operator
  wildmatch: avoid using of the comma operator
  diff-delta: avoid using the comma operator
  xdiff: avoid using the comma operator unnecessarily
  clar: avoid using the comma operator unnecessarily
  kwset: avoid using the comma operator unnecessarily
  rebase: avoid using the comma operator unnecessarily
  remote-curl: avoid using the comma operator unnecessarily
2025-04-15 13:50:17 -07:00
Junio C Hamano
a8c207797f Merge branch 'jt/clone-guess-remote-head-fix'
"git clone" still gave the message about the default branch name;
this message has been turned into an advice message that can be
turned off.

* jt/clone-guess-remote-head-fix:
  advice: allow disabling default branch name advice
  builtin/clone: suppress unexpected default branch advice
  remote: allow `guess_remote_head()` to suppress advice
2025-04-15 13:50:16 -07:00
Junio C Hamano
d690c44846 Merge branch 'ds/maintenance-loose-objects-batchsize'
The job to coalesce loose objects into packfiles in "git
maintenance" now has configurable batch size.

* ds/maintenance-loose-objects-batchsize:
  maintenance: add loose-objects.batchSize config
  maintenance: force progress/no-quiet to children
2025-04-15 13:50:16 -07:00
Junio C Hamano
03633a288c Merge branch 'kn/reflog-drop'
"git reflog" learns "drop" subcommand, that discards the entire
reflog data for a ref.

* kn/reflog-drop:
  reflog: implement subcommand to drop reflogs
  reflog: improve error for when reflog is not found
2025-04-15 13:50:15 -07:00
Junio C Hamano
ee847e0034 Merge branch 'ps/object-wo-the-repository'
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.

* ps/object-wo-the-repository:
  hash: stop depending on `the_repository` in `null_oid()`
  hash: fix "-Wsign-compare" warnings
  object-file: split out logic regarding hash algorithms
  delta-islands: stop depending on `the_repository`
  object-file-convert: stop depending on `the_repository`
  pack-bitmap-write: stop depending on `the_repository`
  pack-revindex: stop depending on `the_repository`
  pack-check: stop depending on `the_repository`
  environment: move access to "core.bigFileThreshold" into repo settings
  pack-write: stop depending on `the_repository` and `the_hash_algo`
  object: stop depending on `the_repository`
  csum-file: stop depending on `the_repository`
2025-04-15 13:50:15 -07:00
Jeff King
f9356f9cb4 fetch: make set_head() call easier to read
We ignore any error returned from set_head(), but 638060dcb9 (fetch
set_head: refactor to use remote directly, 2025-01-26) left its call in
a noop "if" conditional as a sort of note-to-self.

When c834d1a7ce (fetch: only respect followRemoteHEAD with configured
refspecs, 2025-03-18) added a "do_set_head" flag, it was rolled into the
same conditional, putting set_head() on the right-hand side of a
short-circuit AND.

That's not wrong, but it really hides the point of the line, which
is (maybe) calling the function.

Instead, let's have a full if() block for the flag, and then our comment
(with some rewording) will be sufficient to clarify the error handling.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-09 09:03:47 -07:00
Usman Akinyemi
9ec327d922 builtin/update-server-info: remove unnecessary if statement
Since we already teach the `repo_config()` in f29f1990 (config:
teach repo_config to allow `repo` to be NULL, 2025-03-08) to allow
`repo` to be NULL, no need to check if `repo` is NULL before calling
`repo_config()`.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 14:47:37 -07:00
Elijah Newren
170e30d695 builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM
This environment variable existed to allow the testsuite to reuse all
the merge-related tests in the testsuite while easily flipping between
the 'recursive' and the 'ort' backends.  Now that we have removed
merge-recursive and remapped 'recursive' to mean 'ort', we don't need
this scaffolding anymore.  Remove it from these three builtins.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 13:59:14 -07:00
Elijah Newren
75cd9ae05f merge, sequencer: switch recursive merges over to ort
More precisely, replace calls to merge_recursive() with
merge_ort_recursive().

Also change t7615 to quit calling out recursive; it is not needed
anymore, and we are in fact using ort now.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 13:59:12 -07:00
Elijah Newren
77c029493a builtin/merge-recursive: switch to using merge_ort_generic()
Switch from merge-recursive to merge-ort.  Adjust the following
testcases due to the switch:

* t6430: most of the test differences here were due to improved D/F
  conflict handling explained in more detail in ef52778708 (merge
  tests: expect improved directory/file conflict handling in ort,
  2020-10-26).  These changes weren't made to this test back in that
  commit simply because I had been looking at `git merge` rather than
  `git merge-recursive`.  The final test in this testsuite, though, was
  expunged because it was looking for specific output, and the calls to
  output_commit_title() were discarded from merge_ort_internal() in its
  adaptation from merge_recursive_internal(); see 8119214f4e
  (merge-ort: implement merge_incore_recursive(), 2020-12-16).

* t6434: This test is built entirely around rename/delete conflicts,
  which had a suboptimal handling under merge-recursive.  As explained
  in more detail in commits 1f3c9ba707 ("t6425: be more flexible with
  rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404,
  t6423: expect improved rename/delete handling in ort backend",
  2020-10-26), rename/delete conflicts should each have two entries in
  the index rather than just one.  Adjust the expectations for all the
  tests in this testcase to see the two entries per rename/delete
  conflict.

* t6424: merge-recursive had a special check-if-toplevel-trees-match
  check that it ran at the beginning on both the merge-base and the
  other side being merged in.  In such a case, it exited early and
  printed an "Already up to date." message.  merge-ort got rid of
  this, and instead checks the merge base tree matching the other
  side throughout the tree instead of just at the toplevel, allowing
  it to avoid recursing into various subtrees.  As part of that, it
  got rid of the specialty toplevel message.  That message hasn't
  been missed for years from `git merge`, so I don't think it is
  necessary to keep it just for `git merge-recursive`, especially
  since the latter is rarely used.  (git itself only references it
  in the testsuite, whereas it used to power one of the three
  rebase backends that existed once upon a time.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 13:59:11 -07:00
Elijah Newren
b5dff2bd61 checkout: replace merge_trees() with merge_ort_nonrecursive()
Replace the use of merge_trees() from merge-recursive.[ch] with the
merge-ort equivalent.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 13:59:11 -07:00
Junio C Hamano
23ee5065c2 Merge branch 'tb/incremental-midx-part-2'
Incrementally updating multi-pack index files.

* tb/incremental-midx-part-2:
  midx: implement writing incremental MIDX bitmaps
  pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
  pack-bitmap.c: keep track of each layer's type bitmaps
  ewah: implement `struct ewah_or_iterator`
  pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
  pack-bitmap.c: compute disk-usage with incremental MIDXs
  pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
  pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
  pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
  pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
  pack-bitmap.c: open and store incremental bitmap layers
  pack-revindex: prepare for incremental MIDX bitmaps
  Documentation: describe incremental MIDX bitmaps
  Documentation: remove a "future work" item from the MIDX docs
2025-04-08 11:43:14 -07:00
Junio C Hamano
c6b3824a19 Merge branch 'tb/refspec-fetch-cleanup'
Code clean-up.

* tb/refspec-fetch-cleanup:
  refspec: replace `refspec_item_init()` with fetch/push variants
  refspec: remove refspec_item_init_or_die()
  refspec: replace `refspec_init()` with fetch/push variants
  refspec: treat 'fetch' as a Boolean value
2025-04-08 11:43:13 -07:00
Karthik Nayak
221e8fcb7f update-ref: add --batch-updates flag for stdin mode
When updating multiple references through stdin, Git's update-ref
command normally aborts the entire transaction if any single update
fails. This atomic behavior prevents partial updates. Introduce a new
batch update system, where the updates the performed together similar
but individual updates are allowed to fail.

Add a new `--batch-updates` flag that allows the transaction to continue
even when individual reference updates fail. This flag can only be used
in `--stdin` mode and builds upon the batch update support added to the
refs subsystem in the previous commits. When enabled, failed updates are
reported in the following format:

  rejected SP (<old-oid> | <old-target>) SP (<new-oid> | <new-target>) SP <rejection-reason> LF

Update the documentation to reflect this change and also tests to cover
different scenarios where an update could be rejected.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:59:49 -07:00
Karthik Nayak
76e760b999 refs: introduce enum-based transaction error types
Replace preprocessor-defined transaction errors with a strongly-typed
enum `ref_transaction_error`. This change:

  - Improves type safety and function signature clarity.
  - Makes error handling more explicit and discoverable.
  - Maintains existing error cases, while adding new error cases for
    common scenarios.

This refactoring paves the way for more comprehensive error handling
which we will utilize in the upcoming commits to add batch reference
update support.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:20 -07:00
Patrick Steinhardt
8e0a1ec076 builtin/maintenance: introduce "reflog-expire" task
By default, git-maintenance(1) uses the "gc" task to ensure that the
repository is well-maintained. This can be changed, for example by
either explicitly configuring which tasks should be enabled or by using
the "incremental" maintenance strategy. If so, git-maintenance(1) does
not know to expire reflog entries, which is a subtask that git-gc(1)
knows to perform for the user. Consequently, the reflog will grow
indefinitely unless the user manually trims it.

Introduce a new "reflog-expire" task that plugs this gap:

  - When running the task directly, then we simply execute `git reflog
    expire --all`, which is the same as git-gc(1).

  - When running git-maintenance(1) with the `--auto` flag, then we only
    run the task in case the "HEAD" reflog has at least N reflog entries
    that would be discarded. By default, N is set to 100, but this can
    be configured via "maintenance.reflog-expire.auto". When a negative
    integer has been provided we always expire entries, zero causes us
    to never expire entries, and a positive value specifies how many
    entries need to exist before we consider pruning the entries.

Note that the condition for the `--auto` flags is merely a heuristic and
optimized for being fast. This is because `git maintenance run --auto`
will be executed quite regularly, so scanning through all reflogs would
likely be too expensive in many repositories.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:27 -07:00
Patrick Steinhardt
3fef24ac3f builtin/gc: split out function to expire reflog entries
We're about to introduce a new task for git-maintenance(1) that knows to
expire reflog entries. The logic will be shared with git-gc(1), which
already knows how to do this.

Pull out the common logic into a separate function so that we can share
the implementation between both builtins.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:27 -07:00
Patrick Steinhardt
d20fc193b6 builtin/reflog: make functions regarding reflog_expire_options public
Make functions that are required to manage `reflog_expire_options`
available elsewhere by moving them into "reflog.c" and exposing them in
the corresponding header. The functions will be used in a subsequent
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:27 -07:00
Patrick Steinhardt
964f364de9 builtin/reflog: stop storing per-reflog expiry dates globally
As described in the preceding commit, the per-reflog expiry dates are
stored in a global pair of variables. Refactor the code so that they are
contained in `struct reflog_expire_options` to make the structure useful
in other contexts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:26 -07:00
Patrick Steinhardt
8565827570 builtin/reflog: stop storing default reflog expiry dates globally
When expiring reflog entries, it is possible to configure expiry dates
that depend on the name of the reflog. This requires us to store a
couple of different expiry dates:

  - The default expiry date for reflog entries that aren't otherwise
    specified.

  - The per-reflog expiry date.

  - The currently active set of expiry dates for a given reference.

While the last item is stored in `struct reflog_expire_options`, the
other items aren't, which makes it hard to reuse the structure in other
places.

Refactor the code so that the default expiry date is stored as part of
the structure. The per-reflog expiry dates will be adapted accordingly
in the subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:26 -07:00
Patrick Steinhardt
2ed8008399 reflog: rename cmd_reflog_expire_cb to reflog_expire_options
We're about to expose `struct cmd_reflog_expire_cb` via "reflog.h" so
that we can also use this structure in "builtin/gc.c". Once we make it
accessible to a wider scope though it becomes awkwardly named, as it
isn't only useful in the context of a callback. Instead, the function is
containing all kinds of options relevant to whether or not a reflog
entry should be expired.

Rename the structure to `reflog_expire_options` to prepare for this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:25 -07:00
Karthik Nayak
4d253071dd blame: print unblamable and ignored commits in porcelain mode
The 'git-blame(1)' command allows users to ignore specific revisions via
the '--ignore-rev <rev>' and '--ignore-revs-file <file>' flags. These
flags are often combined with the 'blame.markIgnoredLines' and
'blame.markUnblamableLines' config options. These config options prefix
ignored and unblamable lines with a '?' and '*', respectively.

However, this option was never extended to the porcelain mode of
'git-blame(1)'. Since the documentation does not indicate this
exclusion, it is a bug.

Fix this by printing 'ignored' and 'unblamable' respectively for the
options when using the porcelain modes.

Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Toon Claes <toon@iotcl.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:50:18 -07:00
Patrick Steinhardt
8002e8ee18 builtin/cat-file: use bitmaps to efficiently filter by object type
While it is now possible to filter objects by type, this mechanism is
for now mostly a convenience. Most importantly, we still have to iterate
through the whole packfile to find all objects of a specific type. This
can be prohibitively expensive depending on the size of the packfiles.

It isn't really possible to do better than this when only considering a
packfile itself, as the order of objects is not fixed. But when we have
a packfile with a corresponding bitmap, either because the packfile
itself has one or because the multi-pack index has a bitmap for it, then
we can use these bitmaps to improve the runtime.

While bitmaps are typically used to compute reachability of objects,
they also contain one bitmap per object type that encodes which object
has what type. So instead of reading through the whole packfile(s), we
can use the bitmaps and iterate through the type-specific bitmap.
Typically, only a subset of packfiles will have a bitmap. But this isn't
really much of a problem: we can use bitmaps when available, and then
use the non-bitmap walk for every packfile that isn't covered by one.

Overall, this leads to quite a significant speedup depending on how many
objects of a certain type exist. The following benchmarks have been
executed in the Chromium repository, which has a 50GB packfile with
almost 25 million objects. As expected, there isn't really much of a
change in performance without an object filter:

    Benchmark 1: cat-file with no-filter (revision = HEAD~)
      Time (mean ± σ):     89.675 s ±  4.527 s    [User: 40.807 s, System: 10.782 s]
      Range (min … max):   83.052 s … 96.084 s    10 runs

    Benchmark 2: cat-file with no-filter (revision = HEAD)
      Time (mean ± σ):     88.991 s ±  2.488 s    [User: 42.278 s, System: 10.305 s]
      Range (min … max):   82.843 s … 91.271 s    10 runs

    Summary
      cat-file with no-filter (revision = HEAD) ran
        1.01 ± 0.06 times faster than cat-file with no-filter (revision = HEAD~)

We still have to scan through all objects as we yield all of them, so
using the bitmap in this case doesn't really buy us anything. What is
noticeable in this benchmark is that we're I/O-bound, not CPU-bound, as
can be seen from the user/system runtimes, which combined are way lower
than the overall benchmarked runtime.

But when we do use a filter we can see a significant improvement:

    Benchmark 1: cat-file with filter=object:type=commit (revision = HEAD~)
      Time (mean ± σ):     86.444 s ±  4.081 s    [User: 36.830 s, System: 11.312 s]
      Range (min … max):   80.305 s … 93.104 s    10 runs

    Benchmark 2: cat-file with filter=object:type=commit (revision = HEAD)
      Time (mean ± σ):      2.089 s ±  0.015 s    [User: 1.872 s, System: 0.207 s]
      Range (min … max):    2.073 s …  2.119 s    10 runs

    Summary
      cat-file with filter=object:type=commit (revision = HEAD) ran
       41.38 ± 1.98 times faster than cat-file with filter=object:type=commit (revision = HEAD~)

This is because we don't have to scan through all packfiles anymore, but
can instead directly look up relevant objects.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:52 -07:00
Patrick Steinhardt
d5ec7027bc builtin/cat-file: deduplicate logic to iterate over all objects
Pull out a common function that allows us to iterate over all objects in
a repository. Right now the logic is trivial and would only require two
function calls, making this refactoring a bit pointless. But in the next
commit we will iterate on this logic to make use of bitmaps, so this is
about to become a bit more complex.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:52 -07:00
Patrick Steinhardt
3d45483846 pack-bitmap: allow passing payloads to show_reachable_fn()
The `show_reachable_fn` callback is used by a couple of functions to
present reachable objects to the caller. The function does not provide a
way for the caller to pass a payload though, which is functionality that
we'll require in a subsequent commit.

Change the callback type to accept a payload and adapt all callsites
accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:51 -07:00
Patrick Steinhardt
8fa9fe171a builtin/cat-file: support "object:type=" objects filter
Implement support for the "object:type=" filter in git-cat-file(1),
which causes us to omit all objects that don't match the provided object
type.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:51 -07:00
Patrick Steinhardt
dbe1b32d59 builtin/cat-file: support "blob:limit=" objects filter
Implement support for the "blob:limit=" filter in git-cat-file(1), which
causes us to omit all blobs that are bigger than a certain size.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:50 -07:00
Patrick Steinhardt
3794e9bf98 builtin/cat-file: support "blob:none" objects filter
Implement support for the "blob:none" filter in git-cat-file(1), which
causes us to omit all blobs.

Note that this new filter requires us to read the object type via
`oid_object_info_extended()` in `batch_object_write()`. But as we try to
optimize away reading objects from the database the `data->info.typep`
pointer may not be set. We thus have to adapt the logic to conditionally
set the pointer in cases where the filter is given.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:50 -07:00
Patrick Steinhardt
eb83e4c64b builtin/cat-file: wire up an option to filter objects
In batch mode, git-cat-file(1) enumerates all objects and prints them
by iterating through both loose and packed objects. This works without
considering their reachability at all, and consequently most options to
filter objects as they exist in e.g. git-rev-list(1) are not applicable.
In some situations it may still be useful though to filter objects based
on properties that are inherent to them. This includes the object size
as well as its type.

Such a filter already exists in git-rev-list(1) with the `--filter=`
command line option. While this option supports a couple of filters that
are not applicable to our usecase, some of them are quite a neat fit.

Wire up the filter as an option for git-cat-file(1). This allows us to
reuse the same syntax as in git-rev-list(1) so that we don't have to
reinvent the wheel. For now, we die when any of the filter options has
been passed by the user, but they will be wired up in subsequent
commits.

Further note that the filters that we are about to introduce don't
significantly speed up the runtime of git-cat-file(1). While we can skip
emitting a lot of objects in case they are uninteresting to us, the
majority of time is spent reading the packfile, which is bottlenecked by
I/O and not the processor. This will change though once we start to make
use of bitmaps, which will allow us to skip reading the whole packfile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:50 -07:00
Patrick Steinhardt
1914ae0d70 builtin/cat-file: introduce function to report object status
We have multiple callsites that report the status of an object, for
example when the objec tis missing or its name is ambiguous. We're about
to add a couple more such callsites to report on "excluded" objects.

Prepare for this by introducing a new function `report_object_status()`
that encapsulates the functionality.

Note that this function also flushes stdout, which is a requirement so
that request-response style batched modes can learn about the status
before proceeding to the next object. We already flush correctly at all
existing callsites, even though the flush in `batch_one_object()` only
comes after the switch statement. That flush is now redundant, and we
could in theory deduplicate it by moving it into all branches that don't
use `report_object_status()`. But that doesn't quite feel sensible:

  - The duplicate flush should ultimately just be a no-op for us and
    thus shouldn't impact performance significantly.

  - By keeping the flush in `report_object_status()` we ensure that all
    future callers get semantics correct.

So let's just be pragmatic and live with the duplicated flush.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:49 -07:00
Patrick Steinhardt
84a1d0039a builtin/cat-file: rename variable that tracks usage
The usage strings for git-cat-file(1) that we pass to `parse_options()`
and `usage_msg_optf()` are stored in a variable called `usage`. This
variable shadows the declaration of `usage()`, which we'll want to use
in a subsequent commit.

Rename the variable to `builtin_catfile_usage`, which is in line with
how the variable is typically called in other builtins.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:43:49 -07:00
Junio C Hamano
8a753b9a44 Merge branch 'jh/hash-init-fixes'
An earlier code refactoring of the hash machinery missed a few
required calls to init_fn.

* jh/hash-init-fixes:
  index-pack, unpack-objects: restore missing ->init_fn
2025-04-07 14:23:18 -07:00
Junio C Hamano
58a8c38226 Merge branch 'tb/combine-cruft-below-size'
"git repack" learned "--combine-cruft-below-size" option that
controls how cruft-packs are combined.

* tb/combine-cruft-below-size:
  repack: begin combining cruft packs with `--combine-cruft-below-size`
  repack: avoid combining cruft packs with `--max-cruft-size`
  t/t7704-repack-cruft.sh: consolidate `write_blob()`
  t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
  t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
2025-04-07 14:23:18 -07:00
Junio C Hamano
477cc3b6c7 Merge branch 'jc/name-rev-stdin'
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.

* jc/name-rev-stdin:
  name-rev: remove "--stdin" support
  t6120: further modernize
  t6120: avoid hiding "git" exit status
  t: introduce WITH_BREAKING_CHANGES prerequisite
  t: extend test_lazy_prereq
  t: document test_lazy_prereq
2025-04-07 14:23:17 -07:00
Arnav Bhate
d2fc29380a rm: fix sign comparison warnings
There are multiple places in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa.

get_ours_cache_pos is a special case where i, though derived from a
signed variable is never negative. Move this part to the caller side
and make i an unsigned argument of the function. Rename i to
pos to make it descriptive, now that it is a function argument.

Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both signed and unsigned data types have been
used, define a new variable in the scope of the for loop for use as the
iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.

Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-29 01:04:40 -07:00
Junio C Hamano
ff926a6d1b Merge branch 'en/random-cleanups'
Miscellaneous code clean-ups.

* en/random-cleanups:
  merge-ort: remove extraneous word in comment
  merge-ort: fix accidental strset<->strintmap
  t7615: be more explicit about diff algorithm used
  t6423: fix a comment that accidentally reversed two commits
  stash: remove merge-recursive.h include
2025-03-29 16:39:10 +09:00
Junio C Hamano
27fe152e88 Merge branch 'tb/multi-cruft-pack-refresh-fix'
Certain "cruft" objects would have never been refreshed when there
are multiple cruft packs in the repository, which has been
corrected.

* tb/multi-cruft-pack-refresh-fix:
  builtin/pack-objects.c: freshen objects from existing cruft packs
2025-03-29 16:39:09 +09:00
Junio C Hamano
650b2e2fdb Merge branch 'jk/fetch-ref-prefix-cleanup'
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs.  Revamp the logic to simplify.

* jk/fetch-ref-prefix-cleanup:
  fetch: use ref prefix list to skip ls-refs
  fetch: avoid ls-refs only to ask for HEAD symref update
  fetch: stop protecting additions to ref-prefix list
  fetch: ask server to advertise HEAD for config-less fetch
  refspec_ref_prefixes(): clean up refspec_item logic
  t5516: beef up exact-oid ref prefixes test
  t5516: drop NEEDSWORK about v2 reachability behavior
  t5516: prefer "oid" to "sha1" in some test titles
  t5702: fix typo in test name
2025-03-29 16:39:08 +09:00
Junio C Hamano
eb7923be1f Merge branch 'en/merge-ort-prepare-to-remove-recursive'
First step of deprecating and removing merge-recursive.

* en/merge-ort-prepare-to-remove-recursive:
  am: switch from merge_recursive_generic() to merge_ort_generic()
  merge-ort: fix merge.directoryRenames=false
  t3650: document bug when directory renames are turned off
  merge-ort: support having merge verbosity be set to 0
  merge-ort: allow rename detection to be disabled
  merge-ort: add new merge_ort_generic() function
2025-03-29 16:39:07 +09:00
Junio C Hamano
8d6413a1be Merge branch 'ps/refname-avail-check-optim'
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.

* ps/refname-avail-check-optim:
  refs: reuse iterators when determining refname availability
  refs/iterator: implement seeking for files iterators
  refs/iterator: implement seeking for packed-ref iterators
  refs/iterator: implement seeking for ref-cache iterators
  refs/iterator: implement seeking for reftable iterators
  refs/iterator: implement seeking for merged iterators
  refs/iterator: provide infrastructure to re-seek iterators
  refs/iterator: separate lifecycle from iteration
  refs: stop re-verifying common prefixes for availability
  refs/files: batch refname availability checks for initial transactions
  refs/files: batch refname availability checks for normal transactions
  refs/reftable: batch refname availability checks
  refs: introduce function to batch refname availability checks
  builtin/update-ref: skip ambiguity checks when parsing object IDs
  object-name: allow skipping ambiguity checks in `get_oid()` family
  object-name: introduce `repo_get_oid_with_flags()`
2025-03-29 16:39:07 +09:00