Commit Graph

13509 Commits

Author SHA1 Message Date
Junio C Hamano
7290368147 Merge branch 'pt/fsmonitor-linux' into jch
The fsmonitor daemon has been implemented for Linux.

* pt/fsmonitor-linux:
  fsmonitor: convert shown khash to strset in do_handle_client
  fsmonitor: add tests for Linux
  fsmonitor: add timeout to daemon stop command
  fsmonitor: close inherited file descriptors and detach in daemon
  run-command: add close_fd_above_stderr option
  fsmonitor: implement filesystem change listener for Linux
  fsmonitor: rename fsm-settings-darwin.c to fsm-settings-unix.c
  fsmonitor: rename fsm-ipc-darwin.c to fsm-ipc-unix.c
  fsmonitor: use pthread_cond_timedwait for cookie wait
  compat/win32: add pthread_cond_timedwait
  fsmonitor: fix hashmap memory leak in fsmonitor_run_daemon
  fsmonitor: fix khash memory leak in do_handle_client
2026-03-05 13:48:56 -08:00
Junio C Hamano
ce4412d225 Merge branch 'ps/upload-pack-buffer-more-writes' into jch
Reduce system overhead "git upload-pack" spends relaying "git
pack-objects" output to the "git fetch" running on the other end of
the connection.

Comments?
cf. <xmqqseaf5k5t.fsf@gitster.g>

* ps/upload-pack-buffer-more-writes:
  builtin/pack-objects: reduce lock contention when writing packfile data
  csum-file: drop `hashfd_throughput()`
  csum-file: introduce `hashfd_ext()`
  sideband: use writev(3p) to send pktlines
  wrapper: introduce writev(3p) wrappers
  compat/posix: introduce writev(3p) wrapper
  git-compat-util: introduce `cast_size_t_to_ssize_t()`
  upload-pack: reduce lock contention when writing packfile data
  upload-pack: adapt keepalives based on buffering
  upload-pack: fix debug statement when flushing packfile data
2026-03-05 13:48:54 -08:00
Junio C Hamano
ad4844f40a Merge branch 'sk/oidmap-clear-with-custom-free-func' into jch
A bit of OIDmap API enhancement and cleanup.

Comments?

* sk/oidmap-clear-with-custom-free-func:
  builtin/rev-list: migrate missing_objects cleanup to oidmap_clear_with_free()
  oidmap: make entry cleanup explicit in oidmap_clear
2026-03-05 13:48:51 -08:00
Junio C Hamano
c3e201d137 Merge branch 'mf/format-patch-cover-letter-format' into jch
"git format-patch --cover-letter" learns to use a simpler format
instead of the traditional shortlog format to list its commits with
a new --cover-letter-format option and format.commitListFormat
configuration variable.

* mf/format-patch-cover-letter-format:
  docs: add usage for the cover-letter fmt feature
  format-patch: add commitListFormat config
  format-patch: add ability to use alt cover format
  format-patch: move cover letter summary generation
  pretty.c: add %(count) and %(total) placeholders
2026-03-05 13:48:50 -08:00
Junio C Hamano
495417043e Merge branch 'tb/incremental-midx-part-3.2' into jch
Further work on incremental repacking using MIDX/bitmap

* tb/incremental-midx-part-3.2:
  midx: enable reachability bitmaps during MIDX compaction
  midx: implement MIDX compaction
  t/helper/test-read-midx.c: plug memory leak when selecting layer
  midx-write.c: factor fanout layering from `compute_sorted_entries()`
  midx-write.c: enumerate `pack_int_id` values directly
  midx-write.c: extract `fill_pack_from_midx()`
  midx-write.c: introduce `midx_pack_perm()` helper
  midx: do not require packs to be sorted in lexicographic order
  midx-write.c: introduce `struct write_midx_opts`
  midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  git-multi-pack-index(1): remove non-existent incompatibility
  builtin/multi-pack-index.c: make '--progress' a common option
  midx: introduce `midx_get_checksum_hex()`
  midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  midx: mark `get_midx_checksum()` arguments as const
2026-03-05 13:48:48 -08:00
Junio C Hamano
3adc836dcb Merge branch 'lc/rebase-trailer' into jch
"git rebase" learns "--trailer" command to drive the
interpret-trailers machinery.

* lc/rebase-trailer:
  rebase: support --trailer
  commit, tag: parse --trailer with OPT_STRVEC
  trailer: append trailers without fork/exec
  trailer: move process_trailers to trailer.h
  interpret-trailers: factor trailer rewriting
2026-03-05 13:48:47 -08:00
Junio C Hamano
99fdbc5f88 Merge branch 'ps/odb-sources' into jch
The object source API is getting restructured to allow plugging new
backends.

Being reviewed.
cf. <aaidbdpkpH7tfn9x@denethor>

* ps/odb-sources:
  odb/source: make `begin_transaction()` function pluggable
  odb/source: make `write_alternate()` function pluggable
  odb/source: make `read_alternates()` function pluggable
  odb/source: make `write_object_stream()` function pluggable
  odb/source: make `write_object()` function pluggable
  odb/source: make `freshen_object()` function pluggable
  odb/source: make `for_each_object()` function pluggable
  odb/source: make `read_object_stream()` function pluggable
  odb/source: make `read_object_info()` function pluggable
  odb/source: make `close()` function pluggable
  odb/source: make `reprepare()` function pluggable
  odb/source: make `free()` function pluggable
  odb/source: introduce source type for robustness
  odb: move reparenting logic into respective subsystems
  odb: embed base source in the "files" backend
  odb: introduce "files" source
  odb: split `struct odb_source` into separate header
2026-03-05 13:48:45 -08:00
Junio C Hamano
f71d3319b8 Merge branch 'jt/repo-structure-extrema' into jch
"git repo structure" command learns to report maximum values on
various aspects of objects it inspects.

* jt/repo-structure-extrema:
  builtin/repo: find tree with most entries
  builtin/repo: find commit with most parents
  builtin/repo: add OID annotations to table output
  builtin/repo: collect largest inflated objects
  builtin/repo: add helper for printing keyvalue output
  builtin/repo: update stats for each object
2026-03-05 13:48:38 -08:00
Junio C Hamano
a08540854f Merge branch 'ac/help-sort-correctly' into jch
The code in "git help" that shows configuration items in sorted
order was awkwardly organized and prone to bugs.

* ac/help-sort-correctly:
  help: cleanup the contruction of keys_uniq
2026-03-05 13:48:37 -08:00
Junio C Hamano
cde05466f7 Merge branch 'sa/replay-revert' into jch
"git replay" (experimental) learns, in addition to "pick" and
"replay", a new operating mode "revert".

* sa/replay-revert:
  replay: add --revert mode to reverse commit changes
  sequencer: extract revert message formatting into shared function
2026-03-05 13:48:35 -08:00
Junio C Hamano
a0e19535fb Merge branch 'ds/for-each-repo-w-worktree' into jch
"git for-each-repo" started from a secondary worktree did not work
as expected, which has been corrected.

* ds/for-each-repo-w-worktree:
  for-each-repo: simplify passing of parameters
  for-each-repo: work correctly in a worktree
  run-command: extract sanitize_repo_env helper
  for-each-repo: test outside of repo context
2026-03-05 13:48:27 -08:00
Patrick Steinhardt
d9ecf268ef odb: embed base source in the "files" backend
The "files" backend is implemented as a pointer in the `struct
odb_source`. This contradicts our typical pattern for pluggable backends
like we use it for example in the ref store or for object database
streams, where we typically embed the generic base structure in the
specialized implementation. This pattern has a couple of small benefits:

  - We avoid an extra allocation.

  - We hide implementation details in the generic structure.

  - We can easily downcast from a generic backend to the specialized
    structure and vice versa because the offsets are known at compile
    time.

  - It becomes trivial to identify locations where we depend on backend
    specific logic because the cast needs to be explicit.

Refactor our "files" object database source to do the same and embed the
`struct odb_source` in the `struct odb_source_files`.

There are still a bunch of sites in our code base where we do have to
access internals of the "files" backend. The intent is that those will
go away over time, but this will certainly take a while. Meanwhile,
provide a `odb_source_files_downcast()` function that can convert a
generic source into a "files" source.

As we only have a single source the downcast succeeds unconditionally
for now. Eventually though the intent is to make the cast `BUG()` in
case the caller requests to downcast a non-"files" backend to a "files"
backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:45:15 -08:00
Patrick Steinhardt
cb506a8a69 odb: introduce "files" source
Introduce a new "files" object database source. This source encapsulates
access to both loose object files and the packfile store, similar to how
the "files" backend for refs encapsulates access to loose refs and the
packed-refs file.

Note that for now the "files" source is still a direct member of a
`struct odb_source`. This architecture will be reversed in the next
commit so that the files source contains a `struct odb_source`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:45:14 -08:00
Seyi Kufoiji
a98ea50288 builtin/rev-list: migrate missing_objects cleanup to oidmap_clear_with_free()
As part of the conversion away from oidmap_clear(), switch the
missing_objects map to use oidmap_clear_with_free().

missing_objects stores struct missing_objects_map_entry instances,
which own an xstrdup()'d path string in addition to the container
struct itself. Previously, rev-list manually freed entry->path
before calling oidmap_clear(&missing_objects, true).

Introduce a dedicated free callback and pass it to
oidmap_clear_with_free(), consolidating entry teardown into a
single place and making cleanup semantics explicit.

Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:16:18 -08:00
Paul Tarjan
1defaf5665 fsmonitor: convert shown khash to strset in do_handle_client
Replace the khash-based string set used for deduplicating pathnames
in do_handle_client() with a strset, which provides a cleaner
interface for the same purpose.

Since the paths are interned strings from the batch data, use
strdup_strings=0 to avoid unnecessary copies.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:03:42 -08:00
Paul Tarjan
9f79264a5e fsmonitor: add timeout to daemon stop command
The "fsmonitor--daemon stop" command polls in a loop waiting for the
daemon to exit after sending a "quit" command over IPC.  If the daemon
fails to shut down (e.g. it is stuck or wedged), this loop spins
forever.

Add a 30-second timeout so the stop command returns an error instead
of blocking indefinitely.

Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:03:42 -08:00
Paul Tarjan
30f05c1b6c fsmonitor: close inherited file descriptors and detach in daemon
When the fsmonitor daemon is spawned as a background process, it may
inherit file descriptors from its parent that it does not need.  In
particular, when the test harness or a CI system captures output through
pipes, the daemon can inherit duplicated pipe endpoints.  If the daemon
holds these open, the parent process never sees EOF and may appear to
hang.

Set close_fd_above_stderr on the child process at both daemon startup
paths: the explicit "fsmonitor--daemon start" command and the implicit
spawn triggered by fsmonitor-ipc when a client finds no running daemon.
Also suppress stdout and stderr on the implicit spawn path to prevent
the background daemon from writing to the client's terminal.

Additionally, call setsid() when the daemon starts with --detach to
create a new session and process group.  This prevents the daemon
from being part of the spawning shell's process group, which could
cause the shell's "wait" to block until the daemon exits.

Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:03:41 -08:00
Paul Tarjan
f9cc37dc05 fsmonitor: use pthread_cond_timedwait for cookie wait
The cookie wait in with_lock__wait_for_cookie() uses an infinite
pthread_cond_wait() loop.  The existing comment notes the desire
to switch to pthread_cond_timedwait(), but the routine was not
available in git thread-utils.

On certain container or overlay filesystems, inotify watches may
succeed but events are never delivered.  In this case the daemon
would hang indefinitely waiting for the cookie event, which in
turn causes the client to hang.

Replace the infinite wait with a one-second timeout using
pthread_cond_timedwait().  If the timeout fires, report an
error and let the client proceed with a trivial (full-scan)
response rather than blocking forever.

Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:03:39 -08:00
Paul Tarjan
d55ffd0ee4 fsmonitor: fix hashmap memory leak in fsmonitor_run_daemon
The `state.cookies` hashmap is initialized during daemon startup but
never freed during cleanup in the `done:` label of
fsmonitor_run_daemon().  The cookie entries also have names allocated
via strbuf_detach() that must be freed individually.

Iterate the hashmap to free each cookie name, then call
hashmap_clear_and_free() to release the entries and table.

Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:03:39 -08:00
Paul Tarjan
6b837b4c3c fsmonitor: fix khash memory leak in do_handle_client
The `shown` kh_str_t was freed with kh_release_str() at a point in
the code only reachable in the non-trivial response path.  When the
client receives a trivial response, the code jumps to the `cleanup`
label, skipping the kh_release_str() call entirely and leaking the
hash table.

Fix this by initializing `shown` to NULL and moving the cleanup to the
`cleanup` label using kh_destroy_str(), which is safe to call on NULL.
This ensures the hash table is freed regardless of which code path is
taken.

Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:03:39 -08:00
Junio C Hamano
0d26ac63a2 Merge branch 'rs/parse-options-duplicated-long-options' into jch
The parse-options API learned to notice an options[] array with
duplicated long options.

* rs/parse-options-duplicated-long-options:
  parseopt: check for duplicate long names and numerical options
  pack-objects: remove duplicate --stdin-packs definition
2026-03-05 10:05:14 -08:00
Junio C Hamano
9fb5bf493a Merge branch 'kh/format-patch-noprefix-is-boolean' into jch
The configuration variable format.noprefix did not behave as a
proper boolean variable, which has now been fixed and documented.

* kh/format-patch-noprefix-is-boolean:
  doc: diff-options.adoc: make *.noprefix split translatable
  doc: diff-options.adoc: show format.noprefix for format-patch
  format-patch: make format.noprefix a boolean
2026-03-05 10:05:13 -08:00
Junio C Hamano
5e727199f1 Merge branch 'jk/repo-structure-cleanup' into jch
Code clean-up.

* jk/repo-structure-cleanup:
  repo: remove unnecessary variable shadow
2026-03-05 10:05:13 -08:00
Junio C Hamano
e826132f04 Merge branch 'cs/add-skip-submodule-ignore-all' into jch
"git add <submodule>" has been taught to honor
submodule.<name>.ignore that is set to "all" (and requires "git add
-f" to override it).

* cs/add-skip-submodule-ignore-all:
  Documentation: update add --force option + ignore=all config
  tests: fix existing tests when add an ignore=all submodule
  tests: t2206-add-submodule-ignored: ignore=all and add --force tests
  read-cache: submodule add need --force given ignore=all configuration
  read-cache: update add_files_to_cache take param ignored_too
2026-03-05 10:05:12 -08:00
Junio C Hamano
4c166899cb Merge branch 'ps/refs-for-each' into jch
Code refactoring around refs-for-each-* API functions.

* ps/refs-for-each:
  refs: replace `refs_for_each_fullref_in()`
  refs: replace `refs_for_each_namespaced_ref()`
  refs: replace `refs_for_each_glob_ref()`
  refs: replace `refs_for_each_glob_ref_in()`
  refs: replace `refs_for_each_rawref_in()`
  refs: replace `refs_for_each_rawref()`
  refs: replace `refs_for_each_ref_in()`
  refs: improve verification for-each-ref options
  refs: generalize `refs_for_each_fullref_in_prefixes()`
  refs: generalize `refs_for_each_namespaced_ref()`
  refs: speed up `refs_for_each_glob_ref_in()`
  refs: introduce `refs_for_each_ref_ext`
  refs: rename `each_ref_fn`
  refs: rename `do_for_each_ref_flags`
  refs: move `do_for_each_ref_flags` further up
  refs: move `refs_head_ref_namespaced()`
  refs: remove unused `refs_for_each_include_root_ref()`
2026-03-05 10:05:12 -08:00
Junio C Hamano
6e626f2e69 Merge branch 'ar/config-hooks' (early part) into jch
* 'ar/config-hooks' (early part):
  hook: add -z option to "git hook list"
  hook: allow out-of-repo 'git hook' invocations
  hook: allow event = "" to overwrite previous values
  hook: allow disabling config hooks
  hook: include hooks from the config
  hook: add "git hook list" command
  hook: run a list of hooks to prepare for multihook support
  hook: add internal state alloc/free callbacks
2026-03-05 10:05:11 -08:00
Junio C Hamano
daf0a687ae Merge branch 'ar/run-command-hook-take-2' into jch
Use the hook API to replace ad-hoc invocation of hook scripts via
the run_command() API.

* ar/run-command-hook-take-2:
  builtin/receive-pack: avoid spinning no-op sideband async threads
  receive-pack: convert receive hooks to hook API
  receive-pack: convert update hooks to new API
  run-command: poll child input in addition to output
  hook: add jobs option
  reference-transaction: use hook API instead of run-command
  transport: convert pre-push to hook API
  hook: allow separate std[out|err] streams
  hook: convert 'post-rewrite' hook in sequencer.c to hook API
  hook: provide stdin via callback
  run-command: add stdin callback for parallelization
  run-command: add helper for pp child states
  t1800: add hook output stream tests
2026-03-05 10:05:10 -08:00
Junio C Hamano
db227bce22 Merge branch 'ob/core-attributesfile-in-repository'
The core.attributesfile is intended to be set per repository, but
were kept track of by a single global variable in-core, which has
been corrected by moving it to per-repository data structure.

* ob/core-attributesfile-in-repository:
  environment: move "branch.autoSetupMerge" into `struct repo_config_values`
  environment: stop using core.sparseCheckout globally
  environment: stop storing `core.attributesFile` globally
2026-03-05 10:04:49 -08:00
Junio C Hamano
a1b15cc9d1 Merge branch 'cx/fetch-display-ubfix'
Undefined-behaviour fix in "git fetch".

* cx/fetch-display-ubfix:
  fetch: fix wrong evaluation order in URL trailing-slash trimming
2026-03-04 10:53:02 -08:00
Junio C Hamano
a31d4f1860 Merge branch 'ds/config-list-with-type'
"git config list" is taught to show the values interpreted for
specific type with "--type=<X>" option.

* ds/config-list-with-type:
  config: use an enum for type
  config: restructure format_config()
  config: format colors quietly
  color: add color_parse_quietly()
  config: format expiry dates quietly
  config: format paths gently
  config: format bools or strings in helper
  config: format bools or ints gently
  config: format bools gently
  config: format int64s gently
  config: make 'git config list --type=<X>' work
  config: add 'gently' parameter to format_config()
  config: move show_all_config()
2026-03-04 10:53:02 -08:00
Junio C Hamano
34af1d6e87 Merge branch 'lo/repo-leftover-bits'
Clean-up the code around "git repo info" command.

* lo/repo-leftover-bits:
  Documentation/git-repo: capitalize format descriptions
  Documentation/git-repo: replace 'NUL' with '_NUL_'
  t1901: adjust nul format output instead of expected value
  t1900: rename t1900-repo to t1900-repo-info
  repo: rename struct field to repo_info_field
  repo: replace get_value_fn_for_key by get_repo_info_field
  repo: rename repo_info_fields to repo_info_field
  CodingGuidelines: instruct to name arrays in singular
2026-03-04 10:53:01 -08:00
Junio C Hamano
50d7425767 Merge branch 'ps/maintenance-geometric-default'
"git maintenance" starts using the "geometric" strategy by default.

* ps/maintenance-geometric-default:
  builtin/maintenance: use "geometric" strategy by default
  t7900: prepare for switch of the default strategy
  t6500: explicitly use "gc" strategy
  t5510: explicitly use "gc" strategy
  t5400: explicitly use "gc" strategy
  t34xx: don't expire reflogs where it matters
  t: disable maintenance where we verify object database structure
  t: fix races caused by background maintenance
2026-03-04 10:53:01 -08:00
Junio C Hamano
7b7d67104e Merge branch 'pw/no-more-NULL-means-current-worktree'
API clean-up for the worktree subsystem.

* pw/no-more-NULL-means-current-worktree:
  path: remove repository argument from worktree_git_path()
  wt-status: avoid passing NULL worktree
2026-03-04 10:53:00 -08:00
Junio C Hamano
2d843a2d3d Merge branch 'bk/mailmap-wo-the-repository'
Wean the mailmap code off of the_repository dependency.

* bk/mailmap-wo-the-repository:
  mailmap: drop global config variables
  mailmap: stop using the_repository
2026-03-04 10:53:00 -08:00
Junio C Hamano
1d0a2acb78 Merge branch 'kn/ref-location'
Allow the directory in which reference backends store their data to
be specified.

* kn/ref-location:
  refs: add GIT_REFERENCE_BACKEND to specify reference backend
  refs: allow reference location in refstorage config
  refs: receive and use the reference storage payload
  refs: move out stub modification to generic layer
  refs: extract out `refs_create_refdir_stubs()`
  setup: don't modify repo in `create_reference_database()`
2026-03-04 10:52:59 -08:00
Junio C Hamano
a751e79aca Merge branch 'aa/add-p-no-auto-advance'
"git add -p" learned a new mode that allows the user to revisit a
file that was already dealt with.

* aa/add-p-no-auto-advance:
  add-patch: allow interfile navigation when selecting hunks
  add-patch: allow all-or-none application of patches
  add-patch: modify patch_update_file() signature
  interactive -p: add new `--auto-advance` flag
2026-03-03 11:08:13 -08:00
Junio C Hamano
d455602a4d Merge branch 'sc/pack-redundant-leakfix'
Leakfix.

* sc/pack-redundant-leakfix:
  pack-redundant: fix memory leak when open_pack_index() fails
2026-03-03 11:08:12 -08:00
Derrick Stolee
e87493b9b4 for-each-repo: simplify passing of parameters
This change simplifies the code somewhat from its original
implementation.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-03 10:20:00 -08:00
Derrick Stolee
2ef539bcee for-each-repo: work correctly in a worktree
When run in a worktree, the GIT_DIR directory is set in a different way
than in a typical repository. Show this by updating t0068 to include a
worktree and add a test that runs from that worktree. This requires
moving the repo.key config into a global config instead of the base test
repository's local config (demonstrating that it worked with
non-worktree Git repositories).

We need to be careful to unset the local Git environment variables and
let the child process rediscover them, while also reinstating those
variables in the parent process afterwards. Update run_command_on_repo()
to use the new sanitize_repo_env() helper method to erase these
environment variables.

During review of this bug fix, there were several incorrect patches
demonstrating different bad behaviors. Most of these are covered by
tests, when it is not too expensive to set it up. One case that would be
expensive to set up is the GIT_NO_REPLACE_OBJECTS environment variable,
but we trust that using sanitize_repo_env() will be sufficient to
capture these uncovered cases by using the common code for resetting
environment variables.

Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-03 10:20:00 -08:00
Patrick Steinhardt
b095803c90 builtin/pack-objects: reduce lock contention when writing packfile data
When running `git pack-objects --stdout` we feed the data through
`hashfd_ext()` with a progress meter and a smaller-than-usual buffer
length of 8kB so that we can track throughput more granularly. But as
packfiles tend to be on the larger side, this small buffer size may
cause a ton of write(3p) syscalls.

Originally, the buffer we used in `hashfd()` was 8kB for all use cases.
This was changed though in 2ca245f8be (csum-file.h: increase hashfile
buffer size, 2021-05-18) because we noticed that the number of writes
can have an impact on performance. So the buffer size was increased to
128kB, which improved performance a bit for some use cases.

But the commit didn't touch the buffer size for `hashd_throughput()`.
The reasoning here was that callers expect the progress indicator to
update frequently, and a larger buffer size would of course reduce the
update frequency especially on slow networks.

While that is of course true, there was (and still is, even though it's
now a call to `hashfd_ext()`) only a single caller of this function in
git-pack-objects(1). This command is responsible for writing packfiles,
and those packfiles are often on the bigger side. So arguably:

  - The user won't care about increments of 8kB when packfiles tend to
    be megabytes or even gigabytes in size.

  - Reducing the number of syscalls would be even more valuable here
    than it would be for multi-pack indices, which was the benchmark
    done in the mentioned commit, as MIDXs are typically significantly
    smaller than packfiles.

  - Nowadays, many internet connections should be able to transfer data
    at a rate significantly higher than 8kB per second.

Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`,
which translates to ~64kB. This limit was chosen because `git
pack-objects --stdout` is most often used when sending packfiles via
git-upload-pack(1), where packfile data is chunked into pktlines when
using the sideband. Furthermore, most internet connections should have a
bandwidth signifcantly higher than 64kB/s, so we'd still be able to
observe progress updates at a rate of at least once per second.

This change significantly reduces the number of write(3p) syscalls from
355,000 to 44,000 when packing the Linux repository. While this results
in a small performance improvement on an otherwise-unused system, this
improvement is mostly negligible. More importantly though, it will
reduce lock contention in the kernel on an extremely busy system where
we have many processes writing data at once.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-03 09:52:35 -08:00
Patrick Steinhardt
b7b5cde063 csum-file: drop hashfd_throughput()
The `hashfd_throughput()` function is used by a single callsite in
git-pack-objects(1). In contrast to `hashfd()`, this function uses a
progress meter to measure throughput and a smaller buffer length so that
the progress meter can provide more granular metrics.

We're going to change that caller in the next commit to be a bit more
specific to packing objects. As such, `hashfd_throughput()` will be a
somewhat unfitting mechanism for any potential new callers.

Drop the function and replace it with a call to `hashfd_ext()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-03 09:52:35 -08:00
Junio C Hamano
10dd04a3fe Merge branch 'ps/object-info-bits-cleanup'
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.

* ps/object-info-bits-cleanup:
  odb: convert `odb_has_object()` flags into an enum
  odb: convert object info flags into an enum
  odb: drop gaps in object info flag values
  builtin/fsck: fix flags passed to `odb_has_object()`
  builtin/backfill: fix flags passed to `odb_has_object()`
2026-03-02 17:06:52 -08:00
Junio C Hamano
9eb5b3b999 Merge branch 'ps/odb-for-each-object'
Revamp object enumeration API around odb.

* ps/odb-for-each-object:
  odb: drop unused `for_each_{loose,packed}_object()` functions
  reachable: convert to use `odb_for_each_object()`
  builtin/pack-objects: use `packfile_store_for_each_object()`
  odb: introduce mtime fields for object info requests
  treewide: drop uses of `for_each_{loose,packed}_object()`
  treewide: enumerate promisor objects via `odb_for_each_object()`
  builtin/fsck: refactor to use `odb_for_each_object()`
  odb: introduce `odb_for_each_object()`
  packfile: introduce function to iterate through objects
  packfile: extract function to iterate through objects of a store
  object-file: introduce function to iterate through objects
  object-file: extract function to read object info from path
  odb: fix flags parameter to be unsigned
  odb: rename `FOR_EACH_OBJECT_*` flags
2026-03-02 17:06:50 -08:00
Adrian Ratiu
005f3fbe07 builtin/receive-pack: avoid spinning no-op sideband async threads
Exit early if the hooks do not exist, to avoid spinning up/down
sideband async threads which no-op.

It is important to call the hook_exists() API provided by hook.[ch]
because it covers both config-defined hooks and the "traditional"
hooks from the hookdir. find_hook() only covers the hookdir hooks.

The regression happened because the no-op async threads add some
additional overhead which can be measured with the receive-refs test
of the benchmarks suite [1].

Reproduced using:
cd benchmarks/receive-refs && \
./run --revisions /path/to/git \
fc148b146ad41be71a7852c4867f0773cbfe1ff9~,fc148b146ad41be71a7852c4867f0773cbfe1ff9 \
--parameter-list refformat reftable --parameter-list refcount 10000

1: https://gitlab.com/gitlab-org/data-access/git/benchmarks

Fixes: fc148b146a ("receive-pack: convert update hooks to new API")
Reported-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
[jc: avoid duplicated hardcoded hook names]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02 16:00:43 -08:00
Justin Tobler
42e6959411 builtin/repo: find tree with most entries
The size of a tree object usually corresponds with the number of entries
it has. While iterating through objects in the repository for
git-repo-structure, identify the tree with the most entries and display
it in the output.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02 13:54:53 -08:00
Justin Tobler
18952a1ef1 builtin/repo: find commit with most parents
Complex merge events may produce an octopus merge where the resulting
merge commit has more than two parents. While iterating through objects
in the repository for git-repo-structure, identify the commit with the
most parents and display it in the output.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02 13:54:52 -08:00
Justin Tobler
e00bb8c76e builtin/repo: add OID annotations to table output
The "structure" output for git-repo(1) does not show the corresponding
OIDs for the largest objects in its "table" output. Update the output to
include a list of OID annotations with an index to the corresponding row
in the table.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02 13:54:52 -08:00
Justin Tobler
e33ac9cc9e builtin/repo: collect largest inflated objects
The "structure" output for git-repo(1) shows the total inflated and disk
sizes of reachable objects in the repository, but doesn't show the size
of the largest individual objects. Since an individual object may be a
large contributor to the overall repository size, it is useful for users
to know the maximum size of individual objects.

While interating across objects, record the size and OID of the largest
objects encountered for each object type to provide as output. Note that
the default "table" output format only displays size information and not
the corresponding OID. In a subsequent commit, the table format is
updated to add table annotations that mention the OID.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02 13:54:52 -08:00
Justin Tobler
fa17527927 builtin/repo: add helper for printing keyvalue output
The machine-parsable formats for the git-repo(1) "structure" subcommand
print output in keyvalue pairs. Introduce the helper function
`print_keyvalue()` to remove some code duplication and improve
readability.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02 13:54:51 -08:00
Justin Tobler
31c771ab44 builtin/repo: update stats for each object
When walking reachable objects in the repository, `count_objects()`
processes a set of objects and updates the `struct object_stats`. In
preparation for more granular statistics being collected, update the
`struct object_stats` for each individual object instead.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02 13:54:51 -08:00