A bit of OIDmap API enhancement and cleanup.
* sk/oidmap-clear-with-custom-free-func:
builtin/rev-list: migrate missing_objects cleanup to oidmap_clear_with_free()
oidmap: make entry cleanup explicit in oidmap_clear
The object source API is getting restructured to allow plugging new
backends.
* ps/odb-sources:
odb/source: make `begin_transaction()` function pluggable
odb/source: make `write_alternate()` function pluggable
odb/source: make `read_alternates()` function pluggable
odb/source: make `write_object_stream()` function pluggable
odb/source: make `write_object()` function pluggable
odb/source: make `freshen_object()` function pluggable
odb/source: make `for_each_object()` function pluggable
odb/source: make `read_object_stream()` function pluggable
odb/source: make `read_object_info()` function pluggable
odb/source: make `close()` function pluggable
odb/source: make `reprepare()` function pluggable
odb/source: make `free()` function pluggable
odb/source: introduce source type for robustness
odb: move reparenting logic into respective subsystems
odb: embed base source in the "files" backend
odb: introduce "files" source
odb: split `struct odb_source` into separate header
"git for-each-repo" started from a secondary worktree did not work
as expected, which has been corrected.
* ds/for-each-repo-w-worktree:
for-each-repo: simplify passing of parameters
for-each-repo: work correctly in a worktree
run-command: extract sanitize_repo_env helper
for-each-repo: test outside of repo context
The parse-options API learned to notice an options[] array with
duplicated long options.
* rs/parse-options-duplicated-long-options:
parseopt: check for duplicate long names and numerical options
pack-objects: remove duplicate --stdin-packs definition
The configuration variable format.noprefix did not behave as a
proper boolean variable, which has now been fixed and documented.
* kh/format-patch-noprefix-is-boolean:
doc: diff-options.adoc: make *.noprefix split translatable
doc: diff-options.adoc: show format.noprefix for format-patch
format-patch: make format.noprefix a boolean
"git add <submodule>" has been taught to honor
submodule.<name>.ignore that is set to "all" (and requires "git add
-f" to override it).
* cs/add-skip-submodule-ignore-all:
Documentation: update add --force option + ignore=all config
tests: fix existing tests when add an ignore=all submodule
tests: t2206-add-submodule-ignored: ignore=all and add --force tests
read-cache: submodule add need --force given ignore=all configuration
read-cache: update add_files_to_cache take param ignored_too
* 'ar/config-hooks' (early part):
hook: add -z option to "git hook list"
hook: allow out-of-repo 'git hook' invocations
hook: allow event = "" to overwrite previous values
hook: allow disabling config hooks
hook: include hooks from the config
hook: add "git hook list" command
hook: run a list of hooks to prepare for multihook support
hook: add internal state alloc/free callbacks
Use the hook API to replace ad-hoc invocation of hook scripts via
the run_command() API.
* ar/run-command-hook-take-2:
builtin/receive-pack: avoid spinning no-op sideband async threads
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
run-command: poll child input in addition to output
hook: add jobs option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: allow separate std[out|err] streams
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add helper for pp child states
t1800: add hook output stream tests
The "files" backend is implemented as a pointer in the `struct
odb_source`. This contradicts our typical pattern for pluggable backends
like we use it for example in the ref store or for object database
streams, where we typically embed the generic base structure in the
specialized implementation. This pattern has a couple of small benefits:
- We avoid an extra allocation.
- We hide implementation details in the generic structure.
- We can easily downcast from a generic backend to the specialized
structure and vice versa because the offsets are known at compile
time.
- It becomes trivial to identify locations where we depend on backend
specific logic because the cast needs to be explicit.
Refactor our "files" object database source to do the same and embed the
`struct odb_source` in the `struct odb_source_files`.
There are still a bunch of sites in our code base where we do have to
access internals of the "files" backend. The intent is that those will
go away over time, but this will certainly take a while. Meanwhile,
provide a `odb_source_files_downcast()` function that can convert a
generic source into a "files" source.
As we only have a single source the downcast succeeds unconditionally
for now. Eventually though the intent is to make the cast `BUG()` in
case the caller requests to downcast a non-"files" backend to a "files"
backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "files" object database source. This source encapsulates
access to both loose object files and the packfile store, similar to how
the "files" backend for refs encapsulates access to loose refs and the
packed-refs file.
Note that for now the "files" source is still a direct member of a
`struct odb_source`. This architecture will be reversed in the next
commit so that the files source contains a `struct odb_source`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the conversion away from oidmap_clear(), switch the
missing_objects map to use oidmap_clear_with_free().
missing_objects stores struct missing_objects_map_entry instances,
which own an xstrdup()'d path string in addition to the container
struct itself. Previously, rev-list manually freed entry->path
before calling oidmap_clear(&missing_objects, true).
Introduce a dedicated free callback and pass it to
oidmap_clear_with_free(), consolidating entry teardown into a
single place and making cleanup semantics explicit.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.attributesfile is intended to be set per repository, but
were kept track of by a single global variable in-core, which has
been corrected by moving it to per-repository data structure.
* ob/core-attributesfile-in-repository:
environment: move "branch.autoSetupMerge" into `struct repo_config_values`
environment: stop using core.sparseCheckout globally
environment: stop storing `core.attributesFile` globally
"git config list" is taught to show the values interpreted for
specific type with "--type=<X>" option.
* ds/config-list-with-type:
config: use an enum for type
config: restructure format_config()
config: format colors quietly
color: add color_parse_quietly()
config: format expiry dates quietly
config: format paths gently
config: format bools or strings in helper
config: format bools or ints gently
config: format bools gently
config: format int64s gently
config: make 'git config list --type=<X>' work
config: add 'gently' parameter to format_config()
config: move show_all_config()
Clean-up the code around "git repo info" command.
* lo/repo-leftover-bits:
Documentation/git-repo: capitalize format descriptions
Documentation/git-repo: replace 'NUL' with '_NUL_'
t1901: adjust nul format output instead of expected value
t1900: rename t1900-repo to t1900-repo-info
repo: rename struct field to repo_info_field
repo: replace get_value_fn_for_key by get_repo_info_field
repo: rename repo_info_fields to repo_info_field
CodingGuidelines: instruct to name arrays in singular
"git maintenance" starts using the "geometric" strategy by default.
* ps/maintenance-geometric-default:
builtin/maintenance: use "geometric" strategy by default
t7900: prepare for switch of the default strategy
t6500: explicitly use "gc" strategy
t5510: explicitly use "gc" strategy
t5400: explicitly use "gc" strategy
t34xx: don't expire reflogs where it matters
t: disable maintenance where we verify object database structure
t: fix races caused by background maintenance
API clean-up for the worktree subsystem.
* pw/no-more-NULL-means-current-worktree:
path: remove repository argument from worktree_git_path()
wt-status: avoid passing NULL worktree
Wean the mailmap code off of the_repository dependency.
* bk/mailmap-wo-the-repository:
mailmap: drop global config variables
mailmap: stop using the_repository
Allow the directory in which reference backends store their data to
be specified.
* kn/ref-location:
refs: add GIT_REFERENCE_BACKEND to specify reference backend
refs: allow reference location in refstorage config
refs: receive and use the reference storage payload
refs: move out stub modification to generic layer
refs: extract out `refs_create_refdir_stubs()`
setup: don't modify repo in `create_reference_database()`
"git add -p" learned a new mode that allows the user to revisit a
file that was already dealt with.
* aa/add-p-no-auto-advance:
add-patch: allow interfile navigation when selecting hunks
add-patch: allow all-or-none application of patches
add-patch: modify patch_update_file() signature
interactive -p: add new `--auto-advance` flag
This change simplifies the code somewhat from its original
implementation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When run in a worktree, the GIT_DIR directory is set in a different way
than in a typical repository. Show this by updating t0068 to include a
worktree and add a test that runs from that worktree. This requires
moving the repo.key config into a global config instead of the base test
repository's local config (demonstrating that it worked with
non-worktree Git repositories).
We need to be careful to unset the local Git environment variables and
let the child process rediscover them, while also reinstating those
variables in the parent process afterwards. Update run_command_on_repo()
to use the new sanitize_repo_env() helper method to erase these
environment variables.
During review of this bug fix, there were several incorrect patches
demonstrating different bad behaviors. Most of these are covered by
tests, when it is not too expensive to set it up. One case that would be
expensive to set up is the GIT_NO_REPLACE_OBJECTS environment variable,
but we trust that using sanitize_repo_env() will be sufficient to
capture these uncovered cases by using the common code for resetting
environment variables.
Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.
* ps/object-info-bits-cleanup:
odb: convert `odb_has_object()` flags into an enum
odb: convert object info flags into an enum
odb: drop gaps in object info flag values
builtin/fsck: fix flags passed to `odb_has_object()`
builtin/backfill: fix flags passed to `odb_has_object()`
Revamp object enumeration API around odb.
* ps/odb-for-each-object:
odb: drop unused `for_each_{loose,packed}_object()` functions
reachable: convert to use `odb_for_each_object()`
builtin/pack-objects: use `packfile_store_for_each_object()`
odb: introduce mtime fields for object info requests
treewide: drop uses of `for_each_{loose,packed}_object()`
treewide: enumerate promisor objects via `odb_for_each_object()`
builtin/fsck: refactor to use `odb_for_each_object()`
odb: introduce `odb_for_each_object()`
packfile: introduce function to iterate through objects
packfile: extract function to iterate through objects of a store
object-file: introduce function to iterate through objects
object-file: extract function to read object info from path
odb: fix flags parameter to be unsigned
odb: rename `FOR_EACH_OBJECT_*` flags
Exit early if the hooks do not exist, to avoid spinning up/down
sideband async threads which no-op.
It is important to call the hook_exists() API provided by hook.[ch]
because it covers both config-defined hooks and the "traditional"
hooks from the hookdir. find_hook() only covers the hookdir hooks.
The regression happened because the no-op async threads add some
additional overhead which can be measured with the receive-refs test
of the benchmarks suite [1].
Reproduced using:
cd benchmarks/receive-refs && \
./run --revisions /path/to/git \
fc148b146ad41be71a7852c4867f0773cbfe1ff9~,fc148b146ad41be71a7852c4867f0773cbfe1ff9 \
--parameter-list refformat reftable --parameter-list refcount 10000
1: https://gitlab.com/gitlab-org/data-access/git/benchmarks
Fixes: fc148b146a ("receive-pack: convert update hooks to new API")
Reported-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
[jc: avoid duplicated hardcoded hook names]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch --from=<me>" did not honor the command line
option when writing out the cover letter, which has been corrected.
* mf/format-patch-honor-from-for-cover-letter:
format-patch: fix From header in cover letter
Extend the alias configuration syntax to allow aliases using
characters outside ASCII alphanumeric (plus '-').
* jh/alias-i18n:
completion: fix zsh alias listing for subsection aliases
alias: support non-alphanumeric names via subsection syntax
alias: prepare for subsection aliases
help: use list_aliases() for alias listing
"git switch <name>", in an attempt to create a local branch <name>
after a remote tracking branch of the same name gave an advise
message to disambiguate using "git checkout", which has been
updated to use "git switch".
* jc/checkout-switch-restore:
checkout: tell "parse_remote_branch" which command is calling it
checkout: pass program-readable token to unified "main"
UI improvements for "git history reword".
* ps/history-ergonomics-updates:
Documentation/git-history: document default for "--update-refs="
builtin/history: rename "--ref-action=" to "--update-refs="
builtin/history: replace "--ref-action=print" with "--dry-run"
builtin/history: check for merges before asking for user input
builtin/history: perform revwalk checks before asking for user input
A handful of places used refs_for_each_ref_in() API incorrectly,
which has been corrected.
* ps/for-each-ref-in-fixes:
bisect: simplify string_list memory handling
bisect: fix misuse of `refs_for_each_ref_in()`
pack-bitmap: fix bug with exact ref match in "pack.preferBitmapTips"
pack-bitmap: deduplicate logic to iterate over preferred bitmap tips
"git repo info" learns "--keys" action to list known keys.
* lo/repo-info-keys:
repo: add new flag --keys to git-repo-info
repo: rename the output format "keyvalue" to "lines"
cd846bacc7 (pack-objects: introduce '--stdin-packs=follow', 2025-06-23)
added a new definition of the option --stdin-packs that accepts an
argument. It kept the old definition, which still shows up in the short
help, but is shadowed by the new one. Remove it.
Hinted-at-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config value `branch.autoSetupMerge` is parsed in
`git_default_branch_config()` and stored in the global variable
`git_branch_track`. This global variable can be overwritten
by another repository when multiple Git repos run in the the same process.
Move this value into `struct repo_config_values` in the_repository to
retain current behaviours and move towards libifying Git.
Since the variable is no longer a global variable, it has been renamed to
`branch_track` in the struct `repo_config_values`.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config value `core.sparseCheckout` is parsed in
`git_default_core_config()` and stored globally in
`core_apply_sparse_checkout`. This could cause it to be overwritten
by another repository when different Git repositories run in the same
process.
Move the parsed value into `struct repo_config_values` in the_repository
to retain current behaviours and move towards libifying Git.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-objects --stdin-packs" with "--exclude-promisor-objects"
fetched objects that are promised, which was not wanted. This has
been fixed.
* ps/pack-concat-wo-backfill:
builtin/pack-objects: don't fetch objects when merging packs
"auto filter" logic for large-object promisor remote.
* cc/lop-filter-auto:
fetch-pack: wire up and enable auto filter logic
promisor-remote: change promisor_remote_reply()'s signature
promisor-remote: keep advertised filters in memory
list-objects-filter-options: support 'auto' mode for --filter
doc: fetch: document `--filter=<filter-spec>` option
fetch: make filter_options local to cmd_fetch()
clone: make filter_options local to cmd_clone()
promisor-remote: allow a client to store fields
promisor-remote: refactor initialising field lists
Change the name of the struct field to repo_info_field, making it
explicit that it is an internal data type of git-repo-info.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the function `get_value_fn_for_key`, which returns a function that
retrieves a value for a certain repo info key. Introduce `get_repo_info_field`
instead, which returns a struct field.
This refactor makes the structure of the function print_fields more consistent
to the function print_all_fields, improving its readability.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename repo_info_fields as repo_info_field, following the CodingGuidelines rule
for naming arrays in singular. Rename all the references to that array
accordingly.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'extensions.refStorage' config is used to specify the reference
backend for a given repository. Both the 'files' and 'reftable' backends
utilize the $GIT_DIR as the reference folder by default in
`get_main_ref_store()`.
Since the reference backends are pluggable, this means that they could
work with out-of-tree reference directories too. Extend the 'refStorage'
config to also support taking an URI input, where users can specify the
reference backend and the location.
Add the required changes to obtain and propagate this value to the
individual backends. Add the necessary documentation and tests.
Traditionally, for linked worktrees, references were stored in the
'$GIT_DIR/worktrees/<wt_id>' path. But when using an alternate reference
storage path, it doesn't make sense to store the main worktree
references in the new path, and the linked worktree references in the
$GIT_DIR. So, let's store linked worktree references in
'$ALTERNATE_REFERENCE_DIR/worktrees/<wt_id>'. To do this, create the
necessary files and folders while also adding stubs in the $GIT_DIR path
to ensure that it is still considered a Git directory.
Ideally, we would want to pass in a `struct worktree *` to individual
backends, instead of passing the `gitdir`. This allows them to handle
worktree specific logic. Currently, that is not possible since the
worktree code is:
- Tied to using the global `the_repository` variable.
- Is not setup before the reference database during initialization of
the repository.
Add a TODO in 'refs.c' to ensure we can eventually make that change.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For Git to recognize a directory as a Git directory, it requires the
directory to contain:
1. 'HEAD' file
2. 'objects/' directory
3. 'refs/' directory
Here, #1 and #3 are part of the reference storage mechanism,
specifically the files backend. Since then, newer backends such as the
reftable backend have moved to using their own path ('reftable/') for
storing references. But to ensure Git still recognizes the directory as
a Git directory, we create stubs.
There are two locations where we create stubs:
- In 'refs/reftable-backend.c' when creating the reftable backend.
- In 'clone.c' before spawning transport helpers.
In a following commit, we'll add another instance. So instead of
repeating the code, let's extract out this code to
`refs_create_refdir_stubs()` and use it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `create_reference_database()` function is used to create the
reference database during initialization of a repository. The function
calls `repo_set_ref_storage_format()` to set the repositories reference
format. This is an unexpected side-effect of the function. More so
because the function is only called in two locations:
1. During git-init(1) where the value is propagated from the `struct
repository_format repo_fmt` value.
2. During git-clone(1) where the value is propagated from the
`the_repository` value.
The former is valid, however the flow already calls
`repo_set_ref_storage_format()`, so this effort is simply duplicated.
The latter sets the existing value in `the_repository` back to itself.
While this is okay for now, introduction of more fields in
`repo_set_ref_storage_format()` would cause issues, especially
dynamically allocated strings, where we would free/allocate the same
string back into `the_repostiory`.
To avoid all this confusion, clean up the function to no longer take in
and set the repo's reference storage format.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-gc(1) command has been introduced in the early days of Git in
30f610b7b0 (Create 'git gc' to perform common maintenance operations.,
2006-12-27) as the main repository maintenance utility. And while the
tool has of course evolved since then to cover new parts, the basic
strategy it uses has never really changed much.
It is safe to say that since 2006 the Git ecosystem has changed quite a
bit. Repositories tend to be much larger nowadays than they have been
almost 20 years ago, and large parts of the industry went crazy for
monorepos (for various wildly different definitions of "monorepo"). So
the maintenance strategy we used back then may not be the best fit
nowadays anymore.
Arguably, most of the maintenance tasks that git-gc(1) does are still
perfectly fine today: repacking references, expiring various data
structures and things like tend to not cause huge problems. But the big
exception is the way we repack objects.
git-gc(1) by default uses a split strategy: it performs incremental
repacks by default, and then whenever we have too many packs we perform
a large all-into-one repack. This all-into-one repack is what is causing
problems nowadays, as it is an operation that is quite expensive. While
it is wasteful in small- and medium-sized repositories, in large repos
it may even be prohibitively expensive.
We have eventually introduced git-maintenance(1) that was slated as a
replacement for git-gc(1). In contrast to git-gc(1), it is much more
flexible as it is structured around configurable tasks and strategies.
So while its default "gc" strategy still uses git-gc(1) under the hood,
it allows us to iterate.
A second strategy it knows about is the "incremental" strategy, which we
configure when registering a repository for scheduled maintenance. This
strategy isn't really a full replacement for git-gc(1) though, as it
doesn't know to expire unused data structures. In Git 2.52 we have thus
introduced a new "geometric" strategy that is a proper replacement for
the old git-gc(1).
In contrast to the incremental/all-into-one split used by git-gc(1), the
new "geometric" strategy maintains a geometric progression of packfiles,
which significantly reduces the number of all-into-one repacks that we
have to perform in large repositories. It is thus a much better fit for
large repositories than git-gc(1).
Note that the "geometric" strategy isn't perfect though: while we
perform way less all-into-one repacks compared to git-gc(1), we still
have to perform them eventually. But for the largest repositories out
there this may not be an option either, as client machines might not be
powerful enough to perform such a repack in the first place. These cases
would thus still be covered by the "incremental" strategy.
Switch the default strategy away from "gc" to "geometric", but retain
the "incremental" strategy configured when registering background
maintenance with `git maintenance register`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config `format.noprefix` was added in 8d5213de (format-patch: add
format.noprefix option, 2023-03-09) to support no-prefix on paths.
That was immediately after making git-format-patch(1) not respect
`diff.noprefix`.[1]
The intent was to mirror `diff.noprefix`. But this config was
unintentionally[2] implemented by enabling no-prefix if any kind of
value is set.
† 1: c169af8f (format-patch: do not respect diff.noprefix, 2023-03-09)
† 2: https://lore.kernel.org/all/20260211073553.GA1867915@coredump.intra.peff.net/
Let’s indeed mirror `diff.noprefix` by treating it as a boolean.
This is a breaking change. And as far as breaking changes go it is
pretty benign:
• The documentation claims that this config is equivalent to
`diff.noprefix`; this is just a bug fix if the documentation is
what defines the application interface
• Only users with non-boolean values will run into problems when we
try to parse it as a boolean. But what would (1) make them suspect
they could do that in the first place, and (2) have motivated them to
do it?
• Users who have set this to `false` and expect that to mean *enable
format.noprefix* (current behavior) will now have the opposite
experience. Which is not a reasonable setup.
Let’s only offer a breaking change fig leaf by advising about the
previous behavior before dying.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>