Commit Graph

23693 Commits

Author SHA1 Message Date
Junio C Hamano
01148cafa4 Merge branch 'rc/userdiff-r'
Userdiff patterns for the R language.

* rc/userdiff-r:
  userdiff: add support for R programming language
2025-06-17 10:44:39 -07:00
Junio C Hamano
88134a8417 Merge branch 'ds/path-walk-2'
"git pack-objects" learns to find delta bases from blobs at the
same path, using the --path-walk API.

* ds/path-walk-2:
  pack-objects: allow --shallow and --path-walk
  path-walk: add new 'edge_aggressive' option
  pack-objects: thread the path-based compression
  pack-objects: refactor path-walk delta phase
  scalar: enable path-walk during push via config
  pack-objects: enable --path-walk via config
  repack: add --path-walk option
  t5538: add tests to confirm deltas in shallow pushes
  pack-objects: introduce GIT_TEST_PACK_PATH_WALK
  p5313: add performance tests for --path-walk
  pack-objects: update usage to match docs
  pack-objects: add --path-walk option
  pack-objects: extract should_attempt_deltas()
2025-06-17 10:44:38 -07:00
Junio C Hamano
427b538fc3 Merge branch 'mm/test-in-absolute-home'
Tests that compare $HOME and $(pwd), which should be the same
directory unless the tests chdir's around, would fail when the user
enters the test directory via symbolic links, which has been
corrected.

* mm/test-in-absolute-home:
  t: run tests from a normalized working directory
2025-06-09 07:15:51 -07:00
Junio C Hamano
14de3eb344 Merge branch 'js/t5410-tee-hang-workaround'
* js/t5410-tee-hang-workaround:
  t5410: avoid hangs in CI runs in the win+Meson test jobs
2025-06-05 11:56:29 -07:00
Johannes Schindelin
52a86dd26d t5410: avoid hangs in CI runs in the win+Meson test jobs
In the GitHub workflow used in Git's CI builds, the `vs test` jobs use a
subset of a specific revision of Git for Windows' SDK to run Git's test
suite. This revision is validated by another CI workflow to ensure that
said revision _can_ run Git's test suite successfully, skipping buggy
updates in Git for Windows' SDK.

The `win+Meson test` jobs do things differently, quite differently. They
use the Bash of the Git for Windows version that is installed on the
runners to run Git's test suite.

This difference has consequences.

When 68cb0b5253 (builtin/receive-pack: add option to skip connectivity
check, 2025-05-20) introduced a test case that uses `tee <file> | git
receive-pack` as `--receive-pack` parameter (imitating an existing
pattern in the same test script), it hit just the sweet spot to trigger
a bug in the MSYS2 runtime shipped in Git for Windows v2.49.0. This
version is the one currently installed on GitHub's runners.

The problem is that the `git receive-pack` process finishes while the
`tee` process does not need to write anything anymore and therefore does
not receive an EOF. Instead, it should receive a SIGPIPE, but the bug in
the MSYS2 runtime prevents that from working as intended. As a
consequence, the `tee` process waits for more input from the `git.exe
send-pack` process but none is coming, and the test script patiently
waits until the 6h timeout hits.

Only every once in a while, the `git receive-pack` process manages to
send an EOF to the `tee` process and no hang occurs. Therefore, the
problem can be worked around by cancelling the clearly-hanging job after
twenty or so minutes and re-running it, repeating the process about half
a dozen times, until the hang was successfully avoided.

This bug in the MSYS2 runtime has been fixed in the meantime, which is
the reason why the same test case causes no problems in the `win test`
and the `vs test` jobs.

This will continue to be the case until the Git for Windows version on
the GitHub runners is upgraded to a version that distributes a newer
MSYS2 runtime version. However, as of time of writing, this _is_ the
latest Git for Windows version, and will be for another 1.5 weeks, until
Git v2.50.0 is scheduled to appear (and shortly thereafter Git for
Windows v2.50.0). Traditionally it takes a while before the runners pick
up the new version.

We could just wait it out, six hours at a time.

Here, I opt for an alternative: Detect the buggy MSYS2 runtime and
simply skip the test case. It's not like the `receive-pack` test cases
are specific to Windows, and even then, to my chagrin the CI runs in
git-for-windows/git spend around ten hours of compute time each and
every time to run the entire test suite on all the platforms, even the
tests that cover cross-platform code, and for Windows alone we do that
three times: with GCC, with MSVC, and with MSVC via Meson. Therefore, I
deem it more than acceptable to skip this test case in one of those
matrices.

For good luck, also the preceding test case is skipped in that scenario,
as it uses the same `--receive-pack=tee <file> | git receive-pack`
pattern, even though I never observed that test case to hang in
practice.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-05 09:45:42 -07:00
Junio C Hamano
c38b74f286 Merge branch 'sj/ref-contents-check-fix'
"git verify-refs" (and hence "git fsck --reference") started
erroring out in a repository in which secondary worktrees were
prepared with Git 2.43 or lower.

* sj/ref-contents-check-fix:
  fsck: ignore missing "refs" directory for linked worktrees
2025-06-03 08:55:23 -07:00
shejialuo
d5b3c38b8a fsck: ignore missing "refs" directory for linked worktrees
"git refs verify" doesn't work if there are worktrees created on Git
v2.43.0 or older versions. These versions don't automatically create the
"refs" directory, causing the error:

    error: cannot open directory .git/worktrees/<worktree name>/refs:
    No such file or directory

Since 8f4c00de95 (builtin/worktree: create refdb via ref backend,
2024-01-08), we automatically create the "refs" directory for new
worktrees. And in 7c78d819e6 (ref: support multiple worktrees check for
refs, 2024-11-20), we assume that all linked worktrees have this
directory and would wrongly report an error to the user, thus
introducing compatibility issue.

Check for ENOENT errno before reporting directory access errors for
linked worktrees to maintain backward compatibility.

Reported-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-02 11:20:19 -07:00
Junio C Hamano
bbe8a3723b Merge branch 'jc/signed-fast-export-is-experimental'
Mark a new feature added during this cycle as experimental and fix
its default so that existing users of the fast-export command is
not broken.

* jc/signed-fast-export-is-experimental:
  fast-export: --signed-commits is experimental
2025-06-02 09:25:34 -07:00
Mark Mentovai
2d207ed1ec t: run tests from a normalized working directory
Some tests make git perform actions that produce observable pathnames,
and have expectations on those paths. Tests run with $HOME set to a
$TRASH_DIRECTORY, and with their working directory the same
$TRASH_DIRECTORY, although these paths are logically identical, they do
not observe the same pathname canonicalization rules and thus might not
be represented by strings that compare equal. In particular, no pathname
normalization is applied to $TRASH_DIRECTORY or $HOME, while tests
change their working directory with `cd -P`, which normalizes the
working directory's path by fully resolving symbolic links.

t7900's macOS maintenance tests (which are not limited to running on
macOS) have an expectation on a path that `git maintenance` forms by
using abspath.c strbuf_realpath() to resolve a canonical absolute path
based on $HOME. When t7900 runs from a working directory that contains
symbolic links in its pathname, $HOME will also contain symbolic links,
which `git maintenance` resolves but the test's expectation does not,
causing a test failure.

Align $TRASH_DIRECTORY and $HOME with the normalized path as used for
the working directory by resetting them to match the working directory
after it's established by `cd -P`. With all paths in agreement and
symbolic links resolved, pathname expectations can be set and met based
on string comparison without regard to external environmental factors
such as the presence of symbolic links in a path.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-30 14:55:03 -07:00
Junio C Hamano
5d2812ff3c Merge branch 'mm/apply-reverse-mode-of-deleted-path'
"git apply --index/--cached" when applying a deletion patch in
reverse failed to give the mode bits of the path "removed" by the
patch to the file it creates, which has been corrected.

* mm/apply-reverse-mode-of-deleted-path:
  apply: set file mode when --reverse creates a deleted file
  t4129: test that git apply warns for unexpected mode changes
2025-05-30 11:59:17 -07:00
Rodrigo Carvalho
1d9526df8d userdiff: add support for R programming language
Add userdiff patterns to support R programming language.

Also, add three userdiff tests for R programming language
files. These files define simple function and nested function,
with and without indentation.

Signed-off-by: Rodrigo Carvalho <rodrigorsdc@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-29 15:29:37 -07:00
Junio C Hamano
0b4c6baa70 fast-export: --signed-commits is experimental
As the design of signature handling is still being discussed, it is
likely that the data stream produced by the code in Git 2.50 would
have to be changed in such a way that is not backward compatible.

Mark the feature as experimental and discourge its use for now.

Also flip the default on the generation side to "strip"; users of
existing versions would not have passed --signed-commits=strip and
will be broken by this change if the default is made to abort, and
will be encouraged by the error message to produce data stream with
future breakage guarantees by passing --signed-commits option.

As we tone down the default behaviour, we no longer need the
FAST_EXPORT_SIGNED_COMMITS_NOABORT environment variable, which was
not discoverable enough.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-28 10:30:47 -07:00
Junio C Hamano
b4847a4477 Merge branch 'jt/receive-pack-skip-connectivity-check'
"git receive-pack" optionally learns not to care about connectivity
check, which can be useful when the repository arranges to ensure
connectivity by some other means.

* jt/receive-pack-skip-connectivity-check:
  builtin/receive-pack: add option to skip connectivity check
  t5410: test receive-pack connectivity check
2025-05-28 07:59:56 -07:00
Junio C Hamano
b5afd0a7ee Merge branch 'kn/passing-leak-tests'
Remove the leftover hints to the test framework to mark tests that
do not pass the leak checker tests, as they should no longer be
needed.

* kn/passing-leak-tests:
  t: remove unexpected SANITIZE_LEAK variables
2025-05-28 07:59:56 -07:00
Junio C Hamano
80f49f2ae7 Merge branch 'en/sequencer-comment-messages'
Prefix '#' to the commit title in the "rebase -i" todo file, just
like a merge commit being replayed.

* en/sequencer-comment-messages:
  sequencer: make it clearer that commit descriptions are just comments
2025-05-27 13:59:11 -07:00
Junio C Hamano
d8b48af391 Merge branch 'sj/use-mmap-to-check-packed-refs'
The code path to access the "packed-refs" file while "fsck" is
taught to mmap the file, instead of reading the whole file in the
memory.

* sj/use-mmap-to-check-packed-refs:
  packed-backend: mmap large "packed-refs" file during fsck
  packed-backend: extract snapshot allocation in `load_contents`
  packed-backend: fsck should warn when "packed-refs" file is empty
2025-05-27 13:59:10 -07:00
Junio C Hamano
6e5fb398d3 Merge branch 'ds/sparse-apply-add-p'
"git apply" and "git add -i/-p" code paths no longer unnecessarily
expand sparse-index while working.

* ds/sparse-apply-add-p:
  p2000: add performance test for patch-mode commands
  reset: integrate sparse index with --patch
  git add: make -p/-i aware of sparse index
  apply: integrate with the sparse index
2025-05-27 13:59:09 -07:00
Junio C Hamano
f545f401be Merge branch 'en/merge-tree-check'
"git merge-tree" learned an option to see if it resolves cleanly
without actually creating a result.

* en/merge-tree-check:
  merge-tree: add a new --quiet flag
  merge-ort: add a new mergeability_only option
2025-05-27 13:59:08 -07:00
Junio C Hamano
17d9dbd3c2 Merge branch 'jk/no-funny-object-types'
Support to create a loose object file with unknown object type has
been dropped.

* jk/no-funny-object-types:
  object-file: drop support for writing objects with unknown types
  hash-object: handle --literally with OPT_NEGBIT
  hash-object: merge HASH_* and INDEX_* flags
  hash-object: stop allowing unknown types
  t: add lib-loose.sh
  t/helper: add zlib test-tool
  oid_object_info(): drop type_name strbuf
  fsck: stop using object_info->type_name strbuf
  oid_object_info_convert(): stop using string for object type
  cat-file: use type enum instead of buffer for -t option
  object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
  cat-file: make --allow-unknown-type a noop
  object-file.h: fix typo in variable declaration
2025-05-27 13:59:08 -07:00
Junio C Hamano
dcb89740a0 Merge branch 'md/userdiff-bash-shell-function'
The userdiff pattern for shell scripts has been updated to cope
with more bash-isms.

* md/userdiff-bash-shell-function:
  userdiff: extend Bash pattern to cover more shell function forms
2025-05-27 13:59:06 -07:00
Mark Mentovai
1d9a66493b apply: set file mode when --reverse creates a deleted file
Commit 01aff0a (apply: correctly reverse patch's pre- and post-image
mode bits, 2023-12-26) revised reverse_patches() to maintain the desired
property that when only one of patch::old_mode and patch::new_mode is
set, the mode will be carried in old_mode. That property is generally
correct, with one notable exception: when creating a file, only new_mode
will be set. Since reversing a deletion results in a creation, new_mode
must be set in that case.

Omitting handling for this case means that reversing a patch that
removes an executable file will not result in the executable permission
being set on the re-created file. Existing test coverage for file modes
focuses only on mode changes of existing files.

Swap old_mode and new_mode in reverse_patches() for what's represented
in the patch as a file deletion, as it is transformed into a file
creation under reversal. This causes git apply --reverse to set the
executable permission properly when re-creating a deleted executable
file.

Add tests ensuring that git apply sets file modes correctly on file
creation, both in the forward and reverse directions.

Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-27 06:48:07 -07:00
Mark Mentovai
2cc8c17d67 t4129: test that git apply warns for unexpected mode changes
There is no test covering what commit 01aff0a (apply: correctly reverse
patch's pre- and post-image mode bits, 2023-12-26) addressed. Prior to
that commit, git apply was erroneously unaware of a file's expected mode
while reverse-patching a file whose mode was not changing.

Add the missing test coverage to assure that git apply is aware of the
expected mode of a file being patched when the patch does not indicate
that the file's mode is changing. This is achieved by arranging a file
mode so that it doesn't agree with patch being applied, and checking git
apply's output for the warning it's supposed to raise in this situation.
Test in both reverse and normal (forward) directions.

Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-27 06:48:07 -07:00
Junio C Hamano
95c79efb8d Merge branch 'ds/scalar-no-maintenance'
Two "scalar" subcommands that adds a repository that hasn't been
under "scalar"'s control are taught an option not to enable the
scheduled maintenance on it.

* ds/scalar-no-maintenance:
  scalar reconfigure: improve --maintenance docs
  scalar reconfigure: add --maintenance=<mode> option
  scalar clone: add --no-maintenance option
  scalar register: add --no-maintenance option
  scalar: customize register_dir()'s behavior
2025-05-23 15:34:07 -07:00
Karthik Nayak
368d8c86f7 t: remove unexpected SANITIZE_LEAK variables
As of 1fc7ddf35b (test-lib: unconditionally enable leak checking,
2024-11-20), both the `GIT_TEST_PASSING_SANITIZE_LEAK` and
`TEST_PASSES_SANITIZE_LEAK` variables no longer have any meaning, the
leak checks are enabled by default. However, some newly added tests
include them by mistake. Let's clean this up.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20 15:09:33 -07:00
Justin Tobler
68cb0b5253 builtin/receive-pack: add option to skip connectivity check
During git-receive-pack(1), connectivity of the object graph is
validated to ensure that the received packfile does not leave the
repository in a broken state. This is done via git-rev-list(1) and
walking the objects, which can be expensive for large repositories.

Generally, this check is critical to avoid an incomplete received
packfile from corrupting a repository. Server operators may have
additional knowledge though around exactly how Git is being used on the
server-side which can be used to facilitate more efficient connectivity
computation of incoming objects.

For example, if it can be ensured that all objects in a repository are
connected and do not depend on any missing objects, the connectivity of
newly written objects can be checked by walking the object graph
containing only the new objects from the updated tips and identifying
the missing objects which represent the boundary between the new objects
and the repository. These boundary objects can be checked in the
canonical repository to ensure the new objects connect as expected and
thus avoid walking the rest of the object graph.

Git itself cannot make the guarantees required for such an optimization
as it is possible for a repository to contain an unreachable object that
references a missing object without the repository being considered
corrupt.

Introduce the --skip-connectivity-check option for git-receive-pack(1)
which bypasses this connectivity check to give more control to the
server-side. Note that without proper server-side validation of newly
received objects handled outside of Git, usage of this option risks
corrupting a repository.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20 11:43:36 -07:00
Justin Tobler
95262afe78 t5410: test receive-pack connectivity check
As part of git-recieve-pack(1), the connectivity of objects is checked.
Add a test validating that git-receive-pack(1) fails due to an incoming
packfile that would leave the repository with missing objects. Instead
of creating a new test file, "t5410" is generalized for receive-pack
testing.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20 11:43:36 -07:00
Junio C Hamano
90eedabbf7 Merge branch 'ps/reftable-read-block-perffix'
Performance regression in not-yet-released code has been corrected.

* ps/reftable-read-block-perffix:
  reftable: fix perf regression when reading blocks of unwanted type
2025-05-19 16:02:48 -07:00
Junio C Hamano
a9dcacbf2a Merge branch 'jk/oidmap-cleanup'
Code cleanup.

* jk/oidmap-cleanup:
  raw_object_store: drop extra pointer to replace_map
  oidmap: add size function
  oidmap: rename oidmap_free() to oidmap_clear()
2025-05-19 16:02:47 -07:00
Junio C Hamano
9af978fa04 Merge branch 'rc/t1001-test-path-is-file'
Test update.

* rc/t1001-test-path-is-file:
  t1001: replace 'test -f' with 'test_path_is_file'
2025-05-19 16:02:47 -07:00
Junio C Hamano
4bb72548fc Merge branch 'sc/bundle-uri-use-all-refs-in-bundle'
Bundle-URI feature did not use refs recorded in the bundle other
than normal branches as anchoring points to optimize the follow-up
fetch during "git clone"; now it is told to utilize all.

* sc/bundle-uri-use-all-refs-in-bundle:
  bundle-uri: add test for bundle-uri clones with tags
  bundle-uri: copy all bundle references ino the refs/bundle space
2025-05-19 16:02:45 -07:00
Junio C Hamano
0b8d22fd40 Merge branch 'pw/sequencer-reflog-use-after-free'
Use-after-free fix in the sequencer.

* pw/sequencer-reflog-use-after-free:
  sequencer: rework reflog message handling
  sequencer: move reflog message functions
2025-05-19 16:02:44 -07:00
Elijah Newren
29d7bf1951 merge-tree: add a new --quiet flag
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted.  For such cases, add a new --quiet flag which
will make use of the new mergeability_only flag added to merge-ort in
the previous commit.  This option allows the merge machinery to, in the
outer layer of the merge:
    * exit early when a conflict is detected
    * avoid writing (most) merged blobs/trees to the object store

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 15:09:14 -07:00
Elijah Newren
e42667241d sequencer: make it clearer that commit descriptions are just comments
Every once in a while, users report that editing the commit summaries
in the todo list does not get reflected in the rebase operation,
suggesting that users are (a) only using one-line commit messages, and
(b) not understanding that the commit summaries are merely helpful
comments to help them find the right hashes.

It may be difficult to correct users' poor commit messages, but we can
at least try to make it clearer that the commit summaries are not
directives of some sort by inserting a comment character.  Hopefully
that leads to them looking a little further and noticing the hints at
the bottom to use 'reword' or 'edit' directives.

Yes, this change may look funny at first since it hardcodes '#' rather
than using comment_line_str.  However:

  * comment_line_str exists to allow disambiguation between lines in
    a commit message and lines that are instructions to users editing
    the commit message.  No such disambiguation is needed for these
    comments that occur on the same line after existing directives
  * the exact "comment" character(s) on regular pick lines used aren't
    actually important; I could have used anything, including completely
    random variable length text for each line and it'd work because we
    ignore everything after 'pick' and the hash.
  * The whole point of this change is to signal to users that they
    should NOT be editing any part of the line after the hash (and if
    they do so, their edits will be ignored), while the whole point of
    comment_line_str is to allow highly flexible editing.  So making
    it more general by using comment_line_str actually feels
    counterproductive.
  * The character for merge directives absolutely must be '#'; that
    has been deeply hardcoded for a long time (see below), and will
    break if some other comment character is used instead.  In a
    desire to have pick and merge directives be similar, I use the
    same comment character for both.
  * Perhaps merge directives could be fixed to not be inflexible about
    the comment character used, if someone feels highly motivated, but
    I think that should be done in a separate follow-on patch.

Here are (some of?) the locations where '#' has already been hardcoded
for a long time for merges:

  1) In check_label_or_ref_arg():
	case TODO_LABEL:
		/*
		 * '#' is not a valid label as the merge command uses it to
		 * separate merge parents from the commit subject.
		 */

  2) In do_merge():

	/*
	 * For octopus merges, the arg starts with the list of revisions to be
	 * merged. The list is optionally followed by '#' and the oneline.
	 */
	merge_arg_len = oneline_offset = arg_len;
	for (p = arg; p - arg < arg_len; p += strspn(p, " \t\n")) {
		if (!*p)
			break;
		if (*p == '#' && (!p[1] || isspace(p[1]))) {

  3) In label_oid():

		if ((buf->len == the_hash_algo->hexsz &&
		     !get_oid_hex(label, &dummy)) ||
		    (buf->len == 1 && *label == '#') ||
		    hashmap_get_from_hash(&state->labels,
					  strihash(label), label)) {
			/*
			 * If the label already exists, or if the label is a
			 * valid full OID, or the label is a '#' (which we use
			 * as a separator between merge heads and oneline), we
			 * append a dash and a number to make it unique.
			 */

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:28:27 -07:00
Derrick Stolee
c178b02e29 pack-objects: allow --shallow and --path-walk
There does not appear to be anything particularly incompatible about the
--shallow and --path-walk options of 'git pack-objects'. If shallow
commits are to be handled differently, then it is by the revision walk
that defines the commit set and which are interesting or uninteresting.

However, before the previous change, a trivial removal of the warning
would cause a failure in t5500-fetch-pack.sh when
GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more
objects than we desired, due to some incorrect behavior of the path-walk
API, especially around walking uninteresting objects.

The recently-added tests in t5538-push-shallow.sh help to confirm this
behavior is working with the --path-walk option if
GIT_TEST_PACK_PATH_WALK is enabled. These tests passed previously due to
the --path-walk feature being disabled in the presence of a shallow
clone.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:41 -07:00
Derrick Stolee
4705889c3d path-walk: add new 'edge_aggressive' option
In preparation for allowing both the --shallow and --path-walk options
in the 'git pack-objects' builtin, create a new 'edge_aggressive' option
in the path-walk API. This option will help walk the boundary more
thoroughly and help avoid sending extra objects during fetches and
pushes.

The only use of the 'edge_hint_aggressive' option in the revision API is
within mark_edges_uninteresting(), which is usually called before
between prepare_revision_walk() and before visiting commits with
get_revision(). In prepare_revision_walk(), the UNINTERESTING commits
are walked until a boundary is found.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:40 -07:00
Derrick Stolee
206a1bb203 pack-objects: refactor path-walk delta phase
Previously, the --path-walk option to 'git pack-objects' would compute
deltas inline with the path-walk logic. This would make the progress
indicator look like it is taking a long time to enumerate objects, and
then very quickly computed deltas.

Instead of computing deltas on each region of objects organized by tree,
store a list of regions corresponding to these groups. These can later
be pulled from the list for delta compression before doing the "global"
delta search.

This presents a new progress indicator that can be used in tests to
verify that this stage is happening.

The current implementation is not integrated with threads, but we are
setting it up to arrive in the next change.

Since we do not attempt to sort objects by size until after exploring
all trees, we can remove the previous change to t5530 due to a different
error message appearing first.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:40 -07:00
Derrick Stolee
4f7f571204 pack-objects: enable --path-walk via config
Users may want to enable the --path-walk option for 'git pack-objects' by
default, especially underneath commands like 'git push' or 'git repack'.

This should be limited to client repositories, since the --path-walk option
disables bitmap walks, so would be bad to include in Git servers when
serving fetches and clones. There is potential that it may be helpful to
consider when repacking the repository, to take advantage of improved deltas
across historical versions of the same files.

Much like how "pack.useSparse" was introduced and included in
"feature.experimental" before being enabled by default, use the repository
settings infrastructure to make the new "pack.usePathWalk" config enabled by
"feature.experimental" and "feature.manyFiles".

In order to test that this config works, add a new trace2 region around
the path walk code that can be checked by a 'git push' command.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:39 -07:00
Derrick Stolee
5f711504d9 repack: add --path-walk option
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.

Add the --path-walk option to the performance tests in p5313.

For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the --path-walk tests in p5313 look like this:

Test                                                     this tree
-------------------------------------------------------------------------
5313.18: thin pack with --path-walk                      0.08(0.06+0.02)
5313.19: thin pack size with --path-walk                           18.4K
5313.20: big pack with --path-walk                       2.10(7.80+0.26)
5313.21: big pack size with --path-walk                            19.8M
5313.22: shallow fetch pack with --path-walk             1.62(3.38+0.17)
5313.23: shallow pack size with --path-walk                        33.6M
5313.24: repack with --path-walk                         81.29(96.08+0.71)
5313.25: repack size with --path-walk                             142.5M

[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08

Along with the earlier tests in p5313, I'll instead reformat the
comparison as follows:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1             439.4M      87.24s
Hash v2             161.7M      21.51s
Path Walk           142.5M      81.29s

There are a few things to notice here:

 1. The benefits of --name-hash-version=2 over --name-hash-version=1 are
    significant, but --path-walk still compresses better than that
    option.

 2. The --path-walk command is still using --name-hash-version=1 for the
    second pass of delta computation, using the increased name hash
    collisions as a potential method for opportunistic compression on
    top of the path-focused compression.

 3. The --path-walk algorithm is currently sequential and does not use
    multiple threads for delta compression. Threading will be
    implemented in a future change so the computation time will improve
    to better compete in this metric.

There are small benefits in size for my copy of the Git repository:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1             248.8M      30.44s
Hash v2             249.0M      30.15s
Path Walk           213.2M     142.50s

As well as in the nodejs/node repository [3]:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1             739.9M      71.18s
Hash v2             764.6M      67.82s
Path Walk           698.1M     208.10s

[3] https://github.com/nodejs/node

This benefit also repeats in my copy of the Linux kernel repository:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1               2.5G     554.41s
Hash v2               2.5G     549.62s
Path Walk             2.2G    1562.36s

It is important to see that even when the repository shape does not have
many name-hash collisions, there is a slight space boost to be found
using this method.

As this repacking strategy was released in Git for Windows 2.47.0, some
users have reported cases where the --path-walk compression is slightly
worse than the --name-hash-version=2 option. In those cases, it may be
beneficial to combine the two options. However, there has not been a
released version of Git that has both options and I don't have access to
these repos for testing.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:39 -07:00
Derrick Stolee
6e95bf80b5 t5538: add tests to confirm deltas in shallow pushes
It can be notoriously difficult to detect if delta bases are being
computed properly during 'git push'. Construct an example where it will
make a kilobyte worth of difference when a delta base is not found. We
can then use the progress indicators to distinguish between bytes and
KiB depending on whether the delta base is found and used.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:39 -07:00
Derrick Stolee
861d4bc292 pack-objects: introduce GIT_TEST_PACK_PATH_WALK
There are many tests that validate whether 'git pack-objects' works as
expected. Instead of duplicating these tests, add a new test environment
variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default
when specified.

This was useful in testing the implementation of the --path-walk
implementation, helping to find tests that are overly specific to the
default object walk. These include:

 - t0411-clone-from-partial.sh : One test fetches from a repo that does
   not have the boundary objects. This causes the path-based walk to
   fail. Disable the variable for this test.

 - t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo
   without a boundary object.

 - t5310-pack-bitmaps.sh : One test compares the case when packing with
   bitmaps to the case when packing without them. Since we disable the
   test variable when writing bitmaps, this causes a difference in the
   object list (the --path-walk option adds an extra object). Specify
   --no-path-walk in both processes for the comparison. Another test
   checks for a specific delta base, but when computing dynamically
   without using bitmaps, the base object it too small to be considered
   in the delta calculations so no base is used.

 - t5316-pack-delta-depth.sh : This script cares about certain delta
   choices and their chain lengths. The --path-walk option changes how
   these chains are selected, and thus changes the results of this test.

 - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of
   the --sparse option and how it combines with --path-walk.

 - t5332-multi-pack-reuse.sh : This test verifies that the preferred
   pack is used for delta reuse when possible. The --path-walk option is
   not currently aware of the preferred pack at all, so finds a
   different delta base.

 - t7406-submodule-update.sh : When using the variable, the --depth
   option collides with the --path-walk feature, resulting in a warning
   message. Disable the variable so this warning does not appear.

I want to call out one specific test change that is only temporary:

 - t5530-upload-pack-error.sh : One test cares specifically about an
   "unable to read" error message. Since the current implementation
   performs delta calculations within the path-walk API callback, a
   different "unable to get size" error message appears. When this
   is changed in a future refactoring, this test change can be reverted.

Similar to GIT_TEST_NAME_HASH_VERSION, we do not add this option to the
linux-TEST-vars CI build as that's already an overloaded build.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:39 -07:00
Derrick Stolee
3ce9e5f293 p5313: add performance tests for --path-walk
The previous change added a --path-walk option to 'git pack-objects'.
Create a performance test that demonstrates the time and space benefits
of the feature.

In order to get an appropriate comparison, we need to avoid reusing
deltas and recompute them from scratch.

Compare the creation of a thin pack representing a small push and the
creation of a relatively large non-thin pack.

Running on my copy of the Git repository results in this data (removing
the repack tests for --name-hash-version):

Test                                                     this tree
------------------------------------------------------------------------
5313.2: thin pack with --name-hash-version=1             0.02(0.01+0.01)
5313.3: thin pack size with --name-hash-version=1                   1.6K
5313.4: big pack with --name-hash-version=1              2.55(4.20+0.26)
5313.5: big pack size with --name-hash-version=1                   16.4M
5313.6: shallow fetch pack with --name-hash-version=1    1.24(2.03+0.08)
5313.7: shallow pack size with --name-hash-version=1               12.2M
5313.10: thin pack with --name-hash-version=2            0.03(0.01+0.01)
5313.11: thin pack size with --name-hash-version=2                  1.6K
5313.12: big pack with --name-hash-version=2             1.91(3.23+0.20)
5313.13: big pack size with --name-hash-version=2                  16.4M
5313.14: shallow fetch pack with --name-hash-version=2   1.06(1.57+0.10)
5313.15: shallow pack size with --name-hash-version=2              12.5M
5313.18: thin pack with --path-walk                      0.03(0.01+0.01)
5313.19: thin pack size with --path-walk                            1.6K
5313.20: big pack with --path-walk                       2.05(3.24+0.27)
5313.21: big pack size with --path-walk                            16.3M
5313.22: shallow fetch pack with --path-walk             1.08(1.66+0.07)
5313.23: shallow pack size with --path-walk                        12.4M

This can be reformatted as follows:

Pack Type            Hash v1   Hash v2     Path Walk
---------------------------------------------------
thin pack    (time)    0.02s      0.03s      0.03s
             (size)    1.6K       1.6K       1.6K
big pack     (time)    2.55s      1.91s      2.05s
             (size)   16.4M      16.4M      16.3M
shallow pack (time)    1.24s      1.06s      1.08s
             (size)   12.2M      12.5M      12.4M

Note that the timing is slower because there is no threading in the
--path-walk case (yet). Also, the shallow pack cases are really not
using the --path-walk logic right now because it is disabled until some
additions are made to the path walk API.

The cases where the --path-walk option really shines is when the default
name-hash is overwhelmed with unhelpful collisions. An open source
example can be found in the microsoft/fluentui repo [1] at a certain
commit [2].

[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08

Running the tests on this repo results in the following comparison table:

Pack Type            Hash v1    Hash v2    Path Walk
---------------------------------------------------
thin pack    (time)    0.36s      0.12s      0.08s
             (size)    1.2M      22.0K      18.4K
big pack     (time)    2.00s      2.90s      2.21s
             (size)   20.4M      25.9M      19.5M
shallow pack (time)    1.41s      1.80s      1.65s
             (size)   34.4M      33.7M      33.6M

Notice in particular that in the small thin pack, the time performance
has improved from 0.36s for --name-hash-version=1 to 0.08s and this is
likely due to the improved size of the resulting pack: 18.4K instead of
1.2M.  The relatively new --name-hash-version=2 is competitive with
--path-walk (0.12s and 22.0K) but not quite as successful.

Finally, running this on a copy of the Linux kernel repository results
in these data points:

Pack Type            Hash v1    Hash v2    Path Walk
---------------------------------------------------
thin pack    (time)    0.03s      0.13s      0.03s
             (size)    4.6K       4.6K       4.6K
big pack     (time)   15.29s     12.32s     13.92s
             (size)  201.1M     159.1M     158.5M
shallow pack (time)   10.88s     22.93s     22.74s
             (size)  269.2M     273.8M     267.7M

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:38 -07:00
Derrick Stolee
9fcfe12ac4 pack-objects: update usage to match docs
The t0450 test script verifies that builtin usage matches the synopsis
in the documentation. Adjust the builtin to match and then remove 'git
pack-objects' from the exception list.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:38 -07:00
Derrick Stolee
70664d2865 pack-objects: add --path-walk option
In order to more easily compute delta bases among objects that appear at
the exact same path, add a --path-walk option to 'git pack-objects'.

This option will use the path-walk API instead of the object walk given
by the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.

The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.

After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.

The current implementation performs delta calculations while walking
objects, which is not ideal for a few reasons. First, this will cause
the "Enumerating objects" phase to be much longer than usual. Second, it
does not take advantage of threading during the path-scoped delta
calculations. Even with this lack of threading, the path-walk option is
sometimes faster than the usual approach. Future changes will refactor
this code to allow for threading, but that complexity is deferred until
later to keep this patch as simple as possible.

This new walk is incompatible with some features and is ignored by
others:

 * Object filters are not currently integrated with the path-walk API,
   such as sparse-checkout or tree depth. A blobless packfile could be
   integrated easily, but that is deferred for later.

 * Server-focused features such as delta islands, shallow packs, and
   using a bitmap index are incompatible with the path-walk API.

 * The path walk API is only compatible with the --revs option, not
   taking object lists or pack lists over stdin. These alternative ways
   to specify the objects currently ignores the --path-walk option
   without even a warning.

Future changes will create performance tests that demonstrate the power
of this approach.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:38 -07:00
Derrick Stolee
ecf9ba20e3 p2000: add performance test for patch-mode commands
The previous three changes contributed performance improvements to 'git
apply', 'git add -p', and 'git reset -p' when using a sparse index. The
improvement to 'git apply' also improved 'git checkout -p'. Add
performance tests to demonstrate this (and to help validate that
performance remains good in the future).

In the truncated test output below, we see that the full checkout
performance changes within noise expectations, but the sparse index
cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for
'git reset -p'. 'git checkout -p' improves immediatley by 91% because it
does not need any change to its builtin.

  Test                                    HEAD~4  HEAD~3       HEAD~2       HEAD~1
  -------------------------------------------------------------------------------------
  2000.118: ... git add -p (full-v3)        0.79  0.79  +0.0%  0.82  +3.8%  0.82  +3.8%
  2000.119: ... git add -p (full-v4)        0.74  0.76  +2.7%  0.74  +0.0%  0.76  +2.7%
  2000.120: ... git add -p (sparse-v3)      1.94  1.28 -34.0%  0.07 -96.4%  0.07 -96.4%
  2000.121: ... git add -p (sparse-v4)      1.93  1.28 -33.7%  0.06 -96.9%  0.06 -96.9%
  2000.122: ... git checkout -p (full-v3)   1.18  1.18  +0.0%  1.18  +0.0%  1.19  +0.8%
  2000.123: ... git checkout -p (full-v4)   1.10  1.12  +1.8%  1.11  +0.9%  1.11  +0.9%
  2000.124: ... git checkout -p (sparse-v3) 1.31  0.11 -91.6%  0.11 -91.6%  0.11 -91.6%
  2000.125: ... git checkout -p (sparse-v4) 1.29  0.11 -91.5%  0.11 -91.5%  0.11 -91.5%
  2000.126: ... git reset -p (full-v3)      0.81  0.80  -1.2%  0.83  +2.5%  0.83  +2.5%
  2000.127: ... git reset -p (full-v4)      0.78  0.77  -1.3%  0.77  -1.3%  0.78  +0.0%
  2000.128: ... git reset -p (sparse-v3)    1.58  0.92 -41.8%  0.91 -42.4%  0.07 -95.6%
  2000.129: ... git reset -p (sparse-v4)    1.58  0.92 -41.8%  0.92 -41.8%  0.07 -95.6%

It is worth noting that if our test was more involved and had multiple
hunks to evaluate, then the time spent in 'git apply' would dominate due
to multiple index loads and writes. As it stands, we need the sparse
index improvement in 'git add -p' itself to confirm this performance
improvement.

Since the change for 'git add -i' is identical, we avoid a second test
case for that similar operation.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:02:47 -07:00
Derrick Stolee
efab7dc1f4 reset: integrate sparse index with --patch
Similar to the previous change for 'git add -p', the reset builtin
checked for integration with the sparse index after possibly redirecting
its logic toward the interactive logic. This means that the builtin
would expand the sparse index to a full one upon read.

Move this check earlier within cmd_reset() to improve performance here.

Add tests to guarantee that we are not universally expanding the index.
Add behavior tests to check that we are doing the same operations as a
full index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:02:47 -07:00
Derrick Stolee
02ed8555f6 git add: make -p/-i aware of sparse index
It is slow to expand a sparse index in-memory due to parsing of trees.
We aim to minimize that performance cost when possible. 'git add -p'
uses 'git apply' child processes to modify the index, but still there
are some expansions that occur.

It turns out that control flows out of cmd_add() in the interactive
cases before the lines that confirm that the builtin is integrated with
the sparse index.

Moving that integration point earlier in cmd_add() allows 'git add -i'
and 'git add -p' to operate without expanding a sparse index to a full
one.

Add test cases that confirm that these interactive add options work with
the sparse index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:01:51 -07:00
Derrick Stolee
952de281fe apply: integrate with the sparse index
The sparse index allows storing directory entries in the index, marked
with the skip-wortkree bit and pointing to a tree object. This may be an
unexpected data shape for some implementation areas, so we are rolling
it out incrementally on a builtin-per-builtin basis.

This change enables the sparse index for 'git apply'. The main
motivation for this change is that 'git apply' is used as a child
process of 'git add -p' and expanding the sparse index for each of those
child processes can lead to significant performance issues.

The good news is that the actual index manipulation code used by 'git
apply' is already integrated with the sparse index, so the only product
change is to mark the builtin as allowing the sparse index so it isn't
inflated on read.

The more involved part of this change is around adding tests that verify
how 'git apply' behaves in a sparse-checkout environment and whether or
not the index expands in certain operations.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:00:33 -07:00
Moumita Dhar
ea8a71b40d userdiff: extend Bash pattern to cover more shell function forms
The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 11:52:41 -07:00
Jeff King
65a6a79b42 hash-object: stop allowing unknown types
When passed the "--literally" option, hash-object will allow any
arbitrary string for its "-t" type option. Such objects are only useful
for testing or debugging, as they cannot be used in the normal way
(e.g., you cannot fetch their contents!).

Let's drop this feature, which will eventually let us simplify the
object-writing code. This is technically backwards incompatible, but
since such objects were never really functional, it seems unlikely that
anybody will notice.

We will retain the --literally flag, as it also instructs hash-object
not to worry about other format issues (e.g., type-specific things that
fsck would complain about). The documentation does not need to be
updated, as it was always vague about which checks we're loosening (it
uses only the phrase "any garbage").

The code change is a bit hard to verify from just the patch text. We can
drop our local hash_literally() helper, but it was really just wrapping
write_object_file_literally(). We now replace that with calling
index_fd(), as we do for the non-literal code path, but dropping the
INDEX_FORMAT_CHECK flag. This ends up being the same semantically as
what the _literally() code path was doing (modulo handling unknown
types, which is our goal).

We'll be able to clean up these code paths a bit more in subsequent
patches.

The existing test is flipped to show that we now reject the unknown
type. The additional "extra-long type" test is now redundant, as we bail
early upon seeing a bogus type.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00
Jeff King
b5643b60ac t: add lib-loose.sh
This commit adds a shell library for writing raw loose objects into the
object database. Normally this is done with hash-object, but the
specific intent here is to allow broken objects that hash-object may not
support.

We'll convert several cases that use "hash-object --literally" to write
objects with invalid types. That works currently, but dropping this
dependency will allow us to remove that feature and simplify the
object-writing code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00