Commit Graph

75368 Commits

Author SHA1 Message Date
Junio C Hamano
5193aee2a3 Merge branch 'ne/doc-filter-blob-limit-fix' into maint-2.43
Docfix.

* ne/doc-filter-blob-limit-fix:
  rev-list-options: fix off-by-one in '--filter=blob:limit=<n>' explainer
2024-02-13 14:44:49 -08:00
Junio C Hamano
7687ca5a90 Merge branch 'cp/git-flush-is-an-env-bool' into maint-2.43
Recent conversion to allow more than 0/1 in GIT_FLUSH broke the
mechanism by flipping what yes/no means by mistake, which has been
corrected.

* cp/git-flush-is-an-env-bool:
  write-or-die: fix the polarity of GIT_FLUSH environment variable
2024-02-13 14:44:49 -08:00
Junio C Hamano
bd10c45672 Merge branch 'ps/report-failure-from-git-stash' into maint-2.43
"git stash" sometimes was silent even when it failed due to
unwritable index file, which has been corrected.

* ps/report-failure-from-git-stash:
  builtin/stash: report failure to write to index
2024-02-13 14:44:49 -08:00
Junio C Hamano
07fa383615 Merge branch 'jc/sign-buffer-failure-propagation-fix' into maint-2.43
A failed "git tag -s" did not necessarily result in an error
depending on the crypto backend, which has been corrected.

* jc/sign-buffer-failure-propagation-fix:
  ssh signing: signal an error with a negative return value
  tag: fix sign_buffer() call to create a signed tag
2024-02-13 14:44:48 -08:00
Junio C Hamano
a1cd814f1f Merge branch 'jc/comment-style-fixes' into maint-2.43
Rewrite //-comments to /* comments */ in files whose comments
prevalently use the latter.

* jc/comment-style-fixes:
  reftable/pq_test: comment style fix
  merge-ort.c: comment style fix
  builtin/worktree: comment style fixes
2024-02-13 14:44:48 -08:00
Junio C Hamano
5071cb78a3 Merge branch 'jk/diff-external-with-no-index' into maint-2.43
"git diff --no-index file1 file2" segfaulted while invoking the
external diff driver, which has been corrected.

* jk/diff-external-with-no-index:
  diff: handle NULL meta-info when spawning external diff
2024-02-13 14:44:48 -08:00
Junio C Hamano
d982de5d32 Merge branch 'rs/parse-options-with-keep-unknown-abbrev-fix' into maint-2.43
"git diff --no-rename A B" did not disable rename detection but did
not trigger an error from the command line parser.

* rs/parse-options-with-keep-unknown-abbrev-fix:
  parse-options: simplify positivation handling
  parse-options: fully disable option abbreviation with PARSE_OPT_KEEP_UNKNOWN
2024-02-13 14:44:48 -08:00
Junio C Hamano
904ca69428 Merge branch 'en/diffcore-delta-final-line-fix' into maint-2.43
Rename detection logic ignored the final line of a file if it is an
incomplete line.

* en/diffcore-delta-final-line-fix:
  diffcore-delta: avoid ignoring final 'line' of file
2024-02-13 14:44:48 -08:00
Junio C Hamano
908fde12b0 Merge branch 'tc/show-ref-exists-fix' into maint-2.43
Update to a new feature recently added, "git show-ref --exists".

* tc/show-ref-exists-fix:
  builtin/show-ref: treat directory as non-existing in --exists
2024-02-13 14:44:47 -08:00
Junio C Hamano
4cde9f0726 A few more fixes before -rc1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-13 14:31:12 -08:00
Junio C Hamano
4abab9e51a Merge branch 'cp/git-flush-is-an-env-bool'
Recent conversion to allow more than 0/1 in GIT_FLUSH broke the
mechanism by flipping what yes/no means by mistake, which has been
corrected.

* cp/git-flush-is-an-env-bool:
  write-or-die: fix the polarity of GIT_FLUSH environment variable
2024-02-13 14:31:12 -08:00
Junio C Hamano
9115864cb5 Merge branch 'jc/unit-tests-make-relative-fix'
The mechanism to report the filename in the source code, used by
the unit-test machinery, assumed that the compiler expanded __FILE__
to the path to the source given to the $(CC), but some compilers
give full path, breaking the output.  This has been corrected.

* jc/unit-tests-make-relative-fix:
  unit-tests: do show relative file paths on non-Windows, too
2024-02-13 14:31:11 -08:00
Junio C Hamano
c2914d4677 Merge branch 'js/github-actions-update'
Update remaining GitHub Actions jobs to avoid warnings against
using deprecated version of Node.js.

* js/github-actions-update:
  ci(linux32): add a note about Actions that must not be updated
  ci: bump remaining outdated Actions versions
2024-02-13 14:31:11 -08:00
Junio C Hamano
133a7b08dc Merge branch 'jc/github-actions-update'
Squelch node.js 16 deprecation warnings from GitHub Actions CI
by updating actions/github-script and actions/checkout that use
node.js 20.

* jc/github-actions-update:
  GitHub Actions: update to github-script@v7
  GitHub Actions: update to checkout@v4
2024-02-13 14:31:11 -08:00
Ghanshyam Thakkar
7abc1869e5 add -p tests: remove PERL prerequisites
The Perl version of the add -i/-p commands has been removed since
20b813d (add: remove "add.interactive.useBuiltin" & Perl "git
add--interactive", 2023-02-07)

Therefore, Perl prerequisite in the test scripts which use the patch
mode functionality is not neccessary.

Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-13 14:12:53 -08:00
Ghanshyam Thakkar
5a8ed3fe45 add-patch: classify '@' as a synonym for 'HEAD'
Currently, (restore, checkout, reset) commands correctly take '@' as a
synonym for 'HEAD'. However, in patch mode different prompts/messages
are given on command line due to patch mode machinery not considering
'@' to be a synonym for 'HEAD' due to literal string comparison with
the word 'HEAD', and therefore assigning patch_mode_($command)_nothead
and triggering reverse mode (-R in diff-index). The NEEDSWORK comment
suggested comparing commit objects to get around this. However, doing
so would also take a non-checked out branch pointing to the same commit
as HEAD, as HEAD. This would cause confusion to the user.

Therefore, after parsing '@', replace it with 'HEAD' as reasonably
early as possible. This also solves another problem of disparity
between 'git checkout HEAD' and 'git checkout @' (latter detaches at
the HEAD commit and the former does not).

Trade-offs:
- Some of the errors would show the revision argument as 'HEAD' when
  given '@'. This should be fine, as most users who probably use '@'
  would be aware that it is a shortcut for 'HEAD' and most probably
  used to use 'HEAD'. There is also relevant documentation in
  'gitrevisions' manpage about '@' being the shortcut for 'HEAD'. Also,
  the simplicity of the solution far outweighs this cost.

- Consider '@' as a shortcut for 'HEAD' even if 'refs/heads/@' exists
  at a different commit. Naming a branch '@' is an obvious foot-gun and
  many existing commands already take '@' for 'HEAD' even if
  'refs/heads/@' exists at a different commit or does not exist at all
  (e.g. 'git log @', 'git push origin @' etc.). Therefore this is an
  existing assumption and should not be a problem.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-13 14:12:51 -08:00
Junio C Hamano
c784b0a5b9 git: --no-lazy-fetch option
Sometimes, especially during tests of low level machinery, it is
handy to have a way to disable lazy fetching of objects.  This
allows us to say, for example, "git cat-file -e <object-name>", to
see if the object is locally available.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-13 12:53:15 -08:00
Junio C Hamano
b40ba17e44 write-or-die: fix the polarity of GIT_FLUSH environment variable
When GIT_FLUSH is set to 1, true, on, yes, then we should disable
skip_stdout_flush, but the conversion somehow did the opposite.

With the understanding of the original motivation behind "skip" in
06f59e9f (Don't fflush(stdout) when it's not helpful, 2007-06-29),
we can sympathize with the current naming (we wanted to avoid
useless flushing of stdout by default, with an escape hatch to
always flush), but it is still not a good excuse.

Retire the "skip_stdout_flush" variable and replace it with "flush_stdout"
that tells if we do or do not want to run fflush().

Reported-by: Xiaoguang WANG <wxiaoguang@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-13 11:57:28 -08:00
Kristoffer Haugsbakk
76fb807faa column: guard against negative padding
Make sure that client code can’t pass in a negative padding by accident.

Suggested-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-13 10:18:57 -08:00
Kristoffer Haugsbakk
f2d31c69ce column: disallow negative padding
A negative padding does not make sense and can cause errors in the
memory allocator since it’s interpreted as an unsigned integer.

Reported-by: Tiago Pascoal <tiago@pascoal.net>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-13 10:18:50 -08:00
Junio C Hamano
2996f11c1d Sync with 'maint' 2024-02-12 13:17:06 -08:00
Junio C Hamano
ad1a669545 A few more topics before -rc1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 13:16:12 -08:00
Junio C Hamano
3b89ff16aa Merge branch 'tb/multi-pack-reuse-experiment'
Setting `feature.experimental` opts the user into multi-pack reuse
experiment

* tb/multi-pack-reuse-experiment:
  pack-objects: enable multi-pack reuse via `feature.experimental`
  t5332-multi-pack-reuse.sh: extract pack-objects helper functions
2024-02-12 13:16:11 -08:00
Junio C Hamano
d4833b22ab Merge branch 'vd/for-each-ref-sort-with-formatted-timestamp'
"git branch" and friends learned to use the formatted text as
sorting key, not the underlying timestamp value, when the --sort
option is used with author or committer timestamp with a format
specifier (e.g., "--sort=creatordate:format:%H:%M:%S").

* vd/for-each-ref-sort-with-formatted-timestamp:
  ref-filter.c: sort formatted dates by byte value
2024-02-12 13:16:11 -08:00
Junio C Hamano
b3370dd51e Merge branch 'pw/show-ref-pseudorefs'
"git show-ref --verify" did not show things like "CHERRY_PICK_HEAD",
which has been corrected.

* pw/show-ref-pseudorefs:
  t1400: use show-ref to check pseudorefs
  show-ref --verify: accept pseudorefs
2024-02-12 13:16:11 -08:00
Junio C Hamano
70550a2242 Merge branch 'ps/report-failure-from-git-stash'
"git stash" sometimes was silent even when it failed due to
unwritable index file, which has been corrected.

* ps/report-failure-from-git-stash:
  builtin/stash: report failure to write to index
2024-02-12 13:16:11 -08:00
Junio C Hamano
32c5ab6ee4 Merge branch 'pb/template-for-single-commit-pr'
Doc update.

* pb/template-for-single-commit-pr:
  .github/PULL_REQUEST_TEMPLATE.md: add a note about single-commit PRs
2024-02-12 13:16:11 -08:00
Junio C Hamano
05c5a6db80 Merge branch 'jc/sign-buffer-failure-propagation-fix'
A failed "git tag -s" did not necessarily result in an error
depending on the crypto backend, which has been corrected.

* jc/sign-buffer-failure-propagation-fix:
  ssh signing: signal an error with a negative return value
  tag: fix sign_buffer() call to create a signed tag
2024-02-12 13:16:11 -08:00
Junio C Hamano
13fdf82e09 Merge branch 'jc/bisect-doc'
Doc update.

* jc/bisect-doc:
  bisect: document command line arguments for "bisect start"
  bisect: document "terms" subcommand more fully
2024-02-12 13:16:10 -08:00
Junio C Hamano
46761378c3 Merge branch 'bk/complete-bisect'
Command line completion support (in contrib/) has been
updated for "git bisect".

* bk/complete-bisect:
  completion: bisect: recognize but do not complete view subcommand
  completion: bisect: complete log opts for visualize subcommand
  completion: new function __git_complete_log_opts
  completion: bisect: complete missing --first-parent and - -no-checkout options
  completion: bisect: complete custom terms and related options
  completion: bisect: complete bad, new, old, and help subcommands
  completion: tests: always use 'master' for default initial branch name
2024-02-12 13:16:10 -08:00
Junio C Hamano
f424d7c33d Merge branch 'ps/reftable-styles'
Code clean-up in various reftable code paths.

* ps/reftable-styles:
  reftable/record: improve semantics when initializing records
  reftable/merged: refactor initialization of iterators
  reftable/merged: refactor seeking of records
  reftable/stack: use `size_t` to track stack length
  reftable/stack: use `size_t` to track stack slices during compaction
  reftable/stack: index segments with `size_t`
  reftable/stack: fix parameter validation when compacting range
  reftable: introduce macros to allocate arrays
  reftable: introduce macros to grow arrays
2024-02-12 13:16:10 -08:00
Junio C Hamano
cf4a3bd8f1 Merge branch 'ps/reftable-multi-level-indices-fix'
Write multi-level indices for reftable has been corrected.

* ps/reftable-multi-level-indices-fix:
  reftable: document reading and writing indices
  reftable/writer: fix writing multi-level indices
  reftable/writer: simplify writing index records
  reftable/writer: use correct type to iterate through index entries
  reftable/reader: be more careful about errors in indexed seeks
2024-02-12 13:16:10 -08:00
Junio C Hamano
c684b582bc Merge branch 'ps/reftable-backend' into kn/for-all-refs
* ps/reftable-backend:
  refs/reftable: fix leak when copying reflog fails
  ci: add jobs to test with the reftable backend
  refs: introduce reftable backend
2024-02-12 10:09:19 -08:00
Junio C Hamano
7adf215fed Merge branch 'pb/imap-send-wo-curl-build-fix' into maint-2.43
* pb/imap-send-wo-curl-build-fix:
  imap-send: add missing "strbuf.h" include under NO_CURL
2024-02-12 09:57:59 -08:00
Philippe Blain
6e32f718ff completion: add and use __git_compute_second_level_config_vars_for_section
In a previous commit we removed some hardcoded config variable names from
function __git_complete_config_variable_name in the completion script by
introducing a new function,
__git_compute_first_level_config_vars_for_section.

The remaining hardcoded config variables are "second level"
configuration variables, meaning 'branch.<name>.upstream',
'remote.<name>.url', etc. where <name> is a user-defined name.

Making use of the new existing --config flag to 'git help', add a new
function, __git_compute_second_level_config_vars_for_section. This
function takes as argument a config section name and computes the
corresponding second-level config variables, i.e. those that contain a
'<' which indicates the start of a placeholder. Note that as in
__git_compute_first_level_config_vars_for_section added previsouly, we
use indirect expansion instead of associative arrays to stay compatible
with Bash 3 on which macOS is stuck for licensing reasons.

As explained in the previous commit, we use the existing pattern in the
completion script of using global variables to cache the list of
variables for each section.

Use this new function and the variables it defines in
__git_complete_config_variable_name to remove hardcoded config
variables, and add a test to verify the new function.  Use a single
'case' for all sections with second-level variables names, since the
code for each of them is now exactly the same.

Adjust the name of a test added in a previous commit to reflect that it
now tests the added function.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:43:42 -08:00
Philippe Blain
1e0ee4087e completion: add and use __git_compute_first_level_config_vars_for_section
The function __git_complete_config_variable_name in the Bash completion
script hardcodes several config variable names. These variables are
those in config sections where user-defined names can appear, such as
"branch.<name>". These sections are treated first by the case statement,
and the two last "catch all" cases are used for other sections, making
use of the __git_compute_config_vars and __git_compute_config_sections
function, which omit listing any variables containing wildcards or
placeholders. Having hardcoded config variables introduces the risk of
the completion code becoming out of sync with the actual config
variables accepted by Git.

To avoid these hardcoded config variables, introduce a new function,
__git_compute_first_level_config_vars_for_section, making use of the
existing __git_config_vars variable. This function takes as argument a
config section name and computes the matching "first level" config
variables for that section, i.e. those _not_ containing any placeholder,
like 'branch.autoSetupMerge, 'remote.pushDefault', etc.  Use this
function and the variables it defines in the 'branch.*', 'remote.*' and
'submodule.*' switches of the case statement instead of hardcoding the
corresponding config variables.  Note that we use indirect expansion to
create a variable for each section, instead of using a single
associative array indexed by section names, because associative arrays
are not supported in Bash 3, on which macOS is stuck for licensing
reasons.

Use the existing pattern in the completion script of using global
variables to cache the list of config variables for each section. The
rationale for such caching is explained in eaa4e6ee2a (Speed up bash
completion loading, 2009-11-17), and the current approach to using and
defining them via 'test -n' is explained in cf0ff02a38 (completion: work
around zsh option propagation bug, 2012-02-02).

Adjust the name of one of the tests added in the previous commit,
reflecting that it now also tests the new function.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:43:42 -08:00
Philippe Blain
b1d0cc68d1 completion: complete 'submodule.*' config variables
In the Bash completion script, function
__git_complete_config_variable_name completes config variables and has
special logic to deal with config variables involving user-defined
names, like branch.<name>.* and remote.<name>.*.

This special logic is missing for submodule-related config variables.
Add the appropriate branches to the case statement, making use of the
in-tree '.gitmodules' to list relevant submodules.

Add corresponding tests in t9902-completion.sh, making sure we complete
both first level submodule config variables as well as second level
variables involving submodule names.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:43:42 -08:00
Philippe Blain
30bd55f901 completion: add space after config variable names also in Bash 3
In be6444d1ca (completion: bash: add correct suffix in variables,
2021-08-16), __git_complete_config_variable_name was changed to use
"${sfx- }" instead of "$sfx" as the fourth argument of _gitcomp_nl and
_gitcomp_nl_append, such that this argument evaluates to a space if sfx
is unset. This was to ensure that e.g.

	git config branch.autoSetupMe[TAB]

correctly completes to 'branch.autoSetupMerge ' with the trailing space.
This commits notes that the fix only works in Bash 4 because in Bash 3
the 'local sfx' construct at the beginning of
__git_complete_config_variable_name creates an empty string.

Make the fix also work for Bash 3 by using the "unset or null' parameter
expansion syntax ("${sfx:- }"), such that the parameter is also expanded
to a space if it is set but null, as is the behaviour of 'local sfx' in
Bash 3.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:43:41 -08:00
René Scharfe
f0e578c69c use xstrncmpz()
Add and apply a semantic patch for calling xstrncmpz() to compare a
NUL-terminated string with a buffer of a known length instead of using
strncmp() and checking the terminating NUL explicitly.  This simplifies
callers by reducing code duplication.

I had to adjust remote.c manually because Coccinelle inexplicably
changed the indent of the else branches.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:32:41 -08:00
René Scharfe
020456cb74 receive-pack: use find_commit_header() in check_nonce()
Use the public function find_commit_header() and remove find_header(),
as it becomes unused.  This is safe and appropriate because we pass the
NUL-terminated payload buffer to check_nonce() instead of its start and
length.  The underlying strbuf push_cert cannot contain NULs, as it is
built using strbuf_addstr(), only.

We no longer need to call strlen(), as find_commit_header() returns the
length of nonce already.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:22:20 -08:00
Patrick Steinhardt
c68ca7abd3 reftable/reader: add comments to table_iter_next()
While working on the optimizations in the preceding patches I stumbled
upon `table_iter_next()` multiple times. It is quite easy to miss the
fact that we don't call `table_iter_next_in_block()` twice, but that the
second call is in fact `table_iter_next_block()`.

Add comments to explain what exactly is going on here to make things
more obvious. While at it, touch up the code to conform to our code
style better.

Note that one of the refactorings merges two conditional blocks into
one. Before, we had the following code:

```
err = table_iter_next_block(&next, ti);
if (err != 0) {
	ti->is_finished = 1;
}
table_iter_block_done(ti);
if (err != 0) {
	return err;
}
```

As `table_iter_block_done()` does not care about `is_finished`, the
conditional blocks can be merged into one block:

```
err = table_iter_next_block(&next, ti);
table_iter_block_done(ti);
if (err != 0) {
	ti->is_finished = 1;
	return err;
}
```

This is both easier to reason about and more performant because we have
one branch less.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:19:27 -08:00
Patrick Steinhardt
a418a7abef reftable/record: don't try to reallocate ref record name
When decoding reftable ref records we first release the pointer to the
record passed to us and then use realloc(3P) to allocate the refname
array. This is a bit misleading though as we know at that point that the
refname will always be `NULL`, so we would always end up allocating a
new char array anyway.

Refactor the code to use `REFTABLE_ALLOC_ARRAY()` instead. As the
following benchmark demonstrates this is a tiny bit more efficient. But
the bigger selling point really is the gained clarity.

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     150.1 ms ±   4.1 ms    [User: 146.6 ms, System: 3.3 ms]
    Range (min … max):   144.5 ms … 180.5 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     148.9 ms ±   4.5 ms    [User: 145.2 ms, System: 3.4 ms]
    Range (min … max):   143.0 ms … 185.4 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Ideally, we should try and reuse the memory of the old record instead of
first freeing and then immediately reallocating it. This requires some
more surgery though and is thus left for a future iteration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:05 -08:00
Patrick Steinhardt
92fa3253c8 reftable/block: swap buffers instead of copying
When iterating towards the next record in a reftable block we need to
keep track of the key that the last record had. This is required because
reftable records use prefix compression, where subsequent records may
reuse parts of their preceding record's key.

This key is stored in the `block_iter::last_key`, which we update after
every call to `block_iter_next()`: we simply reset the buffer and then
add the current key to it.

This is a bit inefficient though because it requires us to copy over the
key on every iteration, which adds up when iterating over many records.
Instead, we can make use of the fact that the `block_iter::key` buffer
is basically only a scratch buffer. So instead of copying over contents,
we can just swap both buffers.

The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1):

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     155.7 ms ±   5.0 ms    [User: 152.1 ms, System: 3.4 ms]
    Range (min … max):   150.8 ms … 185.7 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     150.8 ms ±   4.2 ms    [User: 147.1 ms, System: 3.5 ms]
    Range (min … max):   145.1 ms … 180.7 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.03 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:04 -08:00
Patrick Steinhardt
dbe4e8b3fd reftable/pq: allocation-less comparison of entry keys
The priority queue is used by the merged iterator to iterate over
reftable records from multiple tables in the correct order. The queue
ends up having one record for each table that is being iterated over,
with the record that is supposed to be shown next at the top. For
example, the key of a ref record is equal to its name so that we end up
sorting the priority queue lexicographically by ref name.

To figure out the order we need to compare the reftable record keys with
each other. This comparison is done by formatting them into a `struct
strbuf` and then doing `strbuf_strcmp()` on the result. We then discard
the buffers immediately after the comparison.

This ends up being very expensive. Because the priority queue usually
contains as many records as we have tables, we call the comparison
function `O(log($tablecount))` many times for every record we insert.
Furthermore, when iterating over many refs, we will insert at least one
record for every ref we are iterating over. So ultimately, this ends up
being called `O($refcount * log($tablecount))` many times.

Refactor the code to use the new `refatble_record_cmp()` function that
has been implemented in a preceding commit. This function does not need
to allocate memory and is thus significantly more efficient.

The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1), where the reftable stack
consists of three tables:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     224.4 ms ±   6.5 ms    [User: 220.6 ms, System: 3.6 ms]
    Range (min … max):   216.5 ms … 261.1 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     172.9 ms ±   4.4 ms    [User: 169.2 ms, System: 3.6 ms]
    Range (min … max):   166.5 ms … 204.6 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:04 -08:00
Patrick Steinhardt
5730a9dccf reftable/merged: skip comparison for records of the same subiter
When retrieving the next entry of a merged iterator we need to drop all
records of other sub-iterators that would be shadowed by the record that
we are about to return. We do this by comparing record keys, dropping
all keys that are smaller or equal to the key of the record we are about
to return.

There is an edge case here where we can skip that comparison: when the
record in the priority queue comes from the same subiterator as the
record we are about to return then we know that its key must be larger
than the key of the record we are about to return. This property is
guaranteed by the sub-iterators, and if it didn't hold then the whole
merged iterator would return records in the wrong order, too.

While this may seem like a very specific edge case it's in fact quite
likely to happen. For most repositories out there you can assume that we
will end up with one large table and several smaller ones on top of it.
Thus, it is very likely that the next entry will sort towards the top of
the priority queue.

Special case this and break out of the loop in that case. The following
benchmark uses git-show-ref(1) to print a single ref matching a pattern
out of 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     162.6 ms ±   4.5 ms    [User: 159.0 ms, System: 3.5 ms]
    Range (min … max):   156.6 ms … 188.5 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     156.8 ms ±   4.7 ms    [User: 153.0 ms, System: 3.6 ms]
    Range (min … max):   151.4 ms … 188.4 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:04 -08:00
Patrick Steinhardt
a96e9a20f3 reftable/merged: allocation-less dropping of shadowed records
The purpose of the merged reftable iterator is to iterate through all
entries of a set of tables in the correct order. This is implemented by
using a sub-iterator for each table, where the next entry of each of
these iterators gets put into a priority queue. For each iteration, we
do roughly the following steps:

  1. Retrieve the top record of the priority queue. This is the entry we
     want to return to the caller.

  2. Retrieve the next record of the sub-iterator that this record came
     from. If any, add it to the priority queue at the correct position.
     The position is determined by comparing the record keys, which e.g.
     corresponds to the refname for ref records.

  3. Keep removing the top record of the priority queue until we hit the
     first entry whose key is larger than the returned record's key.
     This is required to drop "shadowed" records.

The last step will lead to at least one comparison to the next entry,
but may lead to many comparisons in case the reftable stack consists of
many tables with shadowed records. It is thus part of the hot code path
when iterating through records.

The code to compare the entries with each other is quite inefficient
though. Instead of comparing record keys with each other directly, we
first format them into `struct strbuf`s and only then compare them with
each other. While we already optimized this code path to reuse buffers
in 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), the cost to format the keys into the buffers still adds up
quite significantly.

Refactor the code to use `reftable_record_cmp()` instead, which has been
introduced in the preceding commit. This function compares records with
each other directly without requiring any memory allocations or copying
and is thus way more efficient.

The following benchmark uses git-show-ref(1) to print a single ref
matching a pattern out of 1 million refs. This is the most direct way to
exercise ref iteration speed as we remove all overhead of having to show
the refs, too.

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     180.7 ms ±   4.7 ms    [User: 177.1 ms, System: 3.4 ms]
      Range (min … max):   174.9 ms … 211.7 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     162.1 ms ±   4.4 ms    [User: 158.5 ms, System: 3.4 ms]
      Range (min … max):   155.4 ms … 189.3 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:04 -08:00
Patrick Steinhardt
adb5d2cbe9 reftable/record: introduce function to compare records by key
In some places we need to sort reftable records by their keys to
determine their ordering. This is done by first formatting the keys into
a `struct strbuf` and then using `strbuf_cmp()` to compare them. This
logic is needlessly roundabout and can end up costing quite a bit of CPU
cycles, both due to the allocation and formatting logic.

Introduce a new `reftable_record_cmp()` function that knows how to
compare two records with each other without requiring allocations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:04 -08:00
Johannes Schindelin
20e0ff8835 ci(linux32): add a note about Actions that must not be updated
The Docker container used by the `linux32` job comes without Node.js,
and therefore the `actions/checkout` and `actions/upload-artifact`
Actions cannot be upgraded to the latest versions (because they use
Node.js).

One time too many, I accidentally tried to update them, where
`actions/checkout` at least fails immediately, but the
`actions/upload-artifact` step is only used when any test fails, and
therefore the CI run usually passes even though that Action was updated
to a version that is incompatible with the Docker container in which
this job runs.

So let's add a big fat warning, mainly for my own benefit, to avoid
running into the very same issue over and over again.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 08:48:22 -08:00
Johannes Schindelin
820a340085 ci: bump remaining outdated Actions versions
After activating automatic Dependabot updates in the
git-for-windows/git repository, Dependabot noticed a couple
of yet-unaddressed updates.  They avoid "Node.js 16 Actions"
deprecation messages by bumping the following Actions'
versions:

- actions/upload-artifact from 3 to 4
- actions/download-artifact from 3 to 4
- actions/cache from 3 to 4

Helped-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 08:47:38 -08:00
Junio C Hamano
f66286364f unit-tests: do show relative file paths on non-Windows, too
There are compilers other than Visual C that want to show absolute
paths.  Generalize the helper introduced by a2c5e294 (unit-tests: do
show relative file paths, 2023-09-25) so that it can also work with
a path that uses slash as the directory separator, and becomes
almost no-op once one-time preparation finds out that we are using a
compiler that already gives relative paths.  Incidentally, this also
should do the right thing on Windows with a compiler that shows
relative paths but with backslash as the directory separator (if
such a thing exists and is used to build git).

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 08:44:22 -08:00