Commit Graph

22764 Commits

Author SHA1 Message Date
Junio C Hamano
e8a0c243f9 Merge branch 'ps/reftable-exclude' into ps/reftable-alloc-failures
* ps/reftable-exclude:
  refs/reftable: wire up support for exclude patterns
  reftable/reader: make table iterator reseekable
  t/unit-tests: introduce reftable library
  Makefile: stop listing test library objects twice
  builtin/receive-pack: fix exclude patterns when announcing refs
  refs: properly apply exclude patterns to namespaced refs
2024-09-16 14:06:31 -07:00
Junio C Hamano
d29fc595c8 Merge branch 'cp/unit-test-reftable-stack' into ps/reftable-alloc-failures
* cp/unit-test-reftable-stack:
  t-reftable-stack: add test for stack iterators
  t-reftable-stack: add test for non-default compaction factor
  t-reftable-stack: use reftable_ref_record_equal() to compare ref records
  t-reftable-stack: use Git's tempfile API instead of mkstemp()
  t: harmonize t-reftable-stack.c with coding guidelines
  t: move reftable/stack_test.c to the unit testing framework
2024-09-16 14:06:06 -07:00
Patrick Steinhardt
1869525066 refs/reftable: wire up support for exclude patterns
Exclude patterns can be used by reference backends to skip over blocks
of references that are uninteresting to the caller. Reference backends
do not have to wire up support for them, and all callers are expected to
behave as if the backend didn't support them. In fact, the only backend
that supports exclude patterns right now is the "packed" backend.

Exclude patterns can be quite an important performance optimization in
repositories that have loads of references. The patterns are set up in
case "transfer.hideRefs" and friends are configured during a fetch, so
handling these patterns becomes important once there are lots of hidden
refs in a served repository.

Now that we have properly re-seekable reftable iterators we can also
wire up support for these patterns in the "reftable" backend. Doing so
is conceptually simple: once we hit a reference whose prefix matches the
current exclude pattern we re-seek the iterator to the first reference
that doesn't match the pattern anymore. This schema only works for
trivial patterns that do not have any globbing characters in them, but
this restriction also applies do the "packed" backend.

This makes t1419 work with the "reftable" backend with some slight
modifications. Of course it also speeds up listing of references with
hidden refs. The following benchmark prints one reference with 1 million
hidden references:

    Benchmark 1: HEAD~
      Time (mean ± σ):      93.3 ms ±   2.1 ms    [User: 90.3 ms, System: 2.5 ms]
      Range (min … max):    89.8 ms …  97.2 ms    33 runs

    Benchmark 2: HEAD
      Time (mean ± σ):       4.2 ms ±   0.6 ms    [User: 2.2 ms, System: 1.8 ms]
      Range (min … max):     3.1 ms …   8.1 ms    765 runs

    Summary
      HEAD ran
       22.15 ± 3.19 times faster than HEAD~

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 13:57:19 -07:00
Patrick Steinhardt
0a148a8eda reftable/reader: make table iterator reseekable
In 67ce50ba26 (Merge branch 'ps/reftable-reusable-iterator', 2024-05-30)
we have refactored the interface of reftable iterators such that they
can be reused in theory. This patch series only landed the required
changes on the interface level, but didn't yet implement the actual
logic to make iterators reusable.

As it turns out almost all of the infrastructure already does support
re-seeking. The only exception is the table iterator, which does not
reset its `is_finished` bit. Do so and add a couple of tests that verify
that we can re-seek iterators.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 13:57:19 -07:00
Patrick Steinhardt
a4f50bb1e9 t/unit-tests: introduce reftable library
We have recently migrated all of the reftable unit tests that were part
of the reftable library into our own unit testing framework. As part of
that migration we have duplicated some of the functionality that was
part of the reftable test framework into each of the migrated test
suites. This was a sensible decision to not have all of the migrations
dependent on each other, but now that the migration is done it makes
sense to deduplicate the functionality again.

Introduce a new reftable test library that hosts some shared code and
adapt tests to use it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 13:57:18 -07:00
Patrick Steinhardt
d8faf50c36 builtin/receive-pack: fix exclude patterns when announcing refs
In `write_head_info()` we announce references to the remote client. We
need to honor "transfer.hideRefs" here so that we do not announce any
references that the client shouldn't be able to learn about. This is
done via two separate mechanisms:

  - We hand over exclude patterns to the reference backend. We can only
    honor "plain" exclude patterns here that do not have prefixes with
    special meaning such as "^" or "!". Filtering down the references is
    handled by `hidden_refs_to_excludes()`.

  - In `show_ref_cb()` we perform a second check against hidden refs.
    For one this is done such that we can handle those special prefixes.
    And second, handling exclude patterns in ref backends is optional,
    so we also have to handle "normal" patterns.

The special-meaning "^" prefix alters whether a hidden ref applies to
the namespace-stripped reference name or the full name. So while we
would usually call `refs_for_each_namespaced_ref()` to only get those
references in the current namespace, we can't because we'd get the
already-rewritten reference names. Instead, we are forced to use
`refs_for_each_fullref_in()` and then manually strip away the namespace
prefix such that we have access to both names.

But this also means that we do not get namespace handling for exclude
patterns, which `refs_for_each_namespaced_ref()` brings for free. This
results in a bug because we potentially end up hiding away references
based on their namespaced name and not on the stripped name as we really
should be doing.

Fix this by manually rewriting the exclude patterns to their namespaced
variants.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 13:57:18 -07:00
Patrick Steinhardt
155dc8447d refs: properly apply exclude patterns to namespaced refs
Reference namespaces allow commands like git-upload-pack(1) to serve
different sets of references to the client depending on which namespace
is enabled, which is for example useful in fork networks. Namespaced
refs are stored with a `refs/namespaces/$namespace` prefix, but all the
user will ultimately see is a stripped version where that prefix is
removed.

The way that this interacts with "transfer.hideRefs" is not immediately
obvious: the hidden refs can either apply to the stripped references, or
to the non-stripped ones that still have the namespace prefix. In fact,
the "transfer.hideRefs" machinery does the former and applies to the
stripped reference by default, but rules can have "^" prefixed to switch
this behaviour to instead match against the full reference name.

Namespaces are exclusively handled at the generic "refs" layer, the
respective backends have no clue that such a thing even exists. This
also has the consequence that they cannot handle hiding references as
soon as reference namespaces come into play because they neither know
whether a namespace is active, nor do they know how to strip references
if they are active.

Handling such exclude patterns in `refs_for_each_namespaced_ref()` and
`refs_for_each_fullref_in_prefixes()` is broken though, as both support
that the user passes both namespaces and exclude patterns. In the case
where both are set we will exclude references with unstripped names,
even though we really wanted to exclude references based on their
stripped names.

This only surfaces when:

  - A repository uses reference namespaces.

  - "transfer.hideRefs" is active.

  - The namespaced references are packed into the "packed-refs" file.

None of our tests exercise this scenario, and thus we haven't ever hit
it. While t5509 exercises both (1) and (2), it does not happen to hit
(3). It is trivial to demonstrate the bug though by explicitly packing
refs in the tests, and then we indeed surface the breakage.

Fix this bug by prefixing exclude patterns with the namespace in the
generic layer. The newly introduced function will be used outside of
"refs.c" in the next patch, so we add a declaration to "refs.h".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 13:57:18 -07:00
Junio C Hamano
77cf81e988 Merge branch 'bl/trailers-and-incomplete-last-line-fix'
The interpret-trailers command failed to recognise the end of the
message when the commit log ends in an incomplete line.

* bl/trailers-and-incomplete-last-line-fix:
  interpret-trailers: handle message without trailing newline
2024-09-13 15:27:45 -07:00
Junio C Hamano
41390eb3e6 Merge branch 'rs/diff-exit-code-fix'
In a few corner cases "git diff --exit-code" failed to report
"changes" (e.g., renamed without any content change), which has
been corrected.

* rs/diff-exit-code-fix:
  diff: report dirty submodules as changes in builtin_diff()
  diff: report copies and renames as changes in run_diff_cmd()
2024-09-13 15:27:43 -07:00
Junio C Hamano
0299251319 Merge branch 'ds/scalar-no-tags'
The "scalar clone" command learned the "--no-tags" option.

* ds/scalar-no-tags:
  scalar: add --no-tags option to 'scalar clone'
2024-09-13 15:27:42 -07:00
Junio C Hamano
f286e0a01c Merge branch 'kl/cat-file-on-sparse-index'
"git cat-file" works well with the sparse-index, and gets marked as
such.

* kl/cat-file-on-sparse-index:
  builtin/cat-file: mark 'git cat-file' sparse-index compatible
  t1092: allow run_on_* functions to use standard input
2024-09-12 11:47:24 -07:00
Junio C Hamano
b64f249726 Merge branch 'jk/messages-with-excess-lf-fix'
One-line messages to "die" and other helper functions will get LF
added by these helper functions, but many existing messages had an
unnecessary LF at the end, which have been corrected.

* jk/messages-with-excess-lf-fix:
  drop trailing newline from warning/error/die messages
2024-09-12 11:47:23 -07:00
Junio C Hamano
143682ec43 Merge branch 'ps/pack-refs-auto-heuristics'
"git pack-refs --auto" for the files backend was too aggressive,
which has been a bit tamed.

* ps/pack-refs-auto-heuristics:
  refs/files: use heuristic to decide whether to repack with `--auto`
  t0601: merge tests for auto-packing of refs
  wrapper: introduce `log2u()`
2024-09-12 11:47:23 -07:00
Junio C Hamano
3bf057a0cd Merge branch 'tb/multi-pack-reuse-fix'
A data corruption bug when multi-pack-index is used and the same
objects are stored in multiple packfiles has been corrected.

* tb/multi-pack-reuse-fix:
  builtin/pack-objects.c: do not open-code `MAX_PACK_OBJECT_HEADER`
  pack-bitmap.c: avoid repeated `pack_pos_to_offset()` during reuse
  builtin/pack-objects.c: translate bit positions during pack-reuse
  pack-bitmap: tag bitmapped packs with their corresponding MIDX
  t/t5332-multi-pack-reuse.sh: verify pack generation with --strict
2024-09-12 11:47:23 -07:00
Junio C Hamano
04595eb407 Merge branch 'gt/unit-test-oid-array'
Another unit-test.

* gt/unit-test-oid-array:
  t: port helper/test-oid-array.c to unit-tests/t-oid-array.c
2024-09-12 11:47:23 -07:00
Junio C Hamano
63b5fcdde9 Merge branch 'ps/index-pack-outside-repo-fix'
"git verify-pack" and "git index-pack" started dying outside a
repository, which has been corrected.

* ps/index-pack-outside-repo-fix:
  builtin/index-pack: fix segfaults when running outside of a repo
2024-09-12 11:47:22 -07:00
Junio C Hamano
d70600526e Merge branch 'cp/unit-test-reftable-stack' into ps/reftable-exclude
* cp/unit-test-reftable-stack:
  t-reftable-stack: add test for stack iterators
  t-reftable-stack: add test for non-default compaction factor
  t-reftable-stack: use reftable_ref_record_equal() to compare ref records
  t-reftable-stack: use Git's tempfile API instead of mkstemp()
  t: harmonize t-reftable-stack.c with coding guidelines
  t: move reftable/stack_test.c to the unit testing framework
2024-09-09 10:13:44 -07:00
Chandra Pratap
2b14ced370 t-reftable-stack: add test for stack iterators
reftable_stack_init_ref_iterator and reftable_stack_init_log_iterator
as defined by reftable/stack.{c,h} initialize a stack iterator to
iterate over the ref and log records in a reftable stack respectively.
Since these functions are not exercised by any of the existing tests,
add a test for them.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-09 10:12:56 -07:00
Chandra Pratap
e87952443a t-reftable-stack: add test for non-default compaction factor
In a recent codebase update (commit ae8e378430, merge branch
'ps/reftable-write-options', 2024/05/13) the geometric factor used
in auto-compaction of reftable tables was made configurable. Add
a test to verify the functionality introduced by this update.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-09 10:12:56 -07:00
Chandra Pratap
1052280136 t-reftable-stack: use reftable_ref_record_equal() to compare ref records
In the current stack tests, ref records are compared for equality
by sometimes using the dedicated function for ref-record comparison,
reftable_ref_record_equal(), and sometimes by explicity comparing
contents of the ref records.

The latter method is undesired because there can exist unequal ref
records with some of the contents being equal. Replace the latter
instances of ref-record comparison with the former. This has the
added benefit of preserving uniformity throughout the test file.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-09 10:12:15 -07:00
Chandra Pratap
476abc39ba t-reftable-stack: use Git's tempfile API instead of mkstemp()
Git's tempfile API defined by $GIT_DIR/tempfile.{c,h} provides
a unified interface for tempfile operations. Since reftable/stack.c
uses this API for all its tempfile needs instead of raw functions
like mkstemp(), make the ported stack test strictly use Git's
tempfile API as well.

A bigger benefit is the fact that we know to clean up the tempfile
in case the test fails because it gets registered and pruned via a
signal handler.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:24:03 -07:00
Chandra Pratap
e4e384f68d t: harmonize t-reftable-stack.c with coding guidelines
Harmonize the newly ported test unit-tests/t-reftable-stack.c
with the following guidelines:
- Single line 'for' statements must omit curly braces.
- Structs must be 0-initialized with '= { 0 }' instead of '= { NULL }'.
- Array sizes and indices should preferably be of type 'size_t' and
  not 'int'.
- Function pointers should be passed as 'func' and not '&func'.

While at it, remove initialization for those variables that are
re-used multiple times, like loop variables.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:24:03 -07:00
Chandra Pratap
15e29ea1c6 t: move reftable/stack_test.c to the unit testing framework
reftable/stack_test.c exercises the functions defined in
reftable/stack.{c, h}. Migrate reftable/stack_test.c to the
unit testing framework. Migration involves refactoring the tests
to use the unit testing framework instead of reftable's test
framework and renaming the tests to be in-line with unit-tests'
standards.

Since some of the tests use set_test_hash() defined by
reftable/test_framework.{c, h} but these files are not
'#included' in the test file, copy this function in the
ported test file.

With the migration of stack test to the unit-tests framework,
"test-tool reftable" becomes a no-op. Hence, get rid of everything
that uses "test-tool reftable" alongside everything that is used
to implement it.

While at it, alphabetically sort the cmds[] list in
helper/test-tool.c by moving the entry for "dump-reftable".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:24:03 -07:00
René Scharfe
11591850dd diff: report dirty submodules as changes in builtin_diff()
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents.  The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines.  It's enabled by the diff_options flag
diff_from_contents.

The slower mode as never considered submodules (and subrepos) as changes
with --submodule=diff or --submodule=log, which is inconsistent with
--submodule=short (the default).  Fix it.

d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed.  This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.

Reported-by: David Hull <david.hull@friendbuy.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:21:24 -07:00
René Scharfe
87cf96094a diff: report copies and renames as changes in run_diff_cmd()
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents.  The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines.  It's enabled by the diff_options flag
diff_from_contents.

The slower mode has never considered copies and renames to be changes,
which is inconsistent with the quicker one.  Fix it.  Even if we ignore
the file contents (because it's empty or contains only ignored lines),
there's still the meta data change of adding or changing a filename, so
we need to report it in the exit code.

d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed.  This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.

Reported-by: Jorge Luis Martinez Gomez <jol@jol.dev>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:21:23 -07:00
Derrick Stolee
ce31b82ca9 scalar: add --no-tags option to 'scalar clone'
Some large repositories use tags to track a huge list of release
versions. While this choice is costly on the ref advertisement, it is
further wasteful for clients who do not need those tags. Allow clients
to optionally skip the tag advertisement.

This behavior is similar to that of 'git clone --no-tags' implemented in
0dab2468ee (clone: add a --no-tags option to clone without tags,
2017-04-26), including the modification of the remote.origin.tagOpt
config value to include "--no-tags".

One thing that is opposite of the 'git clone' implementation is that
this allows '--tags' as an assumed option, which can be naturally negated
with '--no-tags'. The clone command does not accept '--tags' but allows
"--no-no-tags" as the negation of its '--no-tags' option.

While testing this option, combine the test with the previously untested
'--no-src' option introduced in 4527db8ff8 (scalar: add --[no-]src
option, 2023-08-28).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-06 14:13:48 -07:00
Junio C Hamano
5ecd5fa58b Merge branch 'jk/unused-parameters'
Make our codebase compilable with the -Werror=unused-parameter
option.

* jk/unused-parameters:
  CodingGuidelines: mention -Wunused-parameter and UNUSED
  config.mak.dev: enable -Wunused-parameter by default
  compat: mark unused parameters in win32/mingw functions
  compat: disable -Wunused-parameter in win32/headless.c
  compat: disable -Wunused-parameter in 3rd-party code
  t-reftable-readwrite: mark unused parameter in callback function
  gc: mark unused config parameter in virtual functions
2024-09-06 10:38:50 -07:00
Junio C Hamano
6dcb2db0fa Merge branch 'jk/send-email-mailmap'
"git send-email" learned "--mailmap" option to allow rewriting the
recipient addresses.

* jk/send-email-mailmap:
  send-email: add mailmap support via sendemail.mailmap and --mailmap
  check-mailmap: add options for additional mailmap sources
  check-mailmap: accept "user@host" contacts
2024-09-06 10:38:49 -07:00
Brian Lyles
c02414a997 interpret-trailers: handle message without trailing newline
When git-interpret-trailers is used to add a trailer to a message that
does not end in a trailing newline, the new trailer is added on the line
immediately following the message instead of as a trailer block
separated from the message by a blank line.

For example, if a message's text was exactly "The subject" with no
trailing newline present, `git interpret-trailers --trailer
my-trailer=true` will result in the following malformed commit message:

    The subject
    my-trailer: true

While it is generally expected that a commit message should end with a
newline character, git-interpret-trailers should not be returning an
invalid message in this case.

Use `strbuf_complete_line` to ensure that the message ends with a
newline character when reading the input.

Signed-off-by: Brian Lyles <brianmlyles@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-06 09:21:44 -07:00
Jeff King
1a60f2066a drop trailing newline from warning/error/die messages
Our error reporting routines append a trailing newline, and the strings
we pass to them should not include them (otherwise we get an extra blank
line after the message).

These cases were all found by looking at the results of:

  git grep -P '[^_](error|error_errno|warning|die|die_errno)\(.*\\n"[,)]' '*.c'

Note that we _do_ sometimes include a newline in the middle of such
messages, to create multiline output (hence our grep matching "," or ")"
after we see the newline, so we know we're at the end of the string).

It's possible that one or more of these cases could intentionally be
including a blank line at the end, but having looked at them all
manually, I think these are all just mistakes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05 09:07:12 -07:00
Kevin Lyles
e65b0c7c36 builtin/cat-file: mark 'git cat-file' sparse-index compatible
This change affects how 'git cat-file' works with the index when
specifying an object with the ":<path>" syntax (which will give file
contents from the index).

'git cat-file' expands a sparse index to a full index any time contents
are requested from the index by specifying an object with the ":<path>"
syntax. This is true even when the requested file is part of the sparse
index, and results in much slower 'git cat-file' operations when working
within the sparse index.

Mark 'git cat-file' as not needing a full index, so that you only pay
the cost of expanding the sparse index to a full index when you request
a file outside of the sparse index.

Add tests to ensure both that:
- 'git cat-file' returns the correct file contents whether or not the
  file is in the sparse index
- 'git cat-file' expands to the full index any time you request
  something outside of the sparse index

Signed-off-by: Kevin Lyles <klyles+github@epic.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-04 09:19:04 -07:00
Kevin Lyles
68c57590d3 t1092: allow run_on_* functions to use standard input
The 'run_on_sparse' and 'run_on_all' functions do not work correctly for
commands accepting standard input, because they run the same command
multiple times and the first instance consumes it. This also indirectly
affects 'test_all_match' and 'test_sparse_match'.

To allow these functions to work with commands accepting standard input,
first slurp standard input to a temporary file, and then run the command
with its standard input redirected from the temporary file. This ensures
that each command sees the same contents from its standard input.

Note that this does not impact commands that do not read from standard
input; they continue to ignore it. Additionally, existing uses of the
run_on_* functions do not need to do anything differently, as the
standard input of the test environment is already connected to
/dev/null.

We do not explicitly clean up the input files because they are cleaned
up with the rest of the test repositories and their contents may be
useful for figuring out which command failed when a test case fails.

Signed-off-by: Kevin Lyles <klyles@epic.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-04 09:19:04 -07:00
Patrick Steinhardt
c3459ae9ef refs/files: use heuristic to decide whether to repack with --auto
The `--auto` flag for git-pack-refs(1) allows the ref backend to decide
whether or not a repack is in order. This switch has been introduced
mostly with the "reftable" backend in mind, which already knows to
auto-compact its tables during normal operations. When the flag is set,
then it will use the same auto-compaction mechanism and thus end up
doing nothing in most cases.

The "files" backend does not have any such heuristic yet and instead
packs any loose references unconditionally. So we rewrite the complete
"packed-refs" file even if there's only a single loose reference to be
packed.

Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using
`git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is
unconditionally executed via our auto maintenance, so we end up repacking
references every single time auto maintenance kicks in. And while that
commit already mentioned that the "files" backend unconditionally packs
refs now, the author obviously didn't quite think about the consequences
thereof. So while the idea was sound, we really should have added a
heuristic to the "files" backend before implementing it.

Introduce a heuristic that decides whether or not it is worth to pack
loose references. The important factors to decide here are the number of
loose references in comparison to the overall size of the "packed-refs"
file. The bigger the "packed-refs" file, the longer it takes to rewrite
it and thus we scale up the limit of allowed loose references before we
repack.

As is the nature of heuristics, this mechansim isn't obviously
"correct", but should rather be seen as a tradeoff between how much
resources we spend packing refs and how inefficient the ref store
becomes. For all I can say, we have successfully been using the exact
same heuristic in Gitaly for several years by now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-04 08:03:24 -07:00
Patrick Steinhardt
bd51dca36e t0601: merge tests for auto-packing of refs
We have two tests in t0601 which exercise the same underlying logic,
once via `git pack-refs --auto` and once via `git maintenance run
--auto`. Merge these two tests into one such that it becomes easier to
extend test coverage for both commands at the same time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-04 08:03:24 -07:00
Patrick Steinhardt
b2dbf97f47 builtin/index-pack: fix segfaults when running outside of a repo
It was reported that git-verify-pack(1) has started to crash with Git
v2.46.0 when run outside of a repository. This is another fallout from
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), where we have stopped setting the default hash algorithm
for `the_repository`. Consequently, code that relies on `the_hash_algo`
will now crash when it hasn't explicitly been initialized, which may be
the case when running outside of a Git repository.

The crash is not in git-verify-pack(1) but instead in git-index-pack(1),
which gets called by the former. Ideally, both of these programs should
be able to identify the hash algorithm used by the packfile and index
without having to rely on external information. But unfortunately, the
format for neither of them is completely self-describing, so it is not
possible to derive that information. This is a design issue that we
should address by introducing a new packfile version that encodes its
object hash.

For now though the more important fix is to not make either of these
programs crash anymore, which we do by falling back to SHA1 when the
object hash is unconfigured. This pessimizes reading packfiles which
use a different hash than SHA1, but restores previous behaviour.

Reported-by: Ilya K <me@0upti.me>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-04 07:40:00 -07:00
Junio C Hamano
17636cdf3b Merge branch 'ps/reftable-concurrent-compaction'
The code path for compacting reftable files saw some bugfixes
against concurrent operation.

* ps/reftable-concurrent-compaction:
  reftable/stack: fix segfault when reload with reused readers fails
  reftable/stack: reorder swapping in the reloaded stack contents
  reftable/reader: keep readers alive during iteration
  reftable/reader: introduce refcounting
  reftable/stack: fix broken refnames in `write_n_ref_tables()`
  reftable/reader: inline `reader_close()`
  reftable/reader: inline `init_reader()`
  reftable/reader: rename `reftable_new_reader()`
  reftable/stack: inline `stack_compact_range_stats()`
  reftable/blocksource: drop malloc block source
2024-09-03 09:15:03 -07:00
Junio C Hamano
8c1c63d525 Merge branch 'ps/leakfixes-part-5'
Even more leak fixes.

* ps/leakfixes-part-5:
  transport: fix leaking negotiation tips
  transport: fix leaking arguments when fetching from bundle
  builtin/fetch: fix leaking transaction with `--atomic`
  remote: fix leaking peer ref when expanding refmap
  remote: fix leaks when matching refspecs
  remote: fix leaking config strings
  builtin/fetch-pack: fix leaking refs
  sideband: fix leaks when configuring sideband colors
  builtin/send-pack: fix leaking refspecs
  transport: fix leaking OID arrays in git:// transport data
  t/helper: fix leaking multi-pack-indices in "read-midx"
  builtin/repack: fix leaks when computing packs to repack
  midx-write: fix leaking hashfile on error cases
  builtin/archive: fix leaking `OPT_FILENAME()` value
  builtin/upload-archive: fix leaking args passed to `write_archive()`
  builtin/merge-tree: fix leaking `-X` strategy options
  pretty: fix leaking key/value separator buffer
  pretty: fix memory leaks when parsing pretty formats
  convert: fix leaks when resetting attributes
  mailinfo: fix leaking header data
2024-09-03 09:15:00 -07:00
Ghanshyam Thakkar
a680635e05 t: port helper/test-oid-array.c to unit-tests/t-oid-array.c
helper/test-oid-array.c along with t0064-oid-array.sh test the
oid-array.h API, which provides storage and processing
efficiency over large lists of object identifiers.

Migrate them to the unit testing framework for better runtime
performance and efficiency. As we don't initialize a repository
in these tests, the hash algo that functions like oid_array_lookup()
use is not initialized, therefore call repo_set_hash_algo() to
initialize it. And init_hash_algo():lib-oid.c can aid in this
process, so make it public.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-01 20:43:38 -07:00
Junio C Hamano
11fd53a6c2 Merge branch 'ds/sparse-diff-index'
The underlying machinery for "git diff-index" has long been made to
expand the sparse index as needed, but the command fully expanded
the sparse index upfront, which now has been taught not to do.

* ds/sparse-diff-index:
  diff-index: integrate with the sparse index
2024-08-29 11:08:17 -07:00
Junio C Hamano
839b808325 Merge branch 'cp/unit-test-reftable-block'
Another test for reftable library ported to the unit test framework.

* cp/unit-test-reftable-block:
  t-reftable-block: mark unused argv/argc
  t-reftable-block: add tests for index blocks
  t-reftable-block: add tests for obj blocks
  t-reftable-block: add tests for log blocks
  t-reftable-block: remove unnecessary variable 'j'
  t-reftable-block: use xstrfmt() instead of xstrdup()
  t-reftable-block: use block_iter_reset() instead of block_iter_close()
  t-reftable-block: use reftable_record_key() instead of strbuf_addstr()
  t-reftable-block: use reftable_record_equal() instead of check_str()
  t-reftable-block: release used block reader
  t: harmonize t-reftable-block.c with coding guidelines
  t: move reftable/block_test.c to the unit testing framework
2024-08-29 11:08:16 -07:00
Junio C Hamano
d4d677704d Merge branch 'ps/reftable-drop-generic'
The code in the reftable library has been cleaned up by discarding
unused "generic" interface.

* ps/reftable-drop-generic:
  reftable: mark unused parameters in empty iterator functions
  reftable/generic: drop interface
  t/helper: refactor to not use `struct reftable_table`
  t/helper: use `hash_to_hex_algop()` to print hashes
  t/helper: inline printing of reftable records
  t/helper: inline `reftable_table_print()`
  t/helper: inline `reftable_stack_print_directory()`
  t/helper: inline `reftable_reader_print_file()`
  t/helper: inline `reftable_dump_main()`
  reftable/dump: drop unused `compact_stack()`
  reftable/generic: move generic iterator code into iterator interface
  reftable/iter: drop double-checking logic
  reftable/stack: open-code reading refs
  reftable/merged: stop using generic tables in the merged table
  reftable/merged: rename `reftable_new_merged_table()`
  reftable/merged: expose functions to initialize iterators
2024-08-29 11:08:16 -07:00
Junio C Hamano
a9bc27fb18 Merge branch 'gt/unit-test-urlmatch-normalization'
Another rewrite of test.

* gt/unit-test-urlmatch-normalization:
  t: migrate t0110-urlmatch-normalization to the new framework
2024-08-28 10:31:27 -07:00
Junio C Hamano
029c870ab5 Merge branch 'mt/rebase-x-quiet'
"git rebase -x --quiet" was not quiet, which was corrected.

* mt/rebase-x-quiet:
  rebase --exec: respect --quiet
2024-08-28 10:31:26 -07:00
Jeff King
08e83b5ec5 t-reftable-block: mark unused argv/argc
This is conceptually the same as the cases in df9d638c24 (unit-tests:
ignore unused argc/argv, 2024-08-17), but this unit test was migrated
from the reftable tests in a parallel branch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-28 10:09:32 -07:00
Jeff King
8c90b41f0a t-reftable-readwrite: mark unused parameter in callback function
This spot was originally marked in in 4695c3f3a9 (reftable: mark unused
parameters in virtual functions, 2024-08-17), but was copied in
5b539a5361 (t: move reftable/readwrite_test.c to the unit testing
framework, 2024-08-13).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-28 09:51:17 -07:00
Jacob Keller
241499aba0 send-email: add mailmap support via sendemail.mailmap and --mailmap
In some cases, a user may be generating a patch for an old commit which
now has an out-of-date author or other identity. For example, consider a
team member who contributes to an internal fork of an upstream project,
but leaves before this change is submitted upstream.

In this case, the team members company address may no longer be valid,
and will thus bounce when sending email.

This can be manually avoided by editing the generated patch files, or by
carefully using --suppress-<cc|to> options. This requires a lot of
manual intervention and is easy to forget.

Git has support for mapping old email addresses and names to a canonical
name and address via the .mailmap file (and its associated mailmap.file,
mailmap.blob, and log.mailmap options).

Teach git send-email to enable mailmap support for all addresses. This
ensures that addresses point to the canonical real name and email
address.

Add the sendemail.mailmap configuration option and its associated
--mailmap (and --use-mailmap for compatibility with git log) options.
For now, the default behavior is to disable the mailmap in order to
avoid any surprises or breaking any existing setups.

These options support per-identity configuration via the
sendemail.identity configuration blocks. This enables identity-specific
configuration in cases where users may not want to enable support.

In addition, support send-email specific mailmap data via
sendemail.mailmap.file, sendemail.mailmap.blob and their
identity-specific variants.

The intention of these options is to enable mapping addresses which are
no longer valid to a current project or team maintainer. Such mappings
may change the actual person being referred to, and may not make sense
in a traditional mailmap file which is intended for updating canonical
name and address for the same individual.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:51:29 -07:00
Jacob Keller
3a27e991f2 check-mailmap: accept "user@host" contacts
git check-mailmap splits each provided contact using split_ident_line.
This function requires that the contact either be of the form "Name
<user@host>" or of the form "<user@host>". In particular, if the mail
portion of the contact is not surrounded by angle brackets,
split_ident_line will reject it.

This results in git check-mailmap rejecting attempts to translate simple
email addresses:

  $ git check-mailmap user@host
  fatal: unable to parse contact: user@host

This limits the usability of check-mailmap as it requires placing angle
brackets around plain email addresses.

In particular, attempting to use git check-mailmap to support mapping
addresses in git send-email is not straight forward. The sanitization
and validation functions in git send-email strip angle brackets from
plain email addresses. It is not trivial to add brackets prior to
invoking git check-mailmap.

Instead, modify check_mailmap() to allow such strings as contacts. In
particular, treat any line which cannot be split by split_ident_line as
a simple email address.

No attempt is made to actually parse the address line, or validate that
it is actually an email address. Implementing such validation is not
trivial. Besides, we weren't validating the address between angle
brackets before anyways.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:51:28 -07:00
Taylor Blau
125c32605a builtin/pack-objects.c: translate bit positions during pack-reuse
When reusing chunks verbatim from an existing source pack, the function
write_reused_pack() first attempts to reuse whole words (via the
function `write_reused_pack_verbatim()`), and then individual bits (via
`write_reused_pack_one()`).

In the non-MIDX case, all of this code works fine. Likewise, in the MIDX
case, processing bits individually from the first (preferred) pack works
fine. However, processing subsequent packs in the MIDX case is broken
when there are duplicate objects among the set of MIDX'd packs.

This is because we treat the individual bit positions as valid pack
positions within the source pack(s), which does not account for gaps in
the source pack, like we see when the MIDX must break ties between
duplicate objects which appear in multiple packs.

The broken code looks like:

    for (; i < reuse_packfile_bitmap->word_alloc; i++) {
            for (offset = 0; offset < BITS_IN_EWORD, offset++) {
                    /* ... */

                    write_reused_pack_one(reuse_packfile->p,
                                          pos + offset - reuse_packfile->bitmap_pos,
                                          f, pack_start, &w_curs);
            }
    }

, where the second argument is incorrect and does not account for gaps.

Instead, make sure that we translate bit positions in the MIDX's
pseudo-pack order to pack positions in the respective source packs by:

  - Translating the bit position (pseudo-pack order) to a MIDX position
    (lexical order).

  - Use the MIDX position to obtain the offset at which the given object
    occurs in the source pack.

  - Then translate that offset back into a pack relative position within
    the source pack by calling offset_to_pack_pos().

After doing this, then we can safely use the result as a pack position.
Note that when doing single-pack reuse, as well as reusing objects from
the MIDX's preferred pack, such translation is not necessary, since
either ties are broken in favor of the preferred pack, or there are no
ties to break at all (in the case of non-MIDX bitmaps).

Failing to do this can result in strange failure modes. One example that
can occur when misinterpreting bits in the above fashion is that Git
thinks it's supposed to send a delta that the caller does not want.
Under this (incorrect) assumption, we try to look up the delta's base
(so that we can patch any OFS_DELTAs if necessary). We do this using
find_reused_offset().

But if we try and call that function for an offset belonging to an
object we did not send, we'll get back garbage. This can result in us
computing a negative fixup value, which results in memory corruption
when trying to write the (patched) OFS_DELTA header.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:50:26 -07:00
Taylor Blau
bbc393a9f3 t/t5332-multi-pack-reuse.sh: verify pack generation with --strict
In our tests for multi-pack reuse, we have two helper functions:

  - test_pack_objects_reused_all(), and
  - test_pack_objects_reused()

which invoke pack-objects (either with `--all`, or the supplied tips via
stdin, respectively) and ensure that (a) the number of reused objects,
and (b) the number of packs which those objects were reused from both
match the expected values.

Both functions discard the output of pack-objects and assert only on the
contents of the trace2 stream.

However, if we store the pack and attempt to index it with `--strict`,
we find that a number of our tests are broken, indicating a bug within
multi-pack reuse.

That bug will be addressed in a subsequent commit. But let's first
harden these tests by trying to index the resulting pack, marking the
tests which fail appropriately.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:50:26 -07:00
Junio C Hamano
3222718ad7 Merge branch 'ds/for-each-ref-is-base'
'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

* ds/for-each-ref-is-base:
  p1500: add is-base performance tests
  for-each-ref: add 'is-base' token
  commit: add gentle reference lookup method
  commit-reach: add get_branch_base_for_tip
2024-08-26 11:32:24 -07:00