Commit Graph

75542 Commits

Author SHA1 Message Date
Junio C Hamano
cd0a222f08 Merge branch 'es/oss-fuzz'
Backport oss-fuzz tests for us to our codebase.

* es/oss-fuzz:
  fuzz: port fuzz-url-decode-mem from OSS-Fuzz
  fuzz: port fuzz-parse-attr-line from OSS-Fuzz
  fuzz: port fuzz-credential-from-url-gently from OSS-Fuzz
2024-12-13 07:33:42 -08:00
Junio C Hamano
e56c283c15 Merge branch 'en/fast-import-verify-path'
"git fast-import" learned to reject paths with ".."  and "." as
their components to avoid creating invalid tree objects.

* en/fast-import-verify-path:
  t9300: test verification of renamed paths
  fast-import: disallow more path components
  fast-import: disallow "." and ".." path components
2024-12-13 07:33:41 -08:00
Junio C Hamano
90bf05e45a Merge branch 'kh/doc-update-ref-grammofix'
Grammofix.

* kh/doc-update-ref-grammofix:
  Documentation/git-update-ref.txt: add missing word
2024-12-13 07:33:39 -08:00
Junio C Hamano
1ddfe5acde Merge branch 'kh/doc-bundle-typofix'
Typofix.

* kh/doc-bundle-typofix:
  Documentation/git-bundle.txt: fix word join typo
2024-12-13 07:33:38 -08:00
Junio C Hamano
5cbe030c86 Merge branch 'jc/doc-error-message-guidelines'
Developer documentation update.

* jc/doc-error-message-guidelines:
  CodingGuidelines: a handful of error message guidelines
2024-12-13 07:33:37 -08:00
Junio C Hamano
a32668829d Merge branch 'jt/bundle-fsck'
"git bundle --unbundle" and "git clone" running on a bundle file
both learned to trigger fsck over the new objects with configurable
fck check levels.

* jt/bundle-fsck:
  transport: propagate fsck configuration during bundle fetch
  fetch-pack: split out fsck config parsing
  bundle: support fsck message configuration
  bundle: add bundle verification options type
2024-12-13 07:33:36 -08:00
Junio C Hamano
caacdb5dfd The fifteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-10 10:04:58 +09:00
Junio C Hamano
7041902dfa Merge branch 'ps/reftable-iterator-reuse'
Optimize reading random references out of the reftable backend by
allowing reuse of iterator objects.

* ps/reftable-iterator-reuse:
  refs/reftable: reuse iterators when reading refs
  reftable/merged: drain priority queue on reseek
  reftable/stack: add mechanism to notify callers on reload
  refs/reftable: refactor reflog expiry to use reftable backend
  refs/reftable: refactor reading symbolic refs to use reftable backend
  refs/reftable: read references via `struct reftable_backend`
  refs/reftable: figure out hash via `reftable_stack`
  reftable/stack: add accessor for the hash ID
  refs/reftable: handle reloading stacks in the reftable backend
  refs/reftable: encapsulate reftable stack
2024-12-10 10:04:58 +09:00
Junio C Hamano
de9278127e Merge branch 'ps/reftable-detach'
Isolates the reftable subsystem from the rest of Git's codebase by
using fewer pieces of Git's infrastructure.

* ps/reftable-detach:
  reftable/system: provide thin wrapper for lockfile subsystem
  reftable/stack: drop only use of `get_locked_file_path()`
  reftable/system: provide thin wrapper for tempfile subsystem
  reftable/stack: stop using `fsync_component()` directly
  reftable/system: stop depending on "hash.h"
  reftable: explicitly handle hash format IDs
  reftable/system: move "dir.h" to its only user
2024-12-10 10:04:56 +09:00
Junio C Hamano
35f40385e4 Merge branch 'bc/allow-upload-pack-from-other-people'
Loosen overly strict ownership check introduced in the recent past,
to keep the promise "cloning a suspicious repository is a safe
first step to inspect it".

* bc/allow-upload-pack-from-other-people:
  Allow cloning from repositories owned by another user
2024-12-10 10:04:55 +09:00
Junio C Hamano
9cd1e2e1a0 Merge branch 'pb/mergetool-errors'
End-user experience of "git mergetool" when the command errors out
has been improved.

* pb/mergetool-errors:
  git-difftool--helper.sh: exit upon initialize_merge_tool errors
  git-mergetool--lib.sh: add error message for unknown tool variant
  git-mergetool--lib.sh: add error message if 'setup_user_tool' fails
  git-mergetool--lib.sh: use TOOL_MODE when erroring about unknown tool
  completion: complete '--tool-help' in 'git mergetool'
2024-12-10 10:04:53 +09:00
Junio C Hamano
bd31944dda Merge branch 'jc/doc-opt-tilde-expand'
Describe a case where an option value needs to be spelled as a
separate argument, i.e. "--opt val", not "--opt=val".

* jc/doc-opt-tilde-expand:
  doc: option value may be separate for valid reasons
2024-12-10 10:04:52 +09:00
Junio C Hamano
8afff26aa0 Merge branch 'bc/ancient-ci'
Drop support for ancient environments in various CI jobs.

* bc/ancient-ci:
  Add additional CI jobs to avoid accidental breakage
  ci: remove clause for Ubuntu 16.04
  gitlab-ci: switch from Ubuntu 16.04 to 20.04
2024-12-10 10:04:51 +09:00
Junio C Hamano
e66fd72e97 The fourteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 13:23:18 +09:00
Junio C Hamano
0f588c4661 Merge branch 'kh/sequencer-comment-char'
The sequencer failed to honor core.commentString in some places.

* kh/sequencer-comment-char:
  sequencer: comment commit messages properly
  sequencer: comment `--reference` subject line properly
  sequencer: comment checked-out branch properly
2024-12-06 13:23:18 +09:00
Junio C Hamano
b4269ebf35 Merge branch 'sj/refs-symref-referent-fix'
A double-free that may not trigger in practice by luck has been
corrected in the reference resolution code.

* sj/refs-symref-referent-fix:
  ref-cache: fix invalid free operation in `free_ref_entry`
2024-12-06 13:23:16 +09:00
Junio C Hamano
23692e08c6 The thirteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:14:50 +09:00
Junio C Hamano
f334c387f4 Merge branch 'ja/git-diff-doc-markup'
Documentation mark-up updates.

* ja/git-diff-doc-markup:
  doc: git-diff: apply format changes to config part
  doc: git-diff: apply format changes to diff-generate-patch
  doc: git-diff: apply format changes to diff-format
  doc: git-diff: apply format changes to diff-options
  doc: git-diff: apply new documentation guidelines
2024-12-04 10:14:50 +09:00
Junio C Hamano
4c1b7e364e Merge branch 'bc/drop-ancient-libcurl-and-perl'
Drop support for older libcURL and Perl.

* bc/drop-ancient-libcurl-and-perl:
  gitweb: make use of s///r
  Require Perl 5.26.0
  INSTALL: document requirement for libcurl 7.61.0
  git-curl-compat: remove check for curl 7.56.0
  git-curl-compat: remove check for curl 7.53.0
  git-curl-compat: remove check for curl 7.52.0
  git-curl-compat: remove check for curl 7.44.0
  git-curl-compat: remove check for curl 7.43.0
  git-curl-compat: remove check for curl 7.39.0
  git-curl-compat: remove check for curl 7.34.0
  git-curl-compat: remove check for curl 7.25.0
  git-curl-compat: remove check for curl 7.21.5
2024-12-04 10:14:48 +09:00
Junio C Hamano
1e18cf4310 Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands'
Built-in Git subcommands are supplied the repository object to work
with; they learned to do the same when they invoke sub-subcommands.

* kn/pass-repo-to-builtin-sub-sub-commands:
  builtin: pass repository to sub commands
2024-12-04 10:14:47 +09:00
Junio C Hamano
8c917be5d2 Merge branch 'ps/bisect-double-free-fix'
Work around Coverity warning that would not trigger in practice.

* ps/bisect-double-free-fix:
  bisect: address Coverity warning about potential double free
2024-12-04 10:14:46 +09:00
Junio C Hamano
e5b71577a6 Merge branch 'tb/use-test-file-size-more'
Use the right helper program to measure file size in performance tests.

* tb/use-test-file-size-more:
  t/perf: use 'test_file_size' in more places
2024-12-04 10:14:45 +09:00
Junio C Hamano
0a0712e05f Merge branch 'tb/boundary-traversal-fix'
A trivial "correctness" fix that does not yet matter in practice.

* tb/boundary-traversal-fix:
  pack-bitmap.c: typofix in `find_boundary_objects()`
2024-12-04 10:14:44 +09:00
Junio C Hamano
57e81b59f3 Merge branch 'sj/ref-contents-check'
"git fsck" learned to issue warnings on "curiously formatted" ref
contents that have always been taken valid but something Git
wouldn't have written itself (e.g., missing terminating end-of-line
after the full object name).

* sj/ref-contents-check:
  ref: add symlink ref content check for files backend
  ref: check whether the target of the symref is a ref
  ref: add basic symref content check for files backend
  ref: add more strict checks for regular refs
  ref: port git-fsck(1) regular refs check for files backend
  ref: support multiple worktrees check for refs
  ref: initialize ref name outside of check functions
  ref: check the full refname instead of basename
  ref: initialize "fsck_ref_report" with zero
2024-12-04 10:14:42 +09:00
Junio C Hamano
7ee055b237 Merge branch 'ps/ref-backend-migration-optim'
The migration procedure between two ref backends has been optimized.

* ps/ref-backend-migration-optim:
  reftable: rename scratch buffer
  refs: adapt `initial_transaction` flag to be unsigned
  reftable/block: optimize allocations by using scratch buffer
  reftable/block: rename `block_writer::buf` variable
  reftable/writer: optimize allocations by using a scratch buffer
  refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
  refs: skip collision checks in initial transactions
  refs: use "initial" transaction semantics to migrate refs
  refs/files: support symbolic and root refs in initial transaction
  refs: introduce "initial" transaction flag
  refs/files: move logic to commit initial transaction
  refs: allow passing flags when setting up a transaction
2024-12-04 10:14:41 +09:00
Junio C Hamano
a5dd262a75 Merge branch 'ps/leakfixes-part-10'
Leakfixes.

* ps/leakfixes-part-10: (27 commits)
  t: remove TEST_PASSES_SANITIZE_LEAK annotations
  test-lib: unconditionally enable leak checking
  t: remove unneeded !SANITIZE_LEAK prerequisites
  t: mark some tests as leak free
  t5601: work around leak sanitizer issue
  git-compat-util: drop now-unused `UNLEAK()` macro
  global: drop `UNLEAK()` annotation
  t/helper: fix leaking commit graph in "read-graph" subcommand
  builtin/branch: fix leaking sorting options
  builtin/init-db: fix leaking directory paths
  builtin/help: fix leaks in `check_git_cmd()`
  help: fix leaking return value from `help_unknown_cmd()`
  help: fix leaking `struct cmdnames`
  help: refactor to not use globals for reading config
  builtin/sparse-checkout: fix leaking sanitized patterns
  split-index: fix memory leak in `move_cache_to_base_index()`
  git: refactor builtin handling to use a `struct strvec`
  git: refactor alias handling to use a `struct strvec`
  strvec: introduce new `strvec_splice()` function
  line-log: fix leak when rewriting commit parents
  ...
2024-12-04 10:14:40 +09:00
Junio C Hamano
2f605347da Merge branch 'ps/gc-stale-lock-warning'
Give a bit of advice/hint message when "git maintenance" stops finding a
lock file left by another instance that still is potentially running.

* ps/gc-stale-lock-warning:
  t7900: fix host-dependent behaviour when testing git-maintenance(1)
  builtin/gc: provide hint when maintenance hits a stale schedule lock
2024-12-04 10:14:37 +09:00
Jeff King
8cb4c6e62f t9300: test verification of renamed paths
Commit da91a90c2f (fast-import: disallow more path components,
2024-11-30) added two separate verify_path() calls (one for
added/modified files, and one for renames/copies). But our tests only
exercise the first one. Let's protect ourselves against regressions by
tweaking one of the tests to rename into the bad path. There are
adjacent tests that will stay as additions, so now both calls are
covered.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 09:12:07 +09:00
Kristoffer Haugsbakk
e2f5d3b491 Documentation/git-update-ref.txt: add missing word
Add missing word “that” in the phrase “after verifying that”, like
what was done in 1b2dfb7050 (Documentation/git-update-ref.txt: drop
“flag”, 2024-10-21)

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 10:54:30 +09:00
Kristoffer Haugsbakk
18693d7d65 Documentation/git-bundle.txt: fix word join typo
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 10:29:59 +09:00
Elijah Newren
da91a90c2f fast-import: disallow more path components
Instead of just disallowing '.' and '..', make use of verify_path() to
ensure that fast-import will disallow anything we wouldn't allow into
the index, such as anything under .git/, .gitmodules as a symlink, or
a dos drive prefix on Windows.

Since a few fast-export and fast-import tests that tried to stress-test
the correct handling of quoting relied on filenames that fail
is_valid_win32_path(), such as spaces or periods at the end of filenames
or backslashes within the filename, turn off core.protectNTFS for those
tests to ensure they keep passing.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 10:09:48 +09:00
Junio C Hamano
168ebb7159 CodingGuidelines: a handful of error message guidelines
It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.

Let's write down established best practice we are aware of.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-29 10:36:06 +09:00
Justin Tobler
baa159137b transport: propagate fsck configuration during bundle fetch
When fetching directly from a bundle, fsck message severity
configuration is not propagated to the underlying git-index-pack(1). It
is only capable of enabling or disabling fsck checks entirely. This does
not align with the fsck behavior for fetches through git-fetch-pack(1).

Use the fsck config parsing from fetch-pack to populate fsck message
severity configuration and wire it through to `unbundle()` to enable the
same fsck verification as done through fetch-pack.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28 12:07:58 +09:00
Justin Tobler
05596e93c5 fetch-pack: split out fsck config parsing
When `fetch_pack_config()` is invoked, fetch-pack configuration is
parsed from the config. As part of this operation, fsck message severity
configuration is assigned to the `fsck_msg_types` global variable. This
is optionally used to configure the downstream git-index-pack(1) when
the `--strict` option is specified.

The same parsed fsck message severity configuration is also needed
outside of fetch-pack. Instead of exposing/relying on the existing
global state, split out the fsck config parsing logic into
`fetch_pack_fsck_config()` and expose it. In a subsequent commit, this
is used to provide fsck configuration when invoking `unbundle()`.

For `fetch_pack_fsck_config()` to discern between errors and unhandled
config variables, the return code when `git_config_path()` errors is
changed to a different value also indicating success. This frees up the
previous return code to now indicate the provided config variable
was unhandled. The behavior remains functionally the same.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28 12:07:58 +09:00
Justin Tobler
187574ce86 bundle: support fsck message configuration
If the `VERIFY_BUNDLE_FLAG` is set during `unbundle()`, the
git-index-pack(1) spawned is configured with the `--fsck-options` flag
to perform fsck verification. With this flag enabled, there is not a way
to configure fsck message severity though.

Extend the `unbundle_opts` type to store fsck message severity
configuration and update `unbundle()` to conditionally append it to the
`--fsck-objects` flag if provided. This enables `unbundle()` call sites
to support optionally setting the severity for specific fsck messages.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28 12:07:58 +09:00
Justin Tobler
87c01003cd bundle: add bundle verification options type
When `unbundle()` is invoked, fsck verification may be configured by
passing the `VERIFY_BUNDLE_FSCK` flag. This mechanism allows fsck checks
on the bundle to be enabled or disabled entirely. To facilitate more
fine-grained fsck configuration, additional context must be provided to
`unbundle()`.

Introduce the `unbundle_opts` type, which wraps the existing
`verify_bundle_flags`, to facilitate future extension of `unbundle()`
configuration. Also update `unbundle()` and its call sites to accept
this new options type instead of the flags directly. The end behavior is
functionally the same, but allows for the set of configurable options to
be extended. This is leveraged in a subsequent commit to enable fsck
message severity configuration.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28 12:07:57 +09:00
Junio C Hamano
cc01bad4a9 The twelfth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-27 07:57:10 +09:00
Junio C Hamano
4a611ee7eb Merge branch 'kn/ref-transaction-hook-with-reflog'
The ref-transaction hook triggered for reflog updates, which has
been corrected.

* kn/ref-transaction-hook-with-reflog:
  refs: don't invoke reference-transaction hook for reflogs
2024-11-27 07:57:10 +09:00
Junio C Hamano
1f3d9b9814 Merge branch 'jt/index-pack-allow-promisor-only-while-fetching'
We now ensure "index-pack" is used with the "--promisor" option
only during a "git fetch".

* jt/index-pack-allow-promisor-only-while-fetching:
  index-pack: teach --promisor to forbid pack name
2024-11-27 07:57:09 +09:00
Junio C Hamano
8eaa06590f Merge branch 'en/fast-import-avoid-self-replace'
"git fast-import" can be tricked into a replace ref that maps an
object to itself, which is a useless thing to do.

* en/fast-import-avoid-self-replace:
  fast-import: avoid making replace refs point to themselves
2024-11-27 07:57:08 +09:00
Junio C Hamano
89ceab7b4c Merge branch 'kh/trailer-in-glossary'
Doc updates.

* kh/trailer-in-glossary:
  Documentation/glossary: describe "trailer"
2024-11-27 07:57:07 +09:00
Junio C Hamano
f670d811e2 Merge branch 'jk/gcc15'
GCC 15 compatibility updates.

* jk/gcc15:
  object-file: inline empty tree and blob literals
  object-file: treat cached_object values as const
  object-file: drop oid field from find_cached_object() return value
  object-file: move empty_tree struct into find_cached_object()
  object-file: drop confusing oid initializer of empty_tree struct
  object-file: prefer array-of-bytes initializer for hash literals
2024-11-27 07:57:06 +09:00
Junio C Hamano
93905d3b70 Merge branch 'bc/c23'
C23 compatibility updates.

* bc/c23:
  reflog: rename unreachable
  index-pack: rename struct thread_local
2024-11-27 07:57:05 +09:00
Junio C Hamano
87fc668ce5 Merge branch 'ps/clar-build-improvement'
Fix for clar unit tests to support CMake build.

* ps/clar-build-improvement:
  Makefile: let clar header targets depend on their scripts
  cmake: use verbatim arguments when invoking clar commands
  cmake: use SH_EXE to execute clar scripts
  t/unit-tests: convert "clar-generate.awk" into a shell script
2024-11-27 07:57:04 +09:00
Junio C Hamano
c515230dcf Merge branch 'kh/bundle-docs'
Documentation for "git bundle" saw improvements to more prominently
call out the use of '--all' when creating bundles.

* kh/bundle-docs:
  Documentation/git-bundle.txt: discuss naïve backups
  Documentation/git-bundle.txt: mention --all in spec. refs
  Documentation/git-bundle.txt: remove old `--all` example
  Documentation/git-bundle.txt: mention full backup example
2024-11-27 07:57:03 +09:00
shejialuo
b6318cf23a ref-cache: fix invalid free operation in free_ref_entry
In cfd971520e (refs: keep track of unresolved reference value in
iterators, 2024-08-09), we added a new field "referent" into the "struct
ref" structure. In order to free the "referent", we unconditionally
freed the "referent" by simply adding a "free" statement.

However, this is a bad usage. Because when ref entry is either directory
or loose ref, we will always execute the following statement:

  free(entry->u.value.referent);

This does not make sense. We should never access the "entry->u.value"
field when "entry" is a directory. However, the change obviously doesn't
break the tests. Let's analysis why.

The anonymous union in the "ref_entry" has two members: one is "struct
ref_value", another is "struct ref_dir". On a 64-bit machine, the size
of "struct ref_dir" is 32 bytes, which is smaller than the 48-byte size
of "struct ref_value". And the offset of "referent" field in "struct
ref_value" is 40 bytes. So, whenever we create a new "ref_entry" for a
directory, we will leave the offset from 40 bytes to 48 bytes untouched,
which means the value for this memory is zero (NULL). It's OK to free a
NULL pointer, but this is merely a coincidence of memory layout.

To fix this issue, we now ensure that "free(entry->u.value.referent)" is
only called when "entry->flag" indicates that it represents a loose
reference and not a directory to avoid the invalid memory operation.

Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-27 04:34:37 +09:00
Patrick Steinhardt
7cf65e2660 refs/reftable: reuse iterators when reading refs
When reading references the reftable backend has to:

  1. Create a new ref iterator.

  2. Seek the iterator to the record we're searching for.

  3. Read the record.

We cannot really avoid the last two steps, but re-creating the iterator
every single time we want to read a reference is kind of expensive and a
waste of resources. We couldn't help it in the past though because it
was not possible to reuse iterators. But starting with 5bf96e0c39
(reftable/generic: move seeking of records into the iterator,
2024-05-13) we have split up the iterator lifecycle such that creating
the iterator and seeking are two different concerns.

Refactor the code such that we cache iterators in the reftable backend.
This cache is invalidated whenever the respective stack is reloaded such
that we know to recreate the iterator in that case. This leads to a
sizeable speedup when creating many refs, which requires a lot of random
reference reads:

    Benchmark 1: update-ref: create many refs (refcount = 100000, revision = master)
      Time (mean ± σ):      1.793 s ±  0.010 s    [User: 0.954 s, System: 0.835 s]
      Range (min … max):    1.781 s …  1.811 s    10 runs

    Benchmark 2: update-ref: create many refs (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      1.680 s ±  0.013 s    [User: 0.846 s, System: 0.831 s]
      Range (min … max):    1.664 s …  1.702 s    10 runs

    Summary
      update-ref: create many refs (refcount = 100000, revision = HEAD) ran
        1.07 ± 0.01 times faster than update-ref: create many refs (refcount = 100000, revision = master)

While 7% is not a huge win, you have to consider that the benchmark is
_writing_ data, so _reading_ references is only one part of what we do.
Flame graphs show that we spend around 40% of our time reading refs, so
the speedup when reading refs is approximately ~2.5x that. I could not
find better benchmarks where we perform a lot of random ref reads.

You can also see a sizeable impact on memory usage when creating 100k
references. Before this change:

    HEAP SUMMARY:
        in use at exit: 19,112,538 bytes in 200,170 blocks
      total heap usage: 8,400,426 allocs, 8,200,256 frees, 454,367,048 bytes allocated

After this change:

    HEAP SUMMARY:
        in use at exit: 674,416 bytes in 169 blocks
      total heap usage: 7,929,872 allocs, 7,929,703 frees, 281,509,985 bytes allocated

As an additional factor, this refactoring opens up the possibility for
more performance optimizations in how we re-seek iterators. Any change
that allows us to optimize re-seeking by e.g. reusing data structures
would thus also directly speed up random reads.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 17:18:38 +09:00
Patrick Steinhardt
9d471b9dfe reftable/merged: drain priority queue on reseek
In 5bf96e0c39 (reftable/generic: move seeking of records into the
iterator, 2024-05-13) we have refactored the reftable codebase such that
iterators can be initialized once and then re-seeked multiple times.
This feature is used by 1869525066 (refs/reftable: wire up support for
exclude patterns, 2024-09-16) in order to skip records based on exclude
patterns provided by the caller.

The logic to re-seek the merged iterator is insufficient though because
we don't drain the priority queue on a re-seek. This means that the
queue may contain stale entries and thus reading the next record in the
queue will return the wrong entry. While this is an obvious bug, it is
harmless in the context of above exclude patterns:

  - If the queue contained stale entries that match the pattern then the
    caller would already know to filter out such refs. This is because
    our codebase is prepared to handle backends that don't have a way to
    efficiently implement exclude patterns.

  - If the queue contained stale entries that don't match the pattern
    we'd eventually filter out any duplicates. This is because the
    reftable code discards items with the same ref name and sorts any
    remaining entries properly.

So things happen to work in this context regardless of the bug, and
there is no other use case yet where we re-seek iterators. We're about
to introduce a caching mechanism though where iterators are reused by
the reftable backend, and that will expose the bug.

Fix the issue by draining the priority queue when seeking and add a
testcase that surfaces the issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 17:18:38 +09:00
Patrick Steinhardt
eb22c1b46b reftable/stack: add mechanism to notify callers on reload
Reftable stacks are reloaded in two cases:

  - When calling `reftable_stack_reload()`, if the stat-cache tells us
    that the stack has been modified.

  - When committing a reftable addition.

While callers can figure out the second case, they do not have a
mechanism to figure out whether `reftable_stack_reload()` led to an
actual reload of the on-disk data. All they can do is thus to assume
that data is always being reloaded in that case.

Improve the situation by introducing a new `on_reload()` callback to the
reftable options. If provided, the function will be invoked every time
the stack has indeed been reloaded. This allows callers to invalidate
data that depends on the current stack data.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 17:18:38 +09:00
Patrick Steinhardt
96e7cb83b6 refs/reftable: refactor reflog expiry to use reftable backend
Refactor the callback function that expires reflog entries in the
reftable backend to use `reftable_backend_read_ref()` instead of
accessing the reftable stack directly. This ensures that the function
will benefit from the new caching layer that we're about to introduce.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 17:18:37 +09:00