The code to accept shallow "git push" has been optimized.
* ps/receive-pack-shallow-optim:
commit: use commit graph in `lookup_commit_reference_gently()`
commit: make `repo_parse_commit_no_graph()` more robust
commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
In the preceding commit we refactored `lookup_commit_reference_gently()`
so that it doesn't parse non-commit objects anymore. This has led to a
speedup when git-receive-pack(1) accepts a shallow push into a repo
with lots of refs that point to blobs or trees.
But while this case is now faster, we still have the issue that
accepting pushes with lots of "normal" refs that point to commits are
still slow. This is mostly because we look up the commits via the object
database, and that is rather costly.
Adapt the code to use `repo_parse_commit_gently()` instead of
`parse_object()` to parse the resulting commit object. This function
knows to use the commit-graph to fill in the object, which is way more
cost efficient.
This leads to another significant speedup when accepting shallow pushes.
The following benchmark pushes a single objects from a shallow clone
into a repository with 600,000 references that all point to commits:
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 9.179 s ± 0.031 s [User: 8.858 s, System: 0.528 s]
Range (min … max): 9.154 s … 9.213 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 2.337 s ± 0.032 s [User: 2.331 s, System: 0.234 s]
Range (min … max): 2.308 s … 2.371 s 3 runs
Summary
git-receive-pack . </tmp/input (rev = HEAD) ran
3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~)
Also, this again leads to a significant reduction in memory allocations.
Before this change:
HEAP SUMMARY:
in use at exit: 17,524,978 bytes in 22,393 blocks
total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
And after this change:
HEAP SUMMARY:
in use at exit: 11,534,036 bytes in 12,406 blocks
total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `lookup_commit_reference_gently()` can be used to look up a
committish by object ID. As such, the function knows to peel for example
tag objects so that we eventually end up with the commit.
The function is used quite a lot throughout our tree. One such user is
"shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
function is to figure out whether a shallow push is missing any objects
that are required to satisfy the ref updates, and if so, which of the
ref updates is missing objects.
This is done by painting the tree with `UNINTERESTING`. We start
painting by calling `refs_for_each_ref()` so that we can mark all
existing referenced objects as the boundary of objects that we already
have, and which are supposed to be fully connected. The reference tips
are then parsed via `lookup_commit_reference_gently()`, and the commit
is then marked as uninteresting.
But references may not necessarily point to a committish, and if a lot
of them aren't then this step takes a lot of time. This is mostly due to
the way that `lookup_commit_reference_gently()` is implemented: before
we learn about the type of the object we already call `parse_object()`
on the object ID. This has two consequences:
- We parse all objects, including trees and blobs, even though we
don't even need the contents of them.
- More importantly though, `parse_object()` will cause us to check
whether the object ID matches its contents.
Combined this means that we deflate and hash every non-committish
object, and that of course ends up being both CPU- and memory-intensive.
Improve the logic so that we first use `peel_object()`. This function
won't parse the object for us, and thus it allows us to learn about the
object's type before we parse and return it.
The following benchmark pushes a single object from a shallow clone into
a repository that has 100,000 refs. These refs were created by listing
all objects via `git rev-list(1) --objects --all` and creating refs for
a subset of them, so lots of those refs will cover non-commit objects.
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s]
Range (min … max): 62.191 s … 63.010 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s]
Range (min … max): 38.176 s … 38.551 s 3 runs
Summary
git-receive-pack . </tmp/input (rev = HEAD) ran
1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)
This leads to a sizeable speedup as we now skip reading and parsing
non-commit objects. Before this change we spent around 40% of the time
in `assign_shallow_commits_to_refs()`, after the change we only spend
around 1.2% of the time in there. Almost the entire remainder of the
time is spent in git-rev-list(1) to perform the connectivity checks.
Despite the speedup though, this also leads to a massive reduction in
allocations. Before:
HEAP SUMMARY:
in use at exit: 352,480,441 bytes in 97,185 blocks
total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated
And after:
HEAP SUMMARY:
in use at exit: 17,524,978 bytes in 22,393 blocks
total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
Note that when all references refer to commits performance stays roughly
the same, as expected. The following benchmark was executed with 600k
commits:
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s]
Range (min … max): 9.095 s … 9.106 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s]
Range (min … max): 9.019 s … 9.188 s 3 runs
Summary
git-receive-pack (rev = HEAD~) ran
1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)
This will be improved in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename three functions around the commit_list data structure.
* ps/commit-list-functions-renamed:
commit: rename `free_commit_list()` to conform to coding guidelines
commit: rename `reverse_commit_list()` to conform to coding guidelines
commit: rename `copy_commit_list()` to conform to coding guidelines
The following command reports that some header files are included twice:
$ git grep '#include' '*.c' | sort | uniq -cd
Remove the second #include line in each case, as it has no effect.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use commit_stack instead of open-coding it. Also convert the loop
counter i to size_t to match the type of the nr member of struct
commit_stack.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f406b89552,
reversing changes made to 1627809eef.
It seems to have caused a few regressions, two of the three known
ones we have proposed solutions for. Let's give ourselves a bit
more room to maneuver during the pre-release freeze period and
restart once the 2.53 ships.
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `free_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `reverse_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `copy_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.
* rs/commit-stack:
commit-reach: use commit_stack
commit-graph: use commit_stack
commit: add commit_stack_grow()
shallow: use commit_stack
pack-bitmap-write: use commit_stack
commit: add commit_stack_init()
test-reach: use commit_stack
remote: use commit_stack for src_commits
remote: use commit_stack for sent_tips
remote: use commit_stack for local_commits
name-rev: use commit_stack
midx: use commit_stack
log: use commit_stack
revision: export commit_stack
Use hook API to replace ad-hoc invocation of hook scripts with the
run_command() API.
* ar/run-command-hook:
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
hooks: allow callers to capture output
run-command: allow capturing of collated output
hook: allow overriding the ungroup option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add first helper for pp child states
When calling run_process_parallel() in run_hooks_opt(), the
ungroup option is currently hardcoded to .ungroup = 1.
This causes problems when ungrouping should be disabled, for
example when sideband-reading collated output from child hooks,
because sideband-reading and ungrouping are mutually exclusive.
Thus a new hook.h option is added to allow overriding.
The existing ungroup=1 behavior is preserved in the run_hooks()
API and the "hook run" command. We could modify these to take
an option if necessary, so I added two code comments there.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for increasing the capacity of a commit_stack. It is
useful for reducing reallocations when the target size is known in
advance.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for initializing a struct commit_stack, for when static
initialization is not possible or impractical.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dynamic arrays of commit pointers are used in several places. Some of
them use a custom struct to hold array, item count and capacity, others
have them as separate variables linked by a common name part.
Pick one succinct, clean implementation -- commit_stack -- and convert
the different variants to it to reduce code duplication.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit, we are going to check commit signatures, but we
won't have a commit yet, only a commit buffer, and we are going to
discard this commit buffer if the signature is invalid. So it would be
wasteful to create a commit that we might discard, just to be able to
check a commit signature.
It would be simpler instead to be able to check commit signatures
using only a commit buffer instead of a commit.
To be able to do that, let's extract some code from the
check_commit_signature() function into a new verify_commit_buffer()
function, and then let's make check_commit_signature() call
verify_commit_buffer().
Note that this doesn't fundamentally change how
check_commit_signature() works. It used to call parse_signed_commit()
which calls repo_get_commit_buffer(), parse_buffer_signed_by_header()
and repo_unuse_commit_buffer(). Now these 3 functions are called
directly by verify_commit_buffer().
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
builtin/remote: determine whether refs need renaming early on
builtin/remote: fix sign comparison warnings
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.
* ps/object-file-wo-the-repository:
object-file: get rid of `the_repository` in index-related functions
object-file: get rid of `the_repository` in `force_object_loose()`
object-file: get rid of `the_repository` in `read_loose_object()`
object-file: get rid of `the_repository` in loose object iterators
object-file: remove declaration for `for_each_file_in_obj_subdir()`
object-file: inline `for_each_loose_file_in_objdir_buf()`
object-file: get rid of `the_repository` when writing objects
odb: introduce `odb_write_object()`
loose: write loose objects map via their source
object-file: get rid of `the_repository` in `finalize_object_file()`
object-file: get rid of `the_repository` in `loose_object_info()`
object-file: get rid of `the_repository` when freshening objects
object-file: inline `check_and_freshen()` functions
object-file: get rid of `the_repository` in `has_loose_object()`
object-file: stop using `the_hash_algo`
object-file: fix -Wsign-compare warnings
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.
* rs/pop-recent-commit-with-prio-queue:
commit: use prio_queue_replace() in pop_most_recent_commit()
prio-queue: add prio_queue_replace()
commit: convert pop_most_recent_commit() to prio_queue
Optimize pop_most_recent_commit() by adding the first parent using the
more efficient prio_queue_peek() and prio_queue_replace() instead of
prio_queue_get() and prio_queue_put().
On my machine this neutralizes the performance hit it took in Git's own
repository when we converted it to prio_queue two patches ago (git_pq):
$ hyperfine -w3 -L git ./git_2.50.1,./git_pq,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.073 s ± 0.003 s [User: 1.053 s, System: 0.019 s]
Range (min … max): 1.069 s … 1.078 s 10 runs
Benchmark 2: ./git_pq rev-parse :/^Initial.revision
Time (mean ± σ): 1.077 s ± 0.002 s [User: 1.057 s, System: 0.018 s]
Range (min … max): 1.072 s … 1.079 s 10 runs
Benchmark 3: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.069 s ± 0.003 s [User: 1.049 s, System: 0.018 s]
Range (min … max): 1.065 s … 1.074 s 10 runs
Summary
./git rev-parse :/^Initial.revision ran
1.00 ± 0.00 times faster than ./git_2.50.1 rev-parse :/^Initial.revision
1.01 ± 0.00 times faster than ./git_pq rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pop_most_recent_commit() calls commit_list_insert_by_date() for parent
commits, which is itself called in a loop. This can lead to quadratic
complexity if there are many merges. Replace the commit_list with a
prio_queue to ensure logarithmic worst case complexity and convert all
three users.
Add a performance test that exercises one of them using a pathological
history that consists of 50% merges and 50% root commits to demonstrate
the speedup:
Test v2.50.1 HEAD
----------------------------------------------------------------------
1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9%
Alas, sane histories don't benefit from the conversion much, and
traversing Git's own history takes a 1% performance hit on my machine:
$ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s]
Range (min … max): 1.067 s … 1.078 s 10 runs
Benchmark 2: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s]
Range (min … max): 1.074 s … 1.083 s 10 runs
Summary
./git_2.50.1 rev-parse :/^Initial.revision ran
1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not have a backend-agnostic way to write objects into an object
database. While there is `write_object_file()`, this function is rather
specific to the loose object format.
Introduce `odb_write_object()` to plug this gap. For now, this function
is a simple wrapper around `write_object_file()` and doesn't even use
the passed-in object database yet. This will change in subsequent
commits, where `write_object_file()` is converted so that it works on
top of an `odb_source`. `odb_write_object()` will then become
responsible for deciding which source an object shall be written to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `assert_oid_type()` by
passing in the object database as a parameter and adjusting all callers.
Rename the function to `odb_assert_oid_type()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have the "object-store.h" header, most of the functionality for
object stores is actually hosted in "object-file.c". This makes it hard
to find relevant functions and causes us to mix up concerns.
Split out functions relating to the object store subsystem into a new
"object-store.c" file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_commit_marks_many() clears multiple commits one by one. Move the
code for handling a single commit to clear_commit_marks() and call it
instead of the other way around, to simplify the code.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "object-file-convert.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`. All of these callsites are transitively called
from `convert_object_file()`, which indeed has no repo as input.
Refactor the function so that it receives a repository as a parameter
and pass it through to all internal functions to get rid of the
dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_commit_marks_1() clears the marks of the first parent and its
first parent and so on, and saves the higher numbered parents in a list
for later. There is no benefit in keeping that list growing with each
handled commit. Clear it after each run to reduce peak memory usage.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, adapt `get_reachable_subset()` so
that it tracks array indices via `size_t` instead of using signed
integers to fix a couple of -Wsign-compare warnings. Adapt callers
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
The advice messages now tell the newer 'git config set' command to
set the advice.token configuration variable to squelch a message.
* bf/explicit-config-set-in-advice-messages:
advice: suggest using subcommand "git config set"
We have a bunch of loops which iterate up to an unsigned boundary using
a signed index, which generates warnigs because we compare a signed and
unsigned value in the loop condition. Address these sites for trivial
cases and enable `-Wsign-compare` warnings for these code units.
This patch only adapts those code units where we can drop the
`DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The advice message currently suggests using "git config advice..." to
disable advice messages, but since
00bbdde141 (builtin/config: introduce "set" subcommand, 2024-05-06)
we have the "set" subcommand for config. Since using the subcommand is
more in-line with the modern interface, any advice should be promoting
its usage. Change the disable advice message to use the subcommand
instead. Change all uses of "git config advice" in the tests to use the
subcommand.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we parse a commit via repo_parse_commit_internal(), if
save_commit_buffer is set we'll stuff the buffer of the object contents
into a cache, overwriting any previous value.
This can result in a leak of that previously cached value, though it's
rare in practice. If we have a value in the cache it would have come
from a previous parse, and during that parse we'd set the object.parsed
flag, causing any subsequent parse attempts to exit without doing any
work.
But it's possible to "unparse" a commit, which we do when registering a
commit graft. And since shallow fetches are implemented using grafts,
the leak is triggered in practice by t5539.
There are a number of possible ways to address this:
1. the unparsing function could clear the cached commit buffer, too. I
think this would work for the case I found, but I'm not sure if
there are other ways to end up in the same state (an unparsed
commit with an entry in the commit buffer cache).
2. when we parse, we could check the buffer cache and prefer it to
reading the contents from the object database. In theory the
contents of a particular sha1 are immutable, but the code in
question is violating the immutability with grafts. So this
approach makes me a bit nervous, although I think it would work in
practice (the grafts are applied to what we parse, but we still
retain the original contents).
3. We could realize the cache is already populated and discard its
contents before overwriting. It's possible some other code could be
holding on to a pointer to the old cache entry (and we'd introduce
a use-after-free), but I think the risk of that is relatively low.
4. The reverse of (3): when the cache is populated, don't bother
saving our new copy. This is perhaps a little weird, since we'll
have just populated the commit struct based on a different buffer.
But the two buffers should be the same, even in the presence of
grafts (as in (2) above).
I went with option 4. It addresses the leak directly and doesn't carry
any risk of breaking other assumptions. And it's the same technique used
by parse_object_buffer() for this situation, though I'm not sure when it
would even come up there. The extra safety has been there since
bd1e17e245 (Make "parse_object()" also fill in commit message buffer
data., 2005-05-25).
This lets us mark t5539 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* ps/environ-wo-the-repository: (21 commits)
environment: stop storing "core.notesRef" globally
environment: stop storing "core.warnAmbiguousRefs" globally
environment: stop storing "core.preferSymlinkRefs" globally
environment: stop storing "core.logAllRefUpdates" globally
refs: stop modifying global `log_all_ref_updates` variable
branch: stop modifying `log_all_ref_updates` variable
repo-settings: track defaults close to `struct repo_settings`
repo-settings: split out declarations into a standalone header
environment: guard state depending on a repository
environment: reorder header to split out `the_repository`-free section
environment: move `set_git_dir()` and related into setup layer
environment: make `get_git_namespace()` self-contained
environment: move object database functions into object layer
config: make dependency on repo in `read_early_config()` explicit
config: document `read_early_config()` and `read_very_early_config()`
environment: make `get_git_work_tree()` accept a repository
environment: make `get_graft_file()` accept a repository
environment: make `get_index_file()` accept a repository
environment: make `get_object_directory()` accept a repository
environment: make `get_git_common_dir()` accept a repository
...
The `get_graft_file()` function retrieves the path to the graft file of
`the_repository`. Make it accept a `struct repository` such that it can
work on arbitrary repositories and make it part of the repository
subsystem. This reduces our reliance on `the_repository` and clarifies
scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not clear grafts part of the parsed object pool when clearing the
pool itself, which can lead to memory leaks when a repository is being
cleared.
Fix this by moving `reset_commit_grafts()` into "object.c" and making it
part of the `struct parsed_object_pool` interface such that we can call
it from `parsed_object_pool_clear()`. Adapt `parsed_object_pool_new()`
to take and store a reference to its owning repository, which is needed
by `unparse_commit()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interfaces to retrieve signing keys and their IDs are misdesigned as
they return string constants even though they indeed allocate memory,
which leads to memory leaks. Refactor the code to instead always return
allocated strings and let the callers free them accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>