Various parts of upload-pack has been updated to bound the resource
consumption relative to the size of the repository to protect from
abusive clients.
* jk/upload-pack-bounded-resources:
upload-pack: free tree buffers after parsing
upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
upload-pack: always turn off save_commit_buffer
upload-pack: disallow object-info capability by default
upload-pack: accept only a single packfile-uri line
upload-pack: use a strmap for want-ref lines
upload-pack: use oidset for deepen_not list
upload-pack: switch deepen-not list to an oid_array
upload-pack: drop separate v2 "haves" array
A custom remote helper no longer cannot access the newly created
repository during "git clone", which is a regression in Git 2.44.
This has been corrected.
* ps/remote-helper-repo-initialization-fix:
builtin/clone: allow remote helpers to detect repo
"git commit -v --cleanup=scissors" used to add the scissors line
twice in the log message buffer, which has been corrected.
* jt/commit-redundant-scissors-fix:
commit: unify logic to avoid multiple scissors lines when merging
commit: avoid redundant scissor line with --cleanup=scissors -v
"git merge-tree" has learned that the three trees involved in the
3-way merge only need to be trees, not necessarily commits.
* js/merge-tree-3-trees:
fill_tree_descriptor(): mark error message for translation
cache-tree: avoid an unnecessary check
Always check `parse_tree*()`'s return value
t4301: verify that merge-tree fails on missing blob objects
merge-ort: do check `parse_tree()`'s return value
merge-tree: fail with a non-zero exit code on missing tree objects
merge-tree: accept 3 trees as arguments
"git rev-list --missing=print" has learned to optionally take
"--allow-missing-tips", which allows the objects at the starting
points to be missing.
* cc/rev-list-allow-missing-tips:
revision: fix --missing=[print|allow*] for annotated tags
rev-list: allow missing tips with --missing=[print|allow*]
t6022: fix 'test' style and 'even though' typo
oidset: refactor oidset_insert_from_set()
revision: clarify a 'return NULL' in get_reference()
"git for-each-ref" learned "--include-root-refs" option to show
even the stuff outside the 'refs/' hierarchy.
* kn/for-all-refs:
for-each-ref: add new option to include root refs
ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR'
refs: introduce `refs_for_each_include_root_refs()`
refs: extract out `loose_fill_ref_dir_regular_file()`
refs: introduce `is_pseudoref()` and `is_headref()`
Many small allocations "git name-rev" makes have been updated to
allocate from a mem-pool.
* rs/name-rev-with-mempool:
name-rev: use mem_pool_strfmt()
mem-pool: add mem_pool_strfmt()
"git reflog" learned a "list" subcommand that enumerates known reflogs.
* ps/reflog-list:
builtin/reflog: introduce subcommand to list reflogs
refs: stop resolving ref corresponding to reflogs
refs: drop unused params from the reflog iterator callback
refs: always treat iterators as ordered
refs/files: sort merged worktree and common reflogs
refs/files: sort reflogs returned by the reflog iterator
dir-iterator: support iteration in sorted order
dir-iterator: pass name to `prepare_next_entry_data()` directly
The 'refresh' function in 'builtin/add.c' declares 'flags' as
signed, and passes it as an argument to the 'refresh_index'
function, which though expects an unsigned value.
Since in this case 'flags' represents a bag of bits, whose MSB is
not used in special ways, change the type of 'flags' to unsigned.
Signed-off-by: Eugenio Gigante <giganteeugenio2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the client sends us "want $oid" lines, we call parse_object($oid)
to get an object struct. It's important to parse the commits because we
need to traverse them in the negotiation phase. But of course we don't
need to hold on to the commit messages for each one.
We've turned off the save_commit_buffer flag in get_common_commits() for
a long time, since f0243f26f6 (git-upload-pack: More efficient usage of
the has_sha1 array, 2005-10-28). That helps with the commits we see
while actually traversing. But:
1. That function is only used by the v0 protocol. I think the v2
protocol's code path leaves the flag on (and thus pays the extra
memory penalty), though I didn't measure it specifically.
2. If the client sends us a bunch of "want" lines, that happens before
the negotiation phase. So we'll hold on to all of those commit
messages. Generally the number of "want" lines scales with the
refs, not with the number of objects in the repo. But a malicious
client could send a lot in order to waste memory.
As an example of (2), if I generate a request to fetch all commits in
git.git like this:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_commits() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "commit" || continue
pktline want $oid
done
pktline done
printf 0000
}
want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null
before this patch upload-pack peaks at ~125MB, and after at ~35MB. The
difference is not coincidentally about the same as the sum of all commit
object sizes as computed by:
git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' |
perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }'
In a larger repository like linux.git, that number is ~1GB.
In a repository with a full commit-graph file this will have no impact
(and the commit graph would save us from parsing at all, so is a much
better solution!). But it's easy to do, might help a little in
real-world cases (where even if you have a commit graph it might not be
fully up to date), and helps a lot for a worst-case malicious request.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git tag --column" failed to check the exit status of its "git
column" invocation, which has been corrected.
* rj/tag-column-fix:
tag: error when git-column fails
In 18c9cb7524 (builtin/clone: create the refdb with the correct object
format, 2023-12-12), we have changed git-clone(1) so that it delays
creation of the refdb until after it has learned about the remote's
object format. This change was required for the reftable backend, which
encodes the object format into the tables. So if we pre-initialized the
refdb with the default object format, but the remote uses a different
object format than that, then the resulting tables would have encoded
the wrong object format.
This change unfortunately breaks remote helpers which try to access the
repository that is about to be created. Because the refdb has not yet
been initialized at the point where we spawn the remote helper, we also
don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
the remote helper which try to access the repository would fail because
it cannot be discovered.
This is essentially a chicken-and-egg problem: we cannot initialize the
refdb because we don't know about the object format. But we cannot learn
about the object format because the remote helper may be unable to
access the partially-initialized repository.
Ideally, we would address this issue via capabilities. But the remote
helper protocol is not structured in a way that guarantees that the
capability announcement happens before the remote helper tries to access
the repository.
Instead, fix this issue by partially initializing the refdb up to the
point where it becomes discoverable by Git commands.
Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
prepare_to_commit has some logic to figure out whether merge already
added a scissors line, and therefore it shouldn't add another. Now that
wt_status_add_cut_line has built-in state for whether it has
already added a previous line, just set that state instead, and then
remove that condition from subsequent calls to wt_status_add_cut_line.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git commit --cleanup=scissors -v` prints two scissors lines:
one at the start of the comment lines, and the other right before the
diff. This is redundant, and pushes the diff further down in the user's
editor than it needs to be.
Make wt_status_add_cut_line() remember if it has added a cut line before,
and avoid adding a redundant one.
Add a test for this.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git checkout -p" and friends that "@" is a synonym for
"HEAD".
* gt/at-is-synonym-for-head-in-add-patch:
add -p tests: remove PERL prerequisites
add-patch: classify '@' as a synonym for 'HEAD'
"git column" has been taught to reject negative padding value, as
it would lead to nonsense behaviour including division by zero.
* kh/column-reject-negative-padding:
column: guard against negative padding
column: disallow negative padding
1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04)
got a big performance boost in an unusual repository by calculating the
name length in advance. This is a bit awkward, as it references the
name components twice.
Use a memory pool to store the strings for the struct rev_name member
tip_name. Using mem_pool_strfmt() allows efficient allocation without
explicit size calculation. This simplifies the formatting part of the
code without giving up performance:
Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all
Time (mean ± σ): 1.231 s ± 0.013 s [User: 1.082 s, System: 0.136 s]
Range (min … max): 1.214 s … 1.252 s 10 runs
Benchmark 2: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.220 s ± 0.020 s [User: 1.083 s, System: 0.130 s]
Range (min … max): 1.197 s … 1.254 s 10 runs
Don't bother discarding the memory pool just before exiting. The effort
for that would be very low, but actually measurable in the above
example, with no benefit to users. At least UNLEAK it to calm down leak
checkers. This addresses the leaks that 45a14f578e (Revert "name-rev:
release unused name strings", 2022-04-22) brought back.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using strncmp() and strlen() to check whether a string starts with
another one requires repeating the prefix candidate. Use starts_with()
instead, which reduces repetition and is more readable.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-for-each-ref(1) command doesn't provide a way to print root refs
i.e pseudorefs and HEAD with the regular "refs/" prefixed refs.
This commit adds a new option "--include-root-refs" to
git-for-each-ref(1). When used this would also print pseudorefs and HEAD
for the current worktree.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The flag 'FILTER_REFS_ALL' is a bit ambiguous, where ALL doesn't specify
if it means to contain refs from all worktrees or whether all types of
refs (regular, HEAD & pseudorefs) or all of the above.
Since here it is actually referring to all refs with the "refs/" prefix,
let's rename it to 'FILTER_REFS_REGULAR' to indicate that this is
specifically for regular refs.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise we may easily run into serious crashes: For example, if we run
`init_tree_desc()` directly after a failed `parse_tree()`, we are
accessing uninitialized data or trying to dereference `NULL`.
Note that the `parse_tree()` function already takes care of showing an
error message. The `parse_tree_indirectly()` and
`repo_get_commit_tree()` functions do not, therefore those latter call
sites need to show a useful error message while the former do not.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the git-reflog(1) command has subcommands to show reflog entries
or check for reflog existence, it does not have any subcommands that
would allow the user to enumerate all existing reflogs. This makes it
quite hard to discover which reflogs a repository has. While this can
be worked around with the "files" backend by enumerating files in the
".git/logs" directory, users of the "reftable" backend don't enjoy such
a luxury.
Introduce a new subcommand `git reflog list` that lists all reflogs the
repository knows of to fill this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref and reflog iterators share much of the same underlying code to
iterate over the corresponding entries. This results in some weird code
because the reflog iterator also exposes an object ID as well as a flag
to the callback function. Neither of these fields do refer to the reflog
though -- they refer to the corresponding ref with the same name. This
is quite misleading. In practice at least the object ID cannot really be
implemented in any other way as a reflog does not have a specific object
ID in the first place. This is further stressed by the fact that none of
the callbacks except for our test helper make use of these fields.
Split up the infrastucture so that ref and reflog iterators use separate
callback signatures. This allows us to drop the nonsensical fields from
the reflog iterator.
Note that internally, the backends still use the same shared infra to
iterate over both types. As the backends should never end up being
called directly anyway, this is not much of a problem and thus kept
as-is for simplicity's sake.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you run `git rebase --continue` when no rebase is in progress, git
outputs `fatal: No rebase in progress?` which is not a question but a
statement. Make it appear as a statement, and use lowercase to align
with error message style.
Signed-off-by: Harmen Stoppels <me@harmenstoppels.nl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code paths that call repo_read_object_file() have been
tightened to react to errors.
* js/check-null-from-read-object-file:
Always check the return value of `repo_read_object_file()`
Code simplification.
* rs/receive-pack-remove-find-header:
receive-pack: use find_commit_header() in check_nonce()
receive-pack: use find_commit_header() in check_cert_push_options()
If the user asks for the list of tags to be displayed in columns
("--columns"), a child git-column process is used to format the output
as expected.
In a rare situation where we encounter a problem spawning that child
process, we will work erroneously.
Make noticeable we're having a problem executing git-column, so the user
can act accordingly.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it works with with missing commits, not just blobs/trees.
Unfortunately, such a command would still fail with a "fatal: bad
object <oid>" if it is passed a missing commit, blob or tree as an
argument (before the rev walking even begins).
When such a command is used to find the dependencies of some objects,
for example the dependencies of quarantined objects (see the
"QUARANTINE ENVIRONMENT" section in the git-receive-pack(1)
documentation), it would be better if the command would instead
consider such missing objects, especially commits, in the same way as
other missing objects.
If, for example `--missing=print` is used, it would be nice for some
use cases if the missing tips passed as arguments were reported in
the same way as other missing objects instead of the command just
failing.
We could introduce a new option to make it work like this, but most
users are likely to prefer the command to have this behavior as the
default one. Introducing a new option would require another dumb loop
to look for that option early, which isn't nice.
Also we made `git rev-list` work with missing commits very recently
and the command is most often passed commits as arguments. So let's
consider this as a bug fix related to these recent changes.
While at it let's add a NEEDSWORK comment to say that we should get
rid of the existing ugly dumb loops that parse the
`--exclude-promisor-objects` and `--missing=...` options early.
Helped-by: Linus Arver <linusa@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash" sometimes was silent even when it failed due to
unwritable index file, which has been corrected.
* ps/report-failure-from-git-stash:
builtin/stash: report failure to write to index
A failed "git tag -s" did not necessarily result in an error
depending on the crypto backend, which has been corrected.
* jc/sign-buffer-failure-propagation-fix:
ssh signing: signal an error with a negative return value
tag: fix sign_buffer() call to create a signed tag
Update to a new feature recently added, "git show-ref --exists".
* tc/show-ref-exists-fix:
builtin/show-ref: treat directory as non-existing in --exists
Currently, (restore, checkout, reset) commands correctly take '@' as a
synonym for 'HEAD'. However, in patch mode different prompts/messages
are given on command line due to patch mode machinery not considering
'@' to be a synonym for 'HEAD' due to literal string comparison with
the word 'HEAD', and therefore assigning patch_mode_($command)_nothead
and triggering reverse mode (-R in diff-index). The NEEDSWORK comment
suggested comparing commit objects to get around this. However, doing
so would also take a non-checked out branch pointing to the same commit
as HEAD, as HEAD. This would cause confusion to the user.
Therefore, after parsing '@', replace it with 'HEAD' as reasonably
early as possible. This also solves another problem of disparity
between 'git checkout HEAD' and 'git checkout @' (latter detaches at
the HEAD commit and the former does not).
Trade-offs:
- Some of the errors would show the revision argument as 'HEAD' when
given '@'. This should be fine, as most users who probably use '@'
would be aware that it is a shortcut for 'HEAD' and most probably
used to use 'HEAD'. There is also relevant documentation in
'gitrevisions' manpage about '@' being the shortcut for 'HEAD'. Also,
the simplicity of the solution far outweighs this cost.
- Consider '@' as a shortcut for 'HEAD' even if 'refs/heads/@' exists
at a different commit. Naming a branch '@' is an obvious foot-gun and
many existing commands already take '@' for 'HEAD' even if
'refs/heads/@' exists at a different commit or does not exist at all
(e.g. 'git log @', 'git push origin @' etc.). Therefore this is an
existing assumption and should not be a problem.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A negative padding does not make sense and can cause errors in the
memory allocator since it’s interpreted as an unsigned integer.
Reported-by: Tiago Pascoal <tiago@pascoal.net>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git show-ref --verify" did not show things like "CHERRY_PICK_HEAD",
which has been corrected.
* pw/show-ref-pseudorefs:
t1400: use show-ref to check pseudorefs
show-ref --verify: accept pseudorefs
"git stash" sometimes was silent even when it failed due to
unwritable index file, which has been corrected.
* ps/report-failure-from-git-stash:
builtin/stash: report failure to write to index
A failed "git tag -s" did not necessarily result in an error
depending on the crypto backend, which has been corrected.
* jc/sign-buffer-failure-propagation-fix:
ssh signing: signal an error with a negative return value
tag: fix sign_buffer() call to create a signed tag
Add and apply a semantic patch for calling xstrncmpz() to compare a
NUL-terminated string with a buffer of a known length instead of using
strncmp() and checking the terminating NUL explicitly. This simplifies
callers by reducing code duplication.
I had to adjust remote.c manually because Coccinelle inexplicably
changed the indent of the else branches.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the public function find_commit_header() and remove find_header(),
as it becomes unused. This is safe and appropriate because we pass the
NUL-terminated payload buffer to check_nonce() instead of its start and
length. The underlying strbuf push_cert cannot contain NULs, as it is
built using strbuf_addstr(), only.
We no longer need to call strlen(), as find_commit_header() returns the
length of nonce already.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the public function find_commit_header() instead of find_header() to
simplify the code. This is possible and safe because we're operating on
a strbuf, which is always NUL-terminated, so there is no risk of running
over the end of the buffer. It cannot contain NUL within the buffer, as
it is built using strbuf_addstr(), only.
The string comparison becomes more complicated because we need to check
for NUL explicitly after comparing the length-limited option, but on the
flip side we don't need to clean up allocations or track the remaining
buffer length.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files
Doc updates to clarify what an "unborn branch" means.
* jc/orphan-unborn:
orphan/unborn: fix use of 'orphan' in end-user facing messages
orphan/unborn: add to the glossary and use them consistently
Code clean-up.
* la/trailer-cleanups:
trailer: use offsets for trailer_start/trailer_end
trailer: find the end of the log message
commit: ignore_non_trailer computes number of bytes to ignore