mirror of
https://github.com/git/git.git
synced 2026-02-27 18:29:43 +00:00
pack-check: fix verification of large objects
It was reported [1] that git-fsck(1) may sometimes run into an infinite
loop when processing packfiles. This bug was bisected to c31bad4f7d
(packfile: track packs via the MRU list exclusively, 2025-10-30), which
refactored our lsit of packfiles to only be tracked via an MRU list,
exclusively. This isn't entirely surprising: any caller that iterates
through the list of packfiles and then hits `find_pack_entry()`, for
example because they read an object from it, may cause the MRU list to
be updated. And if the caller is unlucky, this may cause the mentioned
infinite loop.
While this mechanism is somewhat fragile, it is still surprising that we
encounter it when verifying the packfile. We iterate through objects in
a given pack one by one and then read them via their offset, and doing
this shouldn't ever end up in `find_pack_entry()`.
But there is an edge case here: when the object in question is a blob
bigger than "core.largeFileThreshold", then we will be careful to not
read it into memory. Instead, we read it via an object stream by calling
`odb_read_object_stream()`, and that function will perform an object
lookup via `odb_read_object_info()`. So in the case where there are at
least two blobs in two different packfiles, and both of these blobs
exceed "core.largeFileThreshold", then we'll run into an infinite loop
because we'll always update the MRU.
We could fix this by improving `repo_for_each_pack()` to not update the
MRU, and this would address the issue. But the fun part is that using
`odb_read_object_stream()` is the wrong thing to do in the first place:
it may open _any_ instance of this object, so we ultimately cannot be
sure that we even verified the object in our given packfile.
Fix this bug by creating the object stream for the packed object
directly via `packfile_read_object_stream()`. Add a test that would have
caused the infinite loop.
[1]: <20260222183710.2963424-1-sandals@crustytoothpaste.net>
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
committed by
Junio C Hamano
parent
41b42e3527
commit
13eb65d366
@@ -155,7 +155,7 @@ static int verify_packfile(struct repository *r,
|
||||
err = error("packed %s from %s is corrupt",
|
||||
oid_to_hex(&oid), p->pack_name);
|
||||
else if (!data &&
|
||||
(!(stream = odb_read_stream_open(r->objects, &oid, NULL)) ||
|
||||
(packfile_read_object_stream(&stream, &oid, p, entries[i].offset) < 0 ||
|
||||
stream_object_signature(r, stream, &oid) < 0))
|
||||
err = error("packed %s from %s is corrupt",
|
||||
oid_to_hex(&oid), p->pack_name);
|
||||
|
||||
@@ -852,6 +852,44 @@ test_expect_success 'fsck errors in packed objects' '
|
||||
! grep corrupt out
|
||||
'
|
||||
|
||||
test_expect_success 'fsck handles multiple packfiles with big blobs' '
|
||||
test_when_finished "rm -rf repo" &&
|
||||
git init repo &&
|
||||
(
|
||||
cd repo &&
|
||||
|
||||
# We construct two packfiles with two objects in common and one
|
||||
# object not in common. The objects in common can then be
|
||||
# corrupted in one of the packfiles, respectively. The other
|
||||
# objects that are unique to the packs are merely used to not
|
||||
# have both packs contain the same data.
|
||||
blob_one=$(test-tool genrandom one 200k | git hash-object -t blob -w --stdin) &&
|
||||
blob_two=$(test-tool genrandom two 200k | git hash-object -t blob -w --stdin) &&
|
||||
blob_three=$(test-tool genrandom three 200k | git hash-object -t blob -w --stdin) &&
|
||||
blob_four=$(test-tool genrandom four 200k | git hash-object -t blob -w --stdin) &&
|
||||
pack_one=$(printf "%s\n" "$blob_one" "$blob_two" "$blob_three" | git pack-objects .git/objects/pack/pack) &&
|
||||
pack_two=$(printf "%s\n" "$blob_two" "$blob_three" "$blob_four" | git pack-objects .git/objects/pack/pack) &&
|
||||
chmod a+w .git/objects/pack/pack-*.pack &&
|
||||
|
||||
# Corrupt blob two in the first pack.
|
||||
git verify-pack -v .git/objects/pack/pack-$pack_one >objects &&
|
||||
offset_one=$(sed <objects -n "s/^$blob_two .* \(.*\)$/\1/p") &&
|
||||
printf "\0" | dd of=.git/objects/pack/pack-$pack_one.pack bs=1 conv=notrunc seek=$offset_one &&
|
||||
|
||||
# Corrupt blob three in the second pack.
|
||||
git verify-pack -v .git/objects/pack/pack-$pack_two >objects &&
|
||||
offset_two=$(sed <objects -n "s/^$blob_three .* \(.*\)$/\1/p") &&
|
||||
printf "\0" | dd of=.git/objects/pack/pack-$pack_two.pack bs=1 conv=notrunc seek=$offset_two &&
|
||||
|
||||
# We now expect to see two failures for the corrupted objects,
|
||||
# even though they exist in a non-corrupted form in the
|
||||
# respective other pack.
|
||||
test_must_fail git -c core.bigFileThreshold=100k fsck 2>err &&
|
||||
test_grep "unknown object type 0 at offset $offset_one in .git/objects/pack/pack-$pack_one.pack" err &&
|
||||
test_grep "unknown object type 0 at offset $offset_two in .git/objects/pack/pack-$pack_two.pack" err
|
||||
)
|
||||
'
|
||||
|
||||
test_expect_success 'fsck fails on corrupt packfile' '
|
||||
hsh=$(git commit-tree -m mycommit HEAD^{tree}) &&
|
||||
pack=$(echo $hsh | git pack-objects .git/objects/pack/pack) &&
|
||||
|
||||
Reference in New Issue
Block a user