From 5476fb07ebcf389752acef0a894a1eea1ff13ea9 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Sat, 1 Sep 2018 03:44:48 -0400 Subject: [PATCH 1/4] bitmap_has_sha1_in_uninteresting(): drop BUG check Commit 30cdc33fba (pack-bitmap: save "have" bitmap from walk, 2018-08-21) introduced a new function for looking at the "have" side of a bitmap walk. Because it only makes sense to do so after we've finished the walk, we added an extra safety assertion, making sure that bitmap_git->result is non-NULL. However, this safety is misguided. It was trying to catch the case where we had called prepare_bitmap_walk() to give us a "struct bitmap_index", but had not yet called traverse_bitmap_commit_list() to walk it. But all of the interesting computation (including setting up the result and "have" bitmaps) happens in the first function! The latter function only delivers the result to a callback function. So the case we were worried about is impossible; if you get a non-NULL result from prepare_bitmap_walk(), then its "have" field will be fully formed. But much worse, traverse_bitmap_commit_list() actually frees the result field as it finishes. Which means that this assertion is worse than useless: it's almost guaranteed to trigger! Our test suite didn't catch this because the function isn't actually exercised at all. The only caller comes from 6a1e32d532 (pack-objects: reuse on-disk deltas for thin "have" objects, 2018-08-21), and that's triggered only when you fetch or push history that contains an object with a base that is found deep in history. Our test suite fetches and pushes either don't use bitmaps, or use too-small example repositories. But any reasonably-sized real-world push or fetch (with bitmaps) would trigger this. This patch drops the harmful assertion and tweaks the docstring for the function to make the precondition clear. The tests need to be improved to exercise this new pack-objects feature, but we'll do that in a separate commit. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- pack-bitmap.c | 2 -- pack-bitmap.h | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index c3231ef9ef..76fd93a3de 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1128,8 +1128,6 @@ int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, if (!bitmap_git) return 0; /* no bitmap loaded */ - if (!bitmap_git->result) - BUG("failed to perform bitmap walk before querying"); if (!bitmap_git->haves) return 0; /* walk had no "haves" */ diff --git a/pack-bitmap.h b/pack-bitmap.h index c633bf5238..189dd68ad3 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -54,7 +54,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping void free_bitmap_index(struct bitmap_index *); /* - * After a traversal has been performed on the bitmap_index, this can be + * After a traversal has been performed by prepare_bitmap_walk(), this can be * queried to see if a particular object was reachable from any of the * objects flagged as UNINTERESTING. */ From c0d61dfc0b248f100987d92545016bce71f92663 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Sat, 1 Sep 2018 03:48:13 -0400 Subject: [PATCH 2/4] t5310: test delta reuse with bitmaps Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for thin "have" objects, 2018-08-21) taught pack-objects a new optimization trick. Since this wasn't meant to change user-visible behavior, but only produce smaller packs more quickly, testing focused on t/perf/p5311. However, since people don't run perf tests very often, we should make sure that the feature is exercised in the regular test suite. This patch does so. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t5310-pack-bitmaps.sh | 93 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 557bd0d0c0..af38776054 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -342,4 +342,97 @@ test_expect_success 'truncated bitmap fails gracefully' ' test_i18ngrep corrupt stderr ' +# have_delta +# +# Note that because this relies on cat-file, it might find _any_ copy of an +# object in the repository. The caller is responsible for making sure +# there's only one (e.g., via "repack -ad", or having just fetched a copy). +have_delta () { + echo $2 >expect && + echo $1 | git cat-file --batch-check="%(deltabase)" >actual && + test_cmp expect actual +} + +# Create a state of history with these properties: +# +# - refs that allow a client to fetch some new history, while sharing some old +# history with the server; we use branches delta-reuse-old and +# delta-reuse-new here +# +# - the new history contains an object that is stored on the server as a delta +# against a base that is in the old history +# +# - the base object is not immediately reachable from the tip of the old +# history; finding it would involve digging down through history we know the +# other side has +# +# This should result in a state where fetching from old->new would not +# traditionally reuse the on-disk delta (because we'd have to dig to realize +# that the client has it), but we will do so if bitmaps can tell us cheaply +# that the other side has it. +test_expect_success 'set up thin delta-reuse parent' ' + # This first commit contains the buried base object. + test-tool genrandom delta 16384 >file && + git add file && + git commit -m "delta base" && + base=$(git rev-parse --verify HEAD:file) && + + # These intermediate commits bury the base back in history. + # This becomes the "old" state. + for i in 1 2 3 4 5 + do + echo $i >file && + git commit -am "intermediate $i" || return 1 + done && + git branch delta-reuse-old && + + # And now our new history has a delta against the buried base. Note + # that this must be smaller than the original file, since pack-objects + # prefers to create deltas from smaller objects to larger. + test-tool genrandom delta 16300 >file && + git commit -am "delta result" && + delta=$(git rev-parse --verify HEAD:file) && + git branch delta-reuse-new && + + # Repack with bitmaps and double check that we have the expected delta + # relationship. + git repack -adb && + have_delta $delta $base +' + +# Now we can sanity-check the non-bitmap behavior (that the server is not able +# to reuse the delta). This isn't strictly something we care about, so this +# test could be scrapped in the future. But it makes sure that the next test is +# actually triggering the feature we want. +# +# Note that our tools for working with on-the-wire "thin" packs are limited. So +# we actually perform the fetch, retain the resulting pack, and inspect the +# result. +test_expect_success 'fetch without bitmaps ignores delta against old base' ' + test_config pack.usebitmaps false && + test_when_finished "rm -rf client.git" && + git init --bare client.git && + ( + cd client.git && + git config transfer.unpackLimit 1 && + git fetch .. delta-reuse-old:delta-reuse-old && + git fetch .. delta-reuse-new:delta-reuse-new && + have_delta $delta $ZERO_OID + ) +' + +# And do the same for the bitmap case, where we do expect to find the delta. +test_expect_success 'fetch with bitmaps can reuse old base' ' + test_config pack.usebitmaps true && + test_when_finished "rm -rf client.git" && + git init --bare client.git && + ( + cd client.git && + git config transfer.unpackLimit 1 && + git fetch .. delta-reuse-old:delta-reuse-old && + git fetch .. delta-reuse-new:delta-reuse-new && + have_delta $delta $base + ) +' + test_done From 715d0c50e1ba479290d9fcb34119d8f9a09bfed3 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Sat, 1 Sep 2018 03:49:48 -0400 Subject: [PATCH 3/4] traverse_bitmap_commit_list(): don't free result Since it was introduced in fff42755ef (pack-bitmap: add support for bitmap indexes, 2013-12-21), this function has freed the result after traversing it. That is an artifact of the early days of the bitmap code, when we had a single static "struct bitmap_index". Back then, it was intended that you would do: prepare_bitmap_walk(&revs); traverse_bitmap_commit_list(&revs); Since the actual bitmap_index struct was totally behind the scenes, it was convenient for traverse_bitmap_commit_list() to clean it up, clearing the way for another traversal. But since 3ae5fa0768 (pack-bitmap: remove bitmap_git global variable, 2018-06-07), the caller explicitly manages the bitmap_index struct itself, like this: b = prepare_bitmap_walk(&revs); traverse_bitmap_commit_list(b, &revs); free_bitmap_index(b); It no longer makes sense to auto-free the result after the traversal. If you want to do another traversal, you'd just create a new bitmap_index. And while nobody tries to call traverse_bitmap_commit_list() twice, the fact that it throws away the result might be surprising, and is better avoided. Note that in the "old" way it was possible for two walks to amortize the cost of opening the on-disk .bitmap file (since it was stored in the global bitmap_index), but we lost that in 3ae5fa0768. However, no code actually does this, so it's not worth addressing now. The solution might involve a new: reset_bitmap_walk(b, &revs); call. Or we might even attach the bitmap data to its matching packed_git struct, so that multiple prepare_bitmap_walk() calls could use it. That can wait until somebody actually has need of the optimization (and until then, we'll do the correct, unsurprising thing). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- pack-bitmap.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 76fd93a3de..8c1af1cca2 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -848,9 +848,6 @@ void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git, OBJ_TAG, show_reachable); show_extended_objects(bitmap_git, show_reachable); - - bitmap_free(bitmap_git->result); - bitmap_git->result = NULL; } static uint32_t count_object_type(struct bitmap_index *bitmap_git, From 199c86be1623ab5053b4ed0ce6ed2ae974e3e859 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Sat, 1 Sep 2018 03:50:57 -0400 Subject: [PATCH 4/4] pack-bitmap: drop "loaded" flag In the early days of the bitmap code, there was a single static bitmap_index struct that was used behind the scenes, and any bitmap-related functions could lazily check bitmap_git.loaded to see if they needed to read the on-disk data. But since 3ae5fa0768 (pack-bitmap: remove bitmap_git global variable, 2018-06-07), the caller is responsible for the lifetime of the bitmap_index struct, and we return it from prepare_bitmap_git() and prepare_bitmap_walk(), both of which load the on-disk data (or return NULL). So outside of these functions, it's not possible to have a bitmap_index for which the loaded flag is not true. Nor is it possible to accidentally pass an already-loaded bitmap_index to the loading function (which is static-local to the file). We can drop this unnecessary and confusing flag. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- pack-bitmap.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 8c1af1cca2..40debd5e20 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -91,8 +91,6 @@ struct bitmap_index { /* Version of the bitmap index */ unsigned int version; - - unsigned loaded : 1; }; static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) @@ -306,7 +304,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git static int load_pack_bitmap(struct bitmap_index *bitmap_git) { - assert(bitmap_git->map && !bitmap_git->loaded); + assert(bitmap_git->map); bitmap_git->bitmaps = kh_init_sha1(); bitmap_git->ext_index.positions = kh_init_sha1_pos(); @@ -321,7 +319,6 @@ static int load_pack_bitmap(struct bitmap_index *bitmap_git) if (load_bitmap_entries_v1(bitmap_git) < 0) goto failed; - bitmap_git->loaded = 1; return 0; failed: @@ -336,7 +333,7 @@ static int open_pack_bitmap(struct bitmap_index *bitmap_git) struct packed_git *p; int ret = -1; - assert(!bitmap_git->map && !bitmap_git->loaded); + assert(!bitmap_git->map); for (p = get_packed_git(the_repository); p; p = p->next) { if (open_pack_bitmap_1(bitmap_git, p) == 0) @@ -738,7 +735,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) * from disk. this is the point of no return; after this the rev_list * becomes invalidated and we must perform the revwalk through bitmaps */ - if (!bitmap_git->loaded && load_pack_bitmap(bitmap_git) < 0) + if (load_pack_bitmap(bitmap_git) < 0) goto cleanup; object_array_clear(&revs->pending);