When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.
To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":
git bundle create incr.bundle A..main
Now, generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.
So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`bundle-uri.c:download_bundle_list` or via
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
is called by `bundle-uri.c:unbundle_all_bundles` or called within
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
inside `bundle-uri.c:unbundle_from_file` to by calling
`bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
`bundle.c:verify_bundle` to ensure the repository contains all the
prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
invokes `packfile.c:prepare_packed_git` or
`packfile.c:reprepare_packed_git`, filling
`raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
`refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
bundle refs which can target arbitrary objects are written to the
repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
`fetch-pack.c:mark_complete_and_common_ref` and
`fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
prevents `packfile.c:reprepare_packed_git` from being called,
resulting in failures to parse OIDs that reside only in the latest
bundle.
In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
bundle refs. When `refs.c:refs_update_ref` is called to write the
corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
`transaction_prepare` of the refs storage backend. For files backend, it
is `files-backend.c:files_transaction_prepare`, and for reftable
backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
Both functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
A set of negotiation-related tests for cloning with bundle-uri has been
included to demonstrate that downloaded bundles are utilized to
accelerate fetching.
Additionally, another test has been added to show that bundles with
incorrect headers, where refs point to non-existent objects, do not
result in any bundle refs being created in the repository.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-<commandname>.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial or git help tutorial, and the
documentation of each command with man git-<commandname> or git help <commandname>.
CVS users may also want to read Documentation/gitcvs-migration.txt
(man gitcvs-migration or git help cvs-migration if git is
installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).
Those wishing to help with error message, usage and informational message
string translations (localization l10) should see po/README.md
(a po file is a Portable Object file that holds the translations).
To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):
- random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
- "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks