mirror of
https://github.com/git/git.git
synced 2026-01-10 10:13:33 +00:00
Merge branch 'ds/backfill'
Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -19,6 +19,7 @@
|
|||||||
/git-apply
|
/git-apply
|
||||||
/git-archimport
|
/git-archimport
|
||||||
/git-archive
|
/git-archive
|
||||||
|
/git-backfill
|
||||||
/git-bisect
|
/git-bisect
|
||||||
/git-blame
|
/git-blame
|
||||||
/git-branch
|
/git-branch
|
||||||
|
|||||||
71
Documentation/git-backfill.adoc
Normal file
71
Documentation/git-backfill.adoc
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
git-backfill(1)
|
||||||
|
===============
|
||||||
|
|
||||||
|
NAME
|
||||||
|
----
|
||||||
|
git-backfill - Download missing objects in a partial clone
|
||||||
|
|
||||||
|
|
||||||
|
SYNOPSIS
|
||||||
|
--------
|
||||||
|
[synopsis]
|
||||||
|
git backfill [--min-batch-size=<n>] [--[no-]sparse]
|
||||||
|
|
||||||
|
DESCRIPTION
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Blobless partial clones are created using `git clone --filter=blob:none`
|
||||||
|
and then configure the local repository such that the Git client avoids
|
||||||
|
downloading blob objects unless they are required for a local operation.
|
||||||
|
This initially means that the clone and later fetches download reachable
|
||||||
|
commits and trees but no blobs. Later operations that change the `HEAD`
|
||||||
|
pointer, such as `git checkout` or `git merge`, may need to download
|
||||||
|
missing blobs in order to complete their operation.
|
||||||
|
|
||||||
|
In the worst cases, commands that compute blob diffs, such as `git blame`,
|
||||||
|
become very slow as they download the missing blobs in single-blob
|
||||||
|
requests to satisfy the missing object as the Git command needs it. This
|
||||||
|
leads to multiple download requests and no ability for the Git server to
|
||||||
|
provide delta compression across those objects.
|
||||||
|
|
||||||
|
The `git backfill` command provides a way for the user to request that
|
||||||
|
Git downloads the missing blobs (with optional filters) such that the
|
||||||
|
missing blobs representing historical versions of files can be downloaded
|
||||||
|
in batches. The `backfill` command attempts to optimize the request by
|
||||||
|
grouping blobs that appear at the same path, hopefully leading to good
|
||||||
|
delta compression in the packfile sent by the server.
|
||||||
|
|
||||||
|
In this way, `git backfill` provides a mechanism to break a large clone
|
||||||
|
into smaller chunks. Starting with a blobless partial clone with `git
|
||||||
|
clone --filter=blob:none` and then running `git backfill` in the local
|
||||||
|
repository provides a way to download all reachable objects in several
|
||||||
|
smaller network calls than downloading the entire repository at clone
|
||||||
|
time.
|
||||||
|
|
||||||
|
By default, `git backfill` downloads all blobs reachable from the `HEAD`
|
||||||
|
commit. This set can be restricted or expanded using various options.
|
||||||
|
|
||||||
|
THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR MAY CHANGE IN THE FUTURE.
|
||||||
|
|
||||||
|
|
||||||
|
OPTIONS
|
||||||
|
-------
|
||||||
|
|
||||||
|
`--min-batch-size=<n>`::
|
||||||
|
Specify a minimum size for a batch of missing objects to request
|
||||||
|
from the server. This size may be exceeded by the last set of
|
||||||
|
blobs seen at a given path. The default minimum batch size is
|
||||||
|
50,000.
|
||||||
|
|
||||||
|
`--[no-]sparse`::
|
||||||
|
Only download objects if they appear at a path that matches the
|
||||||
|
current sparse-checkout. If the sparse-checkout feature is enabled,
|
||||||
|
then `--sparse` is assumed and can be disabled with `--no-sparse`.
|
||||||
|
|
||||||
|
SEE ALSO
|
||||||
|
--------
|
||||||
|
linkgit:git-clone[1].
|
||||||
|
|
||||||
|
GIT
|
||||||
|
---
|
||||||
|
Part of the linkgit:git[1] suite
|
||||||
@@ -6,6 +6,7 @@ manpages = {
|
|||||||
'git-apply.adoc' : 1,
|
'git-apply.adoc' : 1,
|
||||||
'git-archimport.adoc' : 1,
|
'git-archimport.adoc' : 1,
|
||||||
'git-archive.adoc' : 1,
|
'git-archive.adoc' : 1,
|
||||||
|
'git-backfill.adoc' : 1,
|
||||||
'git-bisect.adoc' : 1,
|
'git-bisect.adoc' : 1,
|
||||||
'git-blame.adoc' : 1,
|
'git-blame.adoc' : 1,
|
||||||
'git-branch.adoc' : 1,
|
'git-branch.adoc' : 1,
|
||||||
|
|||||||
@@ -56,8 +56,17 @@ better off using the revision walk API instead.
|
|||||||
the revision walk so that the walk emits commits marked with the
|
the revision walk so that the walk emits commits marked with the
|
||||||
`UNINTERESTING` flag.
|
`UNINTERESTING` flag.
|
||||||
|
|
||||||
|
`pl`::
|
||||||
|
This pattern list pointer allows focusing the path-walk search to
|
||||||
|
a set of patterns, only emitting paths that match the given
|
||||||
|
patterns. See linkgit:gitignore[5] or
|
||||||
|
linkgit:git-sparse-checkout[1] for details about pattern lists.
|
||||||
|
When the pattern list uses cone-mode patterns, then the path-walk
|
||||||
|
API can prune the set of paths it walks to improve performance.
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
--------
|
--------
|
||||||
|
|
||||||
See example usages in:
|
See example usages in:
|
||||||
`t/helper/test-path-walk.c`
|
`t/helper/test-path-walk.c`,
|
||||||
|
`builtin/backfill.c`
|
||||||
|
|||||||
1
Makefile
1
Makefile
@@ -1212,6 +1212,7 @@ BUILTIN_OBJS += builtin/am.o
|
|||||||
BUILTIN_OBJS += builtin/annotate.o
|
BUILTIN_OBJS += builtin/annotate.o
|
||||||
BUILTIN_OBJS += builtin/apply.o
|
BUILTIN_OBJS += builtin/apply.o
|
||||||
BUILTIN_OBJS += builtin/archive.o
|
BUILTIN_OBJS += builtin/archive.o
|
||||||
|
BUILTIN_OBJS += builtin/backfill.o
|
||||||
BUILTIN_OBJS += builtin/bisect.o
|
BUILTIN_OBJS += builtin/bisect.o
|
||||||
BUILTIN_OBJS += builtin/blame.o
|
BUILTIN_OBJS += builtin/blame.o
|
||||||
BUILTIN_OBJS += builtin/branch.o
|
BUILTIN_OBJS += builtin/branch.o
|
||||||
|
|||||||
@@ -120,6 +120,7 @@ int cmd_am(int argc, const char **argv, const char *prefix, struct repository *r
|
|||||||
int cmd_annotate(int argc, const char **argv, const char *prefix, struct repository *repo);
|
int cmd_annotate(int argc, const char **argv, const char *prefix, struct repository *repo);
|
||||||
int cmd_apply(int argc, const char **argv, const char *prefix, struct repository *repo);
|
int cmd_apply(int argc, const char **argv, const char *prefix, struct repository *repo);
|
||||||
int cmd_archive(int argc, const char **argv, const char *prefix, struct repository *repo);
|
int cmd_archive(int argc, const char **argv, const char *prefix, struct repository *repo);
|
||||||
|
int cmd_backfill(int argc, const char **argv, const char *prefix, struct repository *repo);
|
||||||
int cmd_bisect(int argc, const char **argv, const char *prefix, struct repository *repo);
|
int cmd_bisect(int argc, const char **argv, const char *prefix, struct repository *repo);
|
||||||
int cmd_blame(int argc, const char **argv, const char *prefix, struct repository *repo);
|
int cmd_blame(int argc, const char **argv, const char *prefix, struct repository *repo);
|
||||||
int cmd_branch(int argc, const char **argv, const char *prefix, struct repository *repo);
|
int cmd_branch(int argc, const char **argv, const char *prefix, struct repository *repo);
|
||||||
|
|||||||
147
builtin/backfill.c
Normal file
147
builtin/backfill.c
Normal file
@@ -0,0 +1,147 @@
|
|||||||
|
/* We need this macro to access core_apply_sparse_checkout */
|
||||||
|
#define USE_THE_REPOSITORY_VARIABLE
|
||||||
|
|
||||||
|
#include "builtin.h"
|
||||||
|
#include "git-compat-util.h"
|
||||||
|
#include "config.h"
|
||||||
|
#include "parse-options.h"
|
||||||
|
#include "repository.h"
|
||||||
|
#include "commit.h"
|
||||||
|
#include "dir.h"
|
||||||
|
#include "environment.h"
|
||||||
|
#include "hex.h"
|
||||||
|
#include "tree.h"
|
||||||
|
#include "tree-walk.h"
|
||||||
|
#include "object.h"
|
||||||
|
#include "object-store-ll.h"
|
||||||
|
#include "oid-array.h"
|
||||||
|
#include "oidset.h"
|
||||||
|
#include "promisor-remote.h"
|
||||||
|
#include "strmap.h"
|
||||||
|
#include "string-list.h"
|
||||||
|
#include "revision.h"
|
||||||
|
#include "trace2.h"
|
||||||
|
#include "progress.h"
|
||||||
|
#include "packfile.h"
|
||||||
|
#include "path-walk.h"
|
||||||
|
|
||||||
|
static const char * const builtin_backfill_usage[] = {
|
||||||
|
N_("git backfill [--min-batch-size=<n>] [--[no-]sparse]"),
|
||||||
|
NULL
|
||||||
|
};
|
||||||
|
|
||||||
|
struct backfill_context {
|
||||||
|
struct repository *repo;
|
||||||
|
struct oid_array current_batch;
|
||||||
|
size_t min_batch_size;
|
||||||
|
int sparse;
|
||||||
|
};
|
||||||
|
|
||||||
|
static void backfill_context_clear(struct backfill_context *ctx)
|
||||||
|
{
|
||||||
|
oid_array_clear(&ctx->current_batch);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void download_batch(struct backfill_context *ctx)
|
||||||
|
{
|
||||||
|
promisor_remote_get_direct(ctx->repo,
|
||||||
|
ctx->current_batch.oid,
|
||||||
|
ctx->current_batch.nr);
|
||||||
|
oid_array_clear(&ctx->current_batch);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We likely have a new packfile. Add it to the packed list to
|
||||||
|
* avoid possible duplicate downloads of the same objects.
|
||||||
|
*/
|
||||||
|
reprepare_packed_git(ctx->repo);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int fill_missing_blobs(const char *path UNUSED,
|
||||||
|
struct oid_array *list,
|
||||||
|
enum object_type type,
|
||||||
|
void *data)
|
||||||
|
{
|
||||||
|
struct backfill_context *ctx = data;
|
||||||
|
|
||||||
|
if (type != OBJ_BLOB)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < list->nr; i++) {
|
||||||
|
if (!has_object(ctx->repo, &list->oid[i],
|
||||||
|
OBJECT_INFO_FOR_PREFETCH))
|
||||||
|
oid_array_append(&ctx->current_batch, &list->oid[i]);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (ctx->current_batch.nr >= ctx->min_batch_size)
|
||||||
|
download_batch(ctx);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int do_backfill(struct backfill_context *ctx)
|
||||||
|
{
|
||||||
|
struct rev_info revs;
|
||||||
|
struct path_walk_info info = PATH_WALK_INFO_INIT;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
if (ctx->sparse) {
|
||||||
|
CALLOC_ARRAY(info.pl, 1);
|
||||||
|
if (get_sparse_checkout_patterns(info.pl)) {
|
||||||
|
path_walk_info_clear(&info);
|
||||||
|
return error(_("problem loading sparse-checkout"));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
repo_init_revisions(ctx->repo, &revs, "");
|
||||||
|
handle_revision_arg("HEAD", &revs, 0, 0);
|
||||||
|
|
||||||
|
info.blobs = 1;
|
||||||
|
info.tags = info.commits = info.trees = 0;
|
||||||
|
|
||||||
|
info.revs = &revs;
|
||||||
|
info.path_fn = fill_missing_blobs;
|
||||||
|
info.path_fn_data = ctx;
|
||||||
|
|
||||||
|
ret = walk_objects_by_path(&info);
|
||||||
|
|
||||||
|
/* Download the objects that did not fill a batch. */
|
||||||
|
if (!ret)
|
||||||
|
download_batch(ctx);
|
||||||
|
|
||||||
|
path_walk_info_clear(&info);
|
||||||
|
release_revisions(&revs);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
int cmd_backfill(int argc, const char **argv, const char *prefix, struct repository *repo)
|
||||||
|
{
|
||||||
|
int result;
|
||||||
|
struct backfill_context ctx = {
|
||||||
|
.repo = repo,
|
||||||
|
.current_batch = OID_ARRAY_INIT,
|
||||||
|
.min_batch_size = 50000,
|
||||||
|
.sparse = 0,
|
||||||
|
};
|
||||||
|
struct option options[] = {
|
||||||
|
OPT_INTEGER(0, "min-batch-size", &ctx.min_batch_size,
|
||||||
|
N_("Minimum number of objects to request at a time")),
|
||||||
|
OPT_BOOL(0, "sparse", &ctx.sparse,
|
||||||
|
N_("Restrict the missing objects to the current sparse-checkout")),
|
||||||
|
OPT_END(),
|
||||||
|
};
|
||||||
|
|
||||||
|
show_usage_with_options_if_asked(argc, argv,
|
||||||
|
builtin_backfill_usage, options);
|
||||||
|
|
||||||
|
argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
|
||||||
|
0);
|
||||||
|
|
||||||
|
repo_config(repo, git_default_config, NULL);
|
||||||
|
|
||||||
|
if (ctx.sparse < 0)
|
||||||
|
ctx.sparse = core_apply_sparse_checkout;
|
||||||
|
|
||||||
|
result = do_backfill(&ctx);
|
||||||
|
backfill_context_clear(&ctx);
|
||||||
|
return result;
|
||||||
|
}
|
||||||
@@ -60,6 +60,7 @@ git-annotate ancillaryinterrogators
|
|||||||
git-apply plumbingmanipulators complete
|
git-apply plumbingmanipulators complete
|
||||||
git-archimport foreignscminterface
|
git-archimport foreignscminterface
|
||||||
git-archive mainporcelain
|
git-archive mainporcelain
|
||||||
|
git-backfill mainporcelain history
|
||||||
git-bisect mainporcelain info
|
git-bisect mainporcelain info
|
||||||
git-blame ancillaryinterrogators complete
|
git-blame ancillaryinterrogators complete
|
||||||
git-branch mainporcelain history
|
git-branch mainporcelain history
|
||||||
|
|||||||
10
dir.c
10
dir.c
@@ -1093,10 +1093,6 @@ static void invalidate_directory(struct untracked_cache *uc,
|
|||||||
dir->dirs[i]->recurse = 0;
|
dir->dirs[i]->recurse = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int add_patterns_from_buffer(char *buf, size_t size,
|
|
||||||
const char *base, int baselen,
|
|
||||||
struct pattern_list *pl);
|
|
||||||
|
|
||||||
/* Flags for add_patterns() */
|
/* Flags for add_patterns() */
|
||||||
#define PATTERN_NOFOLLOW (1<<0)
|
#define PATTERN_NOFOLLOW (1<<0)
|
||||||
|
|
||||||
@@ -1186,9 +1182,9 @@ static int add_patterns(const char *fname, const char *base, int baselen,
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int add_patterns_from_buffer(char *buf, size_t size,
|
int add_patterns_from_buffer(char *buf, size_t size,
|
||||||
const char *base, int baselen,
|
const char *base, int baselen,
|
||||||
struct pattern_list *pl)
|
struct pattern_list *pl)
|
||||||
{
|
{
|
||||||
char *orig = buf;
|
char *orig = buf;
|
||||||
int i, lineno = 1;
|
int i, lineno = 1;
|
||||||
|
|||||||
3
dir.h
3
dir.h
@@ -467,6 +467,9 @@ void add_patterns_from_file(struct dir_struct *, const char *fname);
|
|||||||
int add_patterns_from_blob_to_list(struct object_id *oid,
|
int add_patterns_from_blob_to_list(struct object_id *oid,
|
||||||
const char *base, int baselen,
|
const char *base, int baselen,
|
||||||
struct pattern_list *pl);
|
struct pattern_list *pl);
|
||||||
|
int add_patterns_from_buffer(char *buf, size_t size,
|
||||||
|
const char *base, int baselen,
|
||||||
|
struct pattern_list *pl);
|
||||||
void parse_path_pattern(const char **string, int *patternlen, unsigned *flags, int *nowildcardlen);
|
void parse_path_pattern(const char **string, int *patternlen, unsigned *flags, int *nowildcardlen);
|
||||||
void add_pattern(const char *string, const char *base,
|
void add_pattern(const char *string, const char *base,
|
||||||
int baselen, struct pattern_list *pl, int srcpos);
|
int baselen, struct pattern_list *pl, int srcpos);
|
||||||
|
|||||||
1
git.c
1
git.c
@@ -506,6 +506,7 @@ static struct cmd_struct commands[] = {
|
|||||||
{ "annotate", cmd_annotate, RUN_SETUP },
|
{ "annotate", cmd_annotate, RUN_SETUP },
|
||||||
{ "apply", cmd_apply, RUN_SETUP_GENTLY },
|
{ "apply", cmd_apply, RUN_SETUP_GENTLY },
|
||||||
{ "archive", cmd_archive, RUN_SETUP_GENTLY },
|
{ "archive", cmd_archive, RUN_SETUP_GENTLY },
|
||||||
|
{ "backfill", cmd_backfill, RUN_SETUP },
|
||||||
{ "bisect", cmd_bisect, RUN_SETUP },
|
{ "bisect", cmd_bisect, RUN_SETUP },
|
||||||
{ "blame", cmd_blame, RUN_SETUP },
|
{ "blame", cmd_blame, RUN_SETUP },
|
||||||
{ "branch", cmd_branch, RUN_SETUP | DELAY_PAGER_CONFIG },
|
{ "branch", cmd_branch, RUN_SETUP | DELAY_PAGER_CONFIG },
|
||||||
|
|||||||
@@ -510,6 +510,7 @@ builtin_sources = [
|
|||||||
'builtin/annotate.c',
|
'builtin/annotate.c',
|
||||||
'builtin/apply.c',
|
'builtin/apply.c',
|
||||||
'builtin/archive.c',
|
'builtin/archive.c',
|
||||||
|
'builtin/backfill.c',
|
||||||
'builtin/bisect.c',
|
'builtin/bisect.c',
|
||||||
'builtin/blame.c',
|
'builtin/blame.c',
|
||||||
'builtin/branch.c',
|
'builtin/branch.c',
|
||||||
|
|||||||
28
path-walk.c
28
path-walk.c
@@ -12,6 +12,7 @@
|
|||||||
#include "object.h"
|
#include "object.h"
|
||||||
#include "oid-array.h"
|
#include "oid-array.h"
|
||||||
#include "prio-queue.h"
|
#include "prio-queue.h"
|
||||||
|
#include "repository.h"
|
||||||
#include "revision.h"
|
#include "revision.h"
|
||||||
#include "string-list.h"
|
#include "string-list.h"
|
||||||
#include "strmap.h"
|
#include "strmap.h"
|
||||||
@@ -172,6 +173,23 @@ static int add_tree_entries(struct path_walk_context *ctx,
|
|||||||
if (type == OBJ_TREE)
|
if (type == OBJ_TREE)
|
||||||
strbuf_addch(&path, '/');
|
strbuf_addch(&path, '/');
|
||||||
|
|
||||||
|
if (ctx->info->pl) {
|
||||||
|
int dtype;
|
||||||
|
enum pattern_match_result match;
|
||||||
|
match = path_matches_pattern_list(path.buf, path.len,
|
||||||
|
path.buf + base_len, &dtype,
|
||||||
|
ctx->info->pl,
|
||||||
|
ctx->repo->index);
|
||||||
|
|
||||||
|
if (ctx->info->pl->use_cone_patterns &&
|
||||||
|
match == NOT_MATCHED)
|
||||||
|
continue;
|
||||||
|
else if (!ctx->info->pl->use_cone_patterns &&
|
||||||
|
type == OBJ_BLOB &&
|
||||||
|
match != MATCHED)
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
if (!(list = strmap_get(&ctx->paths_to_lists, path.buf))) {
|
if (!(list = strmap_get(&ctx->paths_to_lists, path.buf))) {
|
||||||
CALLOC_ARRAY(list, 1);
|
CALLOC_ARRAY(list, 1);
|
||||||
list->type = type;
|
list->type = type;
|
||||||
@@ -582,10 +600,10 @@ void path_walk_info_init(struct path_walk_info *info)
|
|||||||
memcpy(info, &empty, sizeof(empty));
|
memcpy(info, &empty, sizeof(empty));
|
||||||
}
|
}
|
||||||
|
|
||||||
void path_walk_info_clear(struct path_walk_info *info UNUSED)
|
void path_walk_info_clear(struct path_walk_info *info)
|
||||||
{
|
{
|
||||||
/*
|
if (info->pl) {
|
||||||
* This destructor is empty for now, as info->revs
|
clear_pattern_list(info->pl);
|
||||||
* is not owned by 'struct path_walk_info'.
|
free(info->pl);
|
||||||
*/
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
11
path-walk.h
11
path-walk.h
@@ -6,6 +6,7 @@
|
|||||||
|
|
||||||
struct rev_info;
|
struct rev_info;
|
||||||
struct oid_array;
|
struct oid_array;
|
||||||
|
struct pattern_list;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* The type of a function pointer for the method that is called on a list of
|
* The type of a function pointer for the method that is called on a list of
|
||||||
@@ -48,6 +49,16 @@ struct path_walk_info {
|
|||||||
* walk the children of such trees.
|
* walk the children of such trees.
|
||||||
*/
|
*/
|
||||||
int prune_all_uninteresting;
|
int prune_all_uninteresting;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Specify a sparse-checkout definition to match our paths to. Do not
|
||||||
|
* walk outside of this sparse definition. If the patterns are in
|
||||||
|
* cone mode, then the search may prune directories that are outside
|
||||||
|
* of the cone. If not in cone mode, then all tree paths will be
|
||||||
|
* explored but the path_fn will only be called when the path matches
|
||||||
|
* the sparse-checkout patterns.
|
||||||
|
*/
|
||||||
|
struct pattern_list *pl;
|
||||||
};
|
};
|
||||||
|
|
||||||
#define PATH_WALK_INFO_INIT { \
|
#define PATH_WALK_INFO_INIT { \
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
#define USE_THE_REPOSITORY_VARIABLE
|
#define USE_THE_REPOSITORY_VARIABLE
|
||||||
|
|
||||||
#include "test-tool.h"
|
#include "test-tool.h"
|
||||||
|
#include "dir.h"
|
||||||
#include "environment.h"
|
#include "environment.h"
|
||||||
#include "hex.h"
|
#include "hex.h"
|
||||||
#include "object-name.h"
|
#include "object-name.h"
|
||||||
@@ -9,6 +10,7 @@
|
|||||||
#include "revision.h"
|
#include "revision.h"
|
||||||
#include "setup.h"
|
#include "setup.h"
|
||||||
#include "parse-options.h"
|
#include "parse-options.h"
|
||||||
|
#include "strbuf.h"
|
||||||
#include "path-walk.h"
|
#include "path-walk.h"
|
||||||
#include "oid-array.h"
|
#include "oid-array.h"
|
||||||
|
|
||||||
@@ -65,7 +67,7 @@ static int emit_block(const char *path, struct oid_array *oids,
|
|||||||
|
|
||||||
int cmd__path_walk(int argc, const char **argv)
|
int cmd__path_walk(int argc, const char **argv)
|
||||||
{
|
{
|
||||||
int res;
|
int res, stdin_pl = 0;
|
||||||
struct rev_info revs = REV_INFO_INIT;
|
struct rev_info revs = REV_INFO_INIT;
|
||||||
struct path_walk_info info = PATH_WALK_INFO_INIT;
|
struct path_walk_info info = PATH_WALK_INFO_INIT;
|
||||||
struct path_walk_test_data data = { 0 };
|
struct path_walk_test_data data = { 0 };
|
||||||
@@ -80,6 +82,8 @@ int cmd__path_walk(int argc, const char **argv)
|
|||||||
N_("toggle inclusion of tree objects")),
|
N_("toggle inclusion of tree objects")),
|
||||||
OPT_BOOL(0, "prune", &info.prune_all_uninteresting,
|
OPT_BOOL(0, "prune", &info.prune_all_uninteresting,
|
||||||
N_("toggle pruning of uninteresting paths")),
|
N_("toggle pruning of uninteresting paths")),
|
||||||
|
OPT_BOOL(0, "stdin-pl", &stdin_pl,
|
||||||
|
N_("read a pattern list over stdin")),
|
||||||
OPT_END(),
|
OPT_END(),
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -99,6 +103,17 @@ int cmd__path_walk(int argc, const char **argv)
|
|||||||
info.path_fn = emit_block;
|
info.path_fn = emit_block;
|
||||||
info.path_fn_data = &data;
|
info.path_fn_data = &data;
|
||||||
|
|
||||||
|
if (stdin_pl) {
|
||||||
|
struct strbuf in = STRBUF_INIT;
|
||||||
|
CALLOC_ARRAY(info.pl, 1);
|
||||||
|
|
||||||
|
info.pl->use_cone_patterns = 1;
|
||||||
|
|
||||||
|
strbuf_fread(&in, 2048, stdin);
|
||||||
|
add_patterns_from_buffer(in.buf, in.len, "", 0, info.pl);
|
||||||
|
strbuf_release(&in);
|
||||||
|
}
|
||||||
|
|
||||||
res = walk_objects_by_path(&info);
|
res = walk_objects_by_path(&info);
|
||||||
|
|
||||||
printf("commits:%" PRIuMAX "\n"
|
printf("commits:%" PRIuMAX "\n"
|
||||||
@@ -107,6 +122,11 @@ int cmd__path_walk(int argc, const char **argv)
|
|||||||
"tags:%" PRIuMAX "\n",
|
"tags:%" PRIuMAX "\n",
|
||||||
data.commit_nr, data.tree_nr, data.blob_nr, data.tag_nr);
|
data.commit_nr, data.tree_nr, data.blob_nr, data.tag_nr);
|
||||||
|
|
||||||
|
if (info.pl) {
|
||||||
|
clear_pattern_list(info.pl);
|
||||||
|
free(info.pl);
|
||||||
|
}
|
||||||
|
|
||||||
release_revisions(&revs);
|
release_revisions(&revs);
|
||||||
return res;
|
return res;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -721,6 +721,7 @@ integration_tests = [
|
|||||||
't5617-clone-submodules-remote.sh',
|
't5617-clone-submodules-remote.sh',
|
||||||
't5618-alternate-refs.sh',
|
't5618-alternate-refs.sh',
|
||||||
't5619-clone-local-ambiguous-transport.sh',
|
't5619-clone-local-ambiguous-transport.sh',
|
||||||
|
't5620-backfill.sh',
|
||||||
't5621-clone-revision.sh',
|
't5621-clone-revision.sh',
|
||||||
't5700-protocol-v1.sh',
|
't5700-protocol-v1.sh',
|
||||||
't5701-git-serve.sh',
|
't5701-git-serve.sh',
|
||||||
|
|||||||
211
t/t5620-backfill.sh
Executable file
211
t/t5620-backfill.sh
Executable file
@@ -0,0 +1,211 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
test_description='git backfill on partial clones'
|
||||||
|
|
||||||
|
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
|
||||||
|
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
|
||||||
|
|
||||||
|
. ./test-lib.sh
|
||||||
|
|
||||||
|
# We create objects in the 'src' repo.
|
||||||
|
test_expect_success 'setup repo for object creation' '
|
||||||
|
echo "{print \$1}" >print_1.awk &&
|
||||||
|
echo "{print \$2}" >print_2.awk &&
|
||||||
|
|
||||||
|
git init src &&
|
||||||
|
|
||||||
|
mkdir -p src/a/b/c &&
|
||||||
|
mkdir -p src/d/e &&
|
||||||
|
|
||||||
|
for i in 1 2
|
||||||
|
do
|
||||||
|
for n in 1 2 3 4
|
||||||
|
do
|
||||||
|
echo "Version $i of file $n" > src/file.$n.txt &&
|
||||||
|
echo "Version $i of file a/$n" > src/a/file.$n.txt &&
|
||||||
|
echo "Version $i of file a/b/$n" > src/a/b/file.$n.txt &&
|
||||||
|
echo "Version $i of file a/b/c/$n" > src/a/b/c/file.$n.txt &&
|
||||||
|
echo "Version $i of file d/$n" > src/d/file.$n.txt &&
|
||||||
|
echo "Version $i of file d/e/$n" > src/d/e/file.$n.txt &&
|
||||||
|
git -C src add . &&
|
||||||
|
git -C src commit -m "Iteration $n" || return 1
|
||||||
|
done
|
||||||
|
done
|
||||||
|
'
|
||||||
|
|
||||||
|
# Clone 'src' into 'srv.bare' so we have a bare repo to be our origin
|
||||||
|
# server for the partial clone.
|
||||||
|
test_expect_success 'setup bare clone for server' '
|
||||||
|
git clone --bare "file://$(pwd)/src" srv.bare &&
|
||||||
|
git -C srv.bare config --local uploadpack.allowfilter 1 &&
|
||||||
|
git -C srv.bare config --local uploadpack.allowanysha1inwant 1
|
||||||
|
'
|
||||||
|
|
||||||
|
# do basic partial clone from "srv.bare"
|
||||||
|
test_expect_success 'do partial clone 1, backfill gets all objects' '
|
||||||
|
git clone --no-checkout --filter=blob:none \
|
||||||
|
--single-branch --branch=main \
|
||||||
|
"file://$(pwd)/srv.bare" backfill1 &&
|
||||||
|
|
||||||
|
# Backfill with no options gets everything reachable from HEAD.
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/backfill-file-trace" git \
|
||||||
|
-C backfill1 backfill &&
|
||||||
|
|
||||||
|
# We should have engaged the partial clone machinery
|
||||||
|
test_trace2_data promisor fetch_count 48 <backfill-file-trace &&
|
||||||
|
|
||||||
|
# No more missing objects!
|
||||||
|
git -C backfill1 rev-list --quiet --objects --missing=print HEAD >revs2 &&
|
||||||
|
test_line_count = 0 revs2
|
||||||
|
'
|
||||||
|
|
||||||
|
test_expect_success 'do partial clone 2, backfill min batch size' '
|
||||||
|
git clone --no-checkout --filter=blob:none \
|
||||||
|
--single-branch --branch=main \
|
||||||
|
"file://$(pwd)/srv.bare" backfill2 &&
|
||||||
|
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/batch-trace" git \
|
||||||
|
-C backfill2 backfill --min-batch-size=20 &&
|
||||||
|
|
||||||
|
# Batches were used
|
||||||
|
test_trace2_data promisor fetch_count 20 <batch-trace >matches &&
|
||||||
|
test_line_count = 2 matches &&
|
||||||
|
test_trace2_data promisor fetch_count 8 <batch-trace &&
|
||||||
|
|
||||||
|
# No more missing objects!
|
||||||
|
git -C backfill2 rev-list --quiet --objects --missing=print HEAD >revs2 &&
|
||||||
|
test_line_count = 0 revs2
|
||||||
|
'
|
||||||
|
|
||||||
|
test_expect_success 'backfill --sparse without sparse-checkout fails' '
|
||||||
|
git init not-sparse &&
|
||||||
|
test_must_fail git -C not-sparse backfill --sparse 2>err &&
|
||||||
|
grep "problem loading sparse-checkout" err
|
||||||
|
'
|
||||||
|
|
||||||
|
test_expect_success 'backfill --sparse' '
|
||||||
|
git clone --sparse --filter=blob:none \
|
||||||
|
--single-branch --branch=main \
|
||||||
|
"file://$(pwd)/srv.bare" backfill3 &&
|
||||||
|
|
||||||
|
# Initial checkout includes four files at root.
|
||||||
|
git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 44 missing &&
|
||||||
|
|
||||||
|
# Initial sparse-checkout is just the files at root, so we get the
|
||||||
|
# older versions of the four files at tip.
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/sparse-trace1" git \
|
||||||
|
-C backfill3 backfill --sparse &&
|
||||||
|
test_trace2_data promisor fetch_count 4 <sparse-trace1 &&
|
||||||
|
test_trace2_data path-walk paths 5 <sparse-trace1 &&
|
||||||
|
git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 40 missing &&
|
||||||
|
|
||||||
|
# Expand the sparse-checkout to include 'd' recursively. This
|
||||||
|
# engages the algorithm to skip the trees for 'a'. Note that
|
||||||
|
# the "sparse-checkout set" command downloads the objects at tip
|
||||||
|
# to satisfy the current checkout.
|
||||||
|
git -C backfill3 sparse-checkout set d &&
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/sparse-trace2" git \
|
||||||
|
-C backfill3 backfill --sparse &&
|
||||||
|
test_trace2_data promisor fetch_count 8 <sparse-trace2 &&
|
||||||
|
test_trace2_data path-walk paths 15 <sparse-trace2 &&
|
||||||
|
git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 24 missing &&
|
||||||
|
|
||||||
|
# Disabling the --sparse option (on by default) will download everything
|
||||||
|
git -C backfill3 backfill --no-sparse &&
|
||||||
|
git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 0 missing
|
||||||
|
'
|
||||||
|
|
||||||
|
test_expect_success 'backfill --sparse without cone mode (positive)' '
|
||||||
|
git clone --no-checkout --filter=blob:none \
|
||||||
|
--single-branch --branch=main \
|
||||||
|
"file://$(pwd)/srv.bare" backfill4 &&
|
||||||
|
|
||||||
|
# No blobs yet
|
||||||
|
git -C backfill4 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 48 missing &&
|
||||||
|
|
||||||
|
# Define sparse-checkout by filename regardless of parent directory.
|
||||||
|
# This downloads 6 blobs to satisfy the checkout.
|
||||||
|
git -C backfill4 sparse-checkout set --no-cone "**/file.1.txt" &&
|
||||||
|
git -C backfill4 checkout main &&
|
||||||
|
|
||||||
|
# Track new blob count
|
||||||
|
git -C backfill4 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 42 missing &&
|
||||||
|
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/no-cone-trace1" git \
|
||||||
|
-C backfill4 backfill --sparse &&
|
||||||
|
test_trace2_data promisor fetch_count 6 <no-cone-trace1 &&
|
||||||
|
|
||||||
|
# This walk needed to visit all directories to search for these paths.
|
||||||
|
test_trace2_data path-walk paths 12 <no-cone-trace1 &&
|
||||||
|
git -C backfill4 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 36 missing
|
||||||
|
'
|
||||||
|
|
||||||
|
test_expect_success 'backfill --sparse without cone mode (negative)' '
|
||||||
|
git clone --no-checkout --filter=blob:none \
|
||||||
|
--single-branch --branch=main \
|
||||||
|
"file://$(pwd)/srv.bare" backfill5 &&
|
||||||
|
|
||||||
|
# No blobs yet
|
||||||
|
git -C backfill5 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 48 missing &&
|
||||||
|
|
||||||
|
# Define sparse-checkout by filename regardless of parent directory.
|
||||||
|
# This downloads 18 blobs to satisfy the checkout
|
||||||
|
git -C backfill5 sparse-checkout set --no-cone "**/file*" "!**/file.1.txt" &&
|
||||||
|
git -C backfill5 checkout main &&
|
||||||
|
|
||||||
|
# Track new blob count
|
||||||
|
git -C backfill5 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 30 missing &&
|
||||||
|
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/no-cone-trace2" git \
|
||||||
|
-C backfill5 backfill --sparse &&
|
||||||
|
test_trace2_data promisor fetch_count 18 <no-cone-trace2 &&
|
||||||
|
|
||||||
|
# This walk needed to visit all directories to search for these paths, plus
|
||||||
|
# 12 extra "file.?.txt" paths than the previous test.
|
||||||
|
test_trace2_data path-walk paths 24 <no-cone-trace2 &&
|
||||||
|
git -C backfill5 rev-list --quiet --objects --missing=print HEAD >missing &&
|
||||||
|
test_line_count = 12 missing
|
||||||
|
'
|
||||||
|
|
||||||
|
. "$TEST_DIRECTORY"/lib-httpd.sh
|
||||||
|
start_httpd
|
||||||
|
|
||||||
|
test_expect_success 'create a partial clone over HTTP' '
|
||||||
|
SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
|
||||||
|
rm -rf "$SERVER" repo &&
|
||||||
|
git clone --bare "file://$(pwd)/src" "$SERVER" &&
|
||||||
|
test_config -C "$SERVER" uploadpack.allowfilter 1 &&
|
||||||
|
test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
|
||||||
|
|
||||||
|
git clone --no-checkout --filter=blob:none \
|
||||||
|
"$HTTPD_URL/smart/server" backfill-http
|
||||||
|
'
|
||||||
|
|
||||||
|
test_expect_success 'backfilling over HTTP succeeds' '
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/backfill-http-trace" git \
|
||||||
|
-C backfill-http backfill &&
|
||||||
|
|
||||||
|
# We should have engaged the partial clone machinery
|
||||||
|
test_trace2_data promisor fetch_count 48 <backfill-http-trace &&
|
||||||
|
|
||||||
|
# Confirm all objects are present, none missing.
|
||||||
|
git -C backfill-http rev-list --objects --all >rev-list-out &&
|
||||||
|
awk "{print \$1;}" <rev-list-out >oids &&
|
||||||
|
GIT_TRACE2_EVENT="$(pwd)/walk-trace" git -C backfill-http \
|
||||||
|
cat-file --batch-check <oids >batch-out &&
|
||||||
|
! grep missing batch-out
|
||||||
|
'
|
||||||
|
|
||||||
|
# DO NOT add non-httpd-specific tests here, because the last part of this
|
||||||
|
# test script is only executed when httpd is available and enabled.
|
||||||
|
|
||||||
|
test_done
|
||||||
@@ -176,6 +176,38 @@ test_expect_success 'branches and indexed objects mix well' '
|
|||||||
test_cmp_sorted expect out
|
test_cmp_sorted expect out
|
||||||
'
|
'
|
||||||
|
|
||||||
|
test_expect_success 'base & topic, sparse' '
|
||||||
|
cat >patterns <<-EOF &&
|
||||||
|
/*
|
||||||
|
!/*/
|
||||||
|
/left/
|
||||||
|
EOF
|
||||||
|
|
||||||
|
test-tool path-walk --stdin-pl -- base topic <patterns >out &&
|
||||||
|
|
||||||
|
cat >expect <<-EOF &&
|
||||||
|
0:commit::$(git rev-parse topic)
|
||||||
|
0:commit::$(git rev-parse base)
|
||||||
|
0:commit::$(git rev-parse base~1)
|
||||||
|
0:commit::$(git rev-parse base~2)
|
||||||
|
1:tree::$(git rev-parse topic^{tree})
|
||||||
|
1:tree::$(git rev-parse base^{tree})
|
||||||
|
1:tree::$(git rev-parse base~1^{tree})
|
||||||
|
1:tree::$(git rev-parse base~2^{tree})
|
||||||
|
2:blob:a:$(git rev-parse base~2:a)
|
||||||
|
3:tree:left/:$(git rev-parse base:left)
|
||||||
|
3:tree:left/:$(git rev-parse base~2:left)
|
||||||
|
4:blob:left/b:$(git rev-parse base~2:left/b)
|
||||||
|
4:blob:left/b:$(git rev-parse base:left/b)
|
||||||
|
blobs:3
|
||||||
|
commits:4
|
||||||
|
tags:0
|
||||||
|
trees:6
|
||||||
|
EOF
|
||||||
|
|
||||||
|
test_cmp_sorted expect out
|
||||||
|
'
|
||||||
|
|
||||||
test_expect_success 'topic only' '
|
test_expect_success 'topic only' '
|
||||||
test-tool path-walk -- topic >out &&
|
test-tool path-walk -- topic >out &&
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user