mirror of
https://github.com/git/git.git
synced 2026-03-28 01:20:11 +01:00
Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. Add the --path-walk option to the performance tests in p5313. For the microsoft/fluentui repo [1] checked out at a specific commit [2], the --path-walk tests in p5313 look like this: Test this tree ------------------------------------------------------------------------- 5313.18: thin pack with --path-walk 0.08(0.06+0.02) 5313.19: thin pack size with --path-walk 18.4K 5313.20: big pack with --path-walk 2.10(7.80+0.26) 5313.21: big pack size with --path-walk 19.8M 5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17) 5313.23: shallow pack size with --path-walk 33.6M 5313.24: repack with --path-walk 81.29(96.08+0.71) 5313.25: repack size with --path-walk 142.5M [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 Along with the earlier tests in p5313, I'll instead reformat the comparison as follows: Repack Method Pack Size Time --------------------------------------- Hash v1 439.4M 87.24s Hash v2 161.7M 21.51s Path Walk 142.5M 81.29s There are a few things to notice here: 1. The benefits of --name-hash-version=2 over --name-hash-version=1 are significant, but --path-walk still compresses better than that option. 2. The --path-walk command is still using --name-hash-version=1 for the second pass of delta computation, using the increased name hash collisions as a potential method for opportunistic compression on top of the path-focused compression. 3. The --path-walk algorithm is currently sequential and does not use multiple threads for delta compression. Threading will be implemented in a future change so the computation time will improve to better compete in this metric. There are small benefits in size for my copy of the Git repository: Repack Method Pack Size Time --------------------------------------- Hash v1 248.8M 30.44s Hash v2 249.0M 30.15s Path Walk 213.2M 142.50s As well as in the nodejs/node repository [3]: Repack Method Pack Size Time --------------------------------------- Hash v1 739.9M 71.18s Hash v2 764.6M 67.82s Path Walk 698.1M 208.10s [3] https://github.com/nodejs/node This benefit also repeats in my copy of the Linux kernel repository: Repack Method Pack Size Time --------------------------------------- Hash v1 2.5G 554.41s Hash v2 2.5G 549.62s Path Walk 2.2G 1562.36s It is important to see that even when the repository shape does not have many name-hash collisions, there is a slight space boost to be found using this method. As this repacking strategy was released in Git for Windows 2.47.0, some users have reported cases where the --path-walk compression is slightly worse than the --name-hash-version=2 option. In those cases, it may be beneficial to combine the two options. However, there has not been a released version of Git that has both options and I don't have access to these repos for testing. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git performance tests
=====================
This directory holds performance testing scripts for git tools. The
first part of this document describes the various ways in which you
can run them.
When fixing the tools or adding enhancements, you are strongly
encouraged to add tests in this directory to cover what you are
trying to fix or enhance. The later part of this short document
describes how your test scripts should be organized.
Running Tests
-------------
The easiest way to run tests is to say "make". This runs all
the tests on the current git repository.
=== Running 2 tests in this tree ===
[...]
Test this tree
---------------------------------------------------------
0001.1: rev-list --all 0.54(0.51+0.02)
0001.2: rev-list --all --objects 6.14(5.99+0.11)
7810.1: grep worktree, cheap regex 0.16(0.16+0.35)
7810.2: grep worktree, expensive regex 7.90(29.75+0.37)
7810.3: grep --cached, cheap regex 3.07(3.02+0.25)
7810.4: grep --cached, expensive regex 9.39(30.57+0.24)
Output format is in seconds "Elapsed(User + System)"
You can compare multiple repositories and even git revisions with the
'run' script:
$ ./run . origin/next /path/to/git-tree p0001-rev-list.sh
where . stands for the current git tree. The full invocation is
./run [<revision|directory>...] [--] [<test-script>...]
A '.' argument is implied if you do not pass any other
revisions/directories.
You can also manually test this or another git build tree, and then
call the aggregation script to summarize the results:
$ ./p0001-rev-list.sh
[...]
$ ./run /path/to/other/git -- ./p0001-rev-list.sh
[...]
$ ./aggregate.perl . /path/to/other/git ./p0001-rev-list.sh
aggregate.perl has the same invocation as 'run', it just does not run
anything beforehand.
You can set the following variables (also in your config.mak):
GIT_PERF_REPEAT_COUNT
Number of times a test should be repeated for best-of-N
measurements. Defaults to 3.
GIT_PERF_MAKE_OPTS
Options to use when automatically building a git tree for
performance testing. E.g., -j6 would be useful. Passed
directly to make as "make $GIT_PERF_MAKE_OPTS".
GIT_PERF_MAKE_COMMAND
An arbitrary command that'll be run in place of the make
command, if set the GIT_PERF_MAKE_OPTS variable is
ignored. Useful in cases where source tree changes might
require issuing a different make command to different
revisions.
This can be (ab)used to monkeypatch or otherwise change the
tree about to be built. Note that the build directory can be
re-used for subsequent runs so the make command might get
executed multiple times on the same tree, but don't count on
any of that, that's an implementation detail that might change
in the future.
GIT_PERF_REPO
GIT_PERF_LARGE_REPO
Repositories to copy for the performance tests. The normal
repo should be at least git.git size. The large repo should
probably be about linux.git size for optimal results.
Both default to the git.git you are running from.
GIT_PERF_EXTRA
Boolean to enable additional tests. Most test scripts are
written to detect regressions between two versions of Git, and
the output will compare timings for individual tests between
those versions. Some scripts have additional tests which are not
run by default, that show patterns within a single version of
Git (e.g., performance of index-pack as the number of threads
changes). These can be enabled with GIT_PERF_EXTRA.
GIT_PERF_USE_SCALAR
Boolean indicating whether to register test repo(s) with Scalar
before executing tests.
You can also pass the options taken by ordinary git tests; the most
useful one is:
--root=<directory>::
Create "trash" directories used to store all temporary data during
testing under <directory>, instead of the t/ directory.
Using this option with a RAM-based filesystem (such as tmpfs)
can massively speed up the test suite.
Naming Tests
------------
The performance test files are named as:
pNNNN-commandname-details.sh
where N is a decimal digit. The same conventions for choosing NNNN as
for normal tests apply.
Writing Tests
-------------
The perf script starts much like a normal test script, except it
sources perf-lib.sh:
#!/bin/sh
#
# Copyright (c) 2005 Junio C Hamano
#
test_description='xxx performance test'
. ./perf-lib.sh
After that you will want to use some of the following:
test_perf_fresh_repo # sets up an empty repository
test_perf_default_repo # sets up a "normal" repository
test_perf_large_repo # sets up a "large" repository
test_perf_default_repo sub # ditto, in a subdir "sub"
test_checkout_worktree # if you need the worktree too
At least one of the first two is required!
You can use test_expect_success as usual. In both test_expect_success
and in test_perf, running "git" points to the version that is being
perf-tested. The $MODERN_GIT variable points to the git wrapper for the
currently checked-out version (i.e., the one that matches the t/perf
scripts you are running). This is useful if your setup uses commands
that only work with newer versions of git than what you might want to
test (but obviously your new commands must still create a state that can
be used by the older version of git you are testing).
For actual performance tests, use
test_perf 'descriptive string' '
command1 &&
command2
'
test_perf spawns a subshell, for lack of better options. This means
that
* you _must_ export all variables that you need in the subshell
* you _must_ flag all variables that you want to persist from the
subshell with 'test_export':
test_perf 'descriptive string' '
foo=$(git rev-parse HEAD) &&
test_export foo
'
The so-exported variables are automatically marked for export in the
shell executing the perf test. For your convenience, test_export is
the same as export in the main shell.
This feature relies on a bit of magic using 'set' and 'source'.
While we have tried to make sure that it can cope with embedded
whitespace and other special characters, it will not work with
multi-line data.
Rather than tracking the performance by run-time as `test_perf` does, you
may also track output size by using `test_size`. The stdout of the
function should be a single numeric value, which will be captured and
shown in the aggregated output. For example:
test_perf 'time foo' '
./foo >foo.out
'
test_size 'output size'
wc -c <foo.out
'
might produce output like:
Test origin HEAD
-------------------------------------------------------------
1234.1 time foo 0.37(0.79+0.02) 0.26(0.51+0.02) -29.7%
1234.2 output size 4.3M 3.6M -14.7%
The item being measured (and its units) is up to the test; the context
and the test title should make it clear to the user whether bigger or
smaller numbers are better. Unlike test_perf, the test code will only be
run once, since output sizes tend to be more deterministic than timings.