Commit Graph

17220 Commits

Author SHA1 Message Date
Derrick Stolee
45f4eeb291 pack-bitmap-write: relax unique revwalk condition
The previous commits improved the bitmap computation process for very
long, linear histories with many refs by removing quadratic growth in
how many objects were walked. The strategy of computing "intermediate
commits" using bitmasks for which refs can reach those commits
partitioned the poset of reachable objects so each part could be walked
exactly once. This was effective for linear histories.

However, there was a (significant) drawback: wide histories with many
refs had an explosion of memory costs to compute the commit bitmasks
during the exploration that discovers these intermediate commits. Since
these wide histories are unlikely to repeat walking objects, the benefit
of walking objects multiple times was not expensive before. But now, the
commit walk *before computing bitmaps* is incredibly expensive.

In an effort to discover a happy medium, this change reduces the walk
for intermediate commits to only the first-parent history. This focuses
the walk on how the histories converge, which still has significant
reduction in repeat object walks. It is still possible to create
quadratic behavior in this version, but it is probably less likely in
realistic data shapes.

Here is some data taken on a fresh clone of the kernel:

             |   runtime (sec)    |   peak heap (GB)   |
             |                    |                    |
             |   from  |   with   |   from  |   with   |
             | scratch | existing | scratch | existing |
  -----------+---------+----------+---------+-----------
    original |  64.044 |   83.241 |   2.088 |    2.194 |
  last patch |  45.049 |   37.624 |   2.267 |    2.334 |
  this patch |  88.478 |   53.218 |   2.157 |    2.224 |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:49:07 -08:00
Derrick Stolee
089f751360 pack-bitmap-write: build fewer intermediate bitmaps
The bitmap_writer_build() method calls bitmap_builder_init() to
construct a list of commits reachable from the selected commits along
with a "reverse graph". This reverse graph has edges pointing from a
commit to other commits that can reach that commit. After computing a
reachability bitmap for a commit, the values in that bitmap are then
copied to the reachability bitmaps across the edges in the reverse
graph.

We can now relax the role of the reverse graph to greatly reduce the
number of intermediate reachability bitmaps we compute during this
reverse walk. The end result is that we walk objects the same number of
times as before when constructing the reachability bitmaps, but we also
spend much less time copying bits between bitmaps and have much lower
memory pressure in the process.

The core idea is to select a set of "important" commits based on
interactions among the sets of commits reachable from each selected commit.

The first technical concept is to create a new 'commit_mask' member in the
bb_commit struct. Note that the selected commits are provided in an
ordered array. The first thing to do is to mark the ith bit in the
commit_mask for the ith selected commit. As we walk the commit-graph, we
copy the bits in a commit's commit_mask to its parents. At the end of
the walk, the ith bit in the commit_mask for a commit C stores a boolean
representing "The ith selected commit can reach C."

As we walk, we will discover non-selected commits that are important. We
will get into this later, but those important commits must also receive
bit positions, growing the width of the bitmasks as we walk. At the true
end of the walk, the ith bit means "the ith _important_ commit can reach
C."

MAXIMAL COMMITS
---------------

We use a new 'maximal' bit in the bb_commit struct to represent whether
a commit is important or not. The term "maximal" comes from the
partially-ordered set of commits in the commit-graph where C >= P if P
is a parent of C, and then extending the relationship transitively.
Instead of taking the maximal commits across the entire commit-graph, we
instead focus on selecting each commit that is maximal among commits
with the same bits on in their commit_mask. This definition is
important, so let's consider an example.

Suppose we have three selected commits A, B, and C. These are assigned
bitmasks 100, 010, and 001 to start. Each of these can be marked as
maximal immediately because they each will be the uniquely maximal
commit that contains their own bit. Keep in mind that that these commits
may have different bitmasks after the walk; for example, if B can reach
C but A cannot, then the final bitmask for C is 011. Even in these
cases, C would still be a maximal commit among all commits with the
third bit on in their masks.

Now define sets X, Y, and Z to be the sets of commits reachable from A,
B, and C, respectively. The intersections of these sets correspond to
different bitmasks:

 * 100: X - (Y union Z)
 * 010: Y - (X union Z)
 * 001: Z - (X union Y)
 * 110: (X intersect Y) - Z
 * 101: (X intersect Z) - Y
 * 011: (Y intersect Z) - X
 * 111: X intersect Y intersect Z

This can be visualized with the following Hasse diagram:

	100    010    001
         | \  /   \  / |
         |  \/     \/  |
         |  /\     /\  |
         | /  \   /  \ |
        110    101    011
          \___  |  ___/
              \ | /
               111

Some of these bitmasks may not be represented, depending on the topology
of the commit-graph. In fact, we are counting on it, since the number of
possible bitmasks is exponential in the number of selected commits, but
is also limited by the total number of commits. In practice, very few
bitmasks are possible because most commits converge on a common "trunk"
in the commit history.

With this three-bit example, we wish to find commits that are maximal
for each bitmask. How can we identify this as we are walking?

As we walk, we visit a commit C. Since we are walking the commits in
topo-order, we know that C is visited after all of its children are
visited. Thus, when we get C from the revision walk we inspect the
'maximal' property of its bb_data and use that to determine if C is truly
important. Its commit_mask is also nearly final. If C is not one of the
originally-selected commits, then assign a bit position to C (by
incrementing num_maximal) and set that bit on in commit_mask. See
"MULTIPLE MAXIMAL COMMITS" below for more detail on this.

Now that the commit C is known to be maximal or not, consider each
parent P of C. Compute two new values:

 * c_not_p : true if and only if the commit_mask for C contains a bit
             that is not contained in the commit_mask for P.

 * p_not_c : true if and only if the commit_mask for P contains a bit
             that is not contained in the commit_mask for P.

If c_not_p is false, then P already has all of the bits that C would
provide to its commit_mask. In this case, move on to other parents as C
has nothing to contribute to P's state that was not already provided by
other children of P.

We continue with the case that c_not_p is true. This means there are
bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or()
to add those bits.

If p_not_c is also true, then set the maximal bit for P to one. This means
that if no other commit has P as a parent, then P is definitely maximal.
This is because no child had the same bitmask. It is important to think
about the maximal bit for P at this point as a temporary state: "P is
maximal based on current information."

In contrast, if p_not_c is false, then set the maximal bit for P to
zero. Further, clear all reverse_edges for P since any edges that were
previously assigned to P are no longer important. P will gain all
reverse edges based on C.

The final thing we need to do is to update the reverse edges for P.
These reverse edges respresent "which closest maximal commits
contributed bits to my commit_mask?" Since C contributed bits to P's
commit_mask in this case, C must add to the reverse edges of P.

If C is maximal, then C is a 'closest' maximal commit that contributed
bits to P. Add C to P's reverse_edges list.

Otherwise, C has a list of maximal commits that contributed bits to its
bitmask (and this list is exactly one element). Add all of these items
to P's reverse_edges list. Be careful to ignore duplicates here.

After inspecting all parents P for a commit C, we can clear the
commit_mask for C. This reduces the memory load to be limited to the
"width" of the commit graph.

Consider our ABC/XYZ example from earlier and let's inspect the state of
the commits for an interesting bitmask, say 011. Suppose that D is the
only maximal commit with this bitmask (in the first three bits). All
other commits with bitmask 011 have D as the only entry in their
reverse_edges list. D's reverse_edges list contains B and C.

COMPUTING REACHABILITY BITMAPS
------------------------------

Now that we have our definition, let's zoom out and consider what
happens with our new reverse graph when computing reachability bitmaps.
We walk the reverse graph in reverse-topo-order, so we visit commits
with largest commit_masks first. After we compute the reachability
bitmap for a commit C, we push the bits in that bitmap to each commit D
in the reverse edge list for C. Then, when we finally visit D we already
have the bits for everything reachable from maximal commits that D can
reach and we only need to walk the objects in the set-difference.

In our ABC/XYZ example, when we finally walk for the commit A we only
need to walk commits with bitmask equal to A's bitmask. If that bitmask
is 100, then we are only walking commits in X - (Y union Z) because the
bitmap already contains the bits for objects reachable from (X intersect
Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps
for the maximal commits with bitmasks 110 and 101).

The behavior is intended to walk each commit (and the trees that commit
introduces) at most once while allocating and copying fewer reachability
bitmaps. There is one caveat: what happens when there are multiple
maximal commits with the same bitmask, with respect to the initial set
of selected commits?

MULTIPLE MAXIMAL COMMITS
------------------------

Earlier, we mentioned that when we discover a new maximal commit, we
assign a new bit position to that commit and set that bit position to
one for that commit. This is absolutely important for interesting
commit-graphs such as git/git and torvalds/linux. The reason is due to
the existence of "butterflies" in the commit-graph partial order.

Here is an example of four commits forming a butterfly:

   I    J
   |\  /|
   | \/ |
   | /\ |
   |/  \|
   M    N
    \  /
     |/
     Q

Here, I and J both have parents M and N. In general, these do not need
to be exact parent relationships, but reachability relationships. The
most important part is that M and N cannot reach each other, so they are
independent in the partial order. If I had commit_mask 10 and J had
commit_mask 01, then M and N would both be assigned commit_mask 11 and
be maximal commits with the bitmask 11. Then, what happens when M and N
can both reach a commit Q? If Q is also assigned the bitmask 11, then it
is not maximal but is reachable from both M and N.

While this is not necessarily a deal-breaker for our abstract definition
of finding maximal commits according to a given bitmask, we have a few
issues that can come up in our larger picture of constructing
reachability bitmaps.

In particular, if we do not also consider Q to be a "maximal" commit,
then we will walk commits reachable from Q twice: once when computing
the reachability bitmap for M and another time when computing the
reachability bitmap for N. This becomes much worse if the topology
continues this pattern with multiple butterflies.

The solution has already been mentioned: each of M and N are assigned
their own bits to the bitmask and hence they become uniquely maximal for
their bitmasks. Finally, Q also becomes maximal and thus we do not need
to walk its commits multiple times. The final bitmasks for these commits
are as follows:

  I:10       J:01
   |\        /|
   | \ _____/ |
   | /\____   |
   |/      \  |
   M:111    N:1101
        \  /
       Q:1111

Further, Q's reverse edge list is { M, N }, while M and N both have
reverse edge list { I, J }.

PERFORMANCE MEASUREMENTS
------------------------

Now that we've spent a LOT of time on the theory of this algorithm,
let's show that this is actually worth all that effort.

To test the performance, use GIT_TRACE2_PERF=1 when running
'git repack -abd' in a repository with no existing reachability bitmaps.
This avoids any issues with keeping existing bitmaps to skew the
numbers.

Inspect the "building_bitmaps_total" region in the trace2 output to
focus on the portion of work that is affected by this change. Here are
the performance comparisons for a few repositories. The timings are for
the following versions of Git: "multi" is the timing from before any
reverse graph is constructed, where we might perform multiple
traversals. "reverse" is for the previous change where the reverse graph
has every reachable commit.  Finally "maximal" is the version introduced
here where the reverse graph only contains the maximal commits.

      Repository: git/git
           multi: 2.628 sec
         reverse: 2.344 sec
         maximal: 2.047 sec

      Repository: torvalds/linux
           multi: 64.7 sec
         reverse: 205.3 sec
         maximal: 44.7 sec

So in all cases we've not only recovered any time lost to switching to
the reverse-edge algorithm, but we come out ahead of "multi" in all
cases. Likewise, peak heap has gone back to something reasonable:

      Repository: torvalds/linux
           multi: 2.087 GB
         reverse: 3.141 GB
         maximal: 2.288 GB

While I do not have access to full fork networks on GitHub, Peff has run
this algorithm on the chromium/chromium fork network and reported a
change from 3 hours to ~233 seconds. That network is particularly
beneficial for this approach because it has a long, linear history along
with many tags. The "multi" approach was obviously quadratic and the new
approach is linear.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:48:17 -08:00
Derrick Stolee
1467b9572a t5310: add branch-based checks
The current rev-list tests that check the bitmap data only work on HEAD
instead of multiple branches. Expand the test cases to handle both
'master' and 'other' branches.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:48:17 -08:00
Jeff King
c5cd749076 t5310: drop size of truncated ewah bitmap
We truncate the .bitmap file to 512 bytes and expect to run into
problems reading an individual ewah file. But this length is somewhat
arbitrary, and just happened to work when the test was added in
9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14).

An upcoming commit will change the size of the history we create in the
test repo, which will cause this test to fail. We can future-proof it a
bit more by reducing the size of the truncated bitmap file.

Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:48:15 -08:00
Jeff King
ec6c7b4367 pack-bitmap: bounds-check size of cache extension
A .bitmap file may have a "name hash cache" extension, which puts a
sequence of uint32_t values (one per object) at the end of the file.
When we see a flag indicating this extension, we blindly subtract the
appropriate number of bytes from our available length. However, if the
.bitmap file is too short, we'll underflow our length variable and wrap
around, thinking we have a very large length. This can lead to reading
out-of-bounds bytes while loading individual ewah bitmaps.

We can fix this by checking the number of available bytes when we parse
the header. The existing "truncated bitmap" test is now split into two
tests: one where we don't have this extension at all (and hence actually
do try to read a truncated ewah bitmap) and one where we realize
up-front that we can't even fit in the cache structure. We'll check
stderr in each case to make sure we hit the error we're expecting.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:48:15 -08:00
Junio C Hamano
f2061f6982 Merge branch 'js/test-file-size'
Test clean-up.

* js/test-file-size:
  tests: consolidate the `file_size` function into `test-lib-functions.sh`
2020-11-11 13:18:39 -08:00
Junio C Hamano
1e8ed50309 Merge branch 'js/test-whitespace-fixes'
Test code clean-up.

* js/test-whitespace-fixes:
  t9603: use tabs for indentation
  t5570: remove trailing padding
  t5400,t5402: consistently indent with tabs, not with spaces
  t3427: adjust stale comment
  t3406: indent with tabs, not spaces
  t1004: insert missing "branch" in a message
2020-11-11 13:18:38 -08:00
Junio C Hamano
3fc24194c2 Merge branch 'rs/worktree-list-show-locked'
Typofix.

* rs/worktree-list-show-locked:
  t2402: fix typo
2020-11-11 13:18:38 -08:00
Junio C Hamano
8502a5782b Merge branch 'js/default-branch-name-adjust-t5411'
Prepare a test script to transition of the default branch name to
'main'.

* js/default-branch-name-adjust-t5411:
  t5411: finish preparing for `main` being the default branch name
  t5411: adjust the remaining support files for init.defaultBranch=main
  t5411: start adjusting the support files for init.defaultBranch=main
  t5411: start using the default branch name "main"
2020-11-09 14:06:29 -08:00
Junio C Hamano
4560eae44f Merge branch 'fc/zsh-completion'
Zsh autocompletion (in contrib/) update.

* fc/zsh-completion: (29 commits)
  zsh: update copyright notices
  completion: bash: remove old compat wrappers
  completion: bash: cleanup cygwin check
  completion: bash: trivial cleanup
  completion: zsh: add simple version check
  completion: zsh: trivial simplification
  completion: zsh: add alias descriptions
  completion: zsh: improve command tags
  completion: zsh: refactor command completion
  completion: zsh: shuffle functions around
  completion: zsh: simplify file_direct
  completion: zsh: simplify nl_append
  completion: zsh: trivial cleanup
  completion: zsh: simplify direct compadd
  completion: zsh: simplify compadd functions
  completion: zsh: fix splitting of words
  completion: zsh: add missing direct_append
  completion: fix conflict with bashcomp
  completion: zsh: fix completion for --no-.. options
  completion: bash: remove zsh wrapper
  ...
2020-11-09 14:06:29 -08:00
Junio C Hamano
caf3ca7786 Merge branch 'jk/sideband-more-error-checking'
The code to detect premature EOF in the sideband demultiplexer has
been cleaned up.

* jk/sideband-more-error-checking:
  sideband: diagnose more sideband anomalies
2020-11-09 14:06:29 -08:00
Junio C Hamano
ecf95d938b Merge branch 'ab/git-remote-exit-code'
Exit codes from "git remote add" etc. were not usable by scripted
callers.

* ab/git-remote-exit-code:
  remote: add meaningful exit code on missing/existing
2020-11-09 14:06:26 -08:00
Junio C Hamano
4c7eb63d2d Merge branch 'pb/ref-filter-with-crlf'
A commit and tag object may have CR at the end of each and
every line (you can create such an object with hash-object or
using --cleanup=verbatim to decline the default clean-up
action), but it would make it impossible to have a blank line
to separate the title from the body of the message.  Be lenient
and accept a line with lone CR on it as a blank line, too.

* pb/ref-filter-with-crlf:
  log, show: add tests for messages containing CRLF
  ref-filter: handle CRLF at end-of-line more gracefully
2020-11-09 14:06:26 -08:00
Junio C Hamano
92d6bd2e90 Merge branch 'jk/checkout-index-errors'
"git checkout-index" did not consistently signal an error with its
exit status.

* jk/checkout-index-errors:
  checkout-index: propagate errors to exit code
  checkout-index: drop error message from empty --stage=all
2020-11-09 14:06:26 -08:00
Junio C Hamano
65681e75c1 Merge branch 'jk/perl-warning'
Dev support.

* jk/perl-warning:
  perl: check for perl warnings while running tests
2020-11-09 14:06:25 -08:00
Junio C Hamano
bf69da56c9 Merge branch 'nk/diff-files-vs-fsmonitor'
"git diff" and other commands that share the same machinery to
compare with working tree files have been taught to take advantage
of the fsmonitor data when available.

* nk/diff-files-vs-fsmonitor:
  p7519-fsmonitor: add a git add benchmark
  p7519-fsmonitor: refactor to avoid code duplication
  perf lint: add make test-lint to perf tests
  t/perf: add fsmonitor perf test for git diff
  t/perf/p7519-fsmonitor.sh: warm cache on first git status
  t/perf/README: elaborate on output format
  fsmonitor: use fsmonitor data in `git diff`
2020-11-09 14:06:25 -08:00
Junio C Hamano
b3ae46a936 Merge branch 'as/tests-cleanup'
Micro clean-up of a couple of test scripts.

* as/tests-cleanup:
  t2200,t9832: avoid using 'git' upstream in a pipe
2020-11-09 14:06:25 -08:00
Junio C Hamano
0a1cceb9bd Merge branch 'en/dir-rename-tests'
More preliminary tests have been added to document desired outcome
of various "directory rename" situations.

* en/dir-rename-tests:
  t6423: more involved rules for renaming directories into each other
  t6423: update directory rename detection tests with new rule
  t6423: more involved directory rename test
  directory-rename-detection.txt: update references to regression tests
2020-11-09 14:06:25 -08:00
Johannes Schindelin
f6bcd9a8a4 t9603: use tabs for indentation
This patch will let the new `check-whitespace` GitHub workflow be happy
with the upcoming patch series that wants to search-and-replace `master`
with `main` in t9603 and some other test scripts.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
d98f272674 t5570: remove trailing padding
Two blocks in t5570 want to align the closing double quotes, padding
with spaces if needed. Since the maximum length of those lines is
defined by the branch name `master`, the upcoming rename to `main` would
unalign the quotes.

But then, it is unclear how those aligned closing quotes should help
readability anyway, so let's just remove that padding altogether.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
739edb2a73 t5400,t5402: consistently indent with tabs, not with spaces
This patch actually prepares for the upcoming patches to replace
`master` with `main` in these tests: we do not want those changes to be
flagged by the new `check-whitespace` GitHub workflow (even if those
changes do not introduce the whitespace issues, they touch lines
affected by those issues without fixing them).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
adbcf53e3f t3427: adjust stale comment
In b6211b89eb (tests: avoid variations of the `master` branch name,
2020-09-26), the `master[123]` branch names were renamed to
`topic_[123]`. A non-literal mention of the corresponding files was
missed in that commit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
0f321f95c7 t3406: indent with tabs, not spaces
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
0b746f585e t1004: insert missing "branch" in a message
The message in question reads awkward with the name "master", but will
be even more confusing once that is renamed to "main". Let's adjust it
in advance of said rename.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
53b67a801b tests: consolidate the file_size function into test-lib-functions.sh
In 8de7eeb54b (compression: unify pack.compression configuration
parsing, 2016-11-15), we introduced identical copies of the `file_size`
helper into three test scripts, with the plan to eventually consolidate
them into a single copy.

Let's do that, and adjust the function name to adhere to the `test_*`
naming convention.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-06 22:05:08 -08:00
Johannes Schindelin
8d88931123 t2402: fix typo
In c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11), we introduced a test case that wanted to talk about
"worktrees" but talked about "worktress" instead. Let's fix that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-03 12:15:55 -08:00
Junio C Hamano
596ad33080 Merge branch 'js/default-branch-name-part-4-minus-1'
Adjust tests so that they won't scream when the default initial
branch name is changed to 'main'.

* js/default-branch-name-part-4-minus-1:
  t1400: prepare for `main` being default branch name
  tests: prepare aligned mentions of the default branch name
  t9902: prepare a test for the upcoming default branch name
  t3200: prepare for `main` being shorter than `master`
  t5703: adjust a test case for the upcoming default branch name
  t6200: adjust suppression pattern to also match "main"
  tests: start moving to a different default main branch name
  t9801: use `--` in preparation for default branch rename
  fmt-merge-msg: also suppress "into main" by default
2020-11-02 13:17:46 -08:00
Junio C Hamano
292e53fa9d Merge branch 've/userdiff-bash'
The userdiff pattern learned to identify the function definition in
POSIX shells and bash.

* ve/userdiff-bash:
  userdiff: support Bash
2020-11-02 13:17:46 -08:00
Junio C Hamano
f74f5e71d5 Merge branch 'js/t7006-cleanup'
Code clean-up.

* js/t7006-cleanup:
  t7006: Use test_path_is_* functions in test script
2020-11-02 13:17:45 -08:00
Junio C Hamano
1ae0949a03 Merge branch 'mk/diff-ignore-regex'
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.

* mk/diff-ignore-regex:
  diff: add -I<regex> that ignores matching changes
  merge-base, xdiff: zero out xpparam_t structures
2020-11-02 13:17:44 -08:00
Junio C Hamano
c23cd78e81 Merge branch 'jt/apply-reverse-twice'
"git apply -R" did not handle patches that touch the same path
twice correctly, which has been corrected.  This is most relevant
in a patch that changes a path from a regular file to a symbolic
link (and vice versa).

* jt/apply-reverse-twice:
  apply: when -R, also reverse list of sections
2020-11-02 13:17:43 -08:00
Junio C Hamano
73af6a4fab Merge branch 'sc/sequencer-gpg-octopus'
"git rebase --rebase-merges" did not correctly pass --gpg-sign
command line option to underlying "git merge" when replaying a merge
using non-default merge strategy or when replaying an octopus merge
(because replaying a two-head merge with the default strategy was
done in a separate codepath, the problem did not trigger for most
users), which has been corrected.

* sc/sequencer-gpg-octopus:
  t3435: add tests for rebase -r GPG signing
  sequencer: pass explicit --no-gpg-sign to merge
  sequencer: fix gpg option passed to merge subcommand
2020-11-02 13:17:43 -08:00
Junio C Hamano
9879f3b3f6 Merge branch 'en/test-selector'
Our test scripts can be told to run only individual pieces while
skipping others with the "--run=..." option; they were taught to
take a substring of test title, in addition to numbers, to name the
test pieces to run.

* en/test-selector:
  test-lib: reduce verbosity of skipped tests
  t6006, t6012: adjust tests to use 'setup' instead of synonyms
  test-lib: allow selecting tests by substring/glob with --run
2020-11-02 13:17:43 -08:00
Junio C Hamano
e0f6ad2984 Merge branch 'tk/credential-config'
"git credential' didn't honor the core.askPass configuration
variable (among other things), which has been corrected.

* tk/credential-config:
  credential: load default config
2020-11-02 13:17:39 -08:00
Junio C Hamano
b6fb70c985 Merge branch 'dl/diff-merge-base'
"git diff A...B" learned "git diff --merge-base A B", which is a
longer short-hand to say the same thing.

* dl/diff-merge-base:
  contrib/completion: complete `git diff --merge-base`
  builtin/diff-tree: learn --merge-base
  builtin/diff-index: learn --merge-base
  t4068: add --merge-base tests
  diff-lib: define diff_get_merge_base()
  diff-lib: accept option flags in run_diff_index()
  contrib/completion: extract common diff/difftool options
  git-diff.txt: backtick quote command text
  git-diff-index.txt: make --cached description a proper sentence
  t4068: remove unnecessary >tmp
2020-11-02 13:17:39 -08:00
Junio C Hamano
0be2d65132 Merge branch 'ds/maintenance-commit-graph-auto-fix'
Test-coverage enhancement of running commit-graph task "git
maintenance" as needed led to discovery and fix of a bug.

* ds/maintenance-commit-graph-auto-fix:
  maintenance: core.commitGraph=false prevents writes
  maintenance: test commit-graph auto condition
2020-11-02 13:17:39 -08:00
Junio C Hamano
307a53dd99 Merge branch 'ds/commit-graph-merging-fix'
When "git commit-graph" detects the same commit recorded more than
once while it is merging the layers, it used to die.  The code now
ignores all but one of them and continues.

* ds/commit-graph-merging-fix:
  commit-graph: don't write commit-graph when disabled
  commit-graph: ignore duplicates when merging layers
2020-11-02 13:17:39 -08:00
Junio C Hamano
d5c2d1a0aa Merge branch 'es/test-cmp-typocatcher'
A test helper "test_cmp A B" was taught to diagnose missing files A
or B as a bug in test, but some tests legitimately wanted to notice
a failure to even create file B as an error, in addition to leaving
the expected result in it, and were misdiagnosed as a bug.  This
has been corrected.

* es/test-cmp-typocatcher:
  Revert "test_cmp: diagnose incorrect arguments"
2020-11-02 13:17:38 -08:00
Junio C Hamano
cd47bbe164 Merge branch 'jk/fast-import-marks-alloc-fix'
"git fast-import" wasted a lot of memory when many marks were in use.

* jk/fast-import-marks-alloc-fix:
  fast-import: fix over-allocation of marks storage
2020-11-02 13:17:37 -08:00
Junio C Hamano
6b9f5096eb Merge branch 'js/avoid-split-sideband-message'
The side-band status report can be sent at the same time as the
primary payload multiplexed, but the demultiplexer on the receiving
end incorrectly split a single status report into two, which has
been corrected.

* js/avoid-split-sideband-message:
  test-pkt-line: drop colon from sideband identity
  sideband: report unhandled incomplete sideband messages as bugs
  sideband: avoid reporting incomplete sideband messages
2020-11-02 13:17:37 -08:00
Johannes Schindelin
5d5f4ea30d t5411: finish preparing for main being the default branch name
In addition to the trivial search-and-replace performed over the course
of the previous three commits, there is one test in t5411 that depends
on the length of the default branch name.

Adjust it and use `main` as the default branch name in this test.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:17 -07:00
Johannes Schindelin
a9568dba41 t5411: adjust the remaining support files for init.defaultBranch=main
This trick was performed via

	$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
		-e 's/Master/Main/g' -- t/t5411/*

In the previous commit, we adjusted roughly half of the support files,
to stay under the 100kB limit (mails larger than that are rejected by
the Git mailing list).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:17 -07:00
Johannes Schindelin
8f0a264524 t5411: start adjusting the support files for init.defaultBranch=main
This trick was performed via

	$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
		-e 's/Master/Main/g' -- t/t5411/test-00[3-5]*

We do not convert the files in `t/t5411/` in one go because the patch
would be too big (mails larger than 100kB are rejected by the Git
mailing list). Instead, we start with roughly half of the support files.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:17 -07:00
Johannes Schindelin
f3384e7794 t5411: start using the default branch name "main"
This is a straight-forward search-and-replace in the test script;
However, this is not yet complete because it requires many more
replacements in `t/t5411/`, too many for a single patch (the Git mailing
list rejects mails larger than 100kB). For that reason, we disable this
test script temporarily via the `PREPARE_FOR_MAIN_BRANCH` prereq.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:16 -07:00
Junio C Hamano
c8b7c0272a Merge branch 'cm/t7xxx-cleanup'
Micro clean-up.

* cm/t7xxx-cleanup:
  t7102: prepare expected output inside test_expect_* block
  t7201: put each command on a separate line
  t7201: use 'git -C' to avoid subshell
  t7102,t7201: remove whitespace after redirect operator
  t7102,t7201: remove unnecessary blank spaces in test body
  t7101,t7102,t7201: modernize test formatting
2020-10-30 13:04:24 -07:00
Junio C Hamano
a42035fbe4 Merge branch 'ct/t0000-use-test-path-is-file'
Micro clean-up of a test script.

* ct/t0000-use-test-path-is-file:
  t0000: use test_path_is_file instead of "test -f"
2020-10-30 13:04:24 -07:00
Junio C Hamano
678c787c00 Merge branch 'en/t7518-unflake'
Work around flakiness in a test.

* en/t7518-unflake:
  t7518: fix flaky grep invocation
2020-10-30 13:04:23 -07:00
Junio C Hamano
4f9f7c1442 Merge branch 'jk/committer-date-is-author-date-fix' into maint
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.

* jk/committer-date-is-author-date-fix:
  rebase: fix broken email with --committer-date-is-author-date
  am: fix broken email with --committer-date-is-author-date
  t3436: check --committer-date-is-author-date result more carefully
2020-10-29 14:18:47 -07:00
Philippe Blain
e2f89586fa log, show: add tests for messages containing CRLF
A previous commit adjusted the code in ref-filter.c so that messages
containing CRLF are now correctly parsed and displayed.

Add tests to also check that `git log` and `git show` correctly handle
such messages, to prevent futur regressions if these commands are
refactored to use the ref-filter API.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 12:57:45 -07:00
Philippe Blain
9f75ce3d8f ref-filter: handle CRLF at end-of-line more gracefully
The ref-filter code does not correctly handle commit or tag messages
that use CRLF as the line terminator. Such messages can be created with
the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
using `git commit-tree` directly.

The function `find_subpos` in ref-filter.c looks for two consecutive
LFs to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`) being empty.

Moreover, in `copy_subject`, which wants to return the subject as a
single line, '\n' is replaced by space, but '\r' is
untouched.

This impacts the output of `git branch`, `git tag` and `git
for-each-ref`.

This behaviour is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).

Adjust the ref-filter code to be more lenient by hardening the logic in
`copy_subject` and `find_subpos` to correctly parse messages containing
CRLF.

Add a new test script, 't3920-crlf-messages.sh', to test the behaviour
of commands using either the ref-filter or the pretty APIs with messages
using CRLF line endings. The function `test_crlf_subject_body_and_contents`
can be used to test that the `--format` option of `branch`, `tag`,
`for-each-ref`, `log` and `show` correctly displays the subject, body
and raw content of commit and tag messages using CRLF. Test the
output of `branch`, `tag` and `for-each-ref` with such commits.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 12:57:45 -07:00