Commit Graph

63619 Commits

Author SHA1 Message Date
nalla
21a0cc0a91 asciidoctor: Fix user-manual to be built by asciidoctor
The `user-manual.txt` ist designed as a `book` but the `Makefile` wants to
build it as an `article`. This seems to be a problem when building the
documentation with `asciidoctor`. Furthermore the parts *Git Glossary*
and *Apendix B* had no subsections which is not allowed when building with
`asciidoctor`. So lets add a *dummy* section.

Signed-off-by: nalla <nalla@hamal.uberspace.de>
2016-08-12 16:04:43 +02:00
Karsten Blees
09e5e875c0 Win32: fix 'lstat("dir/")' with long paths
Use a suffciently large buffer to strip the trailing slash.

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:41 +02:00
Karsten Blees
76a09cff53 Win32: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

Note that the test cannot rely on the presence of short names, as they
are not enabled by default except on the system drive.

[jes: adjusted test number to avoid conflicts, reinstated && chain,
adjusted test to work without short names]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:40 +02:00
Johannes Schindelin
f2284e7efb Win32: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

[jes: adjusted test number to avoid conflicts]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:40 +02:00
Doug Kelly
a483b49ee8 Add a test demonstrating a problem with long submodule paths
[jes: adusted test number to avoid conflicts, fixed non-portable use of
the 'export' statement]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:39 +02:00
Karsten Blees
2bef546fb2 fscache: load directories only once
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.

On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:37 +02:00
Karsten Blees
3bc251b038 Win32: add a cache below mingw's lstat and dirent implementations
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.

Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.

Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:36 +02:00
Karsten Blees
52b85760a8 add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:35 +02:00
Karsten Blees
65050d8fd6 Win32: make the lstat implementation pluggable
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.

Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:35 +02:00
Karsten Blees
dc3bacbd22 Win32: Make the dirent implementation pluggable
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.

Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.

Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:34 +02:00
Karsten Blees
8ae2a01fe2 Win32: dirent.c: Move opendir down
Move opendir down in preparation for the next patch.

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:33 +02:00
Karsten Blees
ab0252274b Win32: make FILETIME conversion functions public
Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:33 +02:00
Jakub Bereżański
cf62f1228b wincred: handle empty username/password correctly
Empty (length 0) usernames and/or passwords, when saved in the Windows
Credential Manager, come back as null when reading the credential.

One use case for such empty credentials is with NTLM authentication, where
empty username and password instruct libcurl to authenticate using the
credentials of the currently logged-on user (single sign-on).

When locating the relevant credentials, make empty username match null.
When outputting the credentials, handle nulls correctly.

Signed-off-by: Jakub Bereżański <kuba@berezanscy.pl>
2016-08-12 16:04:30 +02:00
Jakub Bereżański
41baea78e2 t0302: check helper can handle empty credentials
Make sure the helper does not crash when blank username and password is
provided. If the helper can save such credentials, it should be able to
read them back.

Signed-off-by: Jakub Bereżański <kuba@berezanscy.pl>
2016-08-12 16:04:30 +02:00
Sebastian Schuberth
962da8e1db gitk: Use an external icon file on Windows
Git for Windows now ships with the new Git icon from git-scm.com. Use that
icon file if it exists instead of the old procedurally drawn one.

This patch was sent upstream but so far no decision on its inclusion was
made, so commit it to our fork.

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
2016-08-12 16:04:28 +02:00
Chris West (Faux)
a6eae5a66a Fix another invocation of git from gitk with an overly long command-line
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
2016-08-12 16:04:27 +02:00
Johannes Schindelin
a9bcbf61c2 Work around the command line limit on Windows
On Windows, there are dramatic problems when a command line grows
beyond PATH_MAX, which is restricted to 8191 characters on XP and
later (according to http://support.microsoft.com/kb/830473).

Work around this by just cutting off the command line at that length
(actually, at a space boundary) in the hope that only negative
refs are chucked: gitk will then do unnecessary work, but that is
still better than flashing the gitk window and exiting with exit
status 5 (which no Windows user is able to make sense of).

The first fix caused Tcl to fail to compile the regexp, see msysGit issue
427. Here is another fix without using regexp, and using a more relaxed
command line length limit to fix the original issue 387.

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:26 +02:00
Johannes Schindelin
c7a5eb0282 git gui: set GIT_ASKPASS=git-gui--askpass if not set yet
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:24 +02:00
Heiko Voigt
292f3b8465 git-gui: provide question helper for retry fallback on Windows
Make use of the new environment variable GIT_ASK_YESNO to support the
recently implemented fallback in case unlink, rename or rmdir fail for
files in use on Windows. The added dialog will present a yes/no question
to the the user which will currently be used by the windows compat layer
to let the user retry a failed file operation.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
2016-08-12 16:04:23 +02:00
Heiko Voigt
7503aec814 Revert "git-gui: set GIT_DIR and GIT_WORK_TREE after setup"
This reverts commit a9fa11fe5b.
2016-08-12 16:04:23 +02:00
Karsten Blees
9fac19f5d7 git-gui:handle the encoding of Git's output correctly
If we use 'eval exec $opt $cmdp $args' to execute git command,
tcl engine will convert the output of the git comand with the rule
system default code page to unicode.

But cp936 -> unicode conversion implicitly done by exec is not reversible.
So we have to use git_read instead.

Bug report and an original reproducer by Cloud Chou:
https://github.com/msysgit/git/issues/302

Karsten Blees writes this code patch.
Cloud Chou find the reason of the bug.

Thanks-to: dscho
Thanks-to: patthoyts
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Cloud Chou <515312382@qq.com>
Signed-off-by: Cloud Chou <515312382@qq.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:20 +02:00
Karsten Blees
08284c10f6 Unicode file name support (gitk and git-gui)
Assumes file names in git tree objects are UTF-8 encoded.

On most unix systems, the system encoding (and thus the TCL system
encoding) will be UTF-8, so file names will be displayed correctly.

On Windows, it is impossible to set the system encoding to UTF-8. Changing
the TCL system encoding (via 'encoding system ...', e.g. in the startup
code) is explicitly discouraged by the TCL docs.

Change gitk and git-gui functions dealing with file names to always convert
from and to UTF-8.

Signed-off-by: Karsten Blees <blees@dcon.de>
2016-08-12 16:04:20 +02:00
Johannes Schindelin
d45193cfb7 mingw: declare main()'s argv as const
In 84d32bf (sparse: Fix mingw_main() argument number/type errors,
2013-04-27), we addressed problems identified by the 'sparse' tool where
argv was declared inconsistently. The way we addressed it was by casting
from the non-const version to the const-version.

This patch is long overdue, fixing compat/mingw.h's declaration to
make the "argv" parameter const.  This also allows us to lose the
"const" trickery introduced earlier to common-main.c:main().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 16:04:17 +02:00
Jeff King
418c157a24 common-main: call git_setup_gettext()
This should be part of every program, as otherwise users do
not get translated error messages. However, some external
commands forgot to do so (e.g., git-credential-store). This
fixes them, and eliminates the repeated code in programs
that did remember to use it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 16:04:17 +02:00
Jeff King
7d3ce1126d common-main: call restore_sigpipe_to_default()
This is another safety/sanity setup that should be in force
everywhere, but which we only applied in git.c. This did
catch most cases, since even external commands are typically
run via "git ..." (and the restoration applies to
sub-processes, too). But there were cases we missed, such as
somebody calling git-upload-pack directly via ssh, or
scripts which use dashed external commands directly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 16:04:16 +02:00
Jeff King
2131862a3b common-main: call sanitize_stdfds()
This is setup that should be done in every program for
safety, but we never got around to adding it everywhere (so
builtins benefited from the call in git.c, but any external
commands did not). Putting it in the common main() gives us
this safety everywhere.

Note that the case in daemon.c is a little funny. We wait
until we know whether we want to daemonize, and then either:

 - call daemonize(), which will close stdio and reopen it to
   /dev/null under the hood

 - sanitize_stdfds(), to fix up any odd cases

But that is way too late; the point of sanitizing is to give
us reliable descriptors on 0/1/2, and we will already have
executed code, possibly called die(), etc. The sanitizing
should be the very first thing that happens.

With this patch, git-daemon will sanitize first, and can
remove the call in the non-daemonize case. It does mean that
daemonize() may just end up closing the descriptors we
opened, but that's not a big deal (it's not wrong to do so,
nor is it really less optimal than the case where our parent
process redirected us from /dev/null ahead of time).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 16:04:15 +02:00
Jeff King
865812a757 common-main: call git_extract_argv0_path()
Every program which links against libgit.a must call this
function, or risk hitting an assert() in system_path() that
checks whether we have configured argv0_path (though only
when RUNTIME_PREFIX is defined, so essentially only on
Windows).

Looking at the diff, you can see that putting it into the
common main() saves us having to do it individually in each
of the external commands. But what you can't see are the
cases where we _should_ have been doing so, but weren't
(e.g., git-credential-store, and all of the t/helper test
programs).

This has been an accident-waiting-to-happen for a long time,
but wasn't triggered until recently because it involves one
of those programs actually calling system_path(). That
happened with git-credential-store in v2.8.0 with ae5f677
(lazily load core.sharedrepository, 2016-03-11). The
program:

  - takes a lock file, which...

  - opens a tempfile, which...

  - calls adjust_shared_perm to fix permissions, which...

  - lazy-loads the config (as of ae5f677), which...

  - calls system_path() to find the location of
    /etc/gitconfig

On systems with RUNTIME_PREFIX, this means credential-store
reliably hits that assert() and cannot be used.

We never noticed in the test suite, because we set
GIT_CONFIG_NOSYSTEM there, which skips the system_path()
lookup entirely.  But if we were to tweak git_config() to
find /etc/gitconfig even when we aren't going to open it,
then the test suite shows multiple failures (for
credential-store, and for some other test helpers). I didn't
include that tweak here because it's way too specific to
this particular call to be worth carrying around what is
essentially dead code.

The implementation is fairly straightforward, with one
exception: there is exactly one caller (git.c) that actually
cares about the result of the function, and not the
side-effect of setting up argv0_path. We can accommodate
that by simply replacing the value of argv[0] in the array
we hand down to cmd_main().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 16:04:14 +02:00
Jeff King
d69f13b38c add an extra level of indirection to main()
There are certain startup tasks that we expect every git
process to do. In some cases this is just to improve the
quality of the program (e.g., setting up gettext()). In
others it is a requirement for using certain functions in
libgit.a (e.g., system_path() expects that you have called
git_extract_argv0_path()).

Most commands are builtins and are covered by the git.c
version of main(). However, there are still a few external
commands that use their own main(). Each of these has to
remember to include the correct startup sequence, and we are
not always consistent.

Rather than just fix the inconsistencies, let's make this
harder to get wrong by providing a common main() that can
run this standard startup.

We basically have two options to do this:

 - the compat/mingw.h file already does something like this by
   adding a #define that replaces the definition of main with a
   wrapper that calls mingw_startup().

   The upside is that the code in each program doesn't need
   to be changed at all; it's rewritten on the fly by the
   preprocessor.

   The downside is that it may make debugging of the startup
   sequence a bit more confusing, as the preprocessor is
   quietly inserting new code.

 - the builtin functions are all of the form cmd_foo(),
   and git.c's main() calls them.

   This is much more explicit, which may make things more
   obvious to somebody reading the code. It's also more
   flexible (because of course we have to figure out _which_
   cmd_foo() to call).

   The downside is that each of the builtins must define
   cmd_foo(), instead of just main().

This patch chooses the latter option, preferring the more
explicit approach, even though it is more invasive. We
introduce a new file common-main.c, with the "real" main. It
expects to call cmd_main() from whatever other objects it is
linked against.

We link common-main.o against anything that links against
libgit.a, since we know that such programs will need to do
this setup. Note that common-main.o can't actually go inside
libgit.a, as the linker would not pick up its main()
function automatically (it has no callers).

The rest of the patch is just adjusting all of the various
external programs (mostly in t/helper) to use cmd_main().
I've provided a global declaration for cmd_main(), which
means that all of the programs also need to match its
signature. In particular, many functions need to switch to
"const char **" instead of "char **" for argv. This effect
ripples out to a few other variables and functions, as well.

This makes the patch even more invasive, but the end result
is much better. We should be treating argv strings as const
anyway, and now all programs conform to the same signature
(which also matches the way builtins are defined).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 16:04:14 +02:00
Brendan Forster
c58c31832b Add an issue template
Signed-off-by: Brendan Forster <brendan@github.com>
2016-08-12 16:04:11 +02:00
Johannes Schindelin
3136f2c7f3 README.md: Add a Windows-specific preamble
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:11 +02:00
Johannes Schindelin
975131d26a Add a Code of Conduct
It is better to state clearly expectations and intentions than to assume
quietly that everybody agrees.

This Code of Conduct is the Open Code of Conduct as per
http://todogroup.org/opencodeofconduct/ (the only modifications are the
adjustments to reflect that there is no "response team" in addition to the
Git for Windows maintainer, and the addition of the link to the Open Code
of Conduct itself).

[Completely revamped, based on the Covenant 1.4 by Brendan Forster]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 16:04:10 +02:00
Johannes Schindelin
207577383b Start the merging-rebase to junio/maint
This commit starts the rebase of 7b3d14e to 8e4b75a
2016-08-12 16:04:08 +02:00
Johannes Schindelin
cf117a2558 status: offer *not* to lock the index and update it
When a third-party tool periodically runs `git status` in order to keep
track of the state of the working tree, it is a bad idea to lock the
index: it might interfere with interactive commands executed by the
user, e.g. when the user wants to commit files.

Let's introduce the option `--no-lock-index` to prevent such problems.
The idea is that the third-party tool calls `git status` with this
option, preventing it from ever updating the index.

The downside is that the periodic `git status` calls will be a little
bit more wasteful because they may have to refresh the index repeatedly,
only to throw away the updates when it exits. This cannot really be
helped, though, as tools wanting to get a periodic update of the status
have no way to predict when the user may want to lock the index herself.

Note that the regression test added in this commit does not *really*
verify that no index.lock file was written; that test is not possible in
a portable way. Instead, we verify that .git/index is rewritten *only*
when `git status` is run without `--no-lock-index`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-08-12 11:10:24 +02:00
Kevin Willford
b3dfeebb92 rebase: avoid computing unnecessary patch IDs
The `rebase` family of Git commands avoid applying patches that were
already integrated upstream. They do that by using the revision walking
option that computes the patch IDs of the two sides of the rebase
(local-only patches vs upstream-only ones) and skipping those local
patches whose patch ID matches one of the upstream ones.

In many cases, this causes unnecessary churn, as already the set of
paths touched by a given commit would suffice to determine that an
upstream patch has no local equivalent.

This hurts performance in particular when there are a lot of upstream
patches, and/or large ones.

Therefore, let's introduce the concept of a "diff-header-only" patch ID,
compare those first, and only evaluate the "full" patch ID lazily.

Please note that in contrast to the "full" patch IDs, those
"diff-header-only" patch IDs are prone to collide with one another, as
adjacent commits frequently touch the very same files. Hence we now
have to be careful to allow multiple hash entries with the same hash.
We accomplish that by using the hashmap_add() function that does not even
test for hash collisions.  This also allows us to evaluate the full patch ID
lazily, i.e. only when we found commits with matching diff-header-only
patch IDs.

We add a performance test that demonstrates ~1-6% improvement.  In
practice this will depend on various factors such as how many upstream
changes and how big those changes are along with whether file system
caches are cold or warm.  As Git's test suite has no way of catching
performance regressions, we also add a regression test that verifies
that the full patch ID computation is skipped when the diff-header-only
computation suffices.

Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 14:39:16 -07:00
Ville Skyttä
2e3a16b279 Spelling fixes
<BAD>                     <CORRECTED>
    accidently                accidentally
    commited                  committed
    dependancy                dependency
    emtpy                     empty
    existance                 existence
    explicitely               explicitly
    git-upload-achive         git-upload-archive
    hierachy                  hierarchy
    indegee                   indegree
    intial                    initial
    mulitple                  multiple
    non-existant              non-existent
    precendence.              precedence.
    priviledged               privileged
    programatically           programmatically
    psuedo-binary             pseudo-binary
    soemwhere                 somewhere
    successfull               successful
    transfering               transferring
    uncommited                uncommitted
    unkown                    unknown
    usefull                   useful
    writting                  writing

Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 14:35:42 -07:00
Jeff King
07e7dbf0db gc: default aggressive depth to 50
This commit message is long and has lots of background and
numbers. The summary is: the current default of 250 doesn't
save much space, and costs CPU. It's not a good tradeoff.
Read on for details.

The "--aggressive" flag to git-gc does three things:

  1. use "-f" to throw out existing deltas and recompute from
     scratch

  2. use "--window=250" to look harder for deltas

  3. use "--depth=250" to make longer delta chains

Items (1) and (2) are good matches for an "aggressive"
repack. They ask the repack to do more computation work in
the hopes of getting a better pack. You pay the costs during
the repack, and other operations see only the benefit.

Item (3) is not so clear. Allowing longer chains means fewer
restrictions on the deltas, which means potentially finding
better ones and saving some space. But it also means that
operations which access the deltas have to follow longer
chains, which affects their performance. So it's a tradeoff,
and it's not clear that the tradeoff is even a good one.

The existing "250" numbers for "--aggressive" come
originally from this thread:

  http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/

where Linus says:

  So when I said "--depth=250 --window=250", I chose those
  numbers more as an example of extremely aggressive
  packing, and I'm not at all sure that the end result is
  necessarily wonderfully usable. It's going to save disk
  space (and network bandwidth - the delta's will be re-used
  for the network protocol too!), but there are definitely
  downsides too, and using long delta chains may
  simply not be worth it in practice.

There are some numbers in that thread, but they're mostly
focused on the improved window size, and measure the
improvement from --depth=250 and --window=250 together.
E.g.:

  http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/

talks about the improved run-time of "git-blame", which
comes from the reduced pack size. But most of that reduction
is coming from --window=250, whereas most of the extra costs
come from --depth=250. There's a link in that thread showing
that increasing the depth beyond 50 doesn't seem to help
much with the size:

  https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html

but again, no discussion of the timing impact.

In an earlier thread from Ted Ts'o which discussed setting
the non-aggressive default (from 10 to 50):

  http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/

we have more numbers, with the conclusion that going past 50
does not help size much, and hurts the speed of normal
operations.

So from that, we might guess that 50 is actually a sweet
spot, even for aggressive, if we interpret aggressive to
"spend time now to make a better pack". It is not clear that
"--depth=250" is actually a better pack. It may be slightly
_smaller_, but it carries a run-time penalty.

Here are some more recent timings I did to verify that. They
show three things:

  - the size of the resulting pack (so disk saved to store,
    bandwidth saved on clones/fetches)

  - the cost of "rev-list --objects --all", which shows the
    effect of the delta chains on trees (commits typically
    don't delta, and the command doesn't touch the blobs at
    all)

  - the cost of "log -Sfoo", which will additionally access
    each blob

All cases were repacked with "git repack -adf --depth=$d
--window=250" (so basically, what would happen if we tweaked
the "gc --aggressive" default depth).

The timings are all wall-clock best-of-3. The machine itself
has plenty of RAM compared to the repositories (which is
probably typical of most workstations these days), so we're
really measuring CPU usage, as the whole thing will be in
disk cache after the first run.

The core.deltaBaseCacheLimit is at its default of 96MiB.
It's possible that tweaking it would have some impact on the
tests, as some of them (especially "log -S" on a large repo)
are likely to overflow that. But bumping that carries a
run-time memory cost, so for these tests, I focused on what
we could do just with the on-disk pack tradeoffs.

Each test is done for four depths: 250 (the current value),
50 (the current default that tested well previously), 100
(to show something on the larger side, which previous tests
showed was not a good tradeoff), and 10 (the very old
default, which previous tests showed was worse than 50).

Here are the numbers for linux.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  | 967MB |  n/a  | 48.159s  |   n/a  | 378.088   |   n/a
    100  | 971MB | +0.4% | 41.471s  | -13.9% | 342.060   |  -9.5%
     50  | 979MB | +1.2% | 37.778s  | -21.6% | 311.040s  | -17.7%
     10  | 1.1GB | +6.6% | 32.518s  | -32.5% | 279.890s  | -25.9%

and for git.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  |  48MB |  n/a  |  2.215s  |   n/a  |  20.922s  |   n/a
    100  |  49MB | +0.5% |  2.140s  |  -3.4% |  17.736s  | -15.2%
     50  |  49MB | +1.7% |  2.099s  |  -5.2% |  15.418s  | -26.3%
     10  |  53MB | +9.3% |  2.001s  |  -9.7% |  12.677s  | -39.4%

You can see that that the CPU savings for regular operations improves as we
decrease the depth. The savings are less for "rev-list" on a smaller repository
than they are for blob-accessing operations, or even rev-list on a larger
repository. This may mean that a larger delta cache would help (though setting
core.deltaBaseCacheLimit by itself doesn't).

But we can also see that the space savings are not that great as the depth goes
higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
probably not.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:53:19 -07:00
Jeff Hostetler
b249e39f99 test-lib-functions.sh: add lf_to_nul helper
Add lf_to_nul helper function to test-lib-functions.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:16:13 -07:00
Jeff Hostetler
1cd828ddc8 git-status.txt: describe --porcelain=v2 format
Update status manpage to include information about
porcelain v2 format.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:15:56 -07:00
Jeff Hostetler
d9fc746cd7 status: print branch info with --porcelain=v2 --branch
Expand porcelain v2 output to include branch and tracking
branch information. This includes the commit id, the branch,
the upstream branch, and the ahead and behind counts.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:15:40 -07:00
Jeff Hostetler
24959bad5d status: print per-file porcelain v2 status data
Print per-file information in porcelain v2 format.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:15:22 -07:00
Jeff Hostetler
1ecdecce62 status: collect per-file data for --porcelain=v2
Collect extra per-file data for porcelain V2 format.

The output of `git status --porcelain` leaves out many
details about the current status that clients might like
to have.  This can force them to be less efficient as they
may need to launch secondary commands (and try to match
the logic within git) to accumulate this extra information.
For example, a GUI IDE might want the file mode to display
the correct icon for a changed item (without having to stat
it afterwards).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:14:43 -07:00
Junio C Hamano
a42d7b6a5b Sync with maint
* maint:
  Yet another batch for 2.9.3
2016-08-10 12:38:02 -07:00
Junio C Hamano
27b0ea4038 Twelfth batch for 2.10
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-10 12:37:53 -07:00
Junio C Hamano
11b53957ac Merge branch 'sb/submodule-update-dot-branch'
A few updates to "git submodule update".

Use of "| wc -l" break with BSD variant of 'wc'.

* sb/submodule-update-dot-branch:
  t7406: fix breakage on OSX
  submodule update: allow '.' for branch value
  submodule--helper: add remote-branch helper
  submodule-config: keep configured branch around
  submodule--helper: fix usage string for relative-path
  submodule update: narrow scope of local variable
  submodule update: respect depth in subsequent fetches
  t7406: future proof tests with hard coded depth
2016-08-10 12:33:20 -07:00
Junio C Hamano
1a5f1a3f25 Merge branch 'js/am-3-merge-recursive-direct'
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.

* js/am-3-merge-recursive-direct:
  merge-recursive: flush output buffer even when erroring out
  merge_trees(): ensure that the callers release output buffer
  merge-recursive: offer an option to retain the output in 'obuf'
  merge-recursive: write the commit title in one go
  merge-recursive: flush output buffer before printing error messages
  am -3: use merge_recursive() directly again
  merge-recursive: switch to returning errors instead of dying
  merge-recursive: handle return values indicating errors
  merge-recursive: allow write_tree_from_memory() to error out
  merge-recursive: avoid returning a wholesale struct
  merge_recursive: abort properly upon errors
  prepare the builtins for a libified merge_recursive()
  merge-recursive: clarify code in was_tracked()
  die(_("BUG")): avoid translating bug messages
  die("bug"): report bugs consistently
  t5520: verify that `pull --rebase` shows the helpful advice when failing
2016-08-10 12:33:20 -07:00
Junio C Hamano
7a3ea66633 Merge branch 'js/commit-slab-decl-fix'
* js/commit-slab-decl-fix:
  commit-slab.h: avoid duplicated global static variables
  config.c: avoid duplicated global static variables
2016-08-10 12:33:20 -07:00
Junio C Hamano
483ca933f8 Merge branch 'jk/completion-diff-submodule'
* jk/completion-diff-submodule:
  completion: add completion for --submodule=* diff option
2016-08-10 12:33:19 -07:00
Junio C Hamano
2dceb92231 Merge branch 'cc/mailmap-tuxfamily'
* cc/mailmap-tuxfamily:
  .mailmap: use Christian Couder's Tuxfamily address
2016-08-10 12:33:18 -07:00
Junio C Hamano
db40a62239 Merge branch 'jt/format-patch-from-config'
"git format-patch" learned format.from configuration variable to
specify the default settings for its "--from" option.

* jt/format-patch-from-config:
  format-patch: format.from gives the default for --from
2016-08-10 12:33:18 -07:00
Junio C Hamano
e674762786 Merge branch 'jk/push-force-with-lease-creation'
"git push --force-with-lease" already had enough logic to allow
ensuring that such a push results in creation of a ref (i.e. the
receiving end did not have another push from sideways that would be
discarded by our force-pushing), but didn't expose this possibility
to the users.  It does so now.

* jk/push-force-with-lease-creation:
  t5533: make it pass on case-sensitive filesystems
  push: allow pushing new branches with --force-with-lease
  push: add shorthand for --force-with-lease branch creation
  Documentation/git-push: fix placeholder formatting
2016-08-10 12:33:18 -07:00