Use OpenSSL's SHA-1 routines rather than builtin block-sha1 routines.
This improves performance on SHA1 operations on Intel processors.
OpenSSL 1.0.2 has made considerable performance improvements and
support the Intel hardware acceleration features. See:
https://software.intel.com/en-us/articles/improving-openssl-performancehttps://software.intel.com/en-us/articles/intel-sha-extensions
To test this I added/staged a single file in a gigantic
repository having a 450MB index file. The code in read-cache.c
verifies the header SHA as it reads the index and computes a new
header SHA as it writes out the new index. Therefore, in this test
the SHA code must process 900MB of data. Testing was done on an
Intel I7-4770 CPU @ 3.40GHz (Intel64, Family 6, Model 60) CPU.
The block-sha1 version averaged 5.27 seconds.
The OpenSSL version averaged 4.50 seconds.
================================================================
$ echo xxx >> project.mk
$ time /e/blk_sha/bin/git.exe add project.mk
real 0m5.207s
user 0m0.000s
sys 0m0.250s
$ echo xxx >> project.mk
$ time /e/blk_sha/bin/git.exe add project.mk
real 0m5.362s
user 0m0.015s
sys 0m0.234s
$ echo xxx >> project.mk
$ time /e/blk_sha/bin/git.exe add project.mk
real 0m5.300s
user 0m0.016s
sys 0m0.250s
$ echo xxx >> project.mk
$ time /e/blk_sha/bin/git.exe add project.mk
real 0m5.216s
user 0m0.000s
sys 0m0.250s
================================================================
$ echo xxx >> project.mk
$ time /e/openssl/bin/git.exe add project.mk
real 0m4.431s
user 0m0.000s
sys 0m0.250s
$ echo xxx >> project.mk
$ time /e/openssl/bin/git.exe add project.mk
real 0m4.478s
user 0m0.000s
sys 0m0.265s
$ echo xxx >> project.mk
$ time /e/openssl/bin/git.exe add project.mk
real 0m4.690s
user 0m0.000s
sys 0m0.250s
$ echo xxx >> project.mk
$ time /e/openssl/bin/git.exe add project.mk
real 0m4.420s
user 0m0.000s
sys 0m0.234s
================================================================
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The "giteveryday" document has a callout list that contains a code
block. This is not a problem for AsciiDoc, but AsciiDoctor sadly was
explicitly designed *not* to render this correctly [*1*]. The symptom is
an unhelpful
line 322: callout list item index: expected 1 got 12
line 325: no callouts refer to list item 1
line 325: callout list item index: expected 2 got 13
line 327: no callouts refer to list item 2
In Git for Windows, we rely on the speed improvement of AsciiDoctor (on
this developer's machine, `make -j15 html` takes roughly 30 seconds with
AsciiDoctor, 70 seconds with AsciiDoc), therefore we need a way to
render this correctly.
The easiest way out is to simplify the callout list, as suggested by
AsciiDoctor's author, even while one may very well disagree with him
that a code block hath no place in a callout list.
*1*: https://github.com/asciidoctor/asciidoctor/issues/1478
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With this topic branch, the PERL5LIB variable is unset to avoid external
settings from interfering with Git's own Perl interpreter.
This branch also cleans up some of our Windows-only config setting code
(and this will need to be rearranged in the next merging rebase so that
the cleanup comes first, and fscache and longPaths support build on
top).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `user-manual.txt` is designed as a `book` but the `Makefile` wants
to build it as an `article`. This seems to be a problem when building
the documentation with `asciidoctor`. Furthermore the parts *Git
Glossary* and *Appendix B* had no subsections which is not allowed when
building with `asciidoctor`. So lets add a *dummy* section.
Signed-off-by: nalla <nalla@hamal.uberspace.de>
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).
Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.
Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).
Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).
Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).
Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.
Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.
While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.
Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199
Note that the test cannot rely on the presence of short names, as they
are not enabled by default except on the system drive.
[jes: adjusted test number to avoid conflicts, reinstated && chain,
adjusted test to work without short names]
Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).
Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.
Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).
Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).
Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).
Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.
Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.
While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.
Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199
[jes: adjusted test number to avoid conflicts]
Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jes: adusted test number to avoid conflicts, fixed non-portable use of
the 'export' statement]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.
On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.
Signed-off-by: Karsten Blees <blees@dcon.de>
Git for Windows ships with its own Perl interpreter, and insists on
using it, so it will most likely wreak havoc if PERL5LIB is set before
launching Git.
Let's just unset that environment variables when spawning processes.
To make this feature extensible (and overrideable), there is a new
config setting `core.unsetenvvars` that allows specifying a
comma-separated list of names to unset before spawning processes.
Reported by Gabriel Fuhrmann.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.
Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.
Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.
The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.
Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.
Signed-off-by: Karsten Blees <blees@dcon.de>
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.
This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.
Enable read-only sections for 'git status' and preload_index.
Signed-off-by: Karsten Blees <blees@dcon.de>
In the Git for Windows project, we have ample precendent for config
settings that apply to Windows, and to Windows only.
Let's formalize this concept by introducing a platform_core_config()
function that can be #define'd in a platform-specific manner.
This will allow us to contain platform-specific code better, as the
corresponding variables no longer need to be exported so that they can
be defined in environment.c and be set in config.c
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.
Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Signed-off-by: Karsten Blees <blees@dcon.de>
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.
Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.
Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).
Signed-off-by: Karsten Blees <blees@dcon.de>
l10n-2.11.0-rnd3.1: update ru and ca translations
* tag 'l10n-2.11.0-rnd3.1' of git://github.com/git-l10n/git-po:
l10n: ru.po: update Russian translation
l10n: ca.po: update translation
Since 650c44925 (common-main: call git_extract_argv0_path(),
2016-07-01), the argv[0] that is seen in cmd_main() of
individual programs is always the basename of the
executable, as common-main strips off the full path. This
can produce confusing results for git-daemon, which wants to
re-exec itself.
For instance, if the program was originally run as
"/usr/lib/git/git-daemon", it will try just re-execing
"git-daemon", which will find the first instance in $PATH.
If git's exec-path has not been prepended to $PATH, we may
find the git-daemon from a different version (or no
git-daemon at all).
Normally this isn't a problem. Git commands are run as "git
daemon", the git wrapper puts the exec-path at the front of
$PATH, and argv[0] is already "daemon" anyway. But running
git-daemon via its full exec-path, while not really a
recommended method, did work prior to 650c44925. Let's make
it work again.
The real goal of 650c44925 was not to munge argv[0], but to
reliably set the argv0_path global. The only reason it
munges at all is that one caller, the git.c wrapper,
piggy-backed on that computation to find the command
basename. Instead, let's leave argv[0] untouched in
common-main, and have git.c do its own basename computation.
While we're at it, let's drop the return value from
git_extract_argv0_path(). It was only ever used in this one
callsite, and its dual purposes is what led to this
confusion in the first place.
Note that by changing the interface, the compiler can
confirm for us that there are no other callers storing the
return value. But the compiler can't tell us whether any of
the cmd_main() functions (besides git.c) were relying on the
basename munging. However, we can observe that prior to
650c44925, no other cmd_main() functions did that munging,
and no new cmd_main() functions have been introduced since
then. So we can't be regressing any of those cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
l10n-2.11.0-rnd3
* tag 'l10n-2.11.0-rnd3' of git://github.com/git-l10n/git-po:
l10n: de.po: translate 210 new messages
l10n: fix unmatched single quote in error message
Translate one message introduced by commit:
* 358718064b i18n: fix unmatched single quote in error message
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
This topic branch brings the new, experimental builtin version of the
difftool into Git for Windows' master branch.
It still hands off to the legacy Perl script unless the feature flag is
flipped: only when the config setting difftool.useBuiltin is set to true
will `git difftool` actually use the experimental builtin. The idea is to
play it safe for the majority of users, but to allow heavy difftool users
to test early and to help make the builtin robust, before we actually
retire the Perl script.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach hash_dir_entry() to remember the previously found dir_entry
during lazy_init_name_hash() iteration. This is a performance
optimization. Since items in the index array are sorted by full
pathname, adjacent items are likely to be in the same directory.
This can save memihash() computations and HashMap lookups.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Specify an initial size for the istate.dir_hash HashMap matching
the size of the istate.name_hash.
Previously hashmap_init() was given 0, causing a 64 bucket
hashmap to be created. When working with very large
repositories, this would cause numerous rehash() calls to
realloc and rebalance the hashmap. This is especially true
when the worktree is deep, with many directories containing
a few files.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Precompute the istate.name_hash and istate.dir_hash values
for each cache-entry during the preload-index phase.
Move the expensive memihash() calculations from lazy_init_name_hash()
to the multi-threaded preload-index phase.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add variant of memihash() to allow the hash computation to
be continued. There are times when we compute the hash on
a full path and then the hash on just the path to the parent
directory. This can be expensive on large repositories.
With this, we can hash the parent directory first. And then
continue the computation to include the "/filename".
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Remove duplicate memihash() call in hash_dir_entry().
The existing code called memihash() to do the find_dir_entry()
and it not found, called memihash() again to do the hashmap_add().
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach preload-index to avoid lstat() calls for index-entries
with skip-worktree bit set. This is a performance optimization.
During a sparse-checkout, the skip-worktree bit is set on items
that were not populated and therefore are not present in the
worktree. The per-thread preload-index loop performs a series
of tests on each index-entry as it attempts to compare the
worktree version with the index and mark them up-to-date.
This patch short-cuts that work.
On a Windows 10 system with a very large repo (450MB index)
and various levels of sparseness, performance was improved
in the {preloadindex=true, fscache=false} case by 80% and
in the {preloadindex=true, fscache=true} case by 20% for various
commands.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When using cvsnt + msys + git, it seems like the output of cvs status
had \r\n in it, and caused the command to fail.
This fixes that.
Signed-off-by: Dustin Spicuzza <dustin@virtualroadside.com>
This topic branch adds the (experimental) --stdin/-z options to `git
reset`. Those patches are still under review in the upstream Git project,
but are already merged in their experimental form into Git for Windows'
`master` branch, in preparation for a MinGit-only release.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch works around an out-of-memory bug when the user
specified a format via --date=format:<format> that strftime() does
not like.
Reported by Stefan Naewe.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With the recent update in efee955 (gpg-interface: check gpg signature
creation status, 2016-06-17), we ask GPG to send all status updates to
stderr, and then catch the stderr in an strbuf.
But GPG might fail, and send error messages to stderr. And we simply
do not show them to the user.
Even worse: this swallows any interactive prompt for a passphrase. And
detaches stderr from the tty so that the passphrase cannot be read.
So while the first problem could be fixed (by printing the captured
stderr upon error), the second problem cannot be easily fixed, and
presents a major regression.
So let's just revert commit efee9553a4.
This fixes https://github.com/git-for-windows/git/issues/871
Cc: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>