This is retry of #1419.
I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.
git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.
Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333
Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136
I confirmed this patch passed all tests in t/ with core_fscache=1.
Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.
On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.
Signed-off-by: Karsten Blees <blees@dcon.de>
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.
Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.
Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.
The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.
Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.
Signed-off-by: Karsten Blees <blees@dcon.de>
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.
This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.
Enable read-only sections for 'git status' and preload_index.
Signed-off-by: Karsten Blees <blees@dcon.de>
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.
Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Signed-off-by: Karsten Blees <blees@dcon.de>
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.
Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.
Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).
Signed-off-by: Karsten Blees <blees@dcon.de>
In December 2016 and January 2017, we revamped the Windows-specific
`isatty()` handling, replacing a hack by a more robust solution.
This patch is a follow-up we realized was necessary already in March
2017, but forgot to contribute to core Git yet.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch has been carried by Git for Windows for almost two years. It
makes loading system libraries safer.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We newly handle isatty() by special-casing the stdin/stdout/stderr file
descriptors, caching the return value. However, we missed the case where
dup2() overrides the respective file descriptor.
That poses a problem e.g. where the `show` builtin asks for a pager very
early, the `setup_pager()` function sets the pager depending on the
return value of `isatty()` and then redirects stdout. Subsequently,
`cmd_log_init_finish()` calls `setup_pager()` *again*. What should
happen now is that `isatty()` reports that stdout is *not* a TTY and
consequently stdout should be left alone.
Let's override dup2() to handle this appropriately.
This fixes https://github.com/git-for-windows/git/issues/1077
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When we access IPv6-related functions, we load the corresponding system
library using the `LoadLibrary()` function, which is not the recommended
way to load system libraries.
In practice, it does not make a difference: the `ws2_32.dll` library
containing the IPv6 functions is already loaded into memory, so
LoadLibrary() simply reuses the already-loaded library.
Still, recommended way is recommended way, so let's use that instead.
While at it, also adjust the code in contrib/ that loads system libraries.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch addresses a problem identified in
https://github.com/git-for-windows/git/issues/439: while
cloning/fetching/pushing from "POSIX-ified UNC paths" (i.e. UNC paths
whose backslashes have been converted to forward slashes) works for some
time now, true UNC paths (with backslashes left intact) were handled
incorrectly. Example:
git clone //myserver/folder/repo.git
works, but
git clone \\myserver\folder\repo.git
(in CMD; in Git Bash, the backslashes would need to be doubled) used to
fail. The reason was an unexpected difference in command-line handling
between Win32 executables and MSYS2 ones (such as the shell that is used
by git-clone.exe to spawn git-upload-pack.exe).
This topic branch features a workaround *just* for the case where Git
passes stuff through sh.exe (which covers quite a few use cases,
though).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch allows us to specify absolute paths without the drive
prefix e.g. when cloning.
Example:
C:\Users\me> git clone https://github.com/git/git \upstream-git
This will clone into a new directory C:\upstream-git, in line with how
Windows interprets absolute paths.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The function CreateHardLink is available in all supported Windows
versions (since Windows XP), so there is no more need to resolve it
in runtime.
Helped-by: Max Kirillov <max@max630.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Windows Vista (and later) actually have a working poll(), but we still
cannot use it because it only works on sockets.
So let's detect when we are targeting Windows Vista and undefine those
constants, and define `pollfd` so that we can declare our own pollfd
struct.
We also need to make sure that we override those constants *after*
`winsock2.h` has been `#include`d (otherwise we would not really
override those constants).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This was pull request #1003 from shoelzer/master
poll: Use GetTickCount64 to avoid wraparound issues
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With this topic branch, the PERL5LIB variable is unset to avoid external
settings from interfering with Git's own Perl interpreter.
This branch also cleans up some of our Windows-only config setting code
(and this will need to be rearranged in the next merging rebase so that
the cleanup comes first, and fscache and longPaths support build on
top).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When specifying an absolute path without a drive prefix, we convert that
path internally. Let's make sure that we handle that case properly, too
;-)
This fixes the command
git clone https://github.com/git-for-windows/git \G4W
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This was pull request #1214 from rongjiecomputer/master
Implement pthread_cond_t with Win32 CONDITION_VARIABLE
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows ships with its own Perl interpreter, and insists on
using it, so it will most likely wreak havoc if PERL5LIB is set before
launching Git.
Let's just unset that environment variables when spawning processes.
To make this feature extensible (and overrideable), there is a new
config setting `core.unsetenvvars` that allows specifying a
comma-separated list of names to unset before spawning processes.
Reported by Gabriel Fuhrmann.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The MSYS2 runtime does its best to emulate the command-line wildcard
expansion and de-quoting which would be performed by the calling Unix
shell on Unix systems.
Those Unix shell quoting rules differ from the quoting rules applying to
Windows' cmd and Powershell, making it a little awkward to quote
command-line parameters properly when spawning other processes.
In particular, git.exe passes arguments to subprocesses that are *not*
intended to be interpreted as wildcards, and if they contain
backslashes, those are not to be interpreted as escape characters, e.g.
when passing Windows paths.
Note: this is only a problem when calling MSYS2 executables, not when
calling MINGW executables such as git.exe. However, we do call MSYS2
executables frequently, most notably when setting the use_shell flag in
the child_process structure.
There is no elegant way to determine whether the .exe file to be
executed is an MSYS2 program or a MINGW one. But since the use case of
passing a command line through the shell is so prevalent, we need to
work around this issue at least when executing sh.exe.
Let's introduce an ugly, hard-coded test whether argv[0] is "sh", and
whether it refers to the MSYS2 Bash, to determine whether we need to
quote the arguments differently than usual.
That still does not fix the issue completely, but at least it is
something.
Incidentally, this also fixes the problem where `git clone \\server\repo`
failed due to incorrect handling of the backslashes when handing the path
to the git-upload-pack process.
We need to take care to quote not only whitespace, but also curly
brackets. As aliases frequently go through the MSYS2 Bash, and
as aliases frequently get parameters such as HEAD@{yesterday}, let's
make sure that this does not regress by adding a test case for that.
Helped-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This fixes the compilation, actually, as we still did not make the jump to
post-Windows XP completely: we still compile with _WIN32_WINNT set to
0x0502 (which corresponds to Windows Server 2003 and is technically
greater than Windows XP's 0x0501).
However, GetTickCount64() is only available starting with Windows
Vista/Windows Server 2008.
Let's just lazy-load the function, which should also help Git for Windows
contributors who want to reinstate Windows XP support.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
From Visual Studio 2015 Code Analysis: Warning C28159 Consider using
'GetTickCount64' instead of 'GetTickCount'.
Reason: GetTickCount overflows roughly every 49 days. Code that does not
take that into account can loop indefinitely. GetTickCount64 operates on
64 bit values and does not have that problem.
Signed-off-by: Steve Hoelzer <shoelzer@gmail.com>
`GetLongPathName()` function may fail when it is unable to query
the parent directory of a path component to determine the long name
for that component. It happens, because of it uses `FindFirstFile()`
function for each next short part of path. The `FindFirstFile()`
requires `List Directory` and `Synchronize` desired access for a calling
process.
In case of lacking such permission for some part of path,
the `GetLongPathName()` returns 0 as result and `GetLastError()`
returns ERROR_ACCESS_DENIED.
`GetFinalPathNameByHandle()` function can help in such cases, because
it requires `Read Attributes` and `Synchronize` desired access to the
target path only.
The `GetFinalPathNameByHandle()` function was introduced on
`Windows Server 2008/Windows Vista`. So we need to load it dynamically.
`CreateFile()` parameters:
`lpFileName` = path to the current directory
`dwDesiredAccess` = 0 (it means `Read Attributes` and `Synchronize`)
`dwShareMode` = FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE
(it prevents `Sharing Violation`)
`lpSecurityAttributes` = NULL (default security attributes)
`dwCreationDisposition` = OPEN_EXISTING
(required to obtain a directory handle)
`dwFlagsAndAttributes` = FILE_FLAG_BACKUP_SEMANTICS
(required to obtain a directory handle)
`hTemplateFile` = NULL (when opening an existing file or directory,
`CreateFile` ignores this parameter)
The string that is returned by `GetFinalPathNameByHandle()` function
uses the \\?\ syntax. To skip the prefix and convert backslashes
to slashes, the `normalize_ntpath()` mingw function will be used.
Note: `GetFinalPathNameByHandle()` function returns a final path.
It is the path that is returned when a path is fully resolved.
For example, for a symbolic link named "C:\tmp\mydir" that points to
"D:\yourdir", the final path would be "D:\yourdir".
Signed-off-by: Anton Serbulov <aserbulov@plesk.com>
When switching the current working directory, say, in PowerShell, it is
quite possible to use a different capitalization than the one that is
recorded on disk. While doing the same in `cmd.exe` adjusts the
capitalization magically, that does not happen in PowerShell so that
`getcwd()` returns the current directory in a different way than is
recorded on disk.
Typically this creates no problems except when you call
git log .
in a subdirectory called, say, "GIT/" but you switched to "Git/" and
your `getcwd()` reports the latter, then Git won't understand that you
wanted to see the history as per the `GIT/` subdirectory but it thinks you
wanted to see the history of some directory that may have existed in the
past (but actually never did).
So let's be extra careful to adjust the capitalization of the current
directory before working with it.
Reported by a few PowerShell power users ;-)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When a user is registered in a Windows domain, it is really easy to
obtain the email address. So let's do that.
Suggested by Lutz Roeder.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Win32 CONDITION_VARIABLE has better performance and is easier to
maintain.
Since CONDITION_VARIABLE is not available in Windows XP and below,
old implementation of pthread_cond_t is kept under define guard
'GIT_WIN_XP_SUPPORT'. To enable old implementation, build with
make CFLAGS="-DGIT_WIN_XP_SUPPORT".
Signed-off-by: Loo Rong Jie <loorongjie@gmail.com>
fast-forwarded.
7ebac8cb94 made launching of .exe
externals work when installed in Unicode paths. But it broke launching
of non-.exe externals, no matter where they were installed. We now
correctly maintain the UTF-8 and UTF-16 paths in tandem in lookup_prog.
This fixes t5526, among others.
Signed-off-by: Adam Roben <adam@roben.org>
We do have the excellent GetUserInfoEx() function to obtain more
detailed information of the current user (if the user is part of a
Windows domain); Let's use it.
Suggested by Lutz Roeder.
To avoid the cost of loading Secur32.dll (even lazily, loading DLLs
takes a non-neglibile amount of time), we use the established technique
to load DLLs only when, and if, needed.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
If Git were installed in a path containing non-ASCII characters,
commands such as git-am and git-submodule, which are implemented as
externals, would fail to launch with the following error:
> fatal: 'am' appears to be a git command, but we were not
> able to execute it. Maybe git-am is broken?
This was due to lookup_prog not being Unicode-aware. It was somehow
missed in 2ee5a1a14a.
Note that the only problem in this function was calling
GetFileAttributes instead of GetFileAttributesW. The calls to access()
were fine because access() is a macro which resolves to mingw_access,
which already handles Unicode correctly. But I changed lookup_prog to
use _waccess directly so that we only convert the path to UTF-16 once.
Signed-off-by: Adam Roben <adam@roben.org>
In the Git for Windows project, we have ample precendent for config
settings that apply to Windows, and to Windows only.
Let's formalize this concept by introducing a platform_core_config()
function that can be #define'd in a platform-specific manner.
This will allow us to contain platform-specific code better, as the
corresponding variables no longer need to be exported so that they can
be defined in environment.c and be set in config.c
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Code hygiene improvement for the header files.
* en/incl-forward-decl:
Remove forward declaration of an enum
compat/precompose_utf8.h: use more common include guard style
urlmatch.h: fix include guard
Move definition of enum branch_track from cache.h to branch.h
alloc: make allocate_alloc_state and clear_alloc_state more consistent
Add missing includes and forward declarations
Among the three codepaths we use O_APPEND to open a file for
appending, one used for writing GIT_TRACE output requires O_APPEND
implementation that behaves sensibly when multiple processes are
writing to the same file. POSIX emulation used in the Windows port
has been updated to improve in this area.
* js/mingw-o-append:
mingw: enable atomic O_APPEND
The Windows CRT implements O_APPEND "manually": on write() calls, the
file pointer is set to EOF before the data is written. Clearly, this is
not atomic. And in fact, this is the root cause of failures observed in
t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where
different processes write to the same trace file simultanously; it also
occurred in t5400-send-pack.sh, but there it was worked around in
71406ed4d6 ("t5400: avoid concurrent writes into a trace file",
2017-05-18).
Fortunately, Windows does support atomic O_APPEND semantics using the
file access mode FILE_APPEND_DATA. Provide an implementation that does.
This implementation is minimal in such a way that it only implements
the open modes that are actually used in the Git code base. Emulation
for other modes can be added as necessary later. To become aware of
the necessity early, the unusal error ENOSYS is reported if an
unsupported mode is encountered.
Diagnosed-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 60f487ac0e (Remove common-cmds.h, 2018-05-10), we forgot to adjust
this README when removing the common-cmds.h file.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, strftime() does not silently ignore invalid formats, but
warns about them and then returns 0 and sets errno to EINVAL.
Unfortunately, Git does not expect such a behavior, as it disagrees
with strftime()'s semantics on Linux. As a consequence, Git
misinterprets the return value 0 as "I need more space" and grows the
buffer. As the larger buffer does not fix the format, the buffer grows
and grows and grows until we are out of memory and abort.
Ideally, we would switch off the parameter validation just for
strftime(), but we cannot even override the invalid parameter handler
via _set_thread_local_invalid_parameter_handler() using MINGW because
that function is not declared. Even _set_invalid_parameter_handler(),
which *is* declared, does not help, as it simply does... nothing.
So let's just bite the bullet and override strftime() for MINGW and
abort on an invalid format string. While this does not provide the
best user experience, it is the best we can do.
See https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx for more
details.
This fixes https://github.com/git-for-windows/git/issues/863
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64