The recent change to make fscache thread specific relied on fscache_enable()
being called first from the primary thread before being called in parallel
from worker threads. Make that more robust and protect it with a critical
section to avoid any issues.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
The Makefile target `install-mingit-test-artifacts` simply copies stuff
and things directly into a MinGit directory, including an init.bat
script to set everything up so that the tests can be run in a cmd
window.
Sadly, Git's test suite still relies on a Perl interpreter even if
compiled with NO_PERL=YesPlease. We punt for now, installing a small
script into /usr/bin/perl that hands off to an existing Perl of a Git
for Windows SDK.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In the BusyBox-w32 version that is currently under consideration for
MinGit for Windows (to reduce the .zip size, and to avoid problems with
the MSYS2 runtime), the UTF-16 environment present in Windows is
considered to be authoritative, and the 8-bit version is always in UTF-8
encoding.
As a consequence, the ISO-8859-1 test in t9350-fast-export (which tries
to set GIT_AUTHOR_NAME to a ISO-8859-1 encoded value) *must* fail in
that setup.
So let's detect when it would fail (due to an environment being purely
kept UTF-8 encoded), and skip that test in that case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, the current working directory is pretty much guaranteed to
contain a colon. If we feed that path to CVS, it mistakes it for a
separator between host and port, though.
This has not been a problem so far because Git for Windows uses MSYS2's
Bash using a POSIX emulation layer that also pretends that the current
directory is a Unix path (at least as long as we're in a shell script).
However, that is rather limiting, as Git for Windows also explores other
ports of other Unix shells. One of those is BusyBox-w32's ash, which is
a native port (i.e. *not* using any POSIX emulation layer, and certainly
not emulating Unix paths).
So let's just detect if there is a colon in $PWD and punt in that case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
BusyBox' find implementation does not understand the -ls option, so
let's not use it when we're running inside BusyBox.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows uses MSYS2's Bash to run the test suite, which comes
with benefits but also at a heavy price: on the plus side, MSYS2's
POSIX emulation layer allows us to continue pretending that we are on a
Unix system, e.g. use Unix paths instead of Windows ones, yet this is
bought at a rather noticeable performance penalty.
There *are* some more native ports of Unix shells out there, though,
most notably BusyBox-w32's ash. These native ports do not use any POSIX
emulation layer (or at most a *very* thin one, choosing to avoid
features such as fork() that are expensive to emulate on Windows), and
they use native Windows paths (usually with forward slashes instead of
backslashes, which is perfectly legal in almost all use cases).
And here comes the problem: with a $PWD looking like, say,
C:/git-sdk-64/usr/src/git/t/trash directory.t5813-proto-disable-ssh
Git's test scripts get quite a bit confused, as their assumptions have
been shattered. Not only does this path contain a colon (oh no!), it
also does not start with a slash.
This is a problem e.g. when constructing a URL as t5813 does it:
ssh://remote$PWD. Not only is it impossible to separate the "host" from
the path with a $PWD as above, even prefixing $PWD by a slash won't
work, as /C:/git-sdk-64/... is not a valid path.
As a workaround, detect when $PWD does not start with a slash on
Windows, and simply strip the drive prefix, using an obscure feature of
Windows paths: if an absolute Windows path starts with a slash, it is
implicitly prefixed by the drive prefix of the current directory. As we
are talking about the current directory here, anyway, that strategy
works.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When t5605 tries to verify that files are hardlinked (or that they are
not), it uses the `-links` option of the `find` utility.
BusyBox' implementation does not support that option, and BusyBox-w32's
lstat() does not even report the number of hard links correctly (for
performance reasons).
So let's just switch to a different method that actually works on
Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While it may seem super convenient to some old Unix hands to simpy
require Perl to be available when running the test suite, this is a
major hassle on Windows, where we want to verify that Perl is not,
actually, required in a NO_PERL build.
As a super ugly workaround, we "install" a script into /usr/bin/perl
reading like this:
#!/bin/sh
# We'd much rather avoid requiring Perl altogether when testing
# an installed Git. Oh well, that's why we cannot have nice
# things.
exec c:/git-sdk-64/usr/bin/perl.exe "$@"
The problem with that is that BusyBox assumes that the #! line in a
script refers to an executable, not to a script. So when it encounters
the line #!/usr/bin/perl in t5532's proxy-get-cmd, it barfs.
Let's help this situation by simply executing the Perl script with the
"interpreter" specified explicitly.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
BusyBox' unzip is working pretty well. But Git's tests want to abuse it
to not only extract files, but to convert their line endings on the fly,
too. BusyBox' unzip does not support that, and it would appear that
it would require rather intrusive changes.
So let's just work around this by skipping the test case that uses
`unzip -a` and the subsequent test cases expecting `unzip -a`'s output.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
At some stage, t5003-archive-zip wants to add a file that is not ASCII.
To that end, it uses /bin/sh. But that file may actually not exist (it
is too easy to forget that not all the world is Unix/Linux...)! Besides,
we already have perfectly fine binary files intended for use solely by
the tests. So let's use one of them instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Everybody and their dogs, cats and other pets settled on using unified
diffs. It is a really quaint holdover from a long-gone era that GNU diff
outputs "normal" diff by default.
Yet, t4124 relied on that mode.
This mode is so out of fashion in the meantime, though, that e.g.
BusyBox' diff decided not even to bother to support it. It only supports
unified diffs.
So let's just switch away from "normal" diffs and use unified diffs, as
we really are only interested in the `+` lines.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, it is impossible to create a file whose name contains a
quote character. We already excluded test cases using such files from
running on Windows when git.exe itself was tested.
However, we still had two test cases that try to create such a file, and
redirect stdin from such a file, respectively. This *seems* to work in
Git for Windows' Bash due to an obscure feature inherited from Cygwin:
illegal filename characters are simply mapped into/from a private UTF-8
page. Pure Win32 programs (such as git.exe) *still* cannot work with
those files, of course, but at least Unix shell scripts pretend to be
able to.
This entire strategy breaks down when switching to any Unix shell
lacking support for that private UTF-8 page trick, e.g. BusyBox-w32's
ash. So let's just exclude test cases that test whether the Unix shell
can redirect to/from files with "funny names" those from running on
Windows, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Since c6b0831c9c (docs: warn about possible '=' in clean/smudge filter
process values, 2016-12-03), t0021 writes out a file with quotes in its
name, and MSYS2's path conversion heuristics mistakes that to mean that
we are not talking about a path here.
Therefore, we need to use Windows paths, as the test-helper is a Win32
program that would otherwise have no idea where to look for the file.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When running with BusyBox, we will want to avoid calling executables on
the PATH that are implemented in BusyBox itself.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The -W option is only understood by MSYS2 Bash's pwd command. We already
make sure to override `pwd` by `builtin pwd -W` for MINGW, so let's not
double the effort here.
This will also help when switching the shell to another one (such as
BusyBox' ash) whose pwd does *not* understand the -W option.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Traditionally, Git for Windows' SDK uses Bash as its default shell.
However, other Unix shells are available, too. Most notably, the Win32
port of BusyBox comes with `ash` whose `pwd` command already prints
Windows paths as Git for Windows wants them, while there is not even a
`builtin` command.
Therefore, let's be careful not to override `pwd` unless we know that
the `builtin` command is available.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
BusyBox-w32 is a true Win32 application, i.e. it does not come with a
POSIX emulation layer.
That also means that it does *not* use the Unix convention of separating
the entries in the PATH variable using colons, but semicolons.
However, there are also BusyBox ports to Windows which use a POSIX
emulation layer such as Cygwin's or MSYS2's runtime, i.e. using colons
as PATH separators.
As a tell-tale, let's use the presence of semicolons in the PATH
variable: on Unix, it is highly unlikely that it contains semicolons,
and on Windows (without POSIX emulation), it is virtually guaranteed, as
everybody should have both $SYSTEMROOT and $SYSTEMROOT/system32 in their
PATH.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The idea is to allow running the test suite on MinGit with BusyBox
installed in /mingw64/bin/sh.exe. In that case, we will want to exclude
sort & find (and other Unix utilities) from being bundled.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We already have a directory where we store files intended for use by
multiple test scripts. The same directory is a better home for the
test-binary-*.png files than t/.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The idea of copying README and COPYING into t/diff-lib/ was to step away
from using files from outside t/ in tests. Let's really make sure that
we use the files from t/diff-lib/ instead of other versions of those
files.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It is convenient to assume that everybody who wants to build & test Git
has access to a working `iconv` executable (after all, we already pretty
much require libiconv).
However, that limits esoteric test scenarios such as Git for Windows',
where an end user installation has to ship with `iconv` for the sole
purpose of being testable. That payload serves no other purpose.
So let's just have a test helper (to be able to test Git, the test
helpers have to be available, after all) to act as `iconv` replacement.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This helper is slightly more performant than the script with MSYS2's
Bash. And a lot more readable.
To accommodate t1050, which wants to compare files weighing in with 3MB
(falling outside of t1050's malloc limit of 1.5MB), we simply lift the
allocation limit by setting the environment variable GIT_ALLOC_LIMIT to
zero when calling the helper.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It is a bit strange, and even undesirable, to require Perl just to run
the test suite even when NO_PERL was set.
This patch does not fix this problem by any stretch of imagination.
However, it fixes *the* Perl invocation that *every single* test script
has to run.
While at it, it makes the source code also more grep'able, as the code
that unsets some, but not all, GIT_* environment variables just became a
*lot* more explicit. And all that while still reducing the total number
of lines.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Instead of relying on the presence of `make`, or `prove`, we might just
as well use our own facilities to run the test suite.
This helps e.g. when trying to verify a Git for Windows installation
without requiring to download a full Git for Windows SDK (which would use
up 600+ megabytes of bandwidth, and over a gigabyte of disk space).
Of course, it still requires the test helpers to be build *somewhere*,
and the Git version should at least roughly match the version from which
the test suite comes.
At the same time, this new way to run the test suite allows to validate
that a BusyBox-backed MinGit works as expected (verifying that BusyBox'
functionality is enough to at least pass the test suite).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
BusyBox comes with a ton of applets ("applet" being the identical
concept to Git's "builtins"). And similar to Git's builtins, the applets
can be called via `busybox <command>`, or the BusyBox executable can be
copied/hard-linked to the command name.
The similarities do not end here. Just as with Git's builtins, it is
problematic that BusyBox' hard-linked applets cannot easily be put into
a .zip file: .zip archives have no concept of hard-links and therefore
would store identical copies (and also extract identical copies,
"inflating" the archive unnecessarily).
To counteract that issue, MinGit already ships without hard-linked
copies of the builtins, and the plan is to do the same with BusyBox'
applets: simply ship busybox.exe as single executable, without
hard-linked applets.
To accommodate that, Git is being taught by this commit a very special
trick, exploiting the fact that it is possible to call an executable
with a command-line whose argv[0] is different from the executable's
name: when `sh` is to be spawned, and no `sh` is found in the PATH, but
busybox.exe is, use that executable (with unchanged argv).
Likewise, if any executable to be spawned is not on the PATH, but
busybox.exe is found, parse the output of `busybox.exe --help` to find
out what applets are included, and if the command matches an included
applet name, use busybox.exe to execute it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The main idea of this patch is that even if we have to look up the
absolute path of the script, if only the basename was specified as
argv[0], then we should use that basename on the command line, too, not
the absolute path.
This patch will also help with the upcoming patch where we automatically
substitute "sh ..." by "busybox sh ..." if "sh" is not in the PATH but
"busybox" is: we will do that by substituting the actual executable, but
still keep prepending "sh" to the command line.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This helps with minimal installations such as MinGit that refuse to
waste .zip real estate by shipping identical copies of builtins (.zip
files do not support hard links).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This branch introduces support for reading the "Windows-wide" Git
configuration from `%PROGRAMDATA%\Git\config`. As these settings are
intended to be shared between *all* Git-related software, that config
file takes an even lower precedence than `$(prefix)/etc/gitconfig`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
By default, CreateProcess() does not inherit any open file handles,
unless the bInheritHandles parameter is set to TRUE. Which we do need to
set because we need to pass in stdin/stdout/stderr to talk to the child
processes. Sadly, this means that all file handles (unless marked via
O_NOINHERIT) are inherited.
This lead to problems in GVFS Git, where a long-running read-object hook
is used to hydrate missing objects, and depending on the circumstances,
might only be called *after* Git opened a file handle.
Ideally, we would not open files without O_NOINHERIT unless *really*
necessary (i.e. when we want to pass the opened file handle as standard
handle into a child process), but apparently it is all-too-easy to
introduce incorrect open() calls: this happened, and prevented updating
a file after the read-object hook was started because the hook still
held a handle on said file.
Happily, there is a solution: as described in the "Old New Thing"
https://blogs.msdn.microsoft.com/oldnewthing/20111216-00/?p=8873 there
is a way, starting with Windows Vista, that lets us define precisely
which handles should be inherited by the child process.
And since we bumped the minimum Windows version for use with Git for
Windows to Vista with v2.10.1 (i.e. a *long* time ago), we can use this
method. So let's do exactly that.
We need to make sure that the list of handles to inherit does not
contain duplicates; Otherwise CreateProcessW() would fail with
ERROR_INVALID_ARGUMENT.
While at it, stop setting errno to ENOENT unless it really is the
correct value.
Also, fall back to not limiting handle inheritance under certain error
conditions (e.g. on Windows 7, which is a lot stricter in what handles
you can specify to limit to).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Now that the fscache is single threaded, take advantage of the mem_pool as
the allocator to significantly reduce the cost of allocations and frees.
With the reduced cost of free, in future patches, we can start freeing the
fscache at the end of commands instead of just leaking it.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
This topic branch conflicts with the next change that will change the
way we call `CreateProcessW()`. So let's merge it early, to avoid merge
conflicts during a merge (because we would have to resolve this with
every single merging-rebase).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The threading model for fscache has been to have a single, global cache.
This puts requirements on it to be thread safe so that callers like
preload-index can call it from multiple threads. This was implemented
with a single mutex and completion events which introduces contention
between the calling threads.
Simplify the threading model by making fscache thread specific. This allows
us to remove the global mutex and synchronization events entirely and instead
associate a fscache with every thread that requests one. This works well with
the current multi-threading which divides the cache entries into blocks with
a separate thread processing each block.
At the end of each worker thread, if there is a fscache on the primary
thread, merge the cached results from the worker into the primary thread
cache. This enables us to reuse the cache later especially when scanning for
untracked files.
In testing, this reduced the time spent in preload_index() by about 25% and
also reduced the CPU utilization significantly. On a repo with ~200K files,
it reduced overall status times by ~12%.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.
Add a separate disable_fscache() macro to make the code clearer and easier
to read.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Add tracing around initializing and discarding mempools. In discard report
on the amount of memory unused in the current block to help tune setting
the initial_size.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Track fscache hits and misses for lstat and opendir requests. Reporting of
statistics is done when the cache is disabled for the last time and freed
and is only reported if GIT_TRACE_FSCACHE is set.
Sample output is:
11:33:11.836428 compat/win32/fscache.c:433 fscache: lstat 3775, opendir 263, total requests/misses 4052/269
Signed-off-by: Ben Peart <benpeart@microsoft.com>
... even if they may look like them.
As looking up the target of the "symbolic link" (just to see whether it
starts with `/ContainerMappedDirectories/`) is pretty expensive, we
do it when we can be *really* sure that there is a possibility that this
might be the case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: JiSeop Moon <zcube@zcube.kr>
In preparation for making this function a bit more complicated (to allow
for special-casing the `ContainerMappedDirectories` in Windows
containers, which look like a symbolic link, but are not), let's move it
out of the header.
Signed-off-by: JiSeop Moon <zcube@zcube.kr>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
At the end of the status command, disable and free the fscache so that we
don't leak the memory and so that we can dump the fscache statistics.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
It is a known issue that a rename() can fail with an "Access denied"
error at times, when copying followed by deleting the original file
works. Let's just fall back to that behavior.
Signed-off-by: JiSeop Moon <zcube@zcube.kr>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Use FindFirstFileExW with FindExInfoBasic to avoid forcing NTFS to look up
the short name. Also switch to a larger (64K vs 4K) buffer using
FIND_FIRST_EX_LARGE_FETCH to minimize round trips to the kernel.
In a repo with ~200K files, this drops warm cache status times from 3.19
seconds to 2.67 seconds for a 16% savings.
Signed-off-by: Ben Peart <benpeart@microsoft.com>