Teach "add" to use preload-index and fscache features
to improve performance on very large repositories.
During an "add", a call is made to run_diff_files()
which calls check_remove() for each index-entry. This
calls lstat(). On Windows, the fscache code intercepts
the lstat() calls and builds a private cache using the
FindFirst/FindNext routines, which are much faster.
Somewhat independent of this, is the preload-index code
which distributes some of the start-up costs across
multiple threads.
We need to keep the call to read_cache() before parsing the
pathspecs (and hence cannot use the pathspecs to limit any preload)
because parse_pathspec() is using the index to determine whether a
pathspec is, in fact, in a submodule. If we would not read the index
first, parse_pathspec() would not error out on a path that is inside
a submodule, and t7400-submodule-basic.sh would fail with
not ok 47 - do not add files from a submodule
We still want the nice preload performance boost, though, so we simply
call read_cache_preload(&pathspecs) after parsing the pathspecs.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The purpose of this function is to stat() the files listed in the index
in a multi-threaded fashion. It is called directly after reading the
index in the read_index_preloaded() function.
However, in some cases we may want to separate the index reading from
the preloading step, e.g. in builtin/add.c, where we need to load the
index before we parse the pathspecs (which needs to error out if one of
the pathspecs refers to a path within a submodule, for which the index
must have been read already), and only then will we want to preload,
possibly limited by the just-parsed pathspecs.
So let's just export that function to allow calling it separately.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.
On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.
Signed-off-by: Karsten Blees <blees@dcon.de>
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.
Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.
Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.
The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.
Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.
Signed-off-by: Karsten Blees <blees@dcon.de>
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.
This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.
Enable read-only sections for 'git status' and preload_index.
Signed-off-by: Karsten Blees <blees@dcon.de>
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.
Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Signed-off-by: Karsten Blees <blees@dcon.de>
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.
Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.
Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).
Signed-off-by: Karsten Blees <blees@dcon.de>
Git for Windows ships with its own Perl interpreter, and insists on
using it, so it will most likely wreak havoc if PERL5LIB is set before
launching Git.
Let's just unset that environment variables when spawning processes.
To make this feature extensible (and overrideable), there is a new
config setting `core.unsetenvvars` that allows specifying a
comma-separated list of names to unset before spawning processes.
Reported by Gabriel Fuhrmann.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In the Git for Windows project, we have ample precendent for config
settings that apply to Windows, and to Windows only.
Let's formalize this concept by introducing a platform_core_config()
function that can be #define'd in a platform-specific manner.
This will allow us to contain platform-specific code better, as the
corresponding variables no longer need to be exported so that they can
be defined in environment.c and be set in config.c
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit starts the rebase of 58e7864ffb to a274e0a036
We use the opportunity to skip earlier iterations of the mingw-isatty and
the interactive-rebase patch series. To make it easier to submit the
patches upstream, we also reorder the patches so that patch series which
are more likely to make it upstream come earlier than the others.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is an evil merge: it changes more than the merged commits, as the
merged branch replaces part of the MSVC patches.
This mess will need to be cleaned up in the next merging rebase, by
moving the mingw-isatty-fixup patches in front of the MSVC patches.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows has carried a patch that depended on internals
of MSVC runtime, but it does not work correctly with recent MSVC
runtime. A replacement was written originally for compiling
with VC++. The patch in this message is a backport of that
replacement, and it also fixes the previous attempt to make
isatty() tell that /dev/null is *not* an interactive terminal.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Tested-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git only colours the output and uses pagination if isatty() returns 1.
MSYS2 and Cygwin emulate pseudo terminals via named pipes, meaning that
isatty() returns 0.
f7f90e0f4f (mingw: make isatty() recognize MSYS2's pseudo terminals
(/dev/pty*), 2016-04-27) fixed this for MSYS2 terminals, but not for
Cygwin.
The named pipes that Cygwin and MSYS2 use are very similar. MSYS2 PTY pipes
are called 'msys-*-pty*' and Cygwin uses 'cygwin-*-pty*'. This commit
modifies the existing check to allow both MSYS2 and Cygwin PTY pipes to be
identified as TTYs.
Note that pagination is still broken when running Git for Windows from
within Cygwin, as MSYS2's less.exe is spawned (and does not like to
interact with Cygwin's PTY).
This partially fixes https://github.com/git-for-windows/git/issues/267
Signed-off-by: Alan Davies <alan.n.davies@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When determining whether a handle corresponds to a *real* Win32 Console
(as opposed to, say, a character device such as /dev/null), we use the
GetConsoleOutputBufferInfo() function as a tell-tale.
However, that does not work for *input* handles associated with a
console. Let's just use the GetConsoleMode() function for input handles,
and since it does not work on output handles fall back to the previous
method for those.
This patch prepares for using is_console() instead of my previous
misguided attempt in cbb3f3c9b1 (mingw: intercept isatty() to handle
/dev/null as Git expects it, 2016-12-11) that broke everything on
Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prepare to merge the latest iteration of the mingw-isatty patch series.
This reverts commit 4af28e2fb4, reversing
changes made to 6cd98a7c65.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prepare to merge the most recent iteration of the mingw-isatty patches.
This reverts commit 66d27deb17, reversing
changes made to cf8df3d1b9.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This series of branches introduces the git-rebase--helper, a builtin
helping to accelerate the interactive rebase dramatically.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prepare for the current interactive-rebase patch thicket, backported to
`maint` (actually, our latest start of a merging rebase).
This reverts commit b358f8dfe0, reversing
changes made to 9e699f5eb6.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This operation has quadratic complexity, which is especially painful
on Windows, where shell scripts are *already* slow (mainly due to the
overhead of the POSIX emulation layer).
Let's reimplement this with linear complexity (using a hash map to
match the commits' subject lines) for the common case; Sadly, the
fixup/squash feature's design neglected performance considerations,
allowing arbitrary prefixes (read: `fixup! hell` will match the
commit subject `hello world`), which means that we are stuck with
quadratic performance in the worst case.
The reimplemented logic also happens to fix a bug where commented-out
lines (representing empty patches) were dropped by the previous code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `git commit --fixup` command unwraps wrapped onelines when
constructing the commit message, without wrapping the result.
We need to make sure that `git rebase --autosquash` keeps handling such
cases correctly, in particular since we are about to move the autosquash
handling into the rebase--helper.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In particular on Windows, where shell scripts are even more expensive
than on MacOSX or Linux, it makes sense to move a loop that forks
Git at least once for every line in the todo list into a builtin.
Note: The original code did not try to skip unnecessary picks of root
commits but punts instead (probably --root was not considered common
enough of a use case to bother optimizing). We do the same, for now.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In particular on Windows, where shell scripts are even more expensive
than on MacOSX or Linux, it makes sense to move a loop that forks
Git at least once for every line in the todo list into a builtin.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These tests were a bit anal about the *exact* warning/error message
printed by git rebase. But those messages are intended for the *end
user*, therefore it does not make sense to test so rigidly for the
*exact* wording.
In the following, we will reimplement the missing commits check in
the sequencer, with slightly different words.
So let's just test for the parts in the warning/error message that
we *really* care about, nothing more, nothing less.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is crucial to improve performance on Windows, as the speed is now
mostly dominated by the SHA-1 transformation (because it spawns a new
rev-parse process for *every* line, and spawning processes is pretty
slow from Git for Windows' MSYS2 Bash).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
To avoid problems with short SHA-1s that become non-unique during the
rebase, we rewrite the todo script with short/long SHA-1s before and
after letting the user edit the script. Since SHA-1s are not intuitive
for humans, rebase -i also provides the onelines (commit message
subjects) in the script, purely for the user's convenience.
It is very possible to generate a todo script via different means than
rebase -i and then to let rebase -i run with it; In this case, these
onelines are not required.
And this is where the expand/collapse machinery has a bug: it *expects*
that oneline, and failing to find one reuses the previous SHA-1 as
"oneline".
It was most likely an oversight, and made implementation in the (quite
limiting) shell script language less convoluted. However, we are about
to reimplement performance-critical parts in C (and due to spawning a
git.exe process for every single line of the todo script, the
expansion/collapsing of the SHA-1s *is* performance-hampering on
Windows), therefore let's fix this bug to make cross-validation with the
C version of that functionality possible.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The commands used to be indented, and it is nice to look at, but when we
transform the SHA-1s, the indentation is removed. So let's do away with it.
For the moment, at least: when we will use the upcoming rebase--helper
to transform the SHA-1s, we *will* keep the indentation and can
reintroduce it. Yet, to be able to validate the rebase--helper against
the output of the current shell script version, we need to remove the
extra indentation.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The first step of an interactive rebase is to generate the so-called "todo
script", to be stored in the state directory as "git-rebase-todo" and to
be edited by the user.
Originally, we adjusted the output of `git log <options>` using a simple
sed script. Over the course of the years, the code became more
complicated. We now use shell scripting to edit the output of `git log`
conditionally, depending whether to keep "empty" commits (i.e. commits
that do not change any files).
On platforms where shell scripting is not native, this can be a serious
drag. And it opens the door for incompatibilities between platforms when
it comes to shell scripting or to Unix-y commands.
Let's just re-implement the todo script generation in plain C, using the
revision machinery directly.
This is substantially faster, improving the speed relative to the
shell script version of the interactive rebase from 2x to 3x on Windows.
Note that the rearrange_squash() function in git-rebase--interactive
relied on the fact that we set the "format" variable to the config setting
rebase.instructionFormat. Relying on a side effect like this is no good,
hence we explicitly perform that assignment (possibly again) in
rearrange_squash().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Now that the sequencer learned to process a "normal" interactive rebase,
we use it. The original shell script is still used for "non-normal"
interactive rebases, i.e. when --root or --preserve-merges was passed.
Please note that the --root option (via the $squash_onto variable) needs
special handling only for the very first command, hence it is still okay
to use the helper upon continue/skip.
Also please note that the --no-ff setting is volatile, i.e. when the
interactive rebase is interrupted at any stage, there is no record of
it. Therefore, we have to pass it from the shell script to the
rebase--helper.
Note: the test t3404 had to be adjusted because the the error messages
produced by the sequencer comply with our current convention to start with
a lower-case letter.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git's interactive rebase is still implemented as a shell script, despite
its complexity. This implies that it suffers from the portability point
of view, from lack of expressibility, and of course also from
performance. The latter issue is particularly serious on Windows, where
we pay a hefty price for relying so much on POSIX.
Unfortunately, being such a huge shell script also means that we missed
the train when it would have been relatively easy to port it to C, and
instead piled feature upon feature onto that poor script that originally
never intended to be more than a slightly pimped cherry-pick in a loop.
To open the road toward better performance (in addition to all the other
benefits of C over shell scripts), let's just start *somewhere*.
The approach taken here is to add a builtin helper that at first intends
to take care of the parts of the interactive rebase that are most
affected by the performance penalties mentioned above.
In particular, after we spent all those efforts on preparing the sequencer
to process rebase -i's git-rebase-todo scripts, we implement the `git
rebase -i --continue` functionality as a new builtin, git-rebase--helper.
Once that is in place, we can work gradually on tackling the rest of the
technical debt.
Note that the rebase--helper needs to learn about the transient
--ff/--no-ff options of git-rebase, as the corresponding flag is not
persisted to, and re-read from, the state directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The shell script version of the interactive rebase has a very specific
final message. Teach the sequencer to print the same.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For the benefit of e.g. the shell prompt, the interactive rebase not
only displays the progress for the user to see, but also writes it into
the msgnum/end files in the state directory.
Teach the sequencer this new trick.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The interactive rebase keeps the user informed about its progress.
If the sequencer wants to do the grunt work of the interactive
rebase, it also needs to show that progress.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is the behavior of the shell script version of the interactive
rebase, by using the `output` function defined in `git-rebase.sh`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is the behavior of the shell script version of the interactive
rebase, by using the `output` function defined in `git-rebase.sh`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Instead of using the convenience function run_command_v_opt_cd_env(), we
now use the run_command() function. The former function is simply a
wrapper of the latter, trying to make it more convenient to use.
However, we already have to construct the argv and the env parameters,
and we will need even finer control e.g. over the output of the command,
so let's just stop using the convenience function.
Based on patches and suggestions by Johannes Sixt and Jeff King.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Rather than abusing a strbuf to come up with an environment block, let's
just use the argv_array structure which serves the same purpose much
better.
While at it, rename the function to reflect the fact that it does not
really care exactly what environment variables are defined in said file.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In the upcoming patch, we will support rebase -i's progress
reporting. The progress skips comments but counts 'noop's.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The parsing part of a 'drop' command is almost identical to parsing a
'pick', while the operation is the same as that of a 'noop'.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The interactive rebase has the very special magic that a cherry-pick
that exits with a status different from 0 and 1 signifies a failure to
even record that a cherry-pick was started.
This can happen e.g. when a fast-forward fails because it would
overwrite untracked files.
In that case, we must reschedule the command that we thought we already
had at least started successfully.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The sequencer already has an idea about using different merge
strategies. We just piggy-back on top of that, using rebase -i's
own settings, when running the sequencer in interactive rebase mode.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git's `rebase` command inspects the `rebase.autostash` config setting
to determine whether it should stash any uncommitted changes before
rebasing and re-apply them afterwards.
As we introduce more bits and pieces to let the sequencer act as
interactive rebase's backend, here is the part that adds support for
the autostash feature.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>