Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).
Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.
Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).
Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).
Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).
Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.
Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.
While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.
Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
These fixes were necessary for Sverre Rabbelier's remote-hg to work,
but for some magic reason they are not necessary for the current
remote-hg. Makes you wonder how that one gets away with it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.
Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.
Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.
The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.
Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.
Signed-off-by: Karsten Blees <blees@dcon.de>
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.
This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.
Enable read-only sections for 'git status' and preload_index.
Signed-off-by: Karsten Blees <blees@dcon.de>
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.
Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Signed-off-by: Karsten Blees <blees@dcon.de>
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.
Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.
Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.
Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).
Signed-off-by: Karsten Blees <blees@dcon.de>
I played around with this quite a bit. After trying some more complex
schemes, I found that what worked best is to just sleep 1 millisecond
between iterations. Though it's a very short time, it still completely
eliminates the busy wait condition, without hurting perf.
There code uses SleepEx(1, TRUE) to sleep. See this page for a good
discussion of why that is better than calling SwitchToThread, which
is what was used previously:
http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1
Note that calling SleepEx(0, TRUE) does *not* solve the busy wait.
The most striking case was when testing on a UNC share with a large repo,
on a single CPU machine. Without the fix, it took 4 minutes 15 seconds,
and with the fix it took just 1:08! I think it's because git-upload-pack's
busy wait was eating the CPU away from the git process that's doing the
real work. With multi-proc, the timing is not much different, but tons of
CPU time is still wasted, which can be a killer on a server that needs to
do bunch of other things.
I also tested the very fast local case, and didn't see any measurable
difference. On a big repo with 4500 files, the upload-pack took about 2
seconds with and without the fix.
The existing hashtable implementation (in hash.[ch]) uses open addressing
(i.e. resolve hash collisions by distributing entries across the table).
Thus, removal is difficult to implement with less than O(n) complexity.
Resolving collisions of entries with identical hashes (e.g. via chaining)
is left to the client code.
Add a hashtable implementation that supports O(1) removal and is slightly
easier to use due to builtin entry chaining.
Supports all basic operations init, free, get, add, remove and iteration.
Also includes ready-to-use hash functions based on the public domain FNV-1
algorithm (http://www.isthe.com/chongo/tech/comp/fnv).
The per-entry data structure (hashmap_entry) is piggybacked in front of
the client's data structure to save memory. See test-hashmap.c for usage
examples.
The hashtable is resized by a factor of four when 80% full. With these
settings, average memory consumption is about 2/3 of hash.[ch], and
insertion is about twice as fast due to less frequent resizing.
Lookups are also slightly faster, because entries are strictly confined to
their bucket (i.e. no data of other buckets needs to be traversed).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 5fee995244 introduced the stage_updated_gitmodules() function to
add submodule configuration updates to the index. It assumed that even
after calling remove_cache_entry_at() the same cache entry would still be
valid. This was true in the old days, as cache entries could never be
freed, but that is not so sure in the present as there is ongoing work to
free removed cache entries, which makes this code segfault.
Fix that by calling add_file_to_cache() instead of open coding it. Also
remove the "could not find .gitmodules in index" warning, as that won't
happen in regular use cases (and by then just silently adding it to the
index we do the right thing).
Thanks-to: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It appears that `pwd` returns the POSIX-style or the DOS-style path
depending which style the previous `cd` used. To normalize, enforce `pwd
-W` in scripts.
From the original e-mail exchange:
On Thu, Mar 22, 2012 at 11:13:37AM +0100, Sebastian Schuberth wrote:
> On Wed, Mar 21, 2012 at 22:21, Johannes Sixt <j6t@kdbg.org> wrote:
>
> > I build git and run its tests outside the msysgit environment. Does that
> > explain the difference? (And I use CMD.)
>
> It does not make a difference for me. I started cmd.exe at
> c:\msysgit\git\t, added c:\msysgit\bin temporarily to PATH, and ran
> "sh t5526-fetch-submodules.sh -i -v", and the test still fails.
Yes it probably does. Johannes said that he runs the tests outside of
the msysgit folder. That way there is only one path the submodule script
gets reported and not two like '/c/msysgit/git' and '/git'.
That would explain to me why it is passing.
I am afraid that the only solution is to patch msys itself to report the
long absolute path when passing window style paths to cd. Currently when
I do
cd c:/msysgit/git
I will end up in '/git' instead of the long path.
I found that there is a -W option to pwd in msys bash which makes it
always return the real windows path. A normalization in that direction
is unique and thus might be more robust. Have a look at the attached
patch. With this at least t5526 passes. I was not able to run the whole
testsuite properly at the moment. I can have a look at that tomorrow.
What do you think?
Cheers Heiko
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Dynamic linking is generally preferred over static linking, and MSVCRT.dll
has been integral part of Windows for a long time.
This also fixes linker warnings for _malloc and _free in zlib.lib, which
seems to be compiled for MSVCRT.dll already.
The DLL version also exports some of the CRT initialization functions,
which are hidden in the static libcmt.lib (e.g. __wgetmainargs, required by
subsequent Unicode patches).
Signed-off-by: Karsten Blees <blees@dcon.de>
The previous implementation said that the filesystem information on
Windows is not reliable to determine whether a file is executable.
To find gather this information it was peeking into the first two bytes
of a file to see whether it looks executable.
Apart from the fact that on Windows executables are usually defined as
such by their extension it lead to slow opening of help file in some
situations.
When you have virus scanner running calling open on an executable file
is a potentially expensive operation. See the following measurements (in
seconds) for example.
With virus scanner running (coldcache):
$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.412873
before open (git-annotate.exe): 0.000175
after open (git-annotate.exe): 0.397925
before open (git-apply.exe): 0.000243
after open (git-apply.exe): 0.399996
before open (git-archive.exe): 0.000147
after open (git-archive.exe): 0.397783
before open (git-bisect--helper.exe): 0.000160
after open (git-bisect--helper.exe): 0.397700
before open (git-blame.exe): 0.000160
after open (git-blame.exe): 0.399136
...
With virus scanner running (hotcache):
$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.000325
before open (git-annotate.exe): 0.000229
after open (git-annotate.exe): 0.000177
before open (git-apply.exe): 0.000167
after open (git-apply.exe): 0.000150
before open (git-archive.exe): 0.000154
after open (git-archive.exe): 0.000156
before open (git-bisect--helper.exe): 0.000132
after open (git-bisect--helper.exe): 0.000180
before open (git-blame.exe): 0.000718
after open (git-blame.exe): 0.000724
...
This test did just list the given directory and open() each file in it.
With this patch I get:
$ time git help git
Launching default browser to display HTML ...
real 0m8.723s
user 0m0.000s
sys 0m0.000s
and without
$ time git help git
Launching default browser to display HTML ...
real 1m37.734s
user 0m0.000s
sys 0m0.031s
both tests with cold cache and giving the machine some time to settle
down after restart.
Signed-off-by: Heiko Voigt <heiko.voigt@mahr.de>
7ebac8cb94 made launching of .exe
externals work when installed in Unicode paths. But it broke launching
of non-.exe externals, no matter where they were installed. We now
correctly maintain the UTF-8 and UTF-16 paths in tandem in lookup_prog.
This fixes t5526, among others.
Signed-off-by: Adam Roben <adam@roben.org>
If Git were installed in a path containing non-ASCII characters,
commands such as git-am and git-submodule, which are implemented as
externals, would fail to launch with the following error:
> fatal: 'am' appears to be a git command, but we were not
> able to execute it. Maybe git-am is broken?
This was due to lookup_prog not being Unicode-aware. It was somehow
missed in 2ee5a1a14a.
Note that the only problem in this function was calling
GetFileAttributes instead of GetFileAttributesW. The calls to access()
were fine because access() is a macro which resolves to mingw_access,
which already handles Unicode correctly. But I changed lookup_prog to
use _waccess directly so that we only convert the path to UTF-16 once.
Signed-off-by: Adam Roben <adam@roben.org>
On Windows XP3 in git bash
git clone git@github.com:octocat/Spoon-Knife.git
cd Spoon-Knife
git gui
menu Remote\Fetch from\origin
error: cannot spawn git: No such file or directory
error: could not run rev-list
if u run
git fetch --all
it worked normal in git bash or gitgui tools
In second version CreateProcess get 'C:\Git\libexec\git-core/git.exe' in
first version - C:/Git/libexec/git-core/git.exe and not executes (unix
slashes)
after fixing C:\Git\libexec\git-core\git.exe or
C:/Git/libexec/git-core\git.exe it works normal
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
If the user has not activated reflogs, or if nothing has been recorded
yet (as is the case directly after cloning), said directory may not
exist yet.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, we would like to be able to have a default http.sslCAinfo
that points to an MSys path (i.e. relative to the installation root of
Git). As Git is a MinGW program, it has to handle the conversion
of the MSys path into a MinGW32 path itself.
Since system_path() considers paths starting with '/' as absolute, we
have to convince it to make a Windows path by stripping the leading
slash.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
After importing anything with fast-import, we should always let the
garbage collector do its job, since the objects are written to disk
inefficiently.
This brings down an initial import of http://selenic.com/hg from about
230 megabytes to about 14.
In the future, we may want to make this configurable on a per-remote
basis, or maybe teach fast-import about it in the first place.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>