Add Unicode conversion functions to convert between Windows native UTF-16LE
encoding to UTF-8 and back.
To support repositories with legacy-encoded file names, the UTF-8 to UTF-16
conversion function tries to create valid, unique file names even for
invalid UTF-8 byte sequences, so that these repositories can be checked out
without error.
The current implementation leaves invalid UTF-8 bytes in range 0xa0 - 0xff
as is (producing printable Unicode chars \u00a0 - \u00ff, equivalent to
ISO-8859-1), and converts 0x80 - 0x9f to hex-code (\u0080 - \u009f are
control chars).
The Windows MultiByteToWideChar API was not used as it either drops invalid
UTF-8 sequences (on Win2k/XP; producing non-unique or even empty file
names) or converts them to the replacement char \ufffd (Vista/7; causing
ERROR_INVALID_NAME in subsequent calls to file system APIs).
Signed-off-by: Karsten Blees <blees@dcon.de>
Winansi.c has many static variables that are accessed and modified from
the [v][f]printf / fputs functions overridden in the file. This may cause
multi threaded git commands that print to the console to produce corrupted
output or even crash.
Additionally, winansi.c doesn't override all functions that can be used to
print to the console (e.g. fwrite, write, fputc are missing), so that ANSI
escapes don't work properly for some git commands (e.g. git-grep).
Instead of doing ANSI emulation in just a few wrapped functions on top of
the IO API, let's plug into the IO system and take advantage of the thread
safety inherent to the IO system.
Redirect stdout and stderr to a pipe if they point to the console. A
background thread reads from the pipe, handles ANSI escape sequences and
UTF-8 to UTF-16 conversion, then writes to the console.
The pipe-based stdout and stderr replacements must be set to unbuffered, as
MSVCRT doesn't support line buffering and fully buffered streams are
inappropriate for console output.
Due to the byte-oriented pipe, ANSI escape sequences and multi-byte UTF-8
sequences can no longer be expected to arrive in one piece. Replace the
string-based ansi_emulate() with a simple stateful parser (this also fixes
colored diff hunk headers, which were broken as of commit 2efcc977).
Override isatty to return true for the pipes redirecting to the console.
Exec/spawn obtain the original console handle to pass to the next process
via winansi_get_osfhandle().
All other overrides are gone, the default stdio implementations work as
expected with the piped stdout/stderr descriptors.
Global variables are either initialized on startup (single threaded) or
exclusively modified by the background thread. Threads communicate through
the pipe, no further synchronization is necessary.
The background thread is terminated by disonnecting the pipe after flushing
the stdio and pipe buffers. This doesn't work for anonymous pipes (created
via CreatePipe), as DisconnectNamedPipe only works on the read end, which
discards remaining data. Thus we have to setup the pipe manually, with the
write end beeing the server (opened with CreateNamedPipe) and the read end
the client (opened with CreateFile).
Limitations: doesn't track reopened or duped file descriptors, i.e.:
- fdopen(1/2) returns fully buffered streams
- dup(1/2), dup2(1/2) returns normal pipe descriptors (i.e. isatty() =
false, winansi_get_osfhandle won't return the original console handle)
Currently, only the git-format-patch command uses xfdopen(xdup(1)) (see
"realstdout" in builtin/log.c), but works well with these limitations.
Many thanks to Atsushi Nakagawa <atnak@chejz.com> for suggesting and
reviewing the thread-exit-mechanism.
Signed-off-by: Karsten Blees <blees@dcon.de>
Some constants (such as LF_FACESIZE) are undefined with -DNOGDI (set in the
Makefile), and CONSOLE_FONT_INFOEX is available in MSVC, but not in MinGW.
Cast FARPROC to PGETCURRENTCONSOLEFONTEX to suppress MSVC compiler warning.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git requires the TERM environment variable to be set for all color*
settings. Simulate the TERM variable if it is not set (default on Windows).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This reverts commit df599e9612.
As of 5e9637c6 "i18n: add infrastructure for translating Git with gettext",
eval_gettext uses MinGW envsubst.exe instead of git-sh-i18n--envsubst.exe
for variable substitution. This breaks git-submodule.sh messages and tests,
as envsubst.exe doesn't support case-sensitive environment lookup (the same
is true for almost everything on Windows, including MSys and Cygwin tools).
30a615ac "Windows/i18n: rename $path to prevent clashes with $PATH" renames
the conflicting variable in git-submodule.sh, so that it works on Windows
(i.e. with case-insensitive environment, regardless of the toolset).
Revert to the documented behaviour of case-insensitive environment on
Windows.
Signed-off-by: Karsten Blees <blees@dcon.de>
GetCurrentConsoleFontEx in an atexit routine doesn't work because git
closes stdout before exit (which also closes the console handle). Check
the console font when we first encounter a non-ascii character and only
schedule the warning message to be printed at exit (warnings go to stderr,
which is not closed by git).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
MingwRT listens to _CRT_glob to decide if __getmainargs should
perform globbing, with the default being that it should.
Unfortunately, __getmainargs globbing is sub-par; for instance
patterns like "*.c" will only match c-sources in the current
directory.
Disable __getmainargs' command line wildcard expansion, so these
patterns will be left untouched, and handled by Git's superior
built-in globbing instead.
MSVC defaults to no globbing, so we don't need to do anything
in that case.
This fixes t5505 and t7810.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
The code in the MinGW main macro is getting more and more complex, move to
a separate initialization function for readabiliy and extensibility.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Improve the dirent implementation by removing the relics that were once
necessary to plug into the now unused MinGW runtime, in preparation for
Unicode file name support.
Move FindFirstFile to opendir, and FindClose to closedir, with the
following implications:
- DIR.dd_name is no longer needed
- chdir(one); opendir(relative); chdir(two); readdir() works as expected
(i.e. lists one/relative instead of two/relative)
- DIR.dd_handle is a valid handle for the entire lifetime of the DIR struct
- thus, all checks for dd_handle == INVALID_HANDLE_VALUE and dd_handle == 0
have been removed
- the special case that the directory has been fully read (which was
previously explicitly tracked with dd_handle == INVALID_HANDLE_VALUE &&
dd_stat != 0) is now handled implicitly by the FindNextFile error
handling code (if a client continues to call readdir after receiving
NULL, FindNextFile will continue to fail with ERROR_NO_MORE_FILES, to
the same effect)
- extracting dirent data from WIN32_FIND_DATA is needed in two places, so
moved to its own method
- GetFileAttributes is no longer needed. The same information can be
obtained from the FindFirstFile error code, which is ERROR_DIRECTORY if
the name is NOT a directory (-> ENOTDIR), otherwise we can use
err_win_to_posix (e.g. ERROR_PATH_NOT_FOUND -> ENOENT). The
ERROR_DIRECTORY case could be fixed in err_win_to_posix, but this
probably breaks other functionality.
Removes the ERROR_NO_MORE_FILES check after FindFirstFile (this was
fortunately a NOOP (searching for '*' always finds '.' and '..'),
otherwise the subsequent code would have copied data from an uninitialized
buffer).
Changes malloc to git support function xmalloc, so opendir will die() if
out of memory, rather than failing with ENOMEM and letting git work on
incomplete directory listings (error handling in dir.c is quite sparse).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Git-compat-util.h is two dirs up, and already includes <dirent.h> (which
is the same as "dirent.h" due to -Icompat/win32 in the Makefile).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
FILENAME_MAX and MAX_PATH are both 260 on Windows, however, MAX_PATH is
used throughout the other Win32 code in Git, and also defines the length
of file name buffers in the Win32 API (e.g. WIN32_FIND_DATA.cFileName,
from which we're copying the dirent data).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Remove the union around dirent.d_type and the unused dirent.d_reclen member
(which was necessary for compatibility with the MinGW dirent runtime, which
is no longer used).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
There are no proper inodes on Windows, so remove dirent.d_ino and #define
NO_D_INO_IN_DIRENT in the Makefile (this skips e.g. an ineffective qsort in
fsck.c).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Unicode console output won't display correctly with default settings
because the default console font ("Terminal") only supports the system's
OEM charset. Unfortunately, this is a user specific setting, so it cannot
be easily fixed by e.g. some registry tricks in the setup program.
This change prints a warning on exit if console output contained non-ascii
characters and the console font is supposedly not a TrueType font (which
usually have decent Unicode support).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
GetStdHandle(STD_OUTPUT_HANDLE) doesn't work for stderr if stdout is
redirected. Use _get_osfhandle of the FILE* instead.
_isatty() is true for all character devices (including parallel and serial
ports). Check return value of GetConsoleScreenBufferInfo instead to
reliably detect console handles (also don't initialize internal state from
an uninitialized CONSOLE_SCREEN_BUFFER_INFO structure if the function
fails).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
WriteConsoleW seems to be the only way to reliably print unicode to the
console (without weird code page conversions).
Also redirects vfprintf to the winansi.c version.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Empty (length 0) usernames and/or passwords, when saved in the Windows
Credential Manager, come back as null when reading the credential.
One use case for such empty credentials is with NTLM authentication, where
empty username and password instruct libcurl to authenticate using the
credentials of the currently logged-on user (single sign-on).
When locating the relevant credentials, make empty username match null.
When outputting the credentials, handle nulls correctly.
Signed-off-by: Jakub Bereżański <kuba@berezanscy.pl>
Make sure the helper does not crash when blank username and password is
provided. If the helper can save such credentials, it should be able to
read them back.
Signed-off-by: Jakub Bereżański <kuba@berezanscy.pl>
Some of the tests were written with the assumption that the native eol would always be lf. After defining NATIVE_CRLF on MinGW, these tests began failing. This change will update the tests to also handle a native eol of crlf.
Signed-off-by: Brice Lambson <bricelam@live.com>
Commit 95f31e9a correctly points out that the NATIVE_CRLF setting is
incorrectly set on Mingw git. However, the Makefile variable is not
propagated to the C preprocessor and results in no change. This patch
pushes the definition to the C code and adds a test to validate that
when core.eol as native is crlf, we actually normalize text files to this
line ending convention when core.autocrlf is false.
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
The path in a .git platform independent link file needs to be absolute
and under mingw we need it to be a windows type path, not a unix style
path so it should start with a drive letter and not a /.
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
MSys works very hard to convert Unix-style paths into DOS-style ones.
*Very* hard.
So hard, indeed, that
git blame -L/hello/,/green/
is translated into something like
git blame -LC:/msysgit/hello/,C:/msysgit/green/
As seen in msys_p2w in src\msys\msys\rt\src\winsup\cygwin\path.cc, line
3204ff:
case '-':
//
// here we check for POSIX paths as attributes to a POSIX switch.
//
...
seemingly absolute POSIX paths in single-letter options get expanded by
msys.dll unless they contain '=' or ';'.
So a quick and very dirty fix is to use '-L/;*evil/'. (Using an equal sign
works only when it is before a comma, so in the above example, /=*green/
would still be converted to a DOS-style path.)
Commit-message-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The commit_msg function has an assumption that the string is being output
as utf-8. On Windows this is not true so always convert from the system
encoding to the desired encoding.
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
On Windows XP (not Win7), WriteConsoleW and WriteFile seem to raise and
catch SIGSEGV if the lpNumberOfCharsWritten parameter is NULL. This is not
a problem when executed standalone, but gdb stops execution here (unless
disabled via "handle SIGSEGV nostop").
Fix it by passing a dummy variable.
Signed-off-by: Karsten Blees <blees@dcon.de>
On Windows the application command line is provided as unicode and in
mingw-git we convert that to utf-8. So these tests that require a iso-8859-1
input are being subverted by the encoding transformations we perform and
should be skipped.
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
Those tests that generate files using echo can expect crlf issues when run
under windows. For such cases we use 'test_cmp_text'.
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
The test separator char is a colon which means any absolute paths on windows
confuse the tests that use global_excludes.
Suggested-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
If the user has not activated reflogs, or if nothing has been recorded
yet (as is the case directly after cloning), said directory may not
exist yet.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows now ships with the new Git icon from git-scm.com. Use that
icon file if it exists instead of the old procedurally drawn one.
This patch was sent upstream but so far no decision on its inclusion was
made, so commit it to our fork.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
The previous implementation said that the filesystem information on
Windows is not reliable to determine whether a file is executable.
To find gather this information it was peeking into the first two bytes
of a file to see whether it looks executable.
Apart from the fact that on Windows executables are usually defined as
such by their extension it lead to slow opening of help file in some
situations.
When you have virus scanner running calling open on an executable file
is a potentially expensive operation. See the following measurements (in
seconds) for example.
With virus scanner running (coldcache):
$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.412873
before open (git-annotate.exe): 0.000175
after open (git-annotate.exe): 0.397925
before open (git-apply.exe): 0.000243
after open (git-apply.exe): 0.399996
before open (git-archive.exe): 0.000147
after open (git-archive.exe): 0.397783
before open (git-bisect--helper.exe): 0.000160
after open (git-bisect--helper.exe): 0.397700
before open (git-blame.exe): 0.000160
after open (git-blame.exe): 0.399136
...
With virus scanner running (hotcache):
$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.000325
before open (git-annotate.exe): 0.000229
after open (git-annotate.exe): 0.000177
before open (git-apply.exe): 0.000167
after open (git-apply.exe): 0.000150
before open (git-archive.exe): 0.000154
after open (git-archive.exe): 0.000156
before open (git-bisect--helper.exe): 0.000132
after open (git-bisect--helper.exe): 0.000180
before open (git-blame.exe): 0.000718
after open (git-blame.exe): 0.000724
...
This test did just list the given directory and open() each file in it.
With this patch I get:
$ time git help git
Launching default browser to display HTML ...
real 0m8.723s
user 0m0.000s
sys 0m0.000s
and without
$ time git help git
Launching default browser to display HTML ...
real 1m37.734s
user 0m0.000s
sys 0m0.031s
both tests with cold cache and giving the machine some time to settle
down after restart.
Signed-off-by: Heiko Voigt <heiko.voigt@mahr.de>
7ebac8cb94 made launching of .exe
externals work when installed in Unicode paths. But it broke launching
of non-.exe externals, no matter where they were installed. We now
correctly maintain the UTF-8 and UTF-16 paths in tandem in lookup_prog.
This fixes t5526, among others.
Signed-off-by: Adam Roben <adam@roben.org>
I played around with this quite a bit. After trying some more complex
schemes, I found that what worked best is to just sleep 1 millisecond
between iterations. Though it's a very short time, it still completely
eliminates the busy wait condition, without hurting perf.
There code uses SleepEx(1, TRUE) to sleep. See this page for a good
discussion of why that is better than calling SwitchToThread, which
is what was used previously:
http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1
Note that calling SleepEx(0, TRUE) does *not* solve the busy wait.
The most striking case was when testing on a UNC share with a large repo,
on a single CPU machine. Without the fix, it took 4 minutes 15 seconds,
and with the fix it took just 1:08! I think it's because git-upload-pack's
busy wait was eating the CPU away from the git process that's doing the
real work. With multi-proc, the timing is not much different, but tons of
CPU time is still wasted, which can be a killer on a server that needs to
do bunch of other things.
I also tested the very fast local case, and didn't see any measurable
difference. On a big repo with 4500 files, the upload-pack took about 2
seconds with and without the fix.
Apparently the signal handling is not quite correct in the fsckobject
handling (most likely we rely on a side effect that lets us still output
some message after receiving a signal 13 but in the BuildHive setup this
fails intermittently).
As a consequence, the push in t5504 does fail as expected, but fails to
output anything (unexpected). Since this is good enough for now, let's
handle an empty output as success, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
After importing anything with fast-import, we should always let the
garbage collector do its job, since the objects are written to disk
inefficiently.
This brings down an initial import of http://selenic.com/hg from about
230 megabytes to about 14.
In the future, we may want to make this configurable on a per-remote
basis, or maybe teach fast-import about it in the first place.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When calling `git fast-export a..a b` when a and b refer to the same
commit, nothing would be exported, and an incorrect reset line would
be printed for b ('from :0').
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Dynamic linking is generally preferred over static linking, and MSVCRT.dll
has been integral part of Windows for a long time.
This also fixes linker warnings for _malloc and _free in zlib.lib, which
seems to be compiled for MSVCRT.dll already.
The DLL version also exports some of the CRT initialization functions,
which are hidden in the static libcmt.lib (e.g. __wgetmainargs, required by
subsequent Unicode patches).
Signed-off-by: Karsten Blees <blees@dcon.de>
As of "Win32: Thread-safe windows console output", git-log no longer
terminates when the pager process dies. This is due to disabling buffering
for the replaced stdout / stderr streams. Git-log will periodically fflush
stdout (see write_or_die.c/mayble_flush_or_die()), but with no buffering,
this is a NOP that always succeeds (so we never detect the EPIPE error).
Exchange the original console handles with our console thread pipe handles
by accessing the internal MSVCRT data structures directly (which are
exposed via __pioinfo for some reason).
Implement this with minimal assumptions about the actual data structure to
make it work with different (hopefully even future) MSVCRT versions.
While messing with internal data structures is ugly, this patch solves the
problem at the source instead of adding more workarounds. We no longer need
the special winansi_isatty override, and the limitations documented in
"Win32: Thread-safe windows console output" are gone (i.e. fdopen(1/2)
returns unbuffered streams now, and isatty() for duped console file
descriptors works as expected).
Signed-off-by: Karsten Blees <blees@dcon.de>
On Windows XP (not Win7), directories cannot be deleted while a find handle
is open, causing "Deletion of directory '...' failed. Should I try again?"
prompts.
Prior to 19d1e75d "Win32: Unicode file name support (except dirent)",
these failures were silently ignored due to strbuf_free in is_dir_empty
resetting GetLastError to ERROR_SUCCESS.
Close the find handle in is_dir_empty so that git doesn't block deletion
of the directory even after all other applications have released it.
Reported-by: John Chen <john0312@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>