Commit Graph

330 Commits

Author SHA1 Message Date
Karsten Blees
d7e5707ffd Revert "Windows: teach getenv to do a case-sensitive search"
This reverts commit df599e9612.

As of 5e9637c6 "i18n: add infrastructure for translating Git with gettext",
eval_gettext uses MinGW envsubst.exe instead of git-sh-i18n--envsubst.exe
for variable substitution. This breaks git-submodule.sh messages and tests,
as envsubst.exe doesn't support case-sensitive environment lookup (the same
is true for almost everything on Windows, including MSys and Cygwin tools).

30a615ac "Windows/i18n: rename $path to prevent clashes with $PATH" renames
the conflicting variable in git-submodule.sh, so that it works on Windows
(i.e. with case-insensitive environment, regardless of the toolset).

Revert to the documented behaviour of case-insensitive environment on
Windows.

Signed-off-by: Karsten Blees <blees@dcon.de>
2013-01-04 14:06:37 +01:00
Karsten Blees
dcf42ce99a Unicode console: fix font warning on Vista and Win7
GetCurrentConsoleFontEx in an atexit routine doesn't work because git
closes stdout before exit (which also closes the console handle). Check
the console font when we first encounter a non-ascii character and only
schedule the warning message to be printed at exit (warnings go to stderr,
which is not closed by git).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:35 +01:00
Karsten Blees
5d22fcd775 MinGW: disable CRT command line globbing
MingwRT listens to _CRT_glob to decide if __getmainargs should
perform globbing, with the default being that it should.
Unfortunately, __getmainargs globbing is sub-par; for instance
patterns like "*.c" will only match c-sources in the current
directory.

Disable __getmainargs' command line wildcard expansion, so these
patterns will be left untouched, and handled by Git's superior
built-in globbing instead.

MSVC defaults to no globbing, so we don't need to do anything
in that case.

This fixes t5505 and t7810.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:32 +01:00
Karsten Blees
4b2acc0a76 Win32: move main macro to a function
The code in the MinGW main macro is getting more and more complex, move to
a separate initialization function for readabiliy and extensibility.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:30 +01:00
Karsten Blees
c4dafa2549 Win32: fix potential multi-threading issue
...by removing a static buffer in do_stat_internal.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:28 +01:00
Karsten Blees
9af7e14a00 Win32 dirent: improve dirent implementation
Improve the dirent implementation by removing the relics that were once
necessary to plug into the now unused MinGW runtime, in preparation for
Unicode file name support.

Move FindFirstFile to opendir, and FindClose to closedir, with the
following implications:
- DIR.dd_name is no longer needed
- chdir(one); opendir(relative); chdir(two); readdir() works as expected
  (i.e. lists one/relative instead of two/relative)
- DIR.dd_handle is a valid handle for the entire lifetime of the DIR struct
- thus, all checks for dd_handle == INVALID_HANDLE_VALUE and dd_handle == 0
  have been removed
- the special case that the directory has been fully read (which was
  previously explicitly tracked with dd_handle == INVALID_HANDLE_VALUE &&
  dd_stat != 0) is now handled implicitly by the FindNextFile error
  handling code (if a client continues to call readdir after receiving
  NULL, FindNextFile will continue to fail with ERROR_NO_MORE_FILES, to
  the same effect)
- extracting dirent data from WIN32_FIND_DATA is needed in two places, so
  moved to its own method
- GetFileAttributes is no longer needed. The same information can be
  obtained from the FindFirstFile error code, which is ERROR_DIRECTORY if
  the name is NOT a directory (-> ENOTDIR), otherwise we can use
  err_win_to_posix (e.g. ERROR_PATH_NOT_FOUND -> ENOENT). The
  ERROR_DIRECTORY case could be fixed in err_win_to_posix, but this
  probably breaks other functionality.

Removes the ERROR_NO_MORE_FILES check after FindFirstFile (this was
fortunately a NOOP (searching for '*' always finds '.' and '..'),
otherwise the subsequent code would have copied data from an uninitialized
buffer).

Changes malloc to git support function xmalloc, so opendir will die() if
out of memory, rather than failing with ENOMEM and letting git work on
incomplete directory listings (error handling in dir.c is quite sparse).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:25 +01:00
Karsten Blees
5a4f2329eb Win32 dirent: clarify #include directives
Git-compat-util.h is two dirs up, and already includes <dirent.h> (which
is the same as "dirent.h" due to -Icompat/win32 in the Makefile).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:23 +01:00
Karsten Blees
57dcb1df48 Win32 dirent: change FILENAME_MAX to MAX_PATH
FILENAME_MAX and MAX_PATH are both 260 on Windows, however, MAX_PATH is
used throughout the other Win32 code in Git, and also defines the length
of file name buffers in the Win32 API (e.g. WIN32_FIND_DATA.cFileName,
from which we're copying the dirent data).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:20 +01:00
Karsten Blees
8ec4271793 Win32 dirent: remove unused dirent.d_reclen member
Remove the union around dirent.d_type and the unused dirent.d_reclen member
(which was necessary for compatibility with the MinGW dirent runtime, which
is no longer used).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:17 +01:00
Karsten Blees
df3de15c21 Win32 dirent: remove unused dirent.d_ino member
There are no proper inodes on Windows, so remove dirent.d_ino and #define
NO_D_INO_IN_DIRENT in the Makefile (this skips e.g. an ineffective qsort in
fsck.c).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:06:15 +01:00
Karsten Blees
a5dd3e4a01 Warn if the Windows console font doesn't support Unicode
Unicode console output won't display correctly with default settings
because the default console font ("Terminal") only supports the system's
OEM charset. Unfortunately, this is a user specific setting, so it cannot
be easily fixed by e.g. some registry tricks in the setup program.

This change prints a warning on exit if console output contained non-ascii
characters and the console font is supposedly not a TrueType font (which
usually have decent Unicode support).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:06:10 +01:00
Karsten Blees
3e125b1bed Detect console streams more reliably on Windows
GetStdHandle(STD_OUTPUT_HANDLE) doesn't work for stderr if stdout is
redirected. Use _get_osfhandle of the FILE* instead.

_isatty() is true for all character devices (including parallel and serial
ports). Check return value of GetConsoleScreenBufferInfo instead to
reliably detect console handles (also don't initialize internal state from
an uninitialized CONSOLE_SCREEN_BUFFER_INFO structure if the function
fails).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:06:08 +01:00
Karsten Blees
06c13071f9 Support Unicode console output on Windows
WriteConsoleW seems to be the only way to reliably print unicode to the
console (without weird code page conversions).

Also redirects vfprintf to the winansi.c version.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:06:06 +01:00
Karsten Blees
66732aa71e Enable color output in Windows cmd.exe
Git requires the TERM environment variable to be set for all color*
settings. Simulate the TERM variable if it is not set (default on Windows).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:06:01 +01:00
Evgeny Pashkin
536dd27a17 Fixed wrong path delimiter in exe finding
On Windows XP3 in git bash
git clone git@github.com:octocat/Spoon-Knife.git
cd Spoon-Knife
git gui
menu Remote\Fetch from\origin
error: cannot spawn git: No such file or directory
error: could not run rev-list

if u run
git fetch --all
it worked normal in git bash or gitgui tools

In second version CreateProcess get 'C:\Git\libexec\git-core/git.exe' in
first version - C:/Git/libexec/git-core/git.exe and not executes (unix
slashes)

after fixing C:\Git\libexec\git-core\git.exe or
C:/Git/libexec/git-core\git.exe it works normal

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:04:57 +01:00
Eric Sunshine
a0ef7177c6 Make mingw_offset_1st_component() behave consistently for all paths.
mingw_offset_1st_component() returns "foo" for inputs "/foo" and
"c:/foo", but inconsistently returns "/foo" for UNC input
"/machine/share/foo".  Fix it to return "foo" for all cases.

Reference: http://groups.google.com/group/msysgit/browse_thread/thread/c0af578549b5dda0

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:04:53 +01:00
Cezary Zawadka
af1f9c7757 Allow using UNC path for git repository
[efl: moved MinGW-specific part to compat/]

[jes: fixed compilation on non-Windows]

Signed-off-by: Cezary Zawadka <czawadka@gmail.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:04:50 +01:00
Johannes Schindelin
42ea04cf22 Add a Windows-specific fallback to getenv("HOME");
This fixes msysGit issue 482 properly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:04:47 +01:00
Erik Faye-Lund
7cee2e1aa0 mingw: do not hide bare repositories
As reported in msysGit issue 450 the recent change to set the windows
hidden attribute on the .git directory is being applied to bare git
directories. This patch excludes bare repositories.

Tested-by: Pat Thoyts <patthoyts@users.sourceforge.net>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2013-01-04 14:04:10 +01:00
Johannes Schindelin
5c2cba0313 When initializing .git/, record the current setting of core.hideDotFiles
This is on Windows only, of course.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:04:08 +01:00
Erik Faye-Lund
8c90767228 core.hidedotfiles: hide '.git' dir by default
At least for cross-platform projects, it makes sense to hide the
files starting with a dot, as this is the behavior on Unix/MacOSX.

However, at least Eclipse has problems interpreting the hidden flag
correctly, so the default is to hide only the .git/ directory.

The config setting core.hideDotFiles therefore supports not only
'true' and 'false', but also 'dotGitOnly'.

[jes: clarified the commit message, made git init respect the setting
by marking the .git/ directory only after reading the config, and added
documentation, and rebased on top of current junio/next]

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2013-01-04 14:04:06 +01:00
Junio C Hamano
7348159380 Merge branch 'ef/mingw-rmdir'
MinGW has a workaround when rmdir unnecessarily fails to retry with
a prompt, but the logic was kicking in when the rmdir failed with
ENOTEMPTY, i.e. was expected to fail and there is no point retrying.

* ef/mingw-rmdir:
  mingw_rmdir: do not prompt for retry when non-empty
2012-12-11 15:51:14 -08:00
Erik Faye-Lund
a83b2b578c mingw_rmdir: do not prompt for retry when non-empty
in ab1a11be ("mingw_rmdir: set errno=ENOTEMPTY when appropriate"),
a check was added to prevent us from retrying to delete a directory
that is both in use and non-empty.

However, this logic was slightly flawed; since we didn't return
immediately, we end up falling out of the retry-loop, but right into
the prompting-loop.

Fix this by setting errno, and guarding the prompting-loop with an
errno-check.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-10 08:23:53 -08:00
Erik Faye-Lund
f7a4cea25e mingw: get rid of getpass implementation
There's no remaining call-sites, and as pointed out in the
previous commit message, it's not quite ideal. So let's just
lose it.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-04 08:03:42 -08:00
Erik Faye-Lund
afb43561b8 mingw: reuse tty-version of git_terminal_prompt
The getpass-implementation we use on Windows isn't at all ideal;
it works in raw-mode (as opposed to cooked mode), and as a result
does not deal correcly with deletion, arrow-keys etc.

Instead, use cooked mode to read a line at the time, allowing the
C run-time to process the input properly.

Since we set files to be opened in binary-mode by default on
Windows, introduce a FORCE_TEXT macro that expands to the "t"
modifier that forces the terminal to be opened in text-mode so we
do not have to deal with CRLF issues.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-04 08:03:08 -08:00
Erik Faye-Lund
67fe735653 compat/terminal: separate input and output handles
On Windows, the terminal cannot be opened in read-write mode, so
we need distinct pairs for reading and writing. Since this works
fine on other platforms as well, always open them in pairs.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-04 08:02:55 -08:00
Erik Faye-Lund
9df92e6369 compat/terminal: factor out echo-disabling
By moving the echo-disabling code to a separate function, we can
implement OS-specific versions of it for non-POSIX platforms.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-04 08:01:59 -08:00
Erik Faye-Lund
176478a8bd mingw: make fgetc raise SIGINT if apropriate
Set a control-handler to prevent the process from terminating, and
simulate SIGINT so it can be handled by a signal-handler as usual.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-04 08:00:58 -08:00
Erik Faye-Lund
f4f549892a mingw: correct exit-code for SIGALRM's SIG_DFL
Make sure SIG_DFL for SIGALRM exits with 128 + SIGALRM so other
processes can diagnose why it exits.

While we're at it, make sure we only write to stderr if it's a
terminal, and  change the output to match that of Linux.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-04 08:00:29 -08:00
Junio C Hamano
5ab539bb00 Merge branch 'nd/maint-compat-fnmatch-fix'
* nd/maint-compat-fnmatch-fix:
  compat/fnmatch: fix off-by-one character class's length check
2012-11-25 18:44:41 -08:00
Nguyễn Thái Ngọc Duy
f10e3864dc compat/fnmatch: fix off-by-one character class's length check
Character class "xdigit" is the only one that hits 6 character limit
defined by CHAR_CLASS_MAX_LENGTH. All other character classes are 5
character long and therefore never caught by this.

This should make xdigit tests in t3070 pass on Windows.

Reported-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-20 12:13:09 -08:00
Mark Levedahl
9fca6cffc0 USE CGYWIN_V15_WIN32API as macro to select api for cygwin
The previous macro was confusing to some, and did not include "cygwin" in
its name. The updated name more clearly expresses a choice of the
win32api implementation that shipped with version 1.5 of cygwin.

Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-18 20:02:40 -08:00
Mark Levedahl
380a4d927b Update cygwin.c for new mingw-64 win32 api headers
The cygwin project recently switched to a new implementation of the
windows api, now using header files from the mingw-64 project. These
new header files are incompatible with the way cygwin.c included the
old headers: cygwin.c can be compiled using the new or the older (mingw)
headers, but different files must be included in different order for each
to work. The new headers are in use only for the current release series
(based upon the v1.7.x dll version). The previous release series using
the v1.5 dll is kept available but unmaintained for use on older versions
of Windows. So, patch cygwin.c to use the new include ordering only if
the dll version is 1.7 or higher.

Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
2012-11-12 15:47:50 -05:00
Jeff King
bbbd057389 Merge branch 'js/mingw-fflush-errno'
* js/mingw-fflush-errno:
  maybe_flush_or_die: move a too-loose Windows specific error
2012-10-25 06:43:01 -04:00
Johannes Sixt
84adb64154 maybe_flush_or_die: move a too-loose Windows specific error
check to compat

Commit b2f5e268 (Windows: Work around an oddity when a pipe with no reader
is written to) introduced a check for EINVAL after fflush() to fight
spurious "Invalid argument" errors on Windows when a pipe was broken. But
this check may hide real errors on systems that do not have the this odd
behavior. Introduce an fflush wrapper in compat/mingw.* so that the treatment
is only applied on Windows.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-17 00:33:42 -07:00
Joachim Schmitz
a6772946a5 make poll() work on platforms that can't recv() on a non-socket
This way it just got added to gnulib too the other day.

Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-17 15:44:20 -07:00
Joachim Schmitz
32fde6575e poll() exits too early with EFAULT if 1st arg is NULL
If poll() is used as a milli-second sleep, like in help.c, by passing a NULL
in the 1st and a 0 in the 2nd arg, it exits with EFAULT.

As per Paolo Bonzini, the original author, this is a bug and to be fixed
Like in this commit, which is not to exit if the 2nd arg is 0. It got fixed
In gnulib in the same manner the other day.

Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-17 15:43:44 -07:00
Joachim Schmitz
98c573a902 fix some win32 specific dependencies in poll.c
In order for non-win32 platforms to be able to use poll.c, #ifdef the
inclusion of two header files properly

Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-17 15:43:20 -07:00
Joachim Schmitz
6d45eb1720 make poll available for other platforms lacking it
move poll.[ch] out of compat/win32/ into compat/poll/ and adjust
Makefile with the changed paths. Adding comments to Makefile about
how/when to enable it and add logic for this

Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-17 15:42:57 -07:00
Junio C Hamano
8bc72fc7fe Merge branch 'rr/precompose-utf8-cleanup' into maint
* rr/precompose-utf8-cleanup:
  precompose-utf8: do not call checks for non-ascii "utf8"
  cleanup precompose_utf8
2012-09-11 11:07:14 -07:00
Junio C Hamano
a795b324b7 Merge branch 'js/compat-mkdir'
Some mkdir(2) implementations do not want to see trailing slash in
its parameter.

* js/compat-mkdir:
  compat: some mkdir() do not like a slash at the end
2012-09-03 15:54:37 -07:00
Junio C Hamano
183154bac8 Merge branch 'rr/precompose-utf8-cleanup'
* rr/precompose-utf8-cleanup:
  precompose-utf8: do not call checks for non-ascii "utf8"
  cleanup precompose_utf8
2012-08-29 14:50:35 -07:00
Junio C Hamano
3f988231ae Merge branch 'bw/maint-1.7.9-solaris-getpass' into maint-1.7.11
* bw/maint-1.7.9-solaris-getpass:
  Enable HAVE_DEV_TTY for Solaris
  terminal: seek when switching between reading and writing
2012-08-24 12:05:11 -07:00
Joachim Schmitz
0539ecfdfc compat: some mkdir() do not like a slash at the end
Introduce a compatibility helper for platforms with such a mkdir().

Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-24 09:48:51 -07:00
Junio C Hamano
9a27f96705 precompose-utf8: do not call checks for non-ascii "utf8"
As suggested by Linus, this function is not checking UTF-8-ness of the
string; it only is seeing if it is pure US-ASCII.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-20 11:12:58 -07:00
Robin Rosenberg
5f8580ad47 cleanup precompose_utf8
- Remove extraneous parentheses and braces;

 - Remove redundant NUL-termination before strcpy();

 - Check result of unlink when probing for decomposed file names;

 - Adjust for the coding style by adding missing whitespaces;

 - Move storage class "static" at the beginning of the decl.

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-17 10:29:56 -07:00
Junio C Hamano
e5acacfb48 Merge branch 'bw/maint-1.7.9-solaris-getpass'
The recent update to terminal I/O interface to get passwords &c
interactively didn't quite work on Solaris.

* bw/maint-1.7.9-solaris-getpass:
  Enable HAVE_DEV_TTY for Solaris
  terminal: seek when switching between reading and writing
2012-08-08 15:14:58 -07:00
Jeff King
67ba123fd1 terminal: seek when switching between reading and writing
When a stdio stream is opened in update mode (e.g., "w+"),
the C standard forbids switching between reading or writing
without an intervening positioning function. Many
implementations are lenient about this, but Solaris libc
will flush the recently-read contents to the output buffer.
In this instance, that meant writing the non-echoed password
that the user just typed to the terminal.

Fix it by inserting a no-op fseek between the read and
write.

The opposite direction (writing followed by reading) is also
disallowed, but our intervening fflush is an acceptable
positioning function for that alternative.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-06 22:11:47 -07:00
Junio C Hamano
b856ad623e Merge branch 'tb/sanitize-decomposed-utf-8-pathname'
Teaches git to normalize pathnames read from readdir(3) and all
arguments from the command line into precomposed UTF-8 (assuming
that they come as decomposed UTF-8) to work around issues on Mac OS.

I think there still are other places that need conversion
(e.g. paths that are read from stdin for some commands), but this
should be a good first step in the right direction.

* tb/sanitize-decomposed-utf-8-pathname:
  git on Mac OS and precomposed unicode
2012-07-13 15:37:51 -07:00
Torsten Bögershausen
76759c7dff git on Mac OS and precomposed unicode
Mac OS X mangles file names containing unicode on file systems HFS+,
VFAT or SAMBA.  When a file using unicode code points outside ASCII
is created on a HFS+ drive, the file name is converted into
decomposed unicode and written to disk. No conversion is done if
the file name is already decomposed unicode.

Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same
result as open("\x41\xcc\x88",...) with a decomposed "Ä".

As a consequence, readdir() returns the file names in decomposed
unicode, even if the user expects precomposed unicode.  Unlike on
HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in
precomposed unicode, but readdir() still returns file names in
decomposed unicode.  When a git repository is stored on a network
share using SAMBA, file names are send over the wire and written to
disk on the remote system in precomposed unicode, but Mac OS X
readdir() returns decomposed unicode to be compatible with its
behaviour on HFS+ and VFAT.

The unicode decomposition causes many problems:

- The names "git add" and other commands get from the end user may
  often be precomposed form (the decomposed form is not easily input
  from the keyboard), but when the commands read from the filesystem
  to see what it is going to update the index with already is on the
  filesystem, readdir() will give decomposed form, which is different.

- Similarly "git log", "git mv" and all other commands that need to
  compare pathnames found on the command line (often but not always
  precomposed form; a command line input resulting from globbing may
  be in decomposed) with pathnames found in the tree objects (should
  be precomposed form to be compatible with other systems and for
  consistency in general).

- The same for names stored in the index, which should be
  precomposed, that may need to be compared with the names read from
  readdir().

NFS mounted from Linux is fully transparent and does not suffer from
the above.

As Mac OS X treats precomposed and decomposed file names as equal,
we can

 - wrap readdir() on Mac OS X to return the precomposed form, and

 - normalize decomposed form given from the command line also to the
   precomposed form,

to ensure that all pathnames used in Git are always in the
precomposed form.  This behaviour can be requested by setting
"core.precomposedunicode" configuration variable to true.

The code in compat/precomposed_utf8.c implements basically 4 new
functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(),
precomposed_utf8_closedir() and precompose_argv().  The first three
are to wrap opendir(3), readdir(3), and closedir(3) functions.

The argv[] conversion allows to use the TAB filename completion done
by the shell on command line.  It tolerates other tools which use
readdir() to feed decomposed file names into git.

When creating a new git repository with "git init" or "git clone",
"core.precomposedunicode" will be set "false".

The user needs to activate this feature manually.  She typically
sets core.precomposedunicode to "true" on HFS and VFAT, or file
systems mounted via SAMBA.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08 22:03:46 -07:00