Commit Graph

693 Commits

Author SHA1 Message Date
Jeff Hostetler
d381fc1d1f fscache: remember not-found directories
Teach FSCACHE to remember "not found" directories.

This is a performance optimization.

FSCACHE is a performance optimization available for Windows.  It
intercepts Posix-style lstat() calls into an in-memory directory
using FindFirst/FindNext.  It improves performance on Windows by
catching the first lstat() call in a directory, using FindFirst/
FindNext to read the list of files (and attribute data) for the
entire directory into the cache, and short-cut subsequent lstat()
calls in the same directory.  This gives a major performance
boost on Windows.

However, it does not remember "not found" directories.  When STATUS
runs and there are missing directories, the lstat() interception
fails to find the parent directory and simply return ENOENT for the
file -- it does not remember that the FindFirst on the directory
failed. Thus subsequent lstat() calls in the same directory, each
re-attempt the FindFirst.  This completely defeats any performance
gains.

This can be seen by doing a sparse-checkout on a large repo and
then doing a read-tree to reset the skip-worktree bits and then
running status.

This change reduced status times for my very large repo by 60%.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:27 +01:00
Jeff Hostetler
e03aa77afa fscache: add key for GIT_TRACE_FSCACHE
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:27 +01:00
Karsten Blees
c82eb49e93 Win32: fix 'lstat("dir/")' with long paths
Use a suffciently large buffer to strip the trailing slash.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-02-14 10:16:23 +01:00
Karsten Blees
fc001bda2b Win32: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

[jes: adjusted test number to avoid conflicts, added support for
chdir(), etc]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:23 +01:00
Karsten Blees
27d2ecf2b0 fscache: load directories only once
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.

On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-02-14 10:16:23 +01:00
Karsten Blees
44e0964aa7 Win32: add a cache below mingw's lstat and dirent implementations
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.

Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.

Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-02-14 10:16:23 +01:00
Karsten Blees
f9b5d0e9c1 add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-02-14 10:16:22 +01:00
Karsten Blees
07e0b821d3 Win32: make the lstat implementation pluggable
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.

Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:22 +01:00
Karsten Blees
664677f7f5 Win32: Make the dirent implementation pluggable
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.

Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.

Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-02-14 10:16:22 +01:00
Karsten Blees
f3fccb2f48 Win32: dirent.c: Move opendir down
Move opendir down in preparation for the next patch.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-02-14 10:16:22 +01:00
Karsten Blees
44b8736e1a Win32: make FILETIME conversion functions public
We will use them in the upcoming "FSCache" patches (to accelerate
sequential lstat() calls).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:22 +01:00
Johannes Schindelin
9c1322578b msvc: avoid debug assertion windows in Debug Mode
For regular debugging, it is pretty helpful when a debug assertion in a
running application triggers a window that offers to start the debugger.

However, when running the test suite, it is not so helpful, in
particular when the debug assertions are then suppressed anyway because
we disable the invalid parameter checking (via invalidcontinue.obj, see
the comment in config.mak.uname about that object for more information).

So let's simply disable that window in Debug Mode (it is already
disabled in Release Mode).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:20 +01:00
Jeff Hostetler
a0426bf03a msvc: support building Git using MS Visual C++
With this patch, Git can be built using the Microsoft toolchain, via:

	make MSVC=1 [DEBUG=1]

Third party libraries are built from source using the open source
"vcpkg" tool set. See https://github.com/Microsoft/vcpkg

On a first build, the vcpkg tools and the third party libraries are
automatically downloaded and built. DLLs for the third party libraries
are copied to the top-level (and t/helper) directory to facilitate
debugging. See compat/vcbuild/README.

A series of .bat files are invoked by the Makefile to find the location
of the installed version of Visual Studio and the associated compiler
tools (essentially replicating the environment setup performed by a
"Developer Command Prompt"). This should find the most recent VS2015 or
VS2017 installation. Output from these scripts are used by the Makefile
to define compiler and linker pathnames and -I and -L arguments.

The build produces .pdb files for both debug and release builds.

Note: This commit was squashed from an organic series of commits
developed between 2016 and 2018 in Git for Windows' `master` branch.
This combined commit eliminates the obsolete commits related to fetching
NuGet packages for third party libraries. It is difficult to use NuGet
packages for C/C++ sources because they may be built by earlier versions
of the MSVC compiler and have CRT version and linking issues.
Additionally, the C/C++ NuGet packages that were using tended to not be
updated concurrently with the sources.  And in the case of cURL and
OpenSSL, this could expose us to security issues.

Helped-by: Yue Lin Ho <b8732003@student.nsysu.edu.tw>
Helped-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:20 +01:00
Jeff Hostetler
dc57acc6ad msvc: do not pretend to support all signals
This special-cases various signals that are not supported on Windows,
such as SIGPIPE. These cause the UCRT to throw asserts (at least in
debug mode).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2019-02-14 10:16:20 +01:00
Philip Oakley
7488f39024 msvc: add pragmas for common warnings
MSVC can be overzealous about some warnings. Disable them.

Signed-off-by: Philip Oakley <philipoakley@iee.org>
2019-02-14 10:16:20 +01:00
Jeff Hostetler
708d43753f msvc: fix detect_msys_tty()
The ntstatus.h header is only available in MINGW.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2019-02-14 10:16:20 +01:00
Jeff Hostetler
5848e73882 msvc: define ftello()
It is just called differently in MSVC's headers.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:19 +01:00
Jeff Hostetler
590244fde7 msvc: do not re-declare the timespec struct
VS2015's headers already declare that struct.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2019-02-14 10:16:19 +01:00
Jeff Hostetler
c3bd6ecc64 msvc: mark a variable as non-const
VS2015 complains when using a const pointer in memcpy()/free().

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2019-02-14 10:16:19 +01:00
Philip Oakley
ba8ca94d52 msvc: define O_ACCMODE
This constant is not defined in MSVC's headers.

In UCRT's fcntl.h, _O_RDONLY, _O_WRONLY and _O_RDWR are defined as 0, 1
and 2, respectively. Yes, that means that UCRT breaks with the tradition
that O_RDWR == O_RDONLY | O_WRONLY.

It is a perfectly legal way to define those constants, though, therefore
we need to take care of defining O_ACCMODE accordingly.

This is particularly important in order to keep our "open() can set
errno to EISDIR" emulation working: it tests that (flags & O_ACCMODE) is
not identical to O_RDONLY before going on to test specifically whether
the file for which open() reported EACCES is, in fact, a directory.

Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:19 +01:00
Philip Oakley
40c774e339 msvc: include sigset_t definition
On MSVC (VS2008) sigset_t is not defined.

Signed-off-by: Philip Oakley <philipoakley@iee.org>
2019-02-14 10:16:19 +01:00
Johannes Schindelin
e2f209c876 mingw: replace mingw_startup() hack
Git for Windows has special code to retrieve the command-line parameters
(and even the environment) in UTF-16 encoding, so that they can be
converted to UTF-8. This is necessary because Git for Windows wants to
use UTF-8 encoded strings throughout its code, and the main() function
does not get the parameters in that encoding.

To do that, we used the __wgetmainargs() function, which is not even a
Win32 API function, but provided by the MINGW "runtime" instead.

Obviously, this method would not work with any other compiler than GCC,
and in preparation for compiling with Visual C++, we would like to avoid
that.

Lucky us, there is a much more elegant way: we simply implement wmain()
and link with -municode. The command-line parameters are passed to
wmain() encoded in UTF-16, as desired, and this method also works with
Visual C++ after adjusting the MSVC linker flags to force it to use
wmain().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:19 +01:00
Johannes Schindelin
291320cca7 obstack: fix compiler warning
MS Visual C suggests that the construct

	condition ? (int) i : (ptrdiff_t) d

is incorrect. Let's fix this by casting to ptrdiff_t also for the
positive arm of the conditional.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 10:16:19 +01:00
Johannes Schindelin
779d16f69e Merge pull request #1915 from dscho/open-in-gdb
Add a helper function to start GDB that was already attached to the current process
2019-02-14 09:40:43 +01:00
Johannes Schindelin
a128276889 Merge pull request #1900 from tanushree27/remove-ipv6-fallback
[Outreachy] Removed ipv6 fallback
2019-02-14 09:40:42 +01:00
Johannes Schindelin
65706ade9b Merge branch 'drive-prefix'
This topic branch allows us to specify absolute paths without the drive
prefix e.g. when cloning.

Example:

	C:\Users\me> git clone https://github.com/git/git \upstream-git

This will clone into a new directory C:\upstream-git, in line with how
Windows interprets absolute paths.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 09:40:41 +01:00
Johannes Schindelin
61571653dc Merge 'mingw-safer-compat-poll'
This was pull request #1003 from shoelzer/master

poll: Use GetTickCount64 to avoid wraparound issues
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 09:40:40 +01:00
Johannes Schindelin
a5073c3da8 mingw: add a helper function to attach GDB to the current process
When debugging Git, the criss-cross spawning of processes can make
things quite a bit difficult, especially when a Unix shell script is
thrown in the mix that calls a `git.exe` that then segfaults.

To help debugging such things, we introduce the `open_in_gdb()` function
which can be called at a code location where the segfault happens (or as
close as one can get); This will open a new MinTTY window with a GDB
that already attached to the current process.

Inspired by Derrick Stolee.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 09:40:38 +01:00
tanushree27
fa2c36bfcd mingw: remove obsolete IPv6-related code
To support IPv6, Git provided fall back functions for Windows versions that
did not support IPv6. However, as Git dropped support for Windows XP and
prior, those functions are not needed anymore.

Removed those fallbacks by reverting commit[1] and using the functions
directly (without 'ipv6_' prefix).

[1] fe3b2b7b82.

Signed-off-by: tanushree27 <tanushreetumane@gmail.com>
2019-02-14 09:40:38 +01:00
Johannes Schindelin
8245c8af63 mingw: allow absolute paths without drive prefix
When specifying an absolute path without a drive prefix, we convert that
path internally. Let's make sure that we handle that case properly, too
;-)

This fixes the command

	git clone https://github.com/git-for-windows/git \G4W

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 09:40:37 +01:00
Johannes Schindelin
fa5a2cec83 poll: lazy-load GetTickCount64()
This fixes the compilation, actually, as we still did not make the jump to
post-Windows XP completely: we still compile with _WIN32_WINNT set to
0x0502 (which corresponds to Windows Server 2003 and is technically
greater than Windows XP's 0x0501).

However, GetTickCount64() is only available starting with Windows
Vista/Windows Server 2008.

Let's just lazy-load the function, which should also help Git for Windows
contributors who want to reinstate Windows XP support.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-02-14 09:40:36 +01:00
Adam Roben
67b990e09f Make non-.exe externals work again
7ebac8cb94 made launching of .exe
externals work when installed in Unicode paths. But it broke launching
of non-.exe externals, no matter where they were installed. We now
correctly maintain the UTF-8 and UTF-16 paths in tandem in lookup_prog.

This fixes t5526, among others.

Signed-off-by: Adam Roben <adam@roben.org>
2019-02-14 09:39:15 +01:00
Adam Roben
3bfb65363d Fix launching of externals from Unicode paths
If Git were installed in a path containing non-ASCII characters,
commands such as git-am and git-submodule, which are implemented as
externals, would fail to launch with the following error:

> fatal: 'am' appears to be a git command, but we were not
> able to execute it. Maybe git-am is broken?

This was due to lookup_prog not being Unicode-aware. It was somehow
missed in 2ee5a1a14a.

Note that the only problem in this function was calling
GetFileAttributes instead of GetFileAttributesW. The calls to access()
were fine because access() is a macro which resolves to mingw_access,
which already handles Unicode correctly. But I changed lookup_prog to
use _waccess directly so that we only convert the path to UTF-16 once.

Signed-off-by: Adam Roben <adam@roben.org>
2019-02-14 09:39:15 +01:00
Junio C Hamano
8593e8a618 Merge branch 'js/mingw-host-cpu'
Windows update.

* js/mingw-host-cpu:
  mingw: use a more canonical method to fix the CPU reporting
2019-02-13 18:18:43 -08:00
Junio C Hamano
1db999ce8d Merge branch 'nd/fileno-may-be-macro'
* nd/fileno-may-be-macro:
  git-compat-util: work around fileno(fp) that is a macro
2019-02-13 18:18:41 -08:00
Johannes Schindelin
bb02e7a560 mingw: use a more canonical method to fix the CPU reporting
In `git version --build-options`, we report also the CPU, but in Git for
Windows we actually cross-compile the 32-bit version in a 64-bit Git for
Windows, so we cannot rely on the auto-detected value.

In 3815f64b0d (mingw: fix CPU reporting in `git version
--build-options`, 2019-02-07), we fixed this by a Windows-only
workaround, making use of magic pre-processor constants, which works in
GCC, but most likely not all C compilers.

As pointed out by Eric Sunshine, there is a better way, anyway: to set
the Makefile variable HOST_CPU explicitly for cross-compiled Git. So
let's do that!

This reverts commit 3815f64b0d partially.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-13 13:46:58 -08:00
Duy Nguyen
18a4f6be6b git-compat-util: work around fileno(fp) that is a macro
On various BSD's, fileno(fp) is implemented as a macro that directly
accesses the fields in the FILE * object, which breaks a function that
accepts a "void *fp" parameter and calls fileno(fp) and expect it to
work.

Work it around by adding a compile-time knob FILENO_IS_A_MACRO that
inserts a real helper function in the middle of the callchain.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-12 10:01:59 -08:00
Junio C Hamano
6951c5fd9f Merge branch 'js/mingw-host-cpu'
Windows update.

* js/mingw-host-cpu:
  mingw: fix CPU reporting in `git version --build-options`
2019-02-08 20:44:53 -08:00
Johannes Schindelin
3815f64b0d mingw: fix CPU reporting in git version --build-options
We cannot rely on `uname -m` in Git for Windows' SDK to tell us what
architecture we are compiling for, as we can compile both 32-bit and
64-bit `git.exe` from a 64-bit SDK, but the `uname -m` in that SDK will
always report `x86_64`.

So let's go back to our original design. And make it explicitly
Windows-specific.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-07 13:05:37 -08:00
Junio C Hamano
57cbc53d3e Merge branch 'js/vsts-ci'
Prepare to run test suite on Azure Pipeline.

* js/vsts-ci: (22 commits)
  test-date: drop unused parameter to getnanos()
  ci: parallelize testing on Windows
  ci: speed up Windows phase
  tests: optionally skip bin-wrappers/
  t0061: workaround issues with --with-dashes and RUNTIME_PREFIX
  tests: add t/helper/ to the PATH with --with-dashes
  mingw: try to work around issues with the test cleanup
  tests: include detailed trace logs with --write-junit-xml upon failure
  tests: avoid calling Perl just to determine file sizes
  README: add a build badge (status of the Azure Pipelines build)
  mingw: be more generous when wrapping up the setitimer() emulation
  ci: use git-sdk-64-minimal build artifact
  ci: add a Windows job to the Azure Pipelines definition
  Add a build definition for Azure DevOps
  ci/lib.sh: add support for Azure Pipelines
  tests: optionally write results as JUnit-style .xml
  test-date: add a subcommand to measure times in shell scripts
  ci: use a junction on Windows instead of a symlink
  ci: inherit --jobs via MAKEFLAGS in run-build-and-tests
  ci/lib.sh: encapsulate Travis-specific things
  ...
2019-02-06 22:05:26 -08:00
Junio C Hamano
0fa3cc77ee Merge branch 'tb/utf-16-le-with-explicit-bom'
A new encoding UTF-16LE-BOM has been invented to force encoding to
UTF-16 with BOM in little endian byte order, which cannot be directly
generated by using iconv.

* tb/utf-16-le-with-explicit-bom:
  Support working-tree-encoding "UTF-16LE-BOM"
2019-02-06 22:05:21 -08:00
Junio C Hamano
f5dd919064 Merge branch 'js/mingw-unc-path-w-backslashes'
In Git for Windows, "git clone \\server\share\path" etc. that uses
UNC paths from command line had bad interaction with its shell
emulation.

* js/mingw-unc-path-w-backslashes:
  mingw: special-case arguments to `sh`
  mingw (t5580): document bug when cloning from backslashed UNC paths
2019-02-05 14:26:13 -08:00
Junio C Hamano
33bea7358f Merge branch 'sg/obstack-cast-function-type-fix'
The compat/obstack code had casts that -Wcast-function-type
compilation option found questionable.

* sg/obstack-cast-function-type-fix:
  compat/obstack: fix -Wcast-function-type warnings
2019-02-05 14:26:11 -08:00
Torsten Bögershausen
aab2a1ae48 Support working-tree-encoding "UTF-16LE-BOM"
Users who want UTF-16 files in the working tree set the .gitattributes
like this:
test.txt working-tree-encoding=UTF-16

The unicode standard itself defines 3 allowed ways how to encode UTF-16.
The following 3 versions convert all back to 'g' 'i' 't' in UTF-8:

a) UTF-16, without BOM, big endian:
$ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

b) UTF-16, with BOM, little endian:
$ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

c) UTF-16, with BOM, big endian:
$ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the
working tree.
After a checkout, the resulting file has a BOM and is encoded in "UTF-16",
in the version (c) above.
This is what iconv generates, more details follow below.

iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE:

d) UTF-16
$ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c
0000000  376 377  \0   g  \0   i  \0   t

e) UTF-16LE
$ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c
0000000    g  \0   i  \0   t  \0

f)  UTF-16BE
$ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c
0000000   \0   g  \0   i  \0   t

There is no way to generate version (b) from above in a Git working tree,
but that is what some applications need.
(All fully unicode aware applications should be able to read all 3 variants,
but in practise we are not there yet).

When producing UTF-16 as an output, iconv generates the big endian version
with a BOM. (big endian is probably chosen for historical reasons).

iconv can produce UTF-16 files with little endianess by using "UTF-16LE"
as encoding, and that file does not have a BOM.

Not all users (especially under Windows) are happy with this.
Some tools are not fully unicode aware and can only handle version (b).

Today there is no way to produce version (b) with iconv (or libiconv).
Looking into the history of iconv, it seems as if version (c) will
be used in all future iconv versions (for compatibility reasons).

Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM".
libiconv can not handle the encoding, so Git pick it up, handles the BOM
and uses libiconv to convert the rest of the stream.
(UTF-16BE-BOM is added for consistency)

Rported-by: Adrián Gimeno Balaguer <adrigibal@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-31 10:27:52 -08:00
Johannes Schindelin
72d63b2f45 mingw: be more generous when wrapping up the setitimer() emulation
Every once in a while, the Azure Pipeline fails with some semi-random

	error: timer thread did not terminate timely

This error message means that the thread that is used to emulate the
setitimer() function did not terminate within 1,000 milliseconds.

The most likely explanation (and therefore the one we should assume to
be true, according to Occam's Razor) is that the timeout of one second
is simply not enough because we try to run so many tasks in parallel.

So let's give it ten seconds instead of only one. That should be enough.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-29 09:26:46 -08:00
Junio C Hamano
9f2eba2b90 Merge branch 'rb/hpe'
Portability updates for the HPE NonStop platform.

* rb/hpe:
  compat/regex/regcomp.c: define intptr_t and uintptr_t on NonStop
  git-compat-util.h: add FLOSS headers for HPE NonStop
  config.mak.uname: support for modern HPE NonStop config.
  transport-helper: drop read/write errno checks
  transport-helper: use xread instead of read
2019-01-18 13:49:54 -08:00
Johannes Schindelin
9e9da23c27 mingw: special-case arguments to sh
The MSYS2 runtime does its best to emulate the command-line wildcard
expansion and de-quoting which would be performed by the calling Unix
shell on Unix systems.

Those Unix shell quoting rules differ from the quoting rules applying to
Windows' cmd and Powershell, making it a little awkward to quote
command-line parameters properly when spawning other processes.

In particular, git.exe passes arguments to subprocesses that are *not*
intended to be interpreted as wildcards, and if they contain
backslashes, those are not to be interpreted as escape characters, e.g.
when passing Windows paths.

Note: this is only a problem when calling MSYS2 executables, not when
calling MINGW executables such as git.exe. However, we do call MSYS2
executables frequently, most notably when setting the use_shell flag in
the child_process structure.

There is no elegant way to determine whether the .exe file to be
executed is an MSYS2 program or a MINGW one. But since the use case of
passing a command line through the shell is so prevalent, we need to
work around this issue at least when executing sh.exe.

Let's introduce an ugly, hard-coded test whether argv[0] is "sh", and
whether it refers to the MSYS2 Bash, to determine whether we need to
quote the arguments differently than usual.

That still does not fix the issue completely, but at least it is
something.

Incidentally, this also fixes the problem where `git clone \\server\repo`
failed due to incorrect handling of the backslashes when handing the path
to the git-upload-pack process.

Further, we need to take care to quote not only whitespace and
backslashes, but also curly brackets. As aliases frequently go through
the MSYS2 Bash, and as aliases frequently get parameters such as
HEAD@{yesterday}, this is really important. As an early version of this
patch broke this, let's make sure that this does not regress by adding a
test case for that.

Helped-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 13:12:14 -08:00
SZEDER Gábor
764473d257 compat/obstack: fix -Wcast-function-type warnings
GCC 8 introduced the new -Wcast-function-type warning, which is
implied by -Wextra (which, in turn is enabled in our DEVELOPER flags).
When building Git with GCC 8 and this warning enabled on a non-glibc
platform [1], one is greeted with a screenful of compiler
warnings/errors:

  compat/obstack.c: In function '_obstack_begin':
  compat/obstack.c:162:17: error: cast between incompatible function types from 'void * (*)(long int)' to 'struct _obstack_chunk * (*)(void *, long int)' [-Werror=cast-function-type]
     h->chunkfun = (struct _obstack_chunk * (*)(void *, long)) chunkfun;
                   ^
  compat/obstack.c:163:16: error: cast between incompatible function types from 'void (*)(void *)' to 'void (*)(void *, struct _obstack_chunk *)' [-Werror=cast-function-type]
     h->freefun = (void (*) (void *, struct _obstack_chunk *)) freefun;
                  ^
  compat/obstack.c:116:8: error: cast between incompatible function types from 'struct _obstack_chunk * (*)(void *, long int)' to 'struct _obstack_chunk * (*)(long int)' [-Werror=cast-function-type]
      : (*(struct _obstack_chunk *(*) (long)) (h)->chunkfun) ((size)))
          ^
  compat/obstack.c:168:22: note: in expansion of macro 'CALL_CHUNKFUN'
     chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size);
                        ^~~~~~~~~~~~~
  <snip>

'struct obstack' stores pointers to two functions to allocate and free
"chunks", and depending on how obstack is used, these functions take
either one parameter (like standard malloc() and free() do; this is
how we use it in 'kwset.c') or two parameters.  Presumably to reduce
memory footprint, a single field is used to store the function pointer
for both signatures, and then it's casted to the appropriate signature
when the function pointer is accessed.  These casts between function
pointers with different number of parameters are what trigger those
compiler errors.

Modify 'struct obstack' to use unions to store function pointers with
different signatures, and then use the union member with the
appropriate signature when accessing these function pointers.  This
eliminates the need for those casts, and thus avoids this compiler
error.

[1] Compiling 'compat/obstack.c' on a platform with glibc is sort of
    a noop, see the comment before '#  define ELIDE_CODE', so this is
    not an issue on common Linux distros.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:13:38 -08:00
Junio C Hamano
25d90d1cb7 Merge branch 'tb/use-common-win32-pathfuncs-on-cygwin'
Cygwin update.

* tb/use-common-win32-pathfuncs-on-cygwin:
  git clone <url> C:\cygwin\home\USER\repo' is working (again)
2019-01-14 15:29:32 -08:00
Randall S. Becker
0bdaacf66f compat/regex/regcomp.c: define intptr_t and uintptr_t on NonStop
The system definition header files on HPE NonStop do not define
intptr_t and uintptr_t as do other platforms. These typedefs
are added specifically wrapped in a __TANDEM ifdef.

Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-03 14:16:20 -08:00