Commit Graph

82994 Commits

Author SHA1 Message Date
Karsten Blees
cb03313c10 Win32: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

Note that the test cannot rely on the presence of short names, as they
are not enabled by default except on the system drive.

[jes: adjusted test number to avoid conflicts, reinstated && chain,
adjusted test to work without short names]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-24 15:00:30 +02:00
Johannes Schindelin
dbdd8969d3 Win32: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

[jes: adjusted test number to avoid conflicts]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-24 15:00:30 +02:00
Doug Kelly
18329fa74d Add a test demonstrating a problem with long submodule paths
[jes: adusted test number to avoid conflicts, fixed non-portable use of
the 'export' statement]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-24 15:00:30 +02:00
Karsten Blees
1040fdd452 fscache: load directories only once
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.

On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-05-24 13:37:17 +02:00
Karsten Blees
9399af3e19 Win32: add a cache below mingw's lstat and dirent implementations
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.

Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.

Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-05-24 13:37:17 +02:00
Karsten Blees
0ca2da05d3 add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-05-24 13:37:17 +02:00
Karsten Blees
9c48bb1ed3 Win32: make the lstat implementation pluggable
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.

Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-05-24 13:37:17 +02:00
Karsten Blees
54f5b93df9 Win32: Make the dirent implementation pluggable
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.

Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.

Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-05-24 13:37:17 +02:00
Karsten Blees
e77049fed9 Win32: dirent.c: Move opendir down
Move opendir down in preparation for the next patch.

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-05-24 13:37:17 +02:00
Karsten Blees
aefabecfd1 Win32: make FILETIME conversion functions public
Signed-off-by: Karsten Blees <blees@dcon.de>
2018-05-24 13:37:17 +02:00
Johannes Schindelin
674fb4862e mingw: unset PERL5LIB by default
Git for Windows ships with its own Perl interpreter, and insists on
using it, so it will most likely wreak havoc if PERL5LIB is set before
launching Git.

Let's just unset that environment variables when spawning processes.

To make this feature extensible (and overrideable), there is a new
config setting `core.unsetenvvars` that allows specifying a
comma-separated list of names to unset before spawning processes.

Reported by Gabriel Fuhrmann.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-24 13:37:17 +02:00
Johannes Schindelin
dfbcb1e5a7 Move Windows-specific config settings into compat/mingw.c
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-24 13:37:17 +02:00
Johannes Schindelin
a96907096d Allow for platform-specific core.* config settings
In the Git for Windows project, we have ample precendent for config
settings that apply to Windows, and to Windows only.

Let's formalize this concept by introducing a platform_core_config()
function that can be #define'd in a platform-specific manner.

This will allow us to contain platform-specific code better, as the
corresponding variables no longer need to be exported so that they can
be defined in environment.c and be set in config.c

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-24 13:37:17 +02:00
Johannes Schindelin
0f51f6afe3 config: rename dummy parameter to cb in git_default_config()
This is the convention elsewhere (and prepares for the case where we may
need to pass callback data).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-24 13:37:17 +02:00
Johannes Schindelin
8a8c9484c3 Start the merging-rebase to v2.17.1
This commit starts the rebase of 74d8ecfd37 to 6b125952395
2018-05-24 13:37:12 +02:00
Johannes Schindelin
4a158e75a4 fixup! Win32: symlink: add support for symlinks to directories
Fix typo.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-18 13:28:25 +02:00
Johannes Schindelin
915c2bc0ff fixup! mingw: be *very* wary about outside environment changes
Fix typo.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-18 13:28:09 +02:00
Johannes Schindelin
ae246e6327 fixup! Win32: symlink: add support for symlinks to directories
When opening a symbolic link's target, we must take into account that
the symbolic link itself might live in a directory other than the
current one, and that the target may be relative.

Reported by Ricky Roesler.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-18 12:41:10 +02:00
Junio C Hamano
a9693e7806 Git 2.17.1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 13:57:51 +09:00
Junio C Hamano
5a8c71da80 Merge branch 'jk/submodule-name-verify-fsck' into maint
* jk/submodule-name-verify-fsck:
  fsck: complain when .gitignore and .gitattributes are symlinks
  fsck: complain when .gitmodules is a symlink
  index-pack: check .gitmodules files with --strict
  unpack-objects: call fsck_finish() after fscking objects
  fsck: call fsck_finish() after fscking objects
  fsck: check .gitmodules content
  fsck: handle promisor objects in .gitmodules check
  fsck: detect gitmodules files
  fsck: actually fsck blob data
  fsck: simplify ".git" check
  index-pack: make fsck error message more specific
2018-05-18 13:47:28 +09:00
Junio C Hamano
47aa802379 Sync with Git 2.16.4
* maint-2.16:
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules, etc
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-18 13:46:53 +09:00
Junio C Hamano
558c52bf48 Git 2.16.4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 13:38:19 +09:00
Junio C Hamano
95adc822e6 Sync with Git 2.15.2
* maint-2.15:
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules, etc
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-18 13:36:44 +09:00
Junio C Hamano
aabcf7eeb8 Git 2.15.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 13:29:07 +09:00
Junio C Hamano
4375a6b2d9 Sync with Git 2.14.4
* maint-2.14:
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules, etc
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-18 13:26:51 +09:00
Junio C Hamano
37ad15fded Git 2.14.4 2018-05-18 13:09:22 +09:00
Junio C Hamano
13f57c5daa Sync with Git 2.13.7
* maint-2.13:
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules, etc
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-18 12:52:09 +09:00
Junio C Hamano
fd5a7c532f Git 2.13.7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 12:49:35 +09:00
Junio C Hamano
0d084b175e Merge branch 'jk/submodule-name-verify-fix' into maint-2.13
* jk/submodule-name-verify-fix:
  verify_path: disallow symlinks in .gitmodules, etc
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-18 12:35:02 +09:00
Jeff King
2baa638590 fsck: complain when .gitignore and .gitattributes are symlinks
This case is already forbidden by verify_path(), so let's
check it in fsck. It's easier to handle than .gitmodules,
because we don't care about checking the blob content. This
is really just about whether the name and mode for the tree
entry are valid.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:11:33 -07:00
Jeff King
425db067d2 fsck: complain when .gitmodules is a symlink
We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:09:36 -07:00
Jeff King
2a22dac050 index-pack: check .gitmodules files with --strict
Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:

  1. We currently don't bother calling fsck_object() at all
     for blobs, since it has traditionally been a noop. We'd
     actually catch these blobs in fsck_finish() at the end,
     but it's more efficient to check them when we already
     have the object loaded in memory.

  2. The second pass done by fsck_finish() needs to access
     the objects, but we're actually indexing the pack in
     this process. In theory we could give the fsck code a
     special callback for accessing the in-pack data, but
     it's actually quite tricky:

       a. We don't have an internal efficient index mapping
	  oids to packfile offsets. We only generate it on
	  the fly as part of writing out the .idx file.

       b. We'd still have to reconstruct deltas, which means
          we'd basically have to replicate all of the
	  reading logic in packfile.c.

     Instead, let's avoid running fsck_finish() until after
     we've written out the .idx file, and then just add it
     to our internal packed_git list.

     This does mean that the objects are "in the repository"
     before we finish our fsck checks. But unpack-objects
     already exhibits this same behavior, and it's an
     acceptable tradeoff here for the same reason: the
     quarantine mechanism means that pushes will be
     fully protected.

In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.

This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:04:56 -07:00
Jeff King
7fbb4d553f unpack-objects: call fsck_finish() after fscking objects
As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.

This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.

This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:04:56 -07:00
Jeff King
6db6dbfb66 fsck: call fsck_finish() after fscking objects
Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.

With this, we can now catch the malicious case in t7415 with
git-fsck.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:04:56 -07:00
Jeff King
d93d55d482 fsck: check .gitmodules content
This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).

As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:04:50 -07:00
Jeff King
38433731ad fsck: handle promisor objects in .gitmodules check
If we have a tree that points to a .gitmodules blob but
don't have that blob, we can't check its contents. This
produces an fsck error when we encounter it.

But in the case of a promisor object, this absence is
expected, and we must not complain.  Note that this can
technically circumvent our transfer.fsckObjects check.
Imagine a client fetches a tree, but not the matching
.gitmodules blob. An fsck of the incoming objects will show
that we don't have enough information. Later, we do fetch
the actual blob. But we have no idea that it's a .gitmodules
file.

The only ways to get around this would be to re-scan all of
the existing trees whenever new ones enter (which is
expensive), or to somehow persist the gitmodules_found set
between fsck runs (which is complicated).

In practice, it's probably OK to ignore the problem. Any
repository which has all of the objects (including the one
serving the promisor packs) can perform the checks. Since
promisor packs are inherently about a hierarchical topology
in which clients rely on upstream repositories, those
upstream repositories can protect all of their downstream
clients from broken objects.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:03:21 -07:00
Jeff King
4a9e4c2e71 fsck: detect gitmodules files
In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.

Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.

In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:

  1. git-fsck may show us the objects in arbitrary order
     (loose objects are fed in sha1 order, but we may also
     have multiple packs, and we process each pack fully in
     sequence).

  2. The type ordering is just what git-pack-objects happens
     to write now. The pack format does not require a
     specific order, and it's possible that future versions
     of Git (or a custom version trying to fool official
     Git's fsck checks!) may order it differently.

  3. We may not even be fscking all of the relevant objects
     at once. Consider pushing with transfer.fsckObjects,
     where one push adds a blob at path "foo", and then a
     second push adds the same blob at path ".gitmodules".
     The blob is not part of the second push at all, but we
     need to mark and check it.

So in the general case, we need to make up to three passes
over the objects: once to make sure we've seen all commits,
then once to cover any trees we might have missed, and then
a final pass to cover any .gitmodules blobs we found in the
second pass.

We can simplify things a bit by loosening the requirement
that we find .gitmodules only at root trees. Technically
a file like "subdir/.gitmodules" is not parsed by Git, but
it's not unreasonable for us to declare that Git is aware of
all ".gitmodules" files and make them eligible for checking.
That lets us drop the root-tree requirement, which
eliminates one pass entirely. And it makes our worst case
much better: instead of potentially queueing every root tree
to be re-examined, the worst case is that we queue each
unique .gitmodules blob for a second look.

This patch just adds the boilerplate to find .gitmodules
files. The actual content checks will come in a subsequent
commit.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 18:03:19 -07:00
Jeff King
7075cedbcc fsck: actually fsck blob data
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:

  1. The fsck_object() function just returns success for any
     blob. Let's a noop fsck_blob(), which we can fill in
     with actual logic later.

  2. The fsck_loose() function in builtin/fsck.c
     just threw away blob content after loading it. Let's
     hold onto it until after we've called fsck_object().

     The easiest way to do this is to just drop the
     parse_loose_object() helper entirely. Incidentally,
     this also fixes a memory leak: if we successfully
     loaded the object data but did not parse it, we would
     have left the function without freeing it.

  3. When fsck_loose() loads the object data, it
     does so with a custom read_loose_object() helper. This
     function streams any blobs, regardless of size, under
     the assumption that we're only checking the sha1.

     Instead, let's actually load blobs smaller than
     big_file_threshold, as the normal object-reading
     code-paths would do. This lets us fsck small files, and
     a NULL return is an indication that the blob was so big
     that it needed to be streamed, and we can pass that
     information along to fsck_blob().

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 17:59:40 -07:00
Johannes Schindelin
b911995991 Merge branch 'colorize-push-errors'
To help users discern large chunks of white text (when the push
succeeds) from large chunks of white text (when the push fails), let's
add some color to the latter.

This closes https://github.com/git-for-windows/git/pull/1429 and fixes
https://github.com/git-for-windows/git/issues/1422

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-17 23:31:58 +02:00
Johannes Schindelin
e6c415e4e1 Document the new color.* settings to colorize push errors/hints
Let's make it easier for users to find out how to customize these colors.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-17 23:22:37 +02:00
Johannes Schindelin
0a99d05884 Add a test to verify that push errors are colorful
This actually only tests whether the push errors/hints are colored if
the respective color.* config settings are `always`, but in the regular
case they default to `auto` (in which case we color the messages when
stderr is connected to an interactive terminal), therefore these tests
should suffice.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-17 23:22:37 +02:00
Ryan Dammrose
6379dc63ad push: colorize errors
This is an attempt to resolve an issue I experience with people that are
new to Git -- especially colleagues in a team setting -- where they miss
that their push to a remote location failed because the failure and
success both return a block of white text.

An example is if I push something to a remote repository and then a
colleague attempts to push to the same remote repository and the push
fails because it requires them to pull first, but they don't notice
because a success and failure both return a block of white text. They
then continue about their business, thinking it has been successfully
pushed.

This patch colorizes the errors and hints (in red and yellow,
respectively) so whenever there is a failure when pushing to a remote
repository that fails, it is more noticeable.

[jes: fixed a couple bugs, added the color.{advice,push,transport}
settings, refactored to use want_color_stderr().]

Signed-off-by: Ryan Dammrose ryandammrose@gmail.com
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-17 23:21:58 +02:00
Johannes Schindelin
3823662341 color: introduce support for colorizing stderr
So far, we only ever asked whether stdout wants to be colorful. In the
upcoming patches, we will want to make push errors more prominent, which
are printed to stderr, though.

So let's refactor the want_color() function into a want_color_fd()
function (which expects to be called with fd == 1 or fd == 2 for stdout
and stderr, respectively), and then define the macro `want_color()` to
use the want_color_fd() function.

And then also add a macro `want_color_stderr()`, for convenience and
for documentation.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-17 23:11:26 +02:00
Johannes Schindelin
6c1fc6a76b Merge branch 'dj/runtime-prefix'
Two more commits made it into the dj/runtime-prefix branch before being
merged into core Git's `master`. Let's take those two, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-05-17 23:04:49 +02:00
Jonathan Nieder
8ead2533e8 Makefile: quote $INSTLIBDIR when passing it to sed
f6a0ad4b (Makefile: generate Perl header from template file,
2018-04-10) moved code for generating the 'use lib' lines at the top
of perl scripts from the $(SCRIPT_PERL_GEN) rule to a separate
GIT-PERL-HEADER rule.

This rule first populates INSTLIBDIR and then substitutes it into the
GIT-PERL-HEADER using sed:

	INSTLIBDIR=... something ...
	sed -e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' $< > $@

Because $INSTLIBDIR is not surrounded by double quotes, the shell
splits it at each space, causing errors if INSTLIBDIR contains an $IFS
character:

 sed: 1: "s=@@INSTLIBDIR@@=/usr/l ...": unescaped newline inside substitute pattern

Add back the missing double-quotes to make it work again.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-17 23:00:45 +02:00
Jonathan Nieder
27b491972e Makefile: remove unused @@PERLLIBDIR@@ substitution variable
Junio noticed that this variable is not quoted correctly when it is
passed to sed.  As a shell-quoted string, it should be inside
single-quotes like $(perllibdir_relative_SQ), not outside them like
$INSTLIBDIR.

In fact, this substitution variable is not used.  Simplify by removing
it.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-17 23:00:45 +02:00
Jeff King
60d36c3717 fsck: simplify ".git" check
There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 12:17:32 -07:00
Jeff King
8ae9fe61a3 index-pack: make fsck error message more specific
If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 12:17:32 -07:00
Jeff King
97f0d2bce3 Merge branch 'jk/submodule-name-verify-fix' into jk/submodule-name-verify-fsck
* jk/submodule-name-verify-fix:
  verify_path: disallow symlinks in .gitmodules, etc
  update-index: stat updated files earlier
  verify_path: drop clever fallthrough
  skip_prefix: add icase-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  path: match NTFS short names for more .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths

Note that this includes two bits of evil-merge:

 - there's a new call to verify_path() that doesn't actually
   have a mode available. It should be OK to pass "0" here,
   since we're just manipulating the untracked cache, not an
   actual index entry.

 - the lstat() in builtin/update-index.c:update_one() needs
   to be updated to handle the fsmonitor case (without this
   it still behaves correctly, but does an unnecessary
   lstat).
2018-05-17 12:15:47 -07:00
Jeff King
8423fd8281 verify_path: disallow symlinks in .gitmodules, etc
There are a few reasons it's not a good idea to make
.git* files symlinks, including:

  1. It won't be portable to systems without symlinks.

  2. It may behave inconsistently, since Git internally may
     look at these files from the index or a tree without
     bothering to resolve any symbolic links. So it may work
     in some settings (where we read from the filesystem)
     but not in others).

With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow symlinked meta-files:

  a. A symlinked .gitmodules file may circumvent any fsck
     checks of the content.

  b. Git may read and write from the on-disk file without
     sanity checking the symlink target. So for example, if
     you link ".gitmodules" to "../oops" and run "git
     submodule add", we'll write to the file "oops" outside
     the repository.

Again, both of those are problems that _could_ be solved
with sufficient code, but given the current inconsistent
behavior and unportability, we're better off just outlawing
it explicitly.

We'll give the same treatment to .gitmodules, .gitignore,
and .gitattributes. The latter two cannot be used to write
outside the repository (we write them only as part of a
checkout, where we are careful not to follow any symlinks).
But they can still cause a "git clone && git log"
combination to read arbitrary files outside the filesystem.
There's _probably_ nothing too harmful you can do with that,
but it seems questionable (and anyway, they suffer from the
same portability and consistency problems).

Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-17 11:22:05 -07:00