Commit Graph

496 Commits

Author SHA1 Message Date
Johannes Schindelin
f879ee60f6 Merge pull request #1468 from atetubou/fscache_checkout_flush
checkout.c: enable fscache for checkout again
2018-09-05 09:25:54 -04:00
Takuto Ikuta
931dedf8aa checkout.c: enable fscache for checkout again
This is retry of #1419.

I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.

git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.

Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333

Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136

I confirmed this patch passed all tests in t/ with core_fscache=1.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2018-09-05 09:25:53 -04:00
Johannes Schindelin
0a39dcbdf9 mingw: bump the minimum Windows version to Vista
Quite some time ago, a last plea to the XP users out there who want to
see Windows XP support in Git for Windows, asking them to get engaged
and help, vanished into the depths of the universe.

It is time to codify the ascent by the "silent majority" of XP users,
and mark the minimum Windows version required for Git for Windows as
Windows Vista.

This, incidentally, lets us use quite a few nice new APIs.

This also means that we no longer need the inet_pton() and inet_ntop()
emulation, and we no longer need to do the PROC_ADDR dance with the
`CreateSymbolicLinkW()` function, either.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-09-05 09:25:50 -04:00
Johannes Schindelin
9007ed8c05 mingw: set _WIN32_WINNT explicitly for Git for Windows
Previously, we only ever declared a target Windows version if compiling
with Visual C.

Which meant that we were relying on the MinGW headers to guess which
Windows version we want to target...

Let's be explicit about it, in particular because we actually want to
bump the target Windows version to Vista (which we will do in the next
commit).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-09-05 09:25:50 -04:00
Jeff Hostetler
33c31ed9c1 fscache: make fscache_enabled() public
Make fscache_enabled() function public rather than static.
Remove unneeded fscache_is_enabled() function.
Change is_fscache_enabled() macro to call fscache_enabled().

is_fscache_enabled() now takes a pathname so that the answer
is more precise and mean "is fscache enabled for this pathname",
since fscache only stores repo-relative paths and not absolute
paths, we can avoid attempting lookups for absolute paths.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2018-09-05 09:25:44 -04:00
Jeff Hostetler
c676316c27 dir.c: make add_excludes aware of fscache during status
Teach read_directory_recursive() and add_excludes() to
be aware of optional fscache and avoid trying to open()
and fstat() non-existant ".gitignore" files in every
directory in the worktree.

The current code in add_excludes() calls open() and then
fstat() for a ".gitignore" file in each directory present
in the worktree.  Change that when fscache is enabled to
call lstat() first and if present, call open().

This seems backwards because both lstat needs to do more
work than fstat.  But when fscache is enabled, fscache will
already know if the .gitignore file exists and can completely
avoid the IO calls.  This works because of the lstat diversion
to mingw_lstat when fscache is enabled.

This reduced status times on a 350K file enlistment of the
Windows repo on a NVMe SSD by 0.25 seconds.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2018-09-05 09:25:40 -04:00
Johannes Schindelin
fb2aa5e522 Merge pull request #773 from jeffhostetler/vs2015
Build with VS2015
2018-09-05 09:25:08 -04:00
Johannes Schindelin
c55ed19342 msvc: work around iconv() not setting errno
When compiling with MSVC, we rely on NuPkgs to provide the binaries of
dependencies such as libiconv. The libiconv 1.14.0.11 package available
from https://www.nuget.org/packages/libiconv seems to have a bug where
it does not set errno (when we would expect it to be E2BIG).

Let's simulate the error condition by taking less than 16 bytes
remaining in the out buffer as an indicator that we ran out of space.
While 16 might seem a bit excessive (when converting from, say, any
encoding to UTF-8, 8 bytes should be fine), it is designed to be a safe
margin.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-09-05 09:24:44 -04:00
Jeff Hostetler
7d1f29e520 msvc: update Makefile and compiler settings for VS2015
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2018-09-05 09:24:42 -04:00
Johannes Schindelin
9eb2234001 Windows: add support for a Windows-wide configuration
Between the libgit2 and the Git for Windows project, there has been a
discussion how we could share Git configuration to avoid duplication (or
worse: skew).

Earlier, libgit2 was nice enough to just re-use Git for Windows'

	C:\Program Files (x86)\Git\etc\gitconfig

but with the upcoming Git for Windows 2.x, there would be more paths to
search, as we will have 64-bit and 32-bit versions, and the
corresponding config files will be in %PROGRAMFILES%\Git\mingw64\etc and
...\mingw32\etc, respectively.

Worse: there are portable Git for Windows versions out there which live
in totally unrelated directories, still.

Therefore we came to a consensus to use `%PROGRAMDATA%\Git\config` as the
location for shared Git settings that are of wider interest than just Git
for Windows.

On XP, there is no %PROGRAMDATA%, therefore we need to use
"%ALLUSERSPROFILE%\Application Data\Git\config" in those setups.

Of course, the configuration in `%PROGRAMDATA%\Git\config` has the
widest reach, therefore it must take the lowest precedence, i.e. Git for
Windows can still override settings in its `etc/gitconfig` file.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-09-05 09:24:39 -04:00
Johannes Schindelin
f4477bc96f Merge 'default-ident' into HEAD 2018-09-05 09:24:39 -04:00
Johannes Schindelin
6fc6b2ce32 mingw: use domain information for default email
When a user is registered in a Windows domain, it is really easy to
obtain the email address. So let's do that.

Suggested by Lutz Roeder.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-09-05 09:24:33 -04:00
Johannes Schindelin
95ebbd5ffc Merge 'default-ident' into HEAD 2018-09-05 09:24:33 -04:00
Karsten Blees
b7dcad9fa8 Win32: add a cache below mingw's lstat and dirent implementations
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.

Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.

Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-09-05 09:24:24 -04:00
Karsten Blees
8f155e4a54 add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2018-09-05 09:24:24 -04:00
Johannes Schindelin
9379be6dd6 Allow for platform-specific core.* config settings
In the Git for Windows project, we have ample precendent for config
settings that apply to Windows, and to Windows only.

Let's formalize this concept by introducing a platform_core_config()
function that can be #define'd in a platform-specific manner.

This will allow us to contain platform-specific code better, as the
corresponding variables no longer need to be exported so that they can
be defined in environment.c and be set in config.c

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-09-05 09:24:23 -04:00
Junio C Hamano
13bf260ac7 Merge branch 'js/typofixes'
Comment update.

* js/typofixes:
  remote-curl: remove spurious period
  git-compat-util.h: fix typo
2018-08-20 11:33:50 -07:00
Johannes Schindelin
c70e1b04f6 git-compat-util.h: fix typo
The words "save" and "safe" are both very wonderful words, each with
their own set of meanings. Let's not confuse them with one another save
on occasion of a pun.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-08 09:07:16 -07:00
Jeff King
c8af66ab8a automatically ban strcpy()
There are a few standard C functions (like strcpy) which are
easy to misuse. E.g.:

  char path[PATH_MAX];
  strcpy(path, arg);

may overflow the "path" buffer. Sometimes there's an earlier
constraint on the size of "arg", but even in such a case
it's hard to verify that the code is correct. If the size
really is unbounded, you're better off using a dynamic
helper like strbuf:

  struct strbuf path = STRBUF_INIT;
  strbuf_addstr(path, arg);

or if it really is bounded, then use xsnprintf to show your
expectation (and get a run-time assertion):

  char path[PATH_MAX];
  xsnprintf(path, sizeof(path), "%s", arg);

which makes further auditing easier.

We'd usually catch undesirable code like this in a review,
but there's no automated enforcement. Adding that
enforcement can help us be more consistent and save effort
(and a round-trip) during review.

This patch teaches the compiler to report an error when it
sees strcpy (and will become a model for banning a few other
functions). This has a few advantages over a separate
linting tool:

  1. We know it's run as part of a build cycle, so it's
     hard to ignore. Whereas an external linter is an extra
     step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
     parsing the code anyway.

  3. We know it's robust against false positives (unlike a
     grep-based linter).

The two big disadvantages are:

  1. We'll only check code that is actually compiled, so it
     may miss code that isn't triggered on your particular
     system. But since presumably people don't add new code
     without compiling it (and if they do, the banned
     function list is the least of their worries), we really
     only care about failing to clean up old code when
     adding new functions to the list. And that's easy
     enough to address with a manual audit when adding a new
     function (which is what I did for the functions here).

  2. If this ends up generating false positives, it's going
     to be harder to disable (as opposed to a separate
     linter, which may have mechanisms for overriding a
     particular case).

     But the intent is to only ban functions which are
     obviously bad, and for which we accept using an
     alternative even when this particular use isn't buggy
     (e.g., the xsnprintf alternative above).

The implementation here is simple: we'll define a macro for
the banned function which replaces it with a reference to a
descriptively named but undeclared identifier.  Replacing it
with any invalid code would work (since we just want to
break compilation).  But ideally we'd meet these goals:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The output with this patch looks like this (using gcc 7, on
a checkout with 022d2ac1f3 reverted, which removed the final
strcpy from blame.c):

      CC builtin/blame.o
  In file included from ./git-compat-util.h:1246,
                   from ./cache.h:4,
                   from builtin/blame.c:8:
  builtin/blame.c: In function ‘cmd_blame’:
  ./banned.h:11:22: error: ‘sorry_strcpy_is_a_banned_function’ undeclared (first use in this function)
   #define BANNED(func) sorry_##func##_is_a_banned_function
                        ^~~~~~
  ./banned.h:14:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
                       ^~~~~~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
      strcpy(repeated_meta_color, GIT_COLOR_CYAN);
      ^~~~~~
  ./banned.h:11:22: note: each undeclared identifier is reported only once for each function it appears in
   #define BANNED(func) sorry_##func##_is_a_banned_function
                        ^~~~~~
  ./banned.h:14:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
                       ^~~~~~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
      strcpy(repeated_meta_color, GIT_COLOR_CYAN);
      ^~~~~~

This prominently shows the phrase "strcpy is a banned
function", along with the original callsite in blame.c and
the location of the ban code in banned.h. Which should be
enough to get even a developer seeing this for the first
time pointed in the right direction.

This doesn't match our ideals perfectly, but it's a pretty
good balance. A few alternatives I tried:

  1. Instead of using an undeclared variable, using an
     undeclared function. This shortens the message, because
     the "each undeclared identifier" message is not needed
     (and as you can see above, it triggers a separate
     mention of each of the expansion points).

     But it doesn't actually stop compilation unless you use
     -Werror=implicit-function-declaration in your CFLAGS.
     This is the case for DEVELOPER=1, but not for a default
     build (on the other hand, we'd eventually produce a
     link error pointing to the correct source line with the
     descriptive name).

  2. The linux kernel uses a similar mechanism in its
     BUILD_BUG_ON_MSG(), where they actually declare the
     function but do so with gcc's error attribute. But
     that's not portable to other compilers (and it also
     runs afoul of our error() macro).

     We could make a gcc-specific technique and fallback on
     other compilers, but it's probably not worth the
     complexity. It also isn't significantly shorter than
     the error message shown above.

  3. We could drop the BANNED() macro, which would shorten
     the number of lines in the error. But curiously,
     removing it (and just expanding strcpy directly to the
     bogus identifier) causes gcc _not_ to report the
     original line of code.

So this strategy seems to be an acceptable mix of
information, portability, simplicity, and robustness,
without _too_ much extra clutter. I also tested it with
clang, and it looks as good (actually, slightly less
cluttered than with gcc).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-26 10:12:09 -07:00
Junio C Hamano
50f08db594 Merge branch 'js/use-bug-macro'
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.

* js/use-bug-macro:
  BUG_exit_code: fix sparse "symbol not declared" warning
  Convert remaining die*(BUG) messages
  Replace all die("BUG: ...") calls by BUG() ones
  run-command: use BUG() to report bugs, not die()
  test-tool: help verifying BUG() code paths
2018-05-30 14:04:07 +09:00
Junio C Hamano
7913f53b56 Sync with Git 2.17.1
* maint: (25 commits)
  Git 2.17.1
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  fsck: complain when .gitmodules is a symlink
  index-pack: check .gitmodules files with --strict
  unpack-objects: call fsck_finish() after fscking objects
  fsck: call fsck_finish() after fscking objects
  fsck: check .gitmodules content
  fsck: handle promisor objects in .gitmodules check
  fsck: detect gitmodules files
  fsck: actually fsck blob data
  fsck: simplify ".git" check
  index-pack: make fsck error message more specific
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  ...
2018-05-29 17:10:05 +09:00
Junio C Hamano
30b015bffe Merge branch 'nd/repack-keep-pack'
"git gc" in a large repository takes a lot of time as it considers
to repack all objects into one pack by default.  The command has
been taught to pretend as if the largest existing packfile is
marked with ".keep" so that it is left untouched while objects in
other packs and loose ones are repacked.

* nd/repack-keep-pack:
  pack-objects: show some progress when counting kept objects
  gc --auto: exclude base pack if not enough mem to "repack -ad"
  gc: handle a corner case in gc.bigPackThreshold
  gc: add gc.bigPackThreshold config
  gc: add --keep-largest-pack option
  repack: add --keep-pack option
  t7700: have closing quote of a test at the beginning of line
2018-05-23 14:38:14 +09:00
Junio C Hamano
68f95b26e4 Sync with Git 2.16.4
* maint-2.16:
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:25:26 +09:00
Junio C Hamano
023020401d Sync with Git 2.15.2
* maint-2.15:
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:18:06 +09:00
Junio C Hamano
9e0f06d55d Sync with Git 2.14.4
* maint-2.14:
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:15:14 +09:00
Junio C Hamano
7b01c71b64 Sync with Git 2.13.7
* maint-2.13:
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:10:49 +09:00
Jeff King
41a80924ae skip_prefix: add case-insensitive variant
We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Ramsay Jones
746ea4adc6 BUG_exit_code: fix sparse "symbol not declared" warning
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 18:23:09 +09:00
Junio C Hamano
1ac0ce4d32 Merge branch 'ls/checkout-encoding'
The new "checkout-encoding" attribute can ask Git to convert the
contents to the specified encoding when checking out to the working
tree (and the other way around when checking in).

* ls/checkout-encoding:
  convert: add round trip check based on 'core.checkRoundtripEncoding'
  convert: add tracing for 'working-tree-encoding' attribute
  convert: check for detectable errors in UTF encodings
  convert: add 'working-tree-encoding' attribute
  utf8: add function to detect a missing UTF-16/32 BOM
  utf8: add function to detect prohibited UTF-16/32 BOM
  utf8: teach same_encoding() alternative UTF encoding names
  strbuf: add a case insensitive starts_with()
  strbuf: add xstrdup_toupper()
  strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
2018-05-08 15:59:22 +09:00
Johannes Schindelin
c3c3486b24 Convert remaining die*(BUG) messages
These were not caught by the previous commit, as they did not match the
regular expression.

While at it, remove the localization from one instance: we never want
BUG() messages to be translated, as they target Git developers, not the
end user (hence it would be quite unhelpful to not only burden the
translators, but then even end up with a bug report in a language that
no core Git contributor understands).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 19:06:14 +09:00
Nguyễn Thái Ngọc Duy
9806f5a7bf gc --auto: exclude base pack if not enough mem to "repack -ad"
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.

Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.

gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).

This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Lars Schneider
66b8af3e12 strbuf: add a case insensitive starts_with()
Check in a case insensitive manner if one string is a prefix of another
string.

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-09 10:17:23 -08:00
Brandon Williams
eb78e23f22 wrapper: rename 'template' variables
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-22 10:08:05 -08:00
Christian Couder
afaef55e23 git-compat-util: introduce skip_to_optional_arg()
We often accept both a "--key" option and a "--key=<val>" option.

These options currently are parsed using something like:

if (!strcmp(arg, "--key")) {
	/* do something */
} else if (skip_prefix(arg, "--key=", &arg)) {
	/* do something with arg */
}

which is a bit cumbersome compared to just:

if (skip_to_optional_arg(arg, "--key", &arg)) {
	/* do something with arg */
}

This also introduces skip_to_optional_arg_default() for the few
cases where something different should be done when the first
argument is exactly "--key" than when it is exactly "--key=".

In general it is better for UI consistency and simplicity if
"--key" and "--key=" do the same thing though, so that using
skip_to_optional_arg() should be encouraged compared to
skip_to_optional_arg_default().

Note that these functions can be used to parse any "key=value"
string where "key" is also considered as valid, not just
command line options.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-11 16:10:12 -08:00
Junio C Hamano
14a8168e2f Merge branch 'rj/no-sign-compare'
Many codepaths have been updated to squelch -Wsign-compare
warnings.

* rj/no-sign-compare:
  ALLOC_GROW: avoid -Wsign-compare warnings
  cache.h: hex2chr() - avoid -Wsign-compare warnings
  commit-slab.h: avoid -Wsign-compare warnings
  git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
2017-09-29 11:23:42 +09:00
Ramsay Jones
73560c793a git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 13:00:33 +09:00
Jonathan Tan
5de3de329a git-compat-util: make UNLEAK less error-prone
Commit 0e5bba5 ("add UNLEAK annotation for reducing leak false
positives", 2017-09-08) introduced an UNLEAK macro to be used as
"UNLEAK(var);", but its existing definitions leave semicolons that act
as empty statements, which will lead to syntax errors, e.g.

	if (condition)
		UNLEAK(var);
	else
		something_else(var);

would be broken with two statements between if (condition) and else.
Lose the excess semicolon from the end of the macro replacement text.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-20 15:00:41 +09:00
Jeff King
0e5bba53af add UNLEAK annotation for reducing leak false positives
It's a common pattern in git commands to allocate some
memory that should last for the lifetime of the program and
then not bother to free it, relying on the OS to throw it
away.

This keeps the code simple, and it's fast (we don't waste
time traversing structures or calling free at the end of the
program). But it also triggers warnings from memory-leak
checkers like valgrind or LSAN. They know that the memory
was still allocated at program exit, but they don't know
_when_ the leaked memory stopped being useful. If it was
early in the program, then it's probably a real and
important leak. But if it was used right up until program
exit, it's not an interesting leak and we'd like to suppress
it so that we can see the real leaks.

This patch introduces an UNLEAK() macro that lets us do so.
To understand its design, let's first look at some of the
alternatives.

Unfortunately the suppression systems offered by
leak-checking tools don't quite do what we want. A
leak-checker basically knows two things:

  1. Which blocks were allocated via malloc, and the
     callstack during the allocation.

  2. Which blocks were left un-freed at the end of the
     program (and which are unreachable, but more on that
     later).

Their suppressions work by mentioning the function or
callstack of a particular allocation, and marking it as OK
to leak.  So imagine you have code like this:

  int cmd_foo(...)
  {
	/* this allocates some memory */
	char *p = some_function();
	printf("%s", p);
	return 0;
  }

You can say "ignore allocations from some_function(),
they're not leaks". But that's not right. That function may
be called elsewhere, too, and we would potentially want to
know about those leaks.

So you can say "ignore the callstack when main calls
some_function".  That works, but your annotations are
brittle. In this case it's only two functions, but you can
imagine that the actual allocation is much deeper. If any of
the intermediate code changes, you have to update the
suppression.

What we _really_ want to say is that "the value assigned to
p at the end of the function is not a real leak". But
leak-checkers can't understand that; they don't know about
"p" in the first place.

However, we can do something a little bit tricky if we make
some assumptions about how leak-checkers work. They
generally don't just report all un-freed blocks. That would
report even globals which are still accessible when the
leak-check is run.  Instead they take some set of memory
(like BSS) as a root and mark it as "reachable". Then they
scan the reachable blocks for anything that looks like a
pointer to a malloc'd block, and consider that block
reachable. And then they scan those blocks, and so on,
transitively marking anything reachable from a global as
"not leaked" (or at least leaked in a different category).

So we can mark the value of "p" as reachable by putting it
into a variable with program lifetime. One way to do that is
to just mark "p" as static. But that actually affects the
run-time behavior if the function is called twice (you
aren't likely to call main() twice, but some of our cmd_*()
functions are called from other commands).

Instead, we can trick the leak-checker by putting the value
into _any_ reachable bytes. This patch keeps a global
linked-list of bytes copied from "unleaked" variables. That
list is reachable even at program exit, which confers
recursive reachability on whatever values we unleak.

In other words, you can do:

  int cmd_foo(...)
  {
	char *p = some_function();
	printf("%s", p);
	UNLEAK(p);
	return 0;
  }

to annotate "p" and suppress the leak report.

But wait, couldn't we just say "free(p)"? In this toy
example, yes. But UNLEAK()'s byte-copying strategy has
several advantages over actually freeing the memory:

  1. It's recursive across structures. In many cases our "p"
     is not just a pointer, but a complex struct whose
     fields may have been allocated by a sub-function. And
     in some cases (e.g., dir_struct) we don't even have a
     function which knows how to free all of the struct
     members.

     By marking the struct itself as reachable, that confers
     reachability on any pointers it contains (including those
     found in embedded structs, or reachable by walking
     heap blocks recursively.

  2. It works on cases where we're not sure if the value is
     allocated or not. For example:

       char *p = argc > 1 ? argv[1] : some_function();

     It's safe to use UNLEAK(p) here, because it's not
     freeing any memory. In the case that we're pointing to
     argv here, the reachability checker will just ignore
     our bytes.

  3. Likewise, it works even if the variable has _already_
     been freed. We're just copying the pointer bytes. If
     the block has been freed, the leak-checker will skip
     over those bytes as uninteresting.

  4. Because it's not actually freeing memory, you can
     UNLEAK() before we are finished accessing the variable.
     This is helpful in cases like this:

       char *p = some_function();
       return another_function(p);

     Writing this with free() requires:

       int ret;
       char *p = some_function();
       ret = another_function(p);
       free(p);
       return ret;

     But with unleak we can just write:

       char *p = some_function();
       UNLEAK(p);
       return another_function(p);

This patch adds the UNLEAK() macro and enables it
automatically when Git is compiled with SANITIZE=leak.  In
normal builds it's a noop, so we pay no runtime cost.

It also adds some UNLEAK() annotations to show off how the
feature works. On top of other recent leak fixes, these are
enough to get t0000 and t0001 to pass when compiled with
LSAN.

Note the case in commit.c which actually converts a
strbuf_release() into an UNLEAK. This code was already
non-leaky, but the free didn't do anything useful, since
we're exiting. Converting it to an annotation means that
non-leak-checking builds pay no runtime cost. The cost is
minimal enough that it's probably not worth going on a
crusade to convert these kinds of frees to UNLEAKS. I did it
here for consistency with the "sb" leak (though it would
have been equally correct to go the other way, and turn them
both into strbuf_release() calls).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-08 15:43:17 +09:00
Jonathan Tan
f0e17e86e1 pack: move release_pack_memory()
The function unuse_one_window() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Junio C Hamano
d3b7ee087e Merge branch 'rs/move-array' into maint
Code clean-up.

* rs/move-array:
  ls-files: don't try to prune an empty index
  apply: use COPY_ARRAY and MOVE_ARRAY in update_image()
  use MOVE_ARRAY
  add MOVE_ARRAY
2017-08-23 14:33:46 -07:00
Junio C Hamano
32f90258bd Merge branch 'rs/move-array'
Code clean-up.

* rs/move-array:
  ls-files: don't try to prune an empty index
  apply: use COPY_ARRAY and MOVE_ARRAY in update_image()
  use MOVE_ARRAY
  add MOVE_ARRAY
2017-08-11 13:26:57 -07:00
Junio C Hamano
33400c0e96 Merge branch 'tb/push-to-cygwin-unc-path'
On Cygwin, similar to Windows, "git push //server/share/repository"
ought to mean a repository on a network share that can be accessed
locally, but this did not work correctly due to stripping the double
slashes at the beginning.

This may need to be heavily tested before it gets unleashed to the
wild, as the change is at a fairly low-level code and would affect
not just the code to decide if the push destination is local.  There
may be unexpected fallouts in the path normalization.

* tb/push-to-cygwin-unc-path:
  cygwin: allow pushing to UNC paths
2017-07-18 12:48:09 -07:00
René Scharfe
578398071e add MOVE_ARRAY
Similar to COPY_ARRAY (introduced in 60566cbb58), add a safe and
convenient helper for moving potentially overlapping ranges of array
entries.  It infers the element size, multiplies automatically and
safely to get the size in bytes, does a basic type safety check by
comparing element sizes and unlike memmove(3) it supports NULL
pointers iff 0 elements are to be moved.

Also add a semantic patch to demonstrate the helper's intended usage.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17 14:54:53 -07:00
Torsten Bögershausen
496f256989 cygwin: allow pushing to UNC paths
cygwin can use an UNC path like //server/share/repo

 $ cd //server/share/dir
 $ mkdir test
 $ cd test
 $ git init --bare

 However, when we try to push from a local Git repository to this repo,
 there is a problem: Git converts the leading "//" into a single "/".

 As cygwin handles an UNC path so well, Git can support them better:

 - Introduce cygwin_offset_1st_component() which keeps the leading "//",
   similar to what Git for Windows does.

 - Move CYGWIN out of the POSIX in the tests for path normalization in t0060

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-05 14:01:03 -07:00
Ævar Arnfjörð Bjarmason
481df65f4f git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
Add a FREE_AND_NULL() wrapper marco for the common pattern of freeing
a pointer and assigning NULL to it right afterwards.

The implementation is similar to the (currently unused) XDL_PTRFREE
macro in xdiff/xmacros.h added in commit 3443546f6e ("Use a *real*
built-in diff generator", 2006-03-24). The only difference is that
free() is called unconditionally, see [1].

See [2] for a suggested alternative which does this via a function
instead of a macro. As covered in replies to that message, while it's
a viable approach, it would introduce caveats which this approach
doesn't have, so that potential change is left to a future follow-up
change.

This merely allows us to translate exactly what we're doing now to a
less verbose & idiomatic form using a macro, while guaranteeing that
we don't introduce any functional changes.

1. <alpine.DEB.2.20.1608301948310.129229@virtualbox>
   (http://public-inbox.org/git/alpine.DEB.2.20.1608301948310.129229@virtualbox/)

2. <20170610032143.GA7880@starla>
   (https://public-inbox.org/git/20170610032143.GA7880@starla/)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 14:56:39 -07:00
Junio C Hamano
b9a7d55d93 Merge branch 'nd/fopen-errors'
We often try to open a file for reading whose existence is
optional, and silently ignore errors from open/fopen; report such
errors if they are not due to missing files.

* nd/fopen-errors:
  mingw_fopen: report ENOENT for invalid file names
  mingw: verify that paths are not mistaken for remote nicknames
  log: fix memory leak in open_next_file()
  rerere.c: move error_errno() closer to the source system call
  print errno when reporting a system call error
  wrapper.c: make warn_on_inaccessible() static
  wrapper.c: add and use fopen_or_warn()
  wrapper.c: add and use warn_on_fopen_errors()
  config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
  config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
  clone: use xfopen() instead of fopen()
  use xfopen() in more places
  git_fopen: fix a sparse 'not declared' warning
2017-06-13 13:47:09 -07:00
Junio C Hamano
93dd544f54 Merge branch 'jc/noent-notdir'
Our code often opens a path to an optional file, to work on its
contents when we can successfully open it.  We can ignore a failure
to open if such an optional file does not exist, but we do want to
report a failure in opening for other reasons (e.g. we got an I/O
error, or the file is there, but we lack the permission to open).

The exact errors we need to ignore are ENOENT (obviously) and
ENOTDIR (less obvious).  Instead of repeating comparison of errno
with these two constants, introduce a helper function to do so.

* jc/noent-notdir:
  treewide: use is_missing_file_error() where ENOENT and ENOTDIR are checked
  compat-util: is_missing_file_error()
2017-06-13 13:47:07 -07:00
Junio C Hamano
e350625b68 Merge branch 'bw/forking-and-threading' into maint
The "run-command" API implementation has been made more robust
against dead-locking in a threaded environment.

* bw/forking-and-threading:
  usage.c: drop set_error_handle()
  run-command: restrict PATH search to executable files
  run-command: expose is_executable function
  run-command: block signals between fork and execve
  run-command: add note about forking and threading
  run-command: handle dup2 and close errors in child
  run-command: eliminate calls to error handling functions in child
  run-command: don't die in child when duping /dev/null
  run-command: prepare child environment before forking
  string-list: add string_list_remove function
  run-command: use the async-signal-safe execv instead of execvp
  run-command: prepare command before forking
  t0061: run_command executes scripts without a #! line
  t5550: use write_script to generate post-update hook
2017-06-13 13:27:00 -07:00
Junio C Hamano
7d5e13f652 Merge branch 'bw/forking-and-threading'
The "run-command" API implementation has been made more robust
against dead-locking in a threaded environment.

* bw/forking-and-threading:
  usage.c: drop set_error_handle()
  run-command: restrict PATH search to executable files
  run-command: expose is_executable function
  run-command: block signals between fork and execve
  run-command: add note about forking and threading
  run-command: handle dup2 and close errors in child
  run-command: eliminate calls to error handling functions in child
  run-command: don't die in child when duping /dev/null
  run-command: prepare child environment before forking
  string-list: add string_list_remove function
  run-command: use the async-signal-safe execv instead of execvp
  run-command: prepare command before forking
  t0061: run_command executes scripts without a #! line
  t5550: use write_script to generate post-update hook
2017-05-30 11:16:41 +09:00
Junio C Hamano
dc5a18b364 compat-util: is_missing_file_error()
Our code often opens a path to an optional file, to work on its
contents when we can successfully open it.  We can ignore a failure
to open if such an optional file does not exist, but we do want to
report a failure in opening for other reasons (e.g. we got an I/O
error, or the file is there, but we lack the permission to open).

The exact errors we need to ignore are ENOENT (obviously) and
ENOTDIR (less obvious).  Instead of repeating comparison of errno
with these two constants, introduce a helper function to do so.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-30 09:14:39 +09:00