From 65e8141f051401a101567b95860424d94f42ef39 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:11:56 -0500 Subject: [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Our mmap compat code emulates mapping by using malloc/free. Our git_munmap() must take a "length" parameter to match the interface of munmap(), but we don't use it (it is up to the allocator to know how big the block is in free()). Let's mark it as UNUSED to avoid complaints from -Wunused-parameter. Otherwise you cannot build with "make DEVELOPER=1 NO_MMAP=1". Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- compat/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/compat/mmap.c b/compat/mmap.c index 2fe1c7732e..1a118711f7 100644 --- a/compat/mmap.c +++ b/compat/mmap.c @@ -38,7 +38,7 @@ void *git_mmap(void *start, size_t length, int prot, int flags, int fd, off_t of return start; } -int git_munmap(void *start, size_t length) +int git_munmap(void *start, size_t length UNUSED) { free(start); return 0; From 4deb882e54152c31bef23f8b33ad38b7bc26d398 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:06 -0500 Subject: [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps If a bitmap has a name-hash cache, it is an array of 32-bit integers, one per entry in the bitmap, which we've mmap'd from the .bitmap file. We access it directly like this: if (bitmap_git->hashes) hash = get_be32(bitmap_git->hashes + index_pos); That works for both regular pack bitmaps and for non-incremental midx bitmaps. There is one bitmap_index with one "hashes" array, and index_pos is within its bounds (we do the bounds-checking when we load the bitmap). But for an incremental midx bitmap, we have a linked list of bitmap_index structs, and each one has only its own small slice of the name-hash array. If index_pos refers to an object that is not in the first bitmap_git of the chain, then we'll access memory outside of the bounds of its "hashes" array, and often outside of the mmap. Instead, we should walk through the list until we find the bitmap_index which serves our index_pos, and use its hash (after adjusting index_pos to make it relative to the slice we found). This is exactly what we do elsewhere for incremental midx lookups (like the pack_pos_to_midx() call a few lines above). But we can't use existing helpers like midx_for_object() here, because we're walking through the chain of bitmap_index structs (each of which refers to a midx), not the chain of incremental multi_pack_index structs themselves. The problem is triggered in the test suite, but we don't get a segfault because the out-of-bounds index is too small. The OS typically rounds our mmap up to the nearest page size, so we just end up accessing some extra zero'd memory. Nor do we catch it with ASan, since it doesn't seem to instrument mmaps at all. But if we build with NO_MMAP, then our maps are replaced with heap allocations, which ASan does check. And so: make NO_MMAP=1 SANITIZE=address cd t ./t5334-incremental-multi-pack-index.sh does show the problem (and this patch makes it go away). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- pack-bitmap.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index ac6d62b980..9c9aa7b67c 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -212,6 +212,28 @@ static uint32_t bitmap_num_objects(struct bitmap_index *index) return index->pack->num_objects; } +static uint32_t bitmap_name_hash(struct bitmap_index *index, uint32_t pos) +{ + if (bitmap_is_midx(index)) { + while (index && pos < index->midx->num_objects_in_base) { + ASSERT(bitmap_is_midx(index)); + index = index->base; + } + + if (!index) + BUG("NULL base bitmap for object position: %"PRIu32, pos); + + pos -= index->midx->num_objects_in_base; + if (pos >= index->midx->num_objects) + BUG("out-of-bounds midx bitmap object at %"PRIu32, pos); + } + + if (!index->hashes) + return 0; + + return get_be32(index->hashes + pos); +} + static struct repository *bitmap_repo(struct bitmap_index *bitmap_git) { if (bitmap_is_midx(bitmap_git)) @@ -1726,8 +1748,7 @@ static void show_objects_for_type( pack = bitmap_git->pack; } - if (bitmap_git->hashes) - hash = get_be32(bitmap_git->hashes + index_pos); + hash = bitmap_name_hash(bitmap_git, index_pos); show_reach(&oid, object_type, 0, hash, pack, ofs, payload); } @@ -3083,8 +3104,8 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, if (oe) { reposition[i] = oe_in_pack_pos(mapping, oe) + 1; - if (bitmap_git->hashes && !oe->hash) - oe->hash = get_be32(bitmap_git->hashes + index_pos); + if (!oe->hash) + oe->hash = bitmap_name_hash(bitmap_git, index_pos); } } From a9990f8ec0e750a68802f7d79d2aa2293c4811b4 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:13 -0500 Subject: [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Git often uses mmap() to access on-disk files. This leaves a blind spot in our SANITIZE=address builds, since ASan does not seem to handle mmap at all. Nor does the OS notice most out-of-bounds access, since it tends to round up to the nearest page size (so depending on how big the map is, you might have to overrun it by up to 4095 bytes to trigger a segfault). The previous commit demonstrates a memory bug that we missed. We could have made a new test where the out-of-bounds access was much larger, or where the mapped file ended closer to a page boundary. But the point of running the test suite with sanitizers is to catch these problems without having to construct specific tests. Let's enable NO_MMAP for our ASan builds by default, which should give us better coverage. This does increase the memory usage of Git, since we're copying from the filesystem into heap. But the repositories in the test suite tend to be small, so the overhead isn't really noticeable (and ASan already has quite a performance penalty). There are a few other known bugs that this patch will help flush out. However, they aren't directly triggered in the test suite (yet). So it's safe to turn this on now without breaking the test suite, which will help us add new tests to demonstrate those other bugs as we fix them. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- Makefile | 1 + meson.build | 8 +++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 70d1543b6b..c2d327838a 100644 --- a/Makefile +++ b/Makefile @@ -1518,6 +1518,7 @@ SANITIZE_LEAK = YesCompiledWithIt endif ifneq ($(filter address,$(SANITIZERS)),) NO_REGEX = NeededForASAN +NO_MMAP = NeededForASAN SANITIZE_ADDRESS = YesCompiledWithIt endif endif diff --git a/meson.build b/meson.build index 596f5ac711..269769b166 100644 --- a/meson.build +++ b/meson.build @@ -1369,12 +1369,18 @@ if host_machine.system() == 'windows' libgit_c_args += '-DUSE_WIN32_MMAP' else checkfuncs += { - 'mmap' : ['mmap.c'], # provided by compat/mingw.c. 'unsetenv' : ['unsetenv.c'], # provided by compat/mingw.c. 'getpagesize' : [], } + + if get_option('b_sanitize').contains('address') + libgit_c_args += '-DNO_MMAP' + libgit_sources += 'compat/mmap.c' + else + checkfuncs += { 'mmap': ['mmap.c'] } + endif endif foreach func, impls : checkfuncs From c4c9089584d0ed04978e8d0945b2ba2985e67bd3 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:18 -0500 Subject: [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer A cache-tree extension entry in the index looks like this: NUL SPACE NEWLINE where the "_nr" items are human-readable base-10 ASCII. We parse them with strtol(), even though we do not have a NUL-terminated string (we'd generally have an mmap() of the on-disk index file). For a well-formed entry, this is not a problem; strtol() will stop when it sees the newline. But there are two problems: 1. A corrupted entry could omit the newline, causing us to read further. You'd mostly get stopped by seeing non-digits in the oid field (and if it is likewise truncated, there will still be 20 or more bytes of the index checksum). So it's possible, though unlikely, to read off the end of the mmap'd buffer. Of course a malicious index file can fake the oid and the index checksum to all (ASCII) 0's. This is further complicated by the fact that mmap'd buffers tend to be zero-padded up to the page boundary. So to run off the end, the index size also has to be a multiple of the page size. This is also unlikely, though you can construct a malicious index file that matches this. The security implications aren't too interesting. The index file is a local file anyway (so you can't attack somebody by cloning, but only if you convince them to operate in a .git directory you made, at which point attacking .git/config is much easier). And it's just a read overflow via strtol(), which is unlikely to buy you much beyond a crash. 2. ASan has a strict_string_checks option, which tells it to make sure that options to string functions (like strtol) have some eventual NUL, without regard to what the function would actually do (like stopping at a newline here). This option sometimes has false positives, but it can point to sketchy areas (like this one) where the input we use doesn't exhibit a problem, but different input _could_ cause us to misbehave. Let's fix it by just parsing the values ourselves with a helper function that is careful not to go past the end of the buffer. There are a few behavior changes here that should not matter: - We do not consider overflow, as strtol() would. But nor did the original code. However, we don't trust the value we get from the on-disk file, and if it says to read 2^30 entries, we would notice that we do not have that many and bail before reading off the end of the buffer. - Our helper does not skip past extra leading whitespace as strtol() would, but according to gitformat-index(5) there should not be any. - The original quit parsing at a newline or a NUL byte, but now we insist on a newline (which is what the documentation says, and what Git has always produced). Since we are providing our own helper function, we can tweak the interface a bit to make our lives easier. The original code does not use strtol's "end" pointer to find the end of the parsed data, but rather uses a separate loop to advance our "buf" pointer to the trailing newline. We can instead provide a helper that advances "buf" as it parses, letting us read strictly left-to-right through the buffer. I didn't add a new test here. It's surprisingly difficult to construct an index of exactly the right size due to the way we pad entries. But it is easy to trigger the problem in existing tests when using ASan's strict string checking, coupled with a recent change to use NO_MMAP with ASan builds. So: make SANITIZE=address cd t ASAN_OPTIONS=strict_string_checks=1 ./t0090-cache-tree.sh triggers it reliably. Technically it is not deterministic because there is ~8% chance (it's 1-(255/256)^20, or ^32 for sha256) that the trailing checksum hash has a NUL byte in it. But we compute enough cache-trees in the course of that script that we are very likely to hit the problem in one of them. We can look at making strict_string_checks the default for ASan builds, but there are some other cases we'd want to fix first. Reported-by: correctmost Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- cache-tree.c | 50 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 37 insertions(+), 13 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index fa3858e282..2309911dfa 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -548,12 +548,41 @@ void cache_tree_write(struct strbuf *sb, struct cache_tree *root) trace2_region_leave("cache_tree", "write", the_repository); } +static int parse_int(const char **ptr, unsigned long *len_p, int *out) +{ + const char *s = *ptr; + unsigned long len = *len_p; + int ret = 0; + int sign = 1; + + while (len && *s == '-') { + sign *= -1; + s++; + len--; + } + + while (len) { + if (!isdigit(*s)) + break; + ret *= 10; + ret += *s - '0'; + s++; + len--; + } + + if (s == *ptr) + return -1; + + *ptr = s; + *len_p = len; + *out = sign * ret; + return 0; +} + static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) { const char *buf = *buffer; unsigned long size = *size_p; - const char *cp; - char *ep; struct cache_tree *it; int i, subtree_nr; const unsigned rawsz = the_hash_algo->rawsz; @@ -569,19 +598,14 @@ static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) buf++; size--; it = cache_tree(); - cp = buf; - it->entry_count = strtol(cp, &ep, 10); - if (cp == ep) + if (parse_int(&buf, &size, &it->entry_count) < 0) goto free_return; - cp = ep; - subtree_nr = strtol(cp, &ep, 10); - if (cp == ep) + if (!size || *buf != ' ') goto free_return; - while (size && *buf && *buf != '\n') { - size--; - buf++; - } - if (!size) + buf++; size--; + if (parse_int(&buf, &size, &subtree_nr) < 0) + goto free_return; + if (!size || *buf != '\n') goto free_return; buf++; size--; if (0 <= it->entry_count) { From 0b6ec075df2ac77a4792b8b1a7290a36b636012b Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:20 -0500 Subject: [PATCH 5/9] fsck: assert newline presence in fsck_ident() The fsck code purports to handle buffers that are not NUL-terminated, but fsck_ident() uses some string functions. This works OK in practice, as explained in 8e4309038f (fsck: do not assume NUL-termination of buffers, 2023-01-19). Before calling fsck_ident() we'll have called verify_headers(), which makes sure we have at least a trailing newline. And none of our string-like functions will walk past that newline. However, that makes this code at the top of fsck_ident() very confusing: *ident = strchrnul(*ident, '\n'); if (**ident == '\n') (*ident)++; We should always see that newline, or our memory safety assumptions have been violated! Further, using strchrnul() is weird, since the whole point is that if the newline is not there, we don't necessarily have a NUL at all, and might read off the end of the buffer. So let's have callers pass in the boundary of our buffer, which lets us safely find the newline with memchr(). And if it is not there, this is a BUG(), because it means our caller did not validate the input with verify_headers() as it was supposed to (and we are better off bailing rather than having memory-safety problems). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- fsck.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fsck.c b/fsck.c index 8dc8472ceb..3d1027dc87 100644 --- a/fsck.c +++ b/fsck.c @@ -859,16 +859,18 @@ static int verify_headers(const void *data, unsigned long size, FSCK_MSG_UNTERMINATED_HEADER, "unterminated header"); } -static int fsck_ident(const char **ident, +static int fsck_ident(const char **ident, const char *ident_end, const struct object_id *oid, enum object_type type, struct fsck_options *options) { const char *p = *ident; + const char *nl; char *end; - *ident = strchrnul(*ident, '\n'); - if (**ident == '\n') - (*ident)++; + nl = memchr(p, '\n', ident_end - p); + if (!nl) + BUG("verify_headers() should have made sure we have a newline"); + *ident = nl + 1; if (*p == '<') return report(options, oid, type, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email"); @@ -957,7 +959,7 @@ static int fsck_commit(const struct object_id *oid, author_count = 0; while (buffer < buffer_end && skip_prefix(buffer, "author ", &buffer)) { author_count++; - err = fsck_ident(&buffer, oid, OBJ_COMMIT, options); + err = fsck_ident(&buffer, buffer_end, oid, OBJ_COMMIT, options); if (err) return err; } @@ -969,7 +971,7 @@ static int fsck_commit(const struct object_id *oid, return err; if (buffer >= buffer_end || !skip_prefix(buffer, "committer ", &buffer)) return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line"); - err = fsck_ident(&buffer, oid, OBJ_COMMIT, options); + err = fsck_ident(&buffer, buffer_end, oid, OBJ_COMMIT, options); if (err) return err; if (memchr(buffer_begin, '\0', size)) { @@ -1064,7 +1066,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer, goto done; } else - ret = fsck_ident(&buffer, oid, OBJ_TAG, options); + ret = fsck_ident(&buffer, buffer_end, oid, OBJ_TAG, options); if (buffer < buffer_end && !starts_with(buffer, "\n")) { /* From 830424def4896dbf041d41dad873f5b86fdf9bfa Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:23 -0500 Subject: [PATCH 6/9] fsck: avoid strcspn() in fsck_ident() We may be operating on a buffer that is not NUL-terminated, but we use strcspn() to parse it. This is OK in practice, as discussed in 8e4309038f (fsck: do not assume NUL-termination of buffers, 2023-01-19), because we know there is at least a trailing newline in our buffer, and we always pass "\n" to strcspn(). So we know it will stop before running off the end of the buffer. But this is a subtle point to hang our memory safety hat on. And it confuses ASan's strict_string_checks mode, even though it is technically a false positive (that mode complains that we have no NUL, which is true, but it does not know that we have verified the presence of the newline already). Let's instead open-code the loop. As a bonus, this makes the logic more obvious (to my mind, anyway). The current code skips forward with strcspn until it hits "<", ">", or "\n". But then it must check which it saw to decide if that was what we expected or not, duplicating some logic between what's in the strcspn() and what's in the domain logic. Instead, we can just check each character as we loop and act on it immediately. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- fsck.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/fsck.c b/fsck.c index 3d1027dc87..3696f1b849 100644 --- a/fsck.c +++ b/fsck.c @@ -874,18 +874,30 @@ static int fsck_ident(const char **ident, const char *ident_end, if (*p == '<') return report(options, oid, type, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email"); - p += strcspn(p, "<>\n"); - if (*p == '>') - return report(options, oid, type, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name"); - if (*p != '<') - return report(options, oid, type, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email"); + for (;;) { + if (p >= ident_end || *p == '\n') + return report(options, oid, type, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email"); + if (*p == '>') + return report(options, oid, type, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name"); + if (*p == '<') + break; /* end of name, beginning of email */ + + /* otherwise, skip past arbitrary name char */ + p++; + } if (p[-1] != ' ') return report(options, oid, type, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing space before email"); - p++; - p += strcspn(p, "<>\n"); - if (*p != '>') - return report(options, oid, type, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email"); - p++; + p++; /* skip past '<' we found */ + for (;;) { + if (p >= ident_end || *p == '<' || *p == '\n') + return report(options, oid, type, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email"); + if (*p == '>') + break; /* end of email */ + + /* otherwise, skip past arbitrary email char */ + p++; + } + p++; /* skip past '>' we found */ if (*p != ' ') return report(options, oid, type, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, "invalid author/committer line - missing space before date"); p++; From f05df7ffca492b37d604ad6beed788055eb56ebd Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:25 -0500 Subject: [PATCH 7/9] fsck: remove redundant date timestamp check After calling "parse_timestamp(p, &end, 10)", we complain if "p == end", which would imply that we did not see any digits at all. But we know this cannot be the case, since we would have bailed already if we did not see any digits, courtesy of extra checks added by 8e4309038f (fsck: do not assume NUL-termination of buffers, 2023-01-19). Since then, checking "p == end" is redundant and we can drop it. This will make our lives a little easier as we refactor further. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- fsck.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fsck.c b/fsck.c index 3696f1b849..98b16a9e58 100644 --- a/fsck.c +++ b/fsck.c @@ -919,7 +919,7 @@ static int fsck_ident(const char **ident, const char *ident_end, return report(options, oid, type, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date"); if (date_overflows(parse_timestamp(p, &end, 10))) return report(options, oid, type, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow"); - if ((end == p || *end != ' ')) + if (*end != ' ') return report(options, oid, type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date"); p = end + 1; if ((*p != '+' && *p != '-') || From 5a993593b24df699f60841296795f9a6ca60d399 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:28 -0500 Subject: [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated In fsck_ident(), we parse the timestamp with parse_timestamp(), which is really an alias for strtoumax(). But since our buffer may not be NUL-terminated, this can trigger a complaint from ASan's strict_string_checks mode. This is a false positive, since we know that the buffer contains a trailing newline (which we checked earlier in the function), and that strtoumax() would stop there. But it is worth working around ASan's complaint. One is because that will let us turn on strict_string_checks by default, which has helped catch other real problems. And two is that the safety of the current code is very hard to reason about (it subtly depends on distant code which could change). One option here is to just parse the number left-to-right ourselves. But we care about the size of a timestamp_t and detecting overflow, since that's part of the point of these checks. And doing that correctly is tricky. So we'll instead just pull the digits into a separate, NUL-terminated buffer, and use that to call parse_timestamp(). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- fsck.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fsck.c b/fsck.c index 98b16a9e58..ec45f786d6 100644 --- a/fsck.c +++ b/fsck.c @@ -859,13 +859,28 @@ static int verify_headers(const void *data, unsigned long size, FSCK_MSG_UNTERMINATED_HEADER, "unterminated header"); } +static timestamp_t parse_timestamp_from_buf(const char **start, const char *end) +{ + const char *p = *start; + char buf[24]; /* big enough for 2^64 */ + size_t i = 0; + + while (p < end && isdigit(*p)) { + if (i >= ARRAY_SIZE(buf) - 1) + return TIME_MAX; + buf[i++] = *p++; + } + buf[i] = '\0'; + *start = p; + return parse_timestamp(buf, NULL, 10); +} + static int fsck_ident(const char **ident, const char *ident_end, const struct object_id *oid, enum object_type type, struct fsck_options *options) { const char *p = *ident; const char *nl; - char *end; nl = memchr(p, '\n', ident_end - p); if (!nl) @@ -917,11 +932,11 @@ static int fsck_ident(const char **ident, const char *ident_end, "invalid author/committer line - bad date"); if (*p == '0' && p[1] != ' ') return report(options, oid, type, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date"); - if (date_overflows(parse_timestamp(p, &end, 10))) + if (date_overflows(parse_timestamp_from_buf(&p, ident_end))) return report(options, oid, type, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow"); - if (*end != ' ') + if (*p != ' ') return report(options, oid, type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date"); - p = end + 1; + p++; if ((*p != '+' && *p != '-') || !isdigit(p[1]) || !isdigit(p[2]) || From a031b6181a1e1ee6768d19d6a03b031b6e9004e9 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:30 -0500 Subject: [PATCH 9/9] t: enable ASan's strict_string_checks option ASan has an option to enable strict string checking, where any pointer passed to a function that expects a NUL-terminated string will be checked for that NUL termination. This can sometimes produce false positives. E.g., it is not wrong to pass a buffer with { '1', '2', '\n' } into strtoul(). Even though it is not NUL-terminated, it will stop at the newline. But in trying it out, it identified two problematic spots in our test suite (which have now been adjusted): 1. The strtol() parsing in cache-tree.c was a real potential problem, which would have been very hard to find otherwise (since it required constructing a very specific broken index file). 2. The use of string functions in fsck_ident() were false positives, because we knew that there was always a trailing newline which would stop the functions from reading off the end of the buffer. But the reasoning behind that is somewhat fragile, and silencing those complaints made the code easier to reason about. So even though this did not find any earth-shattering bugs, and even had a few false positives, I'm sufficiently convinced that its complaints are more helpful than hurtful. Let's turn it on by default (since the test suite now runs cleanly with it) and see if it ever turns up any other instances. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/test-lib.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/t/test-lib.sh b/t/test-lib.sh index 92d0db13d7..bbda7abb16 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -77,6 +77,7 @@ prepend_var GIT_SAN_OPTIONS : strip_path_prefix="$GIT_BUILD_DIR/" # want that one to complain to stderr). prepend_var ASAN_OPTIONS : $GIT_SAN_OPTIONS prepend_var ASAN_OPTIONS : detect_leaks=0 +prepend_var ASAN_OPTIONS : strict_string_checks=1 export ASAN_OPTIONS prepend_var LSAN_OPTIONS : $GIT_SAN_OPTIONS