global/git

mirror of https://github.com/git/git.git synced 2026-04-13 02:10:06 +02:00

Author	SHA1	Message	Date
Junio C Hamano	183ea3eabf	Merge branch 'ps/mingw-rename' The MinGW compatibility layer has been taught to support POSIX semantics for atomic renames when other process(es) have a file opened at the destination path. * ps/mingw-rename: compat/mingw: support POSIX semantics for atomic renames compat/mingw: allow deletion of most opened files compat/mingw: share file handles created via `CreateFileW()`	2024-11-13 08:35:34 +09:00
Junio C Hamano	486c9d3995	Merge branch 'jt/commit-graph-missing' A regression where commit objects missing from a commit-graph can cause an infinite loop when doing a fetch in a partial clone has been fixed. * jt/commit-graph-missing: fetch-pack: die if in commit graph but not obj db Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"	2024-11-13 08:35:33 +09:00
Junio C Hamano	51ba601160	Merge branch 'en/shallow-exclude-takes-a-ref-fix' The "--shallow-exclude=<ref>" option to various history transfer commands takes a ref, not an arbitrary revision. * en/shallow-exclude-takes-a-ref-fix: doc: correct misleading descriptions for --shallow-exclude upload-pack: fix ambiguous error message	2024-11-13 08:35:32 +09:00
Junio C Hamano	110c8fe8f5	Merge branch 'ak/t1016-style' Test modernization. * ak/t1016-style: t1016: clean up style	2024-11-13 08:35:32 +09:00
Junio C Hamano	6890c99e38	Merge branch 'ps/leakfixes-part-9' More leakfixes. * ps/leakfixes-part-9: (22 commits) list-objects-filter-options: work around reported leak on error builtin/merge: release output buffer after performing merge dir: fix leak when parsing "status.showUntrackedFiles" t/helper: fix leaking buffer in "dump-untracked-cache" t/helper: stop re-initialization of `the_repository` sparse-index: correctly free EWAH contents dir: release untracked cache data combine-diff: fix leaking lost lines builtin/tag: fix leaking key ID on failure to sign transport-helper: fix leaking import/export marks builtin/commit: fix leaking cleanup config trailer: fix leaking strbufs when formatting trailers trailer: fix leaking trailer values builtin/commit: fix leaking change data contents upload-pack: fix leaking URI protocols pretty: clear signature check diff-lib: fix leaking diffopts in `do_diff_cache()` revision: fix leaking bloom filters builtin/grep: fix leak with `--max-count=0` grep: fix leak in `grep_splice_or()` ...	2024-11-13 08:35:31 +09:00
Junio C Hamano	02a2d5706d	Merge branch 'jk/left-right-bitmap' When called with '--left-right' and '--use-bitmap-index', 'rev-list' will produce output without any left/right markers, which has been corrected. * jk/left-right-bitmap: rev-list: skip bitmap traversal for --left-right	2024-11-08 12:56:28 +09:00
Junio C Hamano	1ee7dbde67	Merge branch 'ps/upgrade-clar' Buildfix and upgrade of Clar to a newer version. * ps/upgrade-clar: cmake: set up proper dependencies for generated clar headers cmake: fix compilation of clar-based unit tests Makefile: extract script to generate clar declarations Makefile: adjust sed command for generating "clar-decls.h" t/unit-tests: update clar to 206accb	2024-11-08 12:56:28 +09:00
Patrick Steinhardt	391bceae43	compat/mingw: support POSIX semantics for atomic renames By default, Windows restricts access to files when those files have been opened by another process. As explained in the preceding commits, these restrictions can be loosened such that reads, writes and/or deletes of files with open handles _are_ allowed. While we set up those sharing flags in most relevant code paths now, we still don't properly handle POSIX-style atomic renames in case the target path is open. This is failure demonstrated by t0610, where one of our tests spawns concurrent writes in a reftable-enabled repository and expects all of them to succeed. This test fails most of the time because the process that has acquired the "tables.list" lock is unable to rename it into place while other processes are busy reading that file. Windows 10 has introduced the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag that allows us to fix this usecase [1]. When set, it is possible to rename a file over a preexisting file even when the target file still has handles open. Those handles must have been opened with the `FILE_SHARE_DELETE` flag, which we have ensured in the preceding commits. Careful readers might have noticed that [1] does not mention the above flag, but instead mentions `FILE_RENAME_POSIX_SEMANTICS`. This flag is not for use with `SetFileInformationByHandle()` though, which is what we use. And while the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag exists, it is not documented on [2] or anywhere else as far as I can tell. Unfortunately, we still support Windows systems older than Windows 10 that do not yet have this new flag. Our `_WIN32_WINNT` SDK version still targets 0x0600, which is Windows Vista and later. And even though that Windows version is out-of-support, bumping the SDK version all the way to 0x0A00, which is Windows 10 and later, is not an option as it would make it impossible to compile on Windows 8.1, which is still supported. Instead, we have to manually declare the relevant infrastructure to make this feature available and have fallback logic in place in case we run on a Windows version that does not yet have this flag. On another note: `mingw_rename()` has a retry loop that is used in case deleting a file failed because it's still open in another process. One might be pressed to not use this loop anymore when we can use POSIX semantics. But unfortunately, we have to keep it around due to our dependence on the `FILE_SHARE_DELETE` flag. While we know to set that sharing flag now, other applications may not do so and may thus still cause sharing violations when we try to rename a file. This fixes concurrent writes in the reftable backend as demonstrated in t0610, but may also end up fixing other usecases where Git wants to perform renames. [1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_rename_information [2]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_rename_info Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-06 00:15:25 -08:00
Jonathan Tan	5d4cc78f72	fetch-pack: die if in commit graph but not obj db When fetching, there is a step in which sought objects are first checked against the local repository; only objects that are not in the local repository are then fetched. This check first looks up the commit graph file, and returns "present" if the object is in there. However, the action of first looking up the commit graph file is not done everywhere in Git, especially if the type of the object at the time of lookup is not known. This means that in a repo corruption situation, a user may encounter an "object missing" error, attempt to fetch it, and still encounter the same error later when they reattempt their original action, because the object is present in the commit graph file but not in the object DB. Therefore, make it a fatal error when this occurs. (Note that we cannot proceed to include this object in the list of objects to be fetched without changing at least the fetch negotiation code: what would happen is that the client will send "want X" and "have X" and when I tested at $DAYJOB with a work server that uses JGit, the server reasonably returned an empty packfile. And changing the fetch negotiation code to only use the object DB when deciding what to report as "have" would be an unnecessary slowdown, I think.) This was discovered when a lazy fetch of a missing commit completed with nothing actually fetched, and the writing of the commit graph file after every fetch then attempted to read said missing commit, triggering a lazy fetch of said missing commit, resulting in an infinite loop with no user-visible indication (until they check the list of processes running on their computer). With this fix, there is no infinite loop. Note that although the repo corruption we discovered was caused by a bug in GC in a partial clone, the behavior that this patch teaches Git to warn about applies to any repo with commit graph enabled and with a missing commit, whether it is a partial clone or not. t5330, introduced in `3a1ea94a49` (commit-graph.c: no lazy fetch in lookup_commit_in_graph(), 2022-07-01), tests that an interaction between fetch and the commit graph does not cause an infinite loop. This patch changes the exit code in that situation, so that test had to be changed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-05 18:57:22 -08:00
Patrick Steinhardt	c810549be1	list-objects-filter-options: work around reported leak on error This one is a little bit more curious. In t6112, we have a test that exercises the `git rev-list --filter` option with invalid filters. We execute git-rev-list(1) via `test_must_fail`, which means that we check for leaks even though Git exits with an error code. This causes the following leak: Direct leak of 27 byte(s) in 1 object(s) allocated from: #0 0x5555555e6946 in realloc.part.0 lsan_interceptors.cpp.o #1 0x5555558fb4b6 in xrealloc wrapper.c:137:8 #2 0x5555558b6e06 in strbuf_grow strbuf.c:112:2 #3 0x5555558b7550 in strbuf_add strbuf.c:311:2 #4 0x5555557c1a88 in strbuf_addstr strbuf.h:310:2 #5 0x5555557c1d4c in parse_list_objects_filter list-objects-filter-options.c:261:3 #6 0x555555885ead in handle_revision_pseudo_opt revision.c:2899:3 #7 0x555555884e20 in setup_revisions revision.c:3014:11 #8 0x5555556c4b42 in cmd_rev_list builtin/rev-list.c:588:9 #9 0x5555555ec5e3 in run_builtin git.c:483:11 #10 0x5555555eb1e4 in handle_builtin git.c:749:13 #11 0x5555555ec001 in run_argv git.c:819:4 #12 0x5555555eaf94 in cmd_main git.c:954:19 #13 0x5555556fd569 in main common-main.c:64:11 #14 0x7ffff7ca714d in __libc_start_call_main (.../lib/libc.so.6+0x2a14d) #15 0x7ffff7ca7208 in __libc_start_main@GLIBC_2.2.5 (.../libc.so.6+0x2a208) #16 0x5555555ad064 in _start (git+0x59064) This leak is valid, as we call `die()` and do not clean up the memory at all. But what's curious is that this is the only leak reported, because we don't clean up any other allocated memory, either, and I have no idea why the leak sanitizer treats this buffer specially. In any case, we can work around the leak by shuffling things around a bit. Instead of calling `gently_parse_list_objects_filter()` and dying after we have modified the filter spec, we simply do so beforehand. Like this we don't allocate the buffer in the error case, which makes the reported leak go away. It's not pretty, but it manages to make t6112 leak free. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:57 -08:00
Patrick Steinhardt	ff67083ccd	builtin/merge: release output buffer after performing merge The `obuf` member of `struct merge_options` is used to buffer output in some cases. In order to not discard its allocated memory we only release its contents in `merge_finalize()` when we're not currently recursing into a subtree. This results in some situations where we seemingly do not release the buffer reliably. We thus have calls to `strbuf_release()` for this buffer scattered across the codebase. But we're missing one callsite in git-merge(1), which causes a memory leak. We should ideally refactor this interface so that callers don't have to know about any such internals. But for now, paper over the issue by adding one more `strbuf_release()` call. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:57 -08:00
Patrick Steinhardt	813b12b6f7	dir: fix leak when parsing "status.showUntrackedFiles" We use `repo_config_get_string()` to read "status.showUntrackedFiles" from the config subsystem. This function allocates the result, but we never free the result after parsing it. The value never leaves the scope of the calling function, so refactor it to instead use `repo_config_get_string_tmp()`, which does not hand over ownership to the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:56 -08:00
Patrick Steinhardt	0bc0fcf0b2	t/helper: fix leaking buffer in "dump-untracked-cache" We never release the local `struct strbuf base` buffer, thus leaking memory. Fix this leak. This leak is exposed by t7063, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:56 -08:00
Patrick Steinhardt	a53144cf1b	t/helper: stop re-initialization of `the_repository` While "common-main.c" already initializes `the_repository` for us, we do so a second time in the "read-cache" test helper. This causes a memory leak because the old repository's contents isn't released. Stop calling `initialize_repository()` to plug this leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:56 -08:00
Patrick Steinhardt	1981d1eb3e	combine-diff: fix leaking lost lines The `cnt` variable tracks the number of lines in a patch diff. It can happen though that there are no newlines, in which case we'd still end up allocating our array of `sline`s. In fact, we always allocate it with `cnt + 2` entries: one extra entry for the deletion hunk at the end, and another entry that we don't seem to ever populate at all but acts as a kind of sentinel value. When we loop through the array to clear it at the end of this function we only loop until `lno < cnt`, and thus we may not end up releasing whatever the two extra `sline`s contain. While that shouldn't matter for the sentinel value, it does matter for the extra deletion hunk sline. Regardless of that, plug this memory leak by releasing both extra entries, which makes the logic a bit easier to reason about. While at it, fix the formatting of a local comment, which incidentally also provides the necessary context for why we overallocate the `sline` array. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:55 -08:00
Patrick Steinhardt	d06e3ec858	builtin/tag: fix leaking key ID on failure to sign We do not free the key ID when signing a tag fails. Do so by using the common exit path. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:55 -08:00
Patrick Steinhardt	1a99173de0	transport-helper: fix leaking import/export marks Fix leaking import and export marks for transport helpers. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:54 -08:00
Patrick Steinhardt	6ef9f77a15	builtin/commit: fix leaking cleanup config The cleanup string set by the config is leaking when it is being overridden by an option. Fix this by tracking these via two separate variables such that we can free the old value. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:54 -08:00
Patrick Steinhardt	ff31b7b941	trailer: fix leaking strbufs when formatting trailers When formatting trailer lines we iterate through each of the trailers and munge their respective token/value pairs according to the trailer options. When formatting a trailer that has its `item->token` pointer set we perform the munging in two local buffers. In the case where we figure out that the value is empty and `trim_empty` is set we just skip over the trailer item. But the buffers are local to the loop and we don't release their contents, leading to a memory leak. Plug this leak by lifting the buffers outside of the loop and releasing them on function return. This fixes the memory leaks, but also optimizes the loop as we don't have to reallocate the buffers on every single iteration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:54 -08:00
Patrick Steinhardt	d34b5cbf02	builtin/commit: fix leaking change data contents While we free the worktree change data, we never free its contents. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:53 -08:00
Patrick Steinhardt	3b373150c8	upload-pack: fix leaking URI protocols We don't clear `struct upload_pack::uri_protocols`, which causes a memory leak. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:53 -08:00
Patrick Steinhardt	0b20a28811	pretty: clear signature check The signature check in the formatting context is never getting released. Fix this to plug the resulting memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:53 -08:00
Patrick Steinhardt	8dd3cb4b45	diff-lib: fix leaking diffopts in `do_diff_cache()` In `do_diff_cache()` we initialize a new `rev_info` and then overwrite its `diffopt` with a user-provided set of options. This can leak memory because `repo_init_revisions()` may end up allocating memory for the `diffopt` itself depending on the configuration. And since that field is overwritten we won't ever free it. Plug the memory leak by releasing the diffopts before we overwrite them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:52 -08:00
Patrick Steinhardt	e29ff075e0	revision: fix leaking bloom filters The memory allocated by `prepare_to_use_bloom_filter()` is not released by `release_revisions()`, causing a memory leak. Plug it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:52 -08:00
Patrick Steinhardt	43fedde3df	builtin/grep: fix leak with `--max-count=0` When executing with `--max-count=0` we'll return early from git-grep(1) without performing any cleanup, which causes memory leaks. Plug these. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:52 -08:00
Patrick Steinhardt	ee3e8c3afa	t/helper: fix leaks in "reach" test tool The "reach" test tool doesn't bother to clean up any of its allocated resources, causing various leaks. Plug them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:51 -08:00
Elijah Newren	5a875ff7fb	upload-pack: fix ambiguous error message upload-pack.c takes any --shallow-exclude argument(s) from clone/fetch/etc. and passes them through expand_ref(). If it does not get back exactly one ref from the call to expand_ref(), it will die with the following error: fatal: git upload-pack: ambiguous deepen-not: %s Given that the documentation suggests to users that --shallow-exclude accepts a revision rather than a ref (which will be corrected in a subsequent commit), users may try to pass a revision. In such a case, expand_ref() will return 0 matches, but the error message we print will be misleading since "ambiguous" suggests there are multiple matches. Provide a clearer error message for such a case. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 17:20:21 -08:00
Andrew Kreimer	a1fb77fcb8	t1016: clean up style Adhere to Documentation/CodingGuidelines: - Whitespace and redirect operator. - Case arms indentation. - Tabs for indentation. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-03 15:30:04 -08:00
Taylor Blau	1c5a712f26	Merge branch 'jk/dumb-http-finalize' The dumb-http code regressed when the result of re-indexing a pack yielded an .idx file that differs in content from the .idx file it downloaded from the remote. This has been corrected by no longer relying on the .idx file we got from the remote. jk/dumb-http-finalize: packfile: use oidread() instead of hashcpy() to fill object_id packfile: use object_id in find_pack_entry_one() packfile: convert find_sha1_pack() to use object_id http-walker: use object_id instead of bare hash packfile: warn people away from parse_packed_git() packfile: drop sha1_pack_index_name() packfile: drop sha1_pack_name() packfile: drop has_pack_index() dumb-http: store downloaded pack idx as tempfile t5550: count fetches in "previously-fetched .idx" test midx: avoid duplicate packed_git entries	2024-11-01 12:53:32 -04:00
Taylor Blau	aebc4bd8ce	Merge branch 'ak/more-typofixes' More typofixes. * ak/more-typofixes: t: fix typos	2024-11-01 12:53:29 -04:00
Taylor Blau	43ac23945c	Merge branch 'rs/grep-lookahead' Fix 'git grep' regression on macOS by disabling lookahead when encountering invalid UTF-8 byte sequences. * rs/grep-lookahead: grep: disable lookahead on error	2024-11-01 12:53:28 -04:00
Taylor Blau	81a5461518	Merge branch 'ak/t1016-cleanup' Test cleanup. * ak/t1016-cleanup: t1016: clean up style	2024-11-01 12:53:27 -04:00
Taylor Blau	59dc0ab83c	Merge branch 'ua/atoi' Replace various calls to atoi() with strtol_i() and strtoul_ui(), and add improved error handling. * ua/atoi: imap: replace atoi() with strtol_i() for UIDVALIDITY and UIDNEXT parsing merge: replace atoi() with strtol_i() for marker size validation daemon: replace atoi() with strtoul_ui() and strtol_i()	2024-11-01 12:53:26 -04:00
Taylor Blau	20ab7fa3b6	Merge branch 'sa/notes-edit' Teach 'git notes add' and 'git notes append' a new '-e' flag, instructing them to open the note in $GIT_EDITOR before saving. * sa/notes-edit: notes: teach the -e option to edit messages in editor	2024-11-01 12:53:25 -04:00
Taylor Blau	8237b49ade	Merge branch 'sk/t9101-cleanup' Test cleanup. * sk/t9101-cleanup: t9101: ensure no whitespace after redirect	2024-11-01 12:53:24 -04:00
Taylor Blau	07c6066f82	Merge branch 'kh/mv-breakage' Demonstrate an assertion failure in 'git mv'. * kh/mv-breakage: t7001: add failure test which triggers assertion	2024-11-01 12:53:21 -04:00
Taylor Blau	a524cc77ad	Merge branch 'ua/t3404-cleanup' Test update. * ua/t3404-cleanup: t3404: replace test with test_line_count() t3404: avoid losing exit status with focus on `git show` and `git cat-file`	2024-11-01 12:53:18 -04:00
Taylor Blau	268fd2fe58	Merge branch 'ps/platform-compat-fixes' Various platform compatibility fixes split out of the larger effort to use Meson as the primary build tool. * ps/platform-compat-fixes: t6006: fix prereq handling with `test_format ()` http: fix build error on FreeBSD builtin/credential-cache: fix missing parameter for stub function t7300: work around platform-specific behaviour with long paths on MinGW t5500, t5601: skip tests which exercise paths with '[::1]' on Cygwin t3404: work around platform-specific behaviour on macOS 10.15 t1401: make invocation of tar(1) work with Win32-provided one t/lib-gpg: fix setup of GNUPGHOME in MinGW t/lib-gitweb: test against the build version of gitweb t/test-lib: wire up NO_ICONV prerequisite t/test-lib: fix quoting of TEST_RESULTS_SAN_FILE	2024-11-01 12:53:17 -04:00
Jeff King	16a186fede	rev-list: skip bitmap traversal for --left-right Running: git rev-list --left-right --use-bitmap-index one...two will produce output without any left-right markers, since the bitmap traversal returns only a single set of reachable commits. Instead we should refuse to use bitmaps here and produce the correct output using a traditional traversal. This is probably not the only remaining option that misbehaves with bitmaps, but it's particularly egregious in that it feels like it _could_ work. Doing two separate traversals for the left/right sides and then taking the symmetric set differences should yield the correct answer, but our traversal code doesn't know how to do that. It's not clear if naively doing two separate traversals would always be a performance win. A traditional traversal only needs to walk down to the merge base, but bitmaps always fill out the full reachability set. So depending on your bitmap coverage, we could end up walking old bits of history twice to fill out the same uninteresting bits on both sides. We'd also of course end up with a very large --boundary set, if the user asked for that. So this might or might not be something worth implementing later. But for now, let's make sure we don't produce the wrong answer if somebody tries it. The test covers this, but also the same thing with "--count" (which is what I originally tried in a real-world case). Ironically the try_bitmap_count() code already realizes that "--left-right" won't work there. But that just causes us to fall back to the regular bitmap traversal code, which itself doesn't handle counting (we produce a list of objects rather than a count). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-11-01 11:02:27 -04:00
Taylor Blau	6aa984af3b	Merge branch 'sk/t7011-cleanup' Test cleanup. * sk/t7011-cleanup: t7011: ensure no whitespace after redirect	2024-10-30 13:08:07 -04:00
Taylor Blau	8d305fdaac	Merge branch 'co/t6050-pipefix' Avoid losing exit status by having Git command being tested on the upstream side of a pipe. * co/t6050-pipefix: t6050: avoid pipes with upstream Git commands	2024-10-30 13:08:06 -04:00
Taylor Blau	cec4461e3f	Merge branch 'ks/t4205-fixup' Testfix. * ks/t4205-fixup: t4205: fix typo in 'NUL termination with --stat'	2024-10-30 13:08:05 -04:00
Taylor Blau	bc627658b0	Merge branch 'ps/reftable-strbuf' Implements a new reftable-specific strbuf replacement to reduce reftable's dependency on Git-specific data structures. * ps/reftable-strbuf: reftable: handle trivial `reftable_buf` errors reftable/stack: adapt `stack_filename()` to handle allocation failures reftable/record: adapt `reftable_record_key()` to handle allocation failures reftable/stack: adapt `format_name()` to handle allocation failures t/unit-tests: check for `reftable_buf` allocation errors reftable/blocksource: adapt interface name reftable: convert from `strbuf` to `reftable_buf` reftable/basics: provide new `reftable_buf` interface reftable: stop using `strbuf_addf()` reftable: stop using `strbuf_addbuf()`	2024-10-30 13:08:01 -04:00
Patrick Steinhardt	dd6003f200	t6006: fix prereq handling with `test_format ()` In `df383b5842` (t/test-lib: wire up NO_ICONV prerequisite, 2024-10-16) we have introduced a new NO_ICONV prerequisite that makes us skip tests in case Git is not compiled with support for iconv. This change subtly broke t6006: while the test suite still passes, some of its tests won't execute because they run into an error. ./t6006-rev-list-format.sh: line 92: test_expect_%e: command not found The broken tests use `test_format ()`, and the mentioned commit simply prepended the new prerequisite to its arguments. But that does not work, as the function is not aware of prereqs at all and will now treat all of its arguments incorrectly. Fix this by making the function aware of prereqs by accepting an optional fourth argument. Adapt the callsites accordingly. Reported-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-28 13:44:38 -04:00
Jeff King	479ab76c9f	packfile: use object_id in find_pack_entry_one() The main function we use to search a pack index for an object is find_pack_entry_one(). That function still takes a bare pointer to the hash, despite the fact that its underlying bsearch_pack() function needs an object_id struct. And so we end up making an extra copy of the hash into the struct just to do a lookup. As it turns out, all callers but one already have such an object_id. So we can just take a pointer to that struct and use it directly. This avoids the extra copy and provides a more type-safe interface. The one exception is get_delta_base() in packfile.c, when we are chasing a REF_DELTA from inside the pack (and thus we have a pointer directly to the mmap'd pack memory, not a struct). We can just bump the hashcpy() from inside find_pack_entry_one() to this one caller that needs it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	63aca3f7f1	dumb-http: store downloaded pack idx as tempfile This patch fixes a regression in `b1b8dfde69` (finalize_object_file(): implement collision check, 2024-09-26) where fetching a v1 pack idx file over the dumb-http protocol would cause the fetch to fail. The core of the issue is that dumb-http stores the idx we fetch from the remote at the same path that will eventually hold the idx we generate from "index-pack --stdin". The sequence is something like this: 0. We realize we need some object X, which we don't have locally, and nor does the other side have it as a loose object. 1. We download the list of remote packs from objects/info/packs. 2. For each entry in that file, we download each pack index and store it locally in .git/objects/pack/pack-$hash.idx (the $hash is not something we can verify yet and is given to us by the remote). 3. We check each pack index we got to see if it has object X. When we find a match, we download the matching .pack file from the remote to a tempfile. We feed that to "index-pack --stdin", which reindexes the pack, rather than trusting that it has what the other side claims it does. In most cases, this will end up generating the exact same (byte-for-byte) pack index which we'll store at the same pack-$hash.idx path, because the index generation and $hash id are computed based on what's in the packfile. But: a. The other side might have used other options to generate the index. For instance we use index v2 by default, but long ago it was v1 (and you can still ask for v1 explicitly). b. The other side might even use a different mechanism to determine $hash. E.g., long ago it was based on the sorted list of objects in the packfile, but we switched to using the pack checksum in `1190a1acf8` (pack-objects: name pack files after trailer hash, 2013-12-05). The regression we saw in the real world was (3a). A recent client fetching from a server with a v1 index downloaded that index, then complained about trying to overwrite it with its own v2 index. This collision is otherwise harmless; we know we want to replace the remote version with our local one, but the collision check doesn't realize that. There are a few options to fix it: - we could teach index-pack a command-line option to ignore only pack idx collisions, and use it when the dumb-http code invokes index-pack. This would be an awkward thing to expose users to and would involve a lot of boilerplate to get the option down to the collision code. - we could delete the remote .idx file right before running index-pack. It should be redundant at that point (since we've just downloaded the matching pack). But it feels risky to delete something from our own .git/objects based on what the other side has said. I'm not entirely positive that a malicious server couldn't lie about which pack-$hash.idx it has and get us to delete something precious. - we can stop co-mingling the downloaded idx files in our local objects directory. This is a slightly bigger change but I think fixes the root of the problem more directly. This patch implements the third option. The big design questions are: where do we store the downloaded files, and how do we manage their lifetimes? There are some additional quirks to the dumb-http system we should consider. Remember that in step 2 we downloaded every pack index, but in step 3 we may only download some of the matching packs. What happens to those other idx files now? They sit in the .git/objects/pack directory, possibly waiting to be used at a later date. That may save bandwidth for a subsequent fetch, but it also creates a lot of weird corner cases: - our local object directory now has semi-untrusted .idx files sitting around, without their matching .pack - in case 3b, we noted that we might not generate the same hash as the other side. In that case even if we download the matching pack, our index-pack invocation will store it in a different pack-$hash.idx file. And the unmatched .idx will sit there forever. - if the server repacks, it may delete the old packs. Now we have these orphaned .idx files sitting around locally that will never be used (nor deleted). - if we repack locally we may delete our local version of the server's pack index and not realize we have it. So we'll download it again, even though we have all of the objects it mentions. I think the right solution here is probably some more complex cache management system: download the remote .idx files to their own storage directory, mark them as "seen" when we get their matching pack (to avoid re-downloading even if we repack), and then delete them when the server's objects/info/refs no longer mentions them. But since the dumb http protocol is so ancient and so inferior to the smart http protocol, I don't think it's worth spending a lot of time creating such a system. For this patch I'm just downloading the idx files to .git/objects/tmp_pack_*, and marking them as tempfiles to be deleted when we exit (and due to the name, any we miss due to a crash, etc, should eventually be removed by "git gc" runs based on timestamps). That is slightly worse for one case: if we download an idx but not the matching pack, we won't retain that idx for subsequent runs. But the flip side is that we're making other cases better (we never hold on to useless idx files forever). I suspect that worse case does not even come up often, since it implies that the packs are generated to match distinct parts of history (i.e., in practice even in a repo with many packs you're going to end up grabbing all of those packs to do a clone). If somebody really cares about that, I think the right path forward is a managed cache directory as above, and this patch is providing the first step in that direction anyway (by moving things out of the objects/pack/ directory). There are two test changes. One demonstrates the broken v1 index case (it double-checks the resulting clone with fsck to be careful, but prior to this patch it actually fails at the clone step). The other tweaks the expectation for a test that covers the "slightly worse" case to accommodate the extra index download. The code changes are fairly simple. We stop using finalize_object_file() to copy the remote's index file into place, and leave it as a tempfile. We give the tempfile a real ".idx" name, since the packfile code expects that, and thus we make sure it is out of the usual packs/ directory (so we'd never mistake it for a real local .idx). We also have to change parse_pack_index(), which creates a temporary packed_git to access our index (we need this because all of the pack idx code assumes we have that struct). It reads the index data from the tempfile, but prior to this patch would speculatively write the finalized name into the packed_git struct using the pack-$hash we expect to use. I was mildly surprised that this worked at all, since we call verify_pack_index() on the packed_git which mentions the final name before moving the file into place! But it works because parse_pack_index() leaves the mmap-ed data in the struct, so the lazy-open in verify_pack_index() never triggers, and we read from the tempfile, ignoring the filename in the struct completely. Hacky, but it works. After this patch, parse_pack_index() now uses the index filename we pass in to derive a matching .pack name. This is OK to change because there are only two callers, both in the dumb http code (and the other passes in an existing pack-$hash.idx name, so the derived name is going to be pack-$hash.pack, which is what we were using anyway). I'll follow up with some more cleanups in that area, but this patch is sufficient to fix the regression. Reported-by: fox <fox.gbr@townlong-yak.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	019b21d402	t5550: count fetches in "previously-fetched .idx" test We have a test in t5550 that looks at index fetching over dumb http. It creates two branches, each of which is completely stored in its own pack, then fetches the branches independently. What should (and does) happen is that the first fetch grabs both .idx files and one .pack file, and then the fetch of the second branch re-uses the previously downloaded .idx files (fetching none) and grabs the now-required .pack file. Since the next few patches will be touching this area of the code, let's beef up the test a little by checking that we're downloading the expected items at each step. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	8b5763e8fa	midx: avoid duplicate packed_git entries When we scan a pack directory to load the idx entries we find into the packed_git list, we skip any of them that are contained in a midx. We then load them later lazily if we actually need to access the corresponding pack, referencing them both from the midx struct and the packed_git list. The lazy-load in the midx code checks to see if the midx already mentions the pack, but doesn't otherwise check the packed_git list. This makes sense, since we should have added any pack to both lists. But there's a loophole! If we call close_object_store(), that frees the midx entirely, but _not_ the packed_git structs, which we must keep around for Reasons[1]. If we then try to look up more objects, we'll auto-load the midx again, which won't realize that we've already loaded those packs, and will create duplicate entries in the packed_git list. This is possibly inefficient, because it means we may open and map the pack redundantly. But it can also lead to weird user-visible behavior. The case I found is in "git repack", which closes and reopens the midx after repacking and then calls update_server_info(). We end up writing the duplicate entries into objects/info/packs. We could obviously de-dup them while writing that file, but it seems like a violation of more core assumptions that we end up with these duplicate structs at all. We can avoid the duplicates reasonably efficiently by checking their names in the pack_map hash. This annoyingly does require a little more than a straight hash lookup due to the naming conventions, but it should only happen when we are about to actually open a pack. I don't think one extra malloc will be noticeable there. [1] I'm not entirely sure of all the details, except that we generally assume the packed_git structs never go away. We noted this restriction in the comment added by `6f1e9394e2` (object: fix leaking packfiles when closing object store, 2024-08-08), but it's somewhat vague. At any rate, if you try freeing the structs in close_object_store(), you can observe segfaults all over the test suite. So it might be fixable, but it's not trivial. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Taylor Blau	55d12c24d7	Merge branch 'wm/shortlog-hash' Teaches 'shortlog' to explicitly use SHA-1 when operating outside of a repository. * wm/shortlog-hash: builtin/shortlog: explicitly set hash algo when there is no repo	2024-10-25 14:02:49 -04:00
Taylor Blau	448022a7fb	Merge branch 'bf/t-readme-mention-reftable' Doc update. * bf/t-readme-mention-reftable: t/README: add missing value for GIT_TEST_DEFAULT_REF_FORMAT	2024-10-25 14:02:21 -04:00

1 2 3 4 5 ...

23050 Commits