global/git

mirror of https://github.com/git/git.git synced 2026-04-12 18:00:07 +02:00

Author	SHA1	Message	Date
Junio C Hamano	ca39da6997	Merge branch 'ps/meson-contrib-bits' Update meson-based build procedure to cover contrib/ and other places as well. * ps/meson-contrib-bits: ci: exercise credential helpers ci: fix propagating UTF-8 test locale in musl-based Meson job meson: wire up static analysis via Coccinelle meson: wire up git-contacts(1) meson: wire up credential helpers contrib/credential: fix compilation of "osxkeychain" helper contrib/credential: fix compiling "libsecret" helper contrib/credential: fix compilation of wincred helper with MSVC contrib/credential: fix "netrc" tests with out-of-tree builds GIT-BUILD-OPTIONS: propagate project's source directory	2025-03-03 08:53:03 -08:00
Junio C Hamano	ab09eddf60	Merge branch 'ps/build-meson-fixes-0130' Assorted fixes and improvements to the build procedure based on meson. * ps/build-meson-fixes-0130: gitlab-ci: restrict maximum number of link jobs on Windows meson: consistently use custom program paths to resolve programs meson: fix overwritten `git` variable meson: prevent finding sed(1) in a loop meson: improve handling of `sane_tool_path` option meson: improve PATH handling meson: drop separate version library meson: stop linking libcurl into all executables meson: introduce `libgit_curl` dependency meson: simplify use of the common-main library meson: inline the static 'git' library meson: fix OpenSSL fallback when not explicitly required meson: fix exec path with enabled runtime prefix	2025-03-03 08:53:02 -08:00
Junio C Hamano	1aabec0b48	Merge branch 'dk/test-aggregate-results-paste-fix' The use of "paste" command for aggregating the test results have been corrected. * dk/test-aggregate-results-paste-fix: t/aggregate-results: fix paste(1) invocation	2025-03-03 08:53:01 -08:00
Junio C Hamano	68c3be61fc	Merge branch 'bc/http-push-auth-netrc-fix' The netrc support (via the cURL library) for the HTTP transport has been re-enabled. * bc/http-push-auth-netrc-fix: http: allow using netrc for WebDAV-based HTTP protocol	2025-02-27 15:23:01 -08:00
Junio C Hamano	c51a0b47c9	Merge branch 'pw/rebase-i-ff-empty-commit' "git rebase -i" failed to allow rewording an empty commit that has been fast-forwarded. * pw/rebase-i-ff-empty-commit: rebase -i: reword empty commit after fast-forward	2025-02-27 15:23:00 -08:00
Junio C Hamano	3c0f4abaf5	Merge branch 'kn/ref-migrate-skip-reflog' "git refs migrate" can optionally be told not to migrate the reflog. * kn/ref-migrate-skip-reflog: builtin/refs: add '--no-reflog' flag to drop reflogs	2025-02-27 15:23:00 -08:00
Junio C Hamano	9d8cce051a	Merge branch 'ua/os-version-capability' The value of "uname -s" is by default sent over the wire as a part of the "version" capability. * ua/os-version-capability: agent: advertise OS name via agent capability t5701: add setup test to remove side-effect dependency version: extend get_uname_info() to hide system details version: refactor get_uname_info() version: refactor redact_non_printables() version: replace manual ASCII checks with isprint() for clarity	2025-02-27 15:23:00 -08:00
Patrick Steinhardt	ebb35369f1	meson: simplify use of the common-main library The "common-main.c" file is used by multiple executables. In order to make it easy to set it up we have created a separate library that these executables can link against. All of these executables also want to link against `libgit.a` though, which makes it necessary to specify both of these as dependencies for every executable. Simplify this a bit by declaring the library as a source dependency: instead of creating a static library, we now instead compile the common set of files into each executable separately. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-26 09:09:35 -08:00
Junio C Hamano	e24570b0a3	Merge branch 'jk/check-mailmap-wo-name-fix' "git check-mailmap" segfault fix. * jk/check-mailmap-wo-name-fix: mailmap: fix check-mailmap with full mailmap line	2025-02-26 08:51:00 -08:00
Junio C Hamano	092180990d	Merge branch 'ad/set-default-target-in-makefiles' Correct the default target in Documentation/Makefile, and future-proof all Makefiles from similar breakages by declaring the default target (which happens to be "all") upfront. * ad/set-default-target-in-makefiles: Makefile: set default goals in makefiles	2025-02-25 14:19:36 -08:00
Junio C Hamano	a8a5bb1f78	Merge branch 'bc/diff-reject-empty-arg-to-pickaxe' The -G/-S options to the "diff" family of commands caused us to hit a BUG() when they get no values; they have been corrected. * bc/diff-reject-empty-arg-to-pickaxe: diff: don't crash with empty argument to -G or -S	2025-02-25 14:19:35 -08:00
D. Ben Knoble	ce98863204	t/aggregate-results: fix paste(1) invocation When running `make test`, when missing prereqs the following is emitted: make aggregate-results usage: paste [-s] [-d delimiters] file ... fixed 1 success 30066 failed 0 broken 218 total 31274 POSIX says that `paste` requires a file operand; stdin was clearly intended by `49da404070` (test-lib: show missing prereq summary, 2021-11-20). Use it. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-24 12:24:16 -08:00
brian m. carlson	3306edb380	http: allow using netrc for WebDAV-based HTTP protocol For an extended period of time, we've enabled libcurl's netrc functionality, which will read credentials from the netrc file if none are provided. Unfortunately, we have also not documented this fact or written any tests for it, but people have come to rely on it. In `610cbc1dfb` ("http: allow authenticating proactively", 2024-07-10), we accidentally broke the ability of users to use the netrc file for the WebDAV-based HTTP protocol. Notably, it works on the initial request but does not work on subsequent requests, which causes failures because that version of the protocol will necessarily make multiple requests. This happens because curl_empty_auth_enabled never returns -1, only 0 or 1, and so if http.proactiveAuth is not enabled, the username and password are always set to empty credentials, which prevents libcurl's fallback to netrc from working. However, in other cases, the server continues to get a 401 response and the credential helper is invoked, which is the normal behavior, so this was not noticed earlier. To fix this, change the condition to check for enabling empty auth and also not having proactive auth enabled, which should result in the username and password not being set to a single colon in the typical case, and thus the netrc file being used. Reported-by: Peter Georg <peter.georg@physik.uni-regensburg.de> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-24 08:49:10 -08:00
Jacob Keller	bb60c52131	mailmap: fix check-mailmap with full mailmap line I recently had reported to me a crash from a coworker using the recently added sendemail mailmap support: 3724814 Segmentation fault (core dumped) git check-mailmap "bugs@company.xx" This appears to happen because of the NULL pointer name passed into map_user(). Fix this by passing "" instead of NULL so that we have a valid pointer. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-21 18:27:16 -08:00
Karthik Nayak	89be7d2774	builtin/refs: add '--no-reflog' flag to drop reflogs The "git refs migrate" subcommand converts the backend used for ref storage. It always migrates reflog data as well as refs. Introduce an option to exclude reflogs from migration, allowing them to be discarded when they are unnecessary. This is particularly useful in server-side repositories, where reflogs are typically not expected. However, some repositories may still have them due to historical reasons, such as bugs, misconfigurations, or administrative decisions to enable reflogs for debugging. In such repositories, it would be optimal to drop reflogs during the migration. To address this, introduce the '--no-reflog' flag, which prevents reflog migration. When this flag is used, reflogs from the original reference backend are migrated. Since only the new reference backend remains in the repository, all previous reflogs are permanently discarded. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-21 09:55:02 -08:00
Usman Akinyemi	cf7ee48190	agent: advertise OS name via agent capability As some issues that can happen with a Git client can be operating system specific, it can be useful for a server to know which OS a client is using. In the same way it can be useful for a client to know which OS a server is using. Our current agent capability is in the form of "package/version" (e.g., "git/1.8.3.1"). Let's extend it to include the operating system name (os) i.e in the form "package/version-os" (e.g., "git/1.8.3.1-Linux"). Including OS details in the agent capability simplifies implementation, maintains backward compatibility, avoids introducing a new capability, encourages adoption across Git-compatible software, and enhances debugging by providing complete environment information without affecting functionality. The operating system name is retrieved using the 'sysname' field of the `uname(2)` system call or its equivalent. However, there are differences between `uname(1)` (command-line utility) and `uname(2)` (system call) outputs on Windows. These discrepancies complicate testing on Windows platforms. For example: - `uname(1)` output: MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64\ .2024-02-14.20:17.UTC.x86_64 - `uname(2)` output: Windows.10.0.20348 On Windows, uname(2) is not actually system-supplied but is instead already faked up by Git itself. We could have overcome the test issue on Windows by implementing a new `uname` subcommand in `test-tool` using uname(2), but except uname(2), which would be tested against itself, there would be nothing platform specific, so it's just simpler to disable the tests on Windows. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-19 09:48:37 -08:00
Junio C Hamano	5dd710cb62	Merge branch 'lo/t7603-path-is-file-update' Test clean-up. * lo/t7603-path-is-file-update: t7603: replace test -f by test_path_is_file	2025-02-18 15:30:33 -08:00
Junio C Hamano	7722b997c6	Merge branch 'jt/rev-list-missing-print-info' "git rev-list --missing=" learned to accept "print-info" that gives known details expected of the missing objects, like path and type. * jt/rev-list-missing-print-info: rev-list: extend print-info to print missing object type rev-list: add print-info action to print missing object path	2025-02-18 15:30:32 -08:00
Junio C Hamano	345aaf3976	Merge branch 'ps/send-pack-unhide-error-in-atomic-push' "git push --atomic --porcelain" used to ignore failures from the other side, losing the error status from the child process, which has been corrected. * ps/send-pack-unhide-error-in-atomic-push: send-pack: gracefully close the connection for atomic push t5543: atomic push reports exit code failure send-pack: new return code "ERROR_SEND_PACK_BAD_REF_STATUS" t5548: add porcelain push test cases for dry-run mode t5548: add new porcelain test cases t5548: refactor test cases by resetting upstream t5548: refactor to reuse setup_upstream() function t5504: modernize test by moving heredocs into test bodies	2025-02-18 15:30:32 -08:00
Junio C Hamano	e565f37553	Merge branch 'ds/backfill' Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate	2025-02-18 15:30:31 -08:00
Patrick Steinhardt	c5823641a6	GIT-BUILD-OPTIONS: propagate project's source directory A couple of our tests require knowledge around where to find the project's source directory in order to locate files required for the test itself. Until now we have been wiring these up ad-hoc via new, specialized variables catered to the specific usecase. This is quite awkward though, as every test that potentially needs to locate paths relative to the source directory needs to grow another variable. Introduce a new "GIT_SOURCE_DIR" variable into GIT-BUILD-OPTIONS to stop this proliferation. Remove existing variables that can be derived from it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 11:40:02 -08:00
brian m. carlson	a620046b29	diff: don't crash with empty argument to -G or -S The pickaxe options, -G and -S, need either a regex or a string to look through the history for. An empty value isn't very useful since it would either match everything or nothing, and what's worse, we presently crash with a BUG like so when the user provides one: BUG: diffcore-pickaxe.c:241: should have needle under -G or -S Since it's not very nice of us to crash and this wouldn't do anything useful anyway, let's simply inform the user that they must provide a non-empty argument and exit with an error if they provide an empty one instead. Reported-by: Jared Van Bortel <cebtenzzre@gmail.com> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 10:17:02 -08:00
Usman Akinyemi	15ff206863	t5701: add setup test to remove side-effect dependency Currently, the "test capability advertisement" test creates some files with expected content which are used by other tests below it. To remove that side-effect from this test, let's split up part of it into a "setup"-type test which creates the files with expected content which gets reused by multiple tests. This will be useful in a following commit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:05:13 -08:00
Adam Dinwoodie	5309c1e9fb	Makefile: set default goals in makefiles Explicitly set the default goal at the very top of various makefiles. This is already present in some makefiles, but not all of them. In particular, this corrects a regression introduced in `a38edab7c8` (Makefile: generate doc versions via GIT-VERSION-GEN, 2024-12-06). That commit added some config files as build targets for the Documentation directory, and put the target configuration in a sensible place. Unfortunately, that sensible place was above any other build target definitions, meaning the default goal changed to being those configuration files only, rather than the HTML and man page documentation. Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Helped-by: Junio C Hamano <gitster@pobox.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:02:26 -08:00
Junio C Hamano	82522a9e2c	Merge branch 'kn/reflog-migration-fix-followup' Code clean-up. * kn/reflog-migration-fix-followup: reftable: prevent 'update_index' changes after adding records refs: use 'uint64_t' for 'ref_update.index' refs: mark `ref_transaction_update_reflog()` as static	2025-02-14 17:53:48 -08:00
Junio C Hamano	c3fffcfe8e	Merge branch 'bf/fetch-set-head-fix' Fetching into a bare repository incorrectly assumed it always used a mirror layout when deciding to update remote-tracking HEAD, which has been corrected. * bf/fetch-set-head-fix: fetch set_head: fix non-mirror remotes in bare repositories fetch set_head: refactor to use remote directly	2025-02-14 17:53:48 -08:00
Junio C Hamano	09e74b06ea	Merge branch 'op/worktree-is-main-bare-fix' Going into a secondary worktree and asking "is the main worktree bare?" did not work correctly when per-worktree configuration option was in use, which has been corrected. * op/worktree-is-main-bare-fix: worktree: detect from secondary worktree if main worktree is bare	2025-02-14 17:53:48 -08:00
Junio C Hamano	5785d9143b	Merge branch 'tc/clone-single-revision' "git clone" learned to make a shallow clone for a single commit that is not necessarily be at the tip of any branch. * tc/clone-single-revision: builtin/clone: teach git-clone(1) the --revision= option parse-options: introduce die_for_incompatible_opt2() clone: introduce struct clone_opts in builtin/clone.c clone: add tags refspec earlier to fetch refspec clone: refactor wanted_peer_refs() clone: make it possible to specify --tags clone: cut down on global variables in clone.c	2025-02-14 17:53:48 -08:00
Junio C Hamano	2d7a874493	Merge branch 'da/help-autocorrect-one-fix' "git -c help.autocorrect=0 psuh" shows the suggested typofix, unlike the previous attempt in the base topic. * da/help-autocorrect-one-fix: help: add "show" as a valid configuration value help: show the suggested command when help.autocorrect is false	2025-02-12 10:08:55 -08:00
Junio C Hamano	5b9d01bc4d	Merge branch 'zh/gc-expire-to' "git gc" learned the "--expire-to" option and passes it down to underlying "git repack". * zh/gc-expire-to: gc: add `--expire-to` option	2025-02-12 10:08:53 -08:00
Junio C Hamano	a4af0b6288	Merge branch 'js/libgit-rust' Foreign language interface for Rust into our code base has been added. * js/libgit-rust: libgit: add higher-level libgit crate libgit-sys: also export some config_set functions libgit-sys: introduce Rust wrapper for libgit.a common-main: split init and exit code into new files	2025-02-12 10:08:53 -08:00
Junio C Hamano	3f3fd0f346	Merge branch 'ac/t5401-use-test-path-is-file' Test clean-up. * ac/t5401-use-test-path-is-file: t5401: prefer test_path_is_* helper function	2025-02-12 10:08:52 -08:00
Junio C Hamano	9865ef2457	Merge branch 'ac/t6423-unhide-git-exit-status' Test clean-up. * ac/t6423-unhide-git-exit-status: t6423: fix suppression of Git’s exit code in tests	2025-02-12 10:08:52 -08:00
Junio C Hamano	07c401d392	Merge branch 'ps/repack-keep-unreachable-in-unpacked-repo' "git repack --keep-unreachable" to send unreachable objects to the main pack "git repack -ad" produces did not work when there is no existing packs, which has been corrected. * ps/repack-keep-unreachable-in-unpacked-repo: builtin/repack: fix `--keep-unreachable` when there are no packs	2025-02-12 10:08:52 -08:00
Junio C Hamano	aae91a86fb	Merge branch 'ds/name-hash-tweaks' "git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version	2025-02-12 10:08:51 -08:00
Phillip Wood	af8fc7be10	rebase -i: reword empty commit after fast-forward When rebase rewords a commit it picks the commit and then runs "git commit --amend" to reword it. When the commit is picked the sequencer tries to reuse existing commits by fast-forwarding if the parents are unchanged. Rewording an empty commit that has been fast-forwarded fails because "git commit --amend" is called without "--allow-empty". This happens because when a commit is fast-forwarded the logic that checks whether we should pass "--allow-empty" is skipped. Fix this by always passing "--allow-empty" when rewording a commit. This is safe because we are amending a commit that has already been picked so if it had become empty when it was picked we'd have already returned an error. As "git commit" will happily create empty merge commits without "--allow-empty" we do not need to pass that flag when rewording merge commits. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-11 09:50:53 -08:00
Junio C Hamano	6f0b72205d	Merge branch 'sk/unit-tests-0130' Convert a handful of unit tests to work with the clar framework. * sk/unit-tests-0130: t/unit-tests: convert strcmp-offset test to use clar test framework t/unit-tests: convert strbuf test to use clar test framework t/unit-tests: adapt example decorate test to use clar test framework t/unit-tests: convert hashmap test to use clar test framework	2025-02-10 10:18:31 -08:00
Junio C Hamano	246569bf83	Merge branch 'ps/hash-cleanup' Further code clean-up on the use of hash functions. Now the context object knows what hash function it is working with. * ps/hash-cleanup: global: adapt callers to use generic hash context helpers hash: provide generic wrappers to update hash contexts hash: stop typedeffing the hash context hash: convert hashing context to a structure	2025-02-10 10:18:31 -08:00
Junio C Hamano	34736ff48e	Merge branch 'pw/apply-ulong-overflow-check' "git apply" internally uses unsigned long for line numbers and uses strtoul() to parse numbers on the hunk headers. It however forgot to check parse errors. * pw/apply-ulong-overflow-check: apply: detect overflow when parsing hunk header	2025-02-10 10:18:30 -08:00
Junio C Hamano	442b7e0018	Merge branch 'ps/setup-reinit-fixes' "git init" to reinitialize a repository that already exists cannot change the hash function and ref backends; such a request is silently ignored now. * ps/setup-reinit-fixes: setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT t0001: remove duplicate test	2025-02-10 10:18:29 -08:00
Lucas Oshiro	f1cc562b77	t7603: replace test -f by test_path_is_file `test_path_is_file` provides a better output when asserting whether a file exists. Replace the occurrences of `test -f` in t7603 with it, facilitating the trace of possible test failures. Signed-off-by: Lucas Oshiro <lucasseikioshiro@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-10 08:03:30 -08:00
Junio C Hamano	2bf3c7fab1	Merge branch 'ps/ci-misc-updates' CI updates (containerization, dropping stale ones, etc.). * ps/ci-misc-updates: ci: remove stale code for Azure Pipelines ci: use latest Ubuntu release ci: stop special-casing for Ubuntu 16.04 gitlab-ci: add linux32 job testing against i386 gitlab-ci: remove the "linux-old" job github: simplify computation of the job's distro github: convert all Linux jobs to be containerized github: adapt containerized jobs to be rootless t7422: fix flaky test caused by buffered stdout t0060: fix EBUSY in MinGW when setting up runtime prefix	2025-02-06 14:56:44 -08:00
Toon Claes	337855629f	builtin/clone: teach git-clone(1) the --revision= option The git-clone(1) command has the option `--branch` that allows the user to select the branch they want HEAD to point to. In a non-bare repository this also checks out that branch. Option `--branch` also accepts a tag. When a tag name is provided, the commit this tag points to is checked out and HEAD is detached. Thus `--branch` can be used to clone a repository and check out a ref kept under `refs/heads` or `refs/tags`. But some other refs might be in use as well. For example Git forges might use refs like `refs/pull/<id>` and `refs/merge-requests/<id>` to track pull/merge requests. These refs cannot be selected upon git-clone(1). Add option `--revision` to git-clone(1). This option accepts a fully qualified reference, or a hexadecimal commit ID. This enables the user to clone and check out any revision they want. `--revision` can be used in conjunction with `--depth` to do a minimal clone that only contains the blob and tree for a single revision. This can be useful for automated tests running in CI systems. Using option `--branch` and `--single-branch` together is a similar scenario, but serves a different purpose. Using these two options, a singlet remote tracking branch is created and the fetch refspec is set up so git-fetch(1) will receive updates on that branch from the remote. This allows the user work on that single branch. Option `--revision` on contrary detaches HEAD, creates no tracking branches, and writes no fetch refspec. Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Patrick Steinhardt <ps@pks.im> [jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:26:42 -08:00
Olga Pilipenco	78a95e0d80	worktree: detect from secondary worktree if main worktree is bare When extensions.worktreeConfig is true and the main worktree is bare -- that is, its config.worktree file contains core.bare=true -- commands run from secondary worktrees incorrectly see the main worktree as not bare. As such, those commands incorrectly think that the repository's default branch (typically "main" or "master") is checked out in the bare repository even though it's not. This makes it impossible, for instance, to checkout or delete the default branch from a secondary worktree, among other shortcomings. This problem occurs because, when extensions.worktreeConfig is true, commands run in secondary worktrees only consult $commondir/config and $commondir/worktrees/<id>/config.worktree, thus they never see the main worktree's core.bare=true setting in $commondir/config.worktree. Fix this problem by consulting the main worktree's config.worktree file when checking whether it is bare. (This extra work is performed only when running from a secondary worktree.) Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Olga Pilipenco <olga.pilipenco@shopify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-05 09:46:23 -08:00
Justin Tobler	3295c35398	rev-list: extend print-info to print missing object type Additional information about missing objects found in git-rev-list(1) can be printed by specifying the `print-info` missing action for the `--missing` option. Extend this action to also print missing object type information inferred from its containing object. This token follows the form `type=<type>` and specifies the expected object type of the missing object. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-05 09:32:01 -08:00
Justin Tobler	c6d896bcfd	rev-list: add print-info action to print missing object path Missing objects identified through git-rev-list(1) can be printed by setting the `--missing=print` option. Additional information about the missing object, such as its path and type, may be present in its containing object. Add the `print-info` missing action for the `--missing` option that, when set, prints additional insight about the missing object inferred from its containing object. Each line of output for a missing object is in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs containing additional information are separated from each other by a SP. The value is encoded in a token specific fashion, but SP or LF contained in value are always expected to be represented in such a way that the resulting encoded value does not have either of these two problematic bytes. This format is kept generic so it can be extended in the future to support additional information. For now, only a missing object path info is implemented. It follows the form `path=<path>` and specifies the full path to the object from the top-level tree. A path containing SP or special characters is enclosed in double-quotes in the C style as needed. In a subsequent commit, missing object type info will also be added. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-05 09:32:01 -08:00
Patrick Steinhardt	414c82300a	builtin/repack: fix `--keep-unreachable` when there are no packs The "--keep-unreachable" flag is supposed to append any unreachable objects to the newly written pack. This flag is explicitly documented as appending both packed and loose unreachable objects to the new packfile. And while this works alright when repacking with preexisting packfiles, it stops working when the repository does not have any packfiles at all. The root cause are the conditions used to decide whether or not we want to append "--pack-loose-unreachable" to git-pack-objects(1). There are a couple of conditions here: - `has_existing_non_kept_packs()` checks whether there are existing packfiles. This condition makes sense to guard "--keep-pack=", "--unpack-unreachable" and "--keep-unreachable", because all of these flags only make sense in combination with existing packfiles. But it does not make sense to disable `--pack-loose-unreachable` when there aren't any preexisting packfiles, as loose objects can be packed into the new packfile regardless of that. - `delete_redundant` checks whether we want to delete any objects or packs that are about to become redundant. The documentation of `--keep-unreachable` explicitly says that `git repack -ad` needs to be executed for the flag to have an effect. It is not immediately obvious why such redundant objects need to be deleted in order for "--pack-unreachable-objects" to be effective. But as things are working as documented this is nothing we'll change for now. - `pack_everything & PACK_CRUFT` checks that we're not creating a cruft pack. This condition makes sense in the context of "--pack-loose-unreachable", as unreachable objects would end up in the cruft pack anyway. So while the second and third condition are sensible, it does not make any sense to condition `--pack-loose-unreachable` on the existence of packfiles. Fix the bug by splitting out the "--pack-loose-unreachable" and only making it depend on the second and third condition. Like this, loose unreachable objects will be packed regardless of any preexisting packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-04 09:58:02 -08:00
Derrick Stolee	85127bcdea	backfill: assume --sparse when sparse-checkout is enabled The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:42 -08:00
Derrick Stolee	bff4555767	backfill: add --sparse option One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|-------\| \| (Initial clone) \| 3 \| 110 MB \| \| \| 10K \| 12 \| 192 MB \| 17.2s \| \| 15K \| 9 \| 192 MB \| 15.5s \| \| 20K \| 8 \| 192 MB \| 15.5s \| \| 25K \| 7 \| 192 MB \| 14.7s \| This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|------\| \| (Initial clone) \| 2 \| 1,876 MB \| \| \| 10K \| 11 \| 2,187 MB \| 46s \| \| 25K \| 7 \| 2,188 MB \| 43s \| \| 50K \| 5 \| 2,194 MB \| 44s \| \| 100K \| 4 \| 2,194 MB \| 48s \| This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:42 -08:00
Derrick Stolee	6840fe9ee2	backfill: add --min-batch-size=<n> option Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. For now, we let the path-walk API batches determine the boundaries. To get a feeling for the value of specifying the --min-batch-size parameter, I tested a number of open source repositories available on GitHub. The procedure was generally: 1. git clone --filter=blob:none <url> 2. git backfill Checking the number of packfiles and the size of the .git/objects/pack directory helps to identify the effects of different batch sizes. For the Git repository, we get these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|-------\| \| (Initial clone) \| 2 \| 119 MB \| \| \| 25K \| 8 \| 290 MB \| 24s \| \| 50K \| 5 \| 290 MB \| 24s \| \| 100K \| 4 \| 290 MB \| 29s \| Other than the packfile counts decreasing as we need fewer batches, the size and time required is not changing much for this small example. For the nodejs/node repository, we see these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|--------\| \| (Initial clone) \| 2 \| 330 MB \| \| \| 25K \| 19 \| 1,222 MB \| 1m 22s \| \| 50K \| 11 \| 1,221 MB \| 1m 24s \| \| 100K \| 7 \| 1,223 MB \| 1m 40s \| \| 250K \| 4 \| 1,224 MB \| 2m 23s \| \| 500K \| 3 \| 1,216 MB \| 4m 38s \| Here, we don't have much difference in the size of the repo, though the 500K batch size results in a few MB gained. That comes at a cost of a much longer time. This extra time is due to server-side delta compression happening as the on-disk deltas don't appear to be reusable all the time. But for smaller batch sizes, the server is able to find reasonable deltas partly because we are asking for objects that appear in the same region of the directory tree and include all versions of a file at a specific path. To contrast this example, I tested the microsoft/fluentui repo, which has been known to have inefficient packing due to name hash collisions. These results are found before GitHub had the opportunity to repack the server with more advanced name hash versions: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|--------\| \| (Initial clone) \| 2 \| 105 MB \| \| \| 5K \| 53 \| 348 MB \| 2m 26s \| \| 10K \| 28 \| 365 MB \| 2m 22s \| \| 15K \| 19 \| 407 MB \| 2m 21s \| \| 20K \| 15 \| 393 MB \| 2m 28s \| \| 25K \| 13 \| 417 MB \| 2m 06s \| \| 50K \| 8 \| 509 MB \| 1m 34s \| \| 100K \| 5 \| 535 MB \| 1m 56s \| \| 250K \| 4 \| 698 MB \| 1m 33s \| \| 500K \| 3 \| 696 MB \| 1m 42s \| Here, a larger variety of batch sizes were chosen because of the great variation in results. By asking the server to download small batches corresponding to fewer paths at a time, the server is able to provide better compression for these batches than it would for a regular clone. A typical full clone for this repository would require 738 MB. This example justifies the choice to batch requests by path name, leading to improved communication with a server that is not optimally packed. Finally, the same experiment for the Linux repository had these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|---------\| \| (Initial clone) \| 2 \| 2,153 MB \| \| \| 25K \| 63 \| 6,380 MB \| 14m 08s \| \| 50K \| 58 \| 6,126 MB \| 15m 11s \| \| 100K \| 30 \| 6,135 MB \| 18m 11s \| \| 250K \| 14 \| 6,146 MB \| 18m 22s \| \| 500K \| 8 \| 6,143 MB \| 33m 29s \| Even in this example, where the default name hash algorithm leads to decent compression of the Linux kernel repository, there is value for selecting a smaller batch size, to a limit. The 25K batch size has the fastest time, but uses 250 MB more than the 50K batch size. The 500K batch size took much more time due to server compression time and thus we should avoid large batch sizes like this. Based on these experiments, a batch size of 50,000 was chosen as the default value. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:42 -08:00

1 2 3 4 5 ...

23392 Commits