* ps/undecided-is-not-necessarily-sha1:
repository: stop setting SHA1 as the default object hash
oss-fuzz/commit-graph: set up hash algorithm
builtin/shortlog: don't set up revisions without repo
builtin/diff: explicitly set hash algo when there is no repo
builtin/bundle: abort "verify" early when there is no repository
builtin/blame: don't access potentially unitialized `the_hash_algo`
builtin/rev-parse: allow shortening to more than 40 hex characters
remote-curl: fix parsing of detached SHA256 heads
attr: fix BUG() when parsing attrs outside of repo
attr: don't recompute default attribute source
parse-options-cb: only abbreviate hashes when hash algo is known
path: move `validate_headref()` to its only user
path: harden validation of HEAD with non-standard hashes
CI fix.
* jk/ci-macos-gcc13-fix:
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
Git 2.43 started using the tree of HEAD as the source of attributes
in a bare repository, which has severe performance implications.
For now, revert the change, without ripping out a more explicit
support for the attr.tree configuration variable.
* jc/no-default-attr-tree-in-bare:
stop using HEAD for attributes in bare repository by default
Unbreak CI jobs so that we do not attempt to use Python 2 that has
been removed from the platform.
* ps/ci-python-2-deprecation:
ci: fix Python dependency on Ubuntu 24.04
Tests that try to corrupt in-repository files in chunked format did
not work well on macOS due to its broken "mv", which has been
worked around.
* jc/test-workaround-broken-mv:
t/lib-chunk: work around broken "mv" on some vintage of macOS
Our osx-gcc job explicitly asks to install gcc-13. But since the GitHub
runner image already comes with gcc-13 installed, this is mostly doing
nothing (or in some cases it may install an incremental update over the
runner image). But worse, it recently started causing errors like:
==> Fetching gcc@13
==> Downloading https://ghcr.io/v2/homebrew/core/gcc/13/blobs/sha256:fb2403d97e2ce67eb441b54557cfb61980830f3ba26d4c5a1fe5ecd0c9730d1a
==> Pouring gcc@13--13.2.0.ventura.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/c++-13
Target /usr/local/bin/c++-13
is a symlink belonging to gcc. You can unlink it:
brew unlink gcc
which cause the whole CI job to bail.
I didn't track down the root cause, but I suspect it may be related to
homebrew recently switching the "gcc" default to gcc-14. And it may even
be fixed when a new runner image is released. But if we don't need to
run brew at all, it's one less thing for us to worry about.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On macOS, a bare "gcc" (without a version) will invoke a wrapper for
clang, not actual gcc. Even when gcc is installed via homebrew, that
only provides version-specific links in /usr/local/bin (like "gcc-13"),
and never a version-agnostic "gcc" wrapper.
As far as I can tell, this has been the case for a long time, and this
osx-gcc job has largely been doing nothing. We can point it at "gcc-13",
which will pick up the homebrew-installed version.
The fix here is specific to the github workflow file, as the gitlab one
does not have a matching job.
It's a little unfortunate that we cannot just ask for the latest version
of gcc which homebrew provides, but as far as I can tell there is no
easy alias (you'd have to find the highest number gcc-* in
/usr/local/bin yourself).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last user of this variable went away in 4a6e4b9602 (CI: remove
Travis CI support, 2021-11-23), so it's doing nothing except making it
more confusing to find out which packages _are_ installed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 2d65e5b6a6 (ci: rename "runs_on_pool" to "distro", 2024-04-12)
renamed this variable for the main CI workflow, as well as in the ci/
scripts. Because the coverity workflow also relies on those scripts to
install dependencies, it needs to be updated, too. Without this patch,
the coverity build fails because we lack libcurl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was a semantic merge conflict between 9cdeb34b96 (ci: merge
scripts which install dependencies, 2024-04-12), which has merged
"ci/install-docker-dependencies.sh" into "ci/install-dependencies.sh"
and c7b228e000 (gitlab-ci: add smoke test for fuzzers, 2024-04-29),
which has added a new fuzz smoke test job that makes use of the
now-removed script.
Adapt the job to instead use the new script to install dependencies.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During "git p4 clone" if p4 process returns an error from the server,
it will store the message in the 'err' variable. Then it will send a
text command "die-now" to git-fast-import. However, git-fast-import
raises an exception: "fatal: Unsupported command: die-now" and err is
never displayed. This patch ensures that err is shown to the end user.
Signed-off-by: Fahad Alrashed <fahad@keylock.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The color parsing code learned to handle 12-bit RGB colors, spelled
as "#RGB" (in addition to "#RRGGBB" that is already supported).
* bb/rgb-12-bit-colors:
color: add support for 12-bit RGB colors
t/t4026-color: add test coverage for invalid RGB colors
t/t4026-color: remove an extra double quote character
Command line completion support for zsh (in contrib/) has been
updated to stop exposing internal state to end-user shell
interaction.
* dk/zsh-git-repo-path-fix:
completion: zsh: stop leaking local cache variable
zsh can pretend to be a normal shell pretty well except for some
glitches that we tickle in some of our scripts. Work them around
so that "vimdiff" and our test suite works well enough with it.
* bc/zsh-compatibility:
vimdiff: make script and tests work with zsh
t4046: avoid continue in &&-chain for zsh
When the user responds to a prompt given by "git add -p" with an
unsupported command, list of available commands were given, which
was too much if the user knew what they wanted to type but merely
made a typo. Now the user gets a much shorter error message.
* rj/add-p-typo-reaction:
add-patch: response to unknown command
add-patch: do not show UI messages on stderr
Command line completion script (in contrib/) learned to complete
"git symbolic-ref" a bit better (you need to enable plumbing
commands to be completed with GIT_COMPLETION_SHOW_ALL_COMMANDS).
* rh/complete-symbolic-ref:
completion: add docs on how to add subcommand completions
completion: improve docs for using __git_complete
completion: add 'symbolic-ref'
The singleton index_state instance "the_index" has been eliminated
by always instantiating "the_repository" and replacing references
to "the_index" with references to its .index member.
* ps/the-index-is-no-more:
repository: drop `initialize_the_repository()`
repository: drop `the_index` variable
builtin/clone: stop using `the_index`
repository: initialize index in `repo_init()`
builtin: stop using `the_index`
t/helper: stop using `the_index`
The credential helper protocol, together with the HTTP layer, have
been enhanced to support authentication schemes different from
username & password pair, like Bearer and NTLM.
* bc/credential-scheme-enhancement:
credential: add method for querying capabilities
credential-cache: implement authtype capability
t: add credential tests for authtype
credential: add support for multistage credential rounds
t5563: refactor for multi-stage authentication
docs: set a limit on credential line length
credential: enable state capability
credential: add an argument to keep state
http: add support for authtype and credential
docs: indicate new credential protocol fields
credential: add a field called "ephemeral"
credential: gate new fields on capability
credential: add a field for pre-encoded credentials
http: use new headers for each object request
remote-curl: reset headers on new request
credential: add an authtype field
Tests to ensure interoperability between reftable written by jgit
and our code have been added and enabled in CI.
* ps/ci-test-with-jgit:
t0612: add tests to exercise Git/JGit reftable compatibility
t0610: fix non-portable variable assignment
t06xx: always execute backend-specific tests
ci: install JGit dependency
ci: make Perforce binaries executable for all users
ci: merge scripts which install dependencies
ci: fix setup of custom path for GitLab CI
ci: merge custom PATH directories
ci: convert "install-dependencies.sh" to use "/bin/sh"
ci: drop duplicate package installation for "linux-gcc-default"
ci: skip sudo when we are already root
ci: expose distro name in dockerized GitHub jobs
ci: rename "runs_on_pool" to "distro"
Code to write out reftable has seen some optimization and
simplification.
* ps/reftable-write-optim:
reftable/block: reuse compressed array
reftable/block: reuse zstream when writing log blocks
reftable/writer: reset `last_key` instead of releasing it
reftable/writer: unify releasing memory
reftable/writer: refactorings for `writer_flush_nonempty_block()`
reftable/writer: refactorings for `writer_add_record()`
refs/reftable: don't recompute committer ident
reftable: remove name checks
refs/reftable: skip duplicate name checks
refs/reftable: perform explicit D/F check when writing symrefs
refs/reftable: fix D/F conflict error message on ref copy
During the startup of Git, we call `initialize_the_repository()` to set
up `the_repository` as well as `the_index`. Part of this setup is also
to set the default object hash of the repository to SHA1. This has the
effect that `the_hash_algo` is getting initialized to SHA1, as well.
This default hash algorithm eventually gets overridden by most Git
commands via `setup_git_directory()`, which also detects the actual hash
algorithm used by the repository.
There are some commands though that don't access a repository at all, or
at a later point only, and thus retain the default hash function for
some amount of time. As some of the the preceding commits demonstrate,
this can lead to subtle issues when we access `the_hash_algo` when no
repository has been set up.
Address this issue by dropping the set up of the default hash algorithm
completely. The effect of this is that `the_hash_algo` will map to a
`NULL` pointer and thus cause Git to crash when something tries to
access the hash algorithm without it being properly initialized. It thus
forces all Git commands to explicitly set up the hash algorithm in case
there is no repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our fuzzing setups don't work in a proper repository, but only use the
in-memory configured `the_repository`. Consequently, we never go through
the full repository setup procedures and thus do not set up the hash
algo used by the repository.
The commit-graph fuzzer does rely on a properly initialized hash algo
though. Initialize it explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to run git-shortlog(1) outside of a repository by passing
it output from git-log(1) via standard input. Obviously, as there is no
repository in that context, it is thus unsupported to pass any revisions
as arguments.
Regardless of that we still end up calling `setup_revisions()`. While
that works alright, it is somewhat strange. Furthermore, this is about
to cause problems when we unset the default object hash.
Refactor the code to only call `setup_revisions()` when we have a
repository. This is safe to do as we already verify that there are no
arguments when running outside of a repository anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-diff(1) command can be used outside repositories to diff two
files with each other. But even if there is no repository we will end up
hashing the files that we are diffing so that we can print the "index"
line:
```
diff --git a/a b/b
index 7898192..6178079 100644
--- a/a
+++ b/b
@@ -1 +1 @@
-a
+b
```
We implicitly use SHA1 to calculate the hash here, which is because
`the_repository` gets initialized with SHA1 during the startup routine.
We are about to stop doing this though such that `the_repository` only
ever has a hash function when it was properly initialized via a repo's
configuration.
To give full control to our users, we would ideally add a new switch to
git-diff(1) that allows them to specify the hash function when executed
outside of a repository. But for now, we only convert the code to make
this explicit such that we can stop setting the default hash algorithm
for `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Verifying a bundle requires us to have a repository. This is encoded in
`verify_bundle()`, which will return an error if there is no repository.
We call `open_bundle()` before we call `verify_bundle()` though, which
already performs some verifications even though we may ultimately abort
due to a missing repository.
This is problematic because `open_bundle()` already reads the bundle
header and verifies that it contains a properly formatted hash. When
there is no repository we have no clue what hash function to expect
though, so we always end up assuming SHA1 here, which may or may not be
correct. Furthermore, we are about to stop initializing `the_hash_algo`
when there is no repository, which will lead to segfaults.
Check early on whether we have a repository to fix this issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We access `the_hash_algo` in git-blame(1) before we have executed
`parse_options_start()`, which may not be properly set up in case we
have no repository. This is fine for most of the part because all the
call paths that lead to it (git-blame(1), git-annotate(1) as well as
git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository.
There is one exception though, namely when passing `-h` to print the
help. Here we will access `the_hash_algo` even if there is no repo.
This works fine right now because `the_hash_algo` gets sets up to point
to the SHA1 algorithm via `initialize_repository()`. But we're about to
stop doing this, and thus the code would lead to a `NULL` pointer
exception.
Prepare the code for this and only access `the_hash_algo` after we are
sure that there is a proper repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--short=` option for git-rev-parse(1) allows the user to specify
to how many characters object IDs should be shortened to. The option is
broken though for SHA256 repositories because we set the maximum allowed
hash size to `the_hash_algo->hexsz` before we have even set up the repo.
Consequently, `the_hash_algo` will always be SHA1 and thus we truncate
every hash after at most 40 characters.
Fix this by accessing `the_hash_algo` only after we have set up the
repo.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb HTTP transport tries to read the remote HEAD reference by
downloading the "HEAD" file and then parsing it via `http_fetch_ref()`.
This function will either parse the file as an object ID in case it is
exactly `the_hash_algo->hexsz` long, or otherwise it will check whether
the reference starts with "ref :" and parse it as a symbolic ref.
This is broken when parsing detached HEADs of a remote SHA256 repository
because we never update `the_hash_algo` to the discovered remote object
hash. Consequently, `the_hash_algo` will always be the fallback SHA1
hash algorithm, which will cause us to fail parsing HEAD altogteher when
it contains a SHA256 object ID.
Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`.
While at it, let's make the expected SHA1 fallback explicit in our code,
which also addresses an upcoming issue where we are going to remove the
SHA1 fallback for `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are
set, then `compute_default_attr_source()` will try to look up the value
as a treeish. It is possible to hit that function while outside of a Git
repository though, for example when using `git grep --no-index`. In that
case, Git will hit a bug because we try to look up the main ref store
outside of a repository.
Handle the case gracefully and detect when we try to look up an attr
source without a repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `default_attr_source()` function lazily computes the attr source
supposedly once, only. This is done via a static variable `attr_source`
that contains the resolved object ID of the attr source's tree. If the
variable is the null object ID then we try to look up the attr source,
otherwise we skip over it.
This approach is flawed though: the variable will never be set to
anything else but the null object ID in case there is no attr source.
Consequently, we re-compute the information on every call. And in the
worst case, when we silently ignore bad trees, this will cause us to try
and look up the treeish every single time.
Improve this by introducing a separate variable `has_attr_source` to
track whether we already computed the attr source and, if so, whether we
have an attr source or not.
This also allows us to convert the `ignore_bad_attr_tree` to not be
static anymore as the code will only be executed once anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `OPT__ABBREV()` option can be used to add an option that abbreviates
object IDs. When given a length longer than `the_hash_algo->hexsz`, then
it will instead set the length to that maximum length.
It may not always be guaranteed that we have `the_hash_algo` initialized
properly as the hash algorithm can only be set up after we have set up
`the_repository`. In that case, the hash would always be truncated to
the hex length of SHA1, which may not be what the user desires.
In practice it's not a problem as all commands that use `OPT__ABBREV()`
also have `RUN_SETUP` set and thus cannot work without a repository.
Consequently, both `the_repository` and `the_hash_algo` would be
properly set up.
Regardless of that, harden the code to not truncate the length when we
didn't set up a repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While `validate_headref()` is only called from `is_git_directory()` in
"setup.c", it is currently implemented in "path.c". Move it over such
that it becomes clear that it is only really used during setup in order
to discover repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `validate_headref()` function takes a path to a supposed "HEAD" file
and checks whether its format is something that we understand. It is
used as part of our repository discovery to check whether a specific
directory is a Git directory or not.
Part of the validation is a check for a detached HEAD that contains a
plain object ID. To do this validation we use `get_oid_hex()`, which
relies on `the_hash_algo`. At this point in time the hash algo cannot
yet be initialized though because we didn't yet read the Git config.
Consequently, it will always be the SHA1 hash algorithm.
In practice this works alright because `get_oid_hex()` only ends up
checking whether the prefix of the buffer is a valid object ID. And
because SHA1 is shorter than SHA256, the function will successfully
parse SHA256 object IDs, as well.
It is somewhat fragile though and not really the intent to only check
for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
to check whether the "HEAD" file parses as any known hash.
One might be hard pressed to tighten the check even further and fully
validate the file contents, not only the prefix. In practice though that
wouldn't make a lot of sense as it could be that the repository uses a
hash function that produces longer hashes than SHA256, but which the
current version of Git doesn't understand yet. We'd still want to detect
the repository as proper Git repository in that case, and we will fail
eventually with a proper error message that the hash isn't understood
when trying to set up the repository format.
It follows that we could just leave the current code intact, as in
practice the code change doesn't have any user visible impact. But it
also prepares us for `the_hash_algo` being unset when there is no
repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/the-index-is-no-more:
repository: drop `initialize_the_repository()`
repository: drop `the_index` variable
builtin/clone: stop using `the_index`
repository: initialize index in `repo_init()`
builtin: stop using `the_index`
t/helper: stop using `the_index`
Newer versions of Ubuntu have dropped Python 2 starting with Ubuntu
23.04. By default though, our CI setups will try to use that Python
version on all Ubuntu-based jobs except for the "linux-gcc" one.
We didn't notice this issue due to two reasons:
- The "ubuntu:latest" tag always points to the latest LTS release.
Until a few weeks ago this was Ubuntu 22.04, which still had Python
2.
- Our Docker-based CI jobs had their own script to install
dependencies until 9cdeb34b96 (ci: merge scripts which install
dependencies, 2024-04-12), where we didn't even try to install
Python at all for many of them.
Since the CI refactorings have originally been implemented, Ubuntu
24.04 was released, and it being an LTS versions means that the "latest"
tag now points to that Python-2-less version. Consequently, those jobs
that use "ubuntu:latest" broke.
Address this by using Python 2 on Ubuntu 20.04, only, whereas we use
Python 3 on all other Ubuntu jobs. Eventually, we should think about
dropping support for Python 2 completely.
Reported-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our GitLab CI setup has a test gap where the fuzzers aren't exercised at
all. Add a smoke test, similar to the one we have in GitHub Workflows.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>