A part of code paths that deals with loose objects has been cleaned
up.
* ps/object-source-loose:
object-file: refactor writing objects via a stream
object-file: rename `write_object_file()`
object-file: refactor freshening of objects
object-file: rename `has_loose_object()`
object-file: read objects via the loose object source
object-file: move loose object map into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object cache into loose source
object-file: introduce `struct odb_source_loose`
object-file: move `fetch_if_missing`
odb: adjust naming to free object sources
odb: introduce `odb_source_new()`
odb: fix subtle logic to check whether an alternate is usable
"git replay" (experimental) learned to perform ref updates itself
in a transaction by default, instead of emitting where each refs
should point at and leaving the actual update to another command.
* sa/replay-atomic-ref-updates:
replay: add replay.refAction config option
replay: make atomic ref updates the default behavior
replay: use die_for_incompatible_opt2() for option validation
Adding a repository that uses a different hash function is a no-no,
but "git submodule add" did nt prevent it, which has been corrected.
* bc/submodule-force-same-hash:
read-cache: drop submodule check from add_to_cache()
object-file: disallow adding submodules of different hash algo
The code to expand attribute macros has been rewritten to avoid
recursion to avoid running out of stack space in an uncontrolled
way.
* jk/attr-macroexpand-wo-recursion:
attr: avoid recursion when expanding attribute macros
Ever since we added whitespace rules for this project, we misspelt
an entry, which has been corrected.
* jc/gitattributes-whitespace-no-indent-fix:
.gitattributes: remove misspelled no-op whitespace attribute
"git maintenance" command learned "is-needed" subcommand to tell if
it is necessary to perform various maintenance tasks.
* kn/maintenance-is-needed:
maintenance: add 'is-needed' subcommand
maintenance: add checking logic in `pack_refs_condition()`
refs: add a `optimize_required` field to `struct ref_storage_be`
reftable/stack: add function to check if optimization is required
reftable/stack: return stack segments directly
As "git diff --quiet" only cares about the existence of any
changes, disable rename/copy detection to skip more expensive
processing whose result will be discarded anyway.
* rs/diff-quiet-no-rename:
diff: disable rename detection with --quiet
Code clean-up.
* kn/refs-optim-cleanup:
t/pack-refs-tests: move the 'test_done' to callees
refs: rename 'pack_refs_opts' to 'refs_optimize_opts'
refs: move to using the '.optimize' functions
Some ref backend storage can hold not just the object name of an
annotated tag, but the object name of the object the tag points at.
The code to handle this information has been streamlined.
* ps/ref-peeled-tags:
t7004: do not chdir around in the main process
ref-filter: fix stale parsed objects
ref-filter: parse objects on demand
ref-filter: detect broken tags when dereferencing them
refs: don't store peeled object IDs for invalid tags
object: add flag to `peel_object()` to verify object type
refs: drop infrastructure to peel via iterators
refs: drop `current_ref_iter` hack
builtin/show-ref: convert to use `reference_get_peeled_oid()`
ref-filter: propagate peeled object ID
upload-pack: convert to use `reference_get_peeled_oid()`
refs: expose peeled object ID via the iterator
refs: refactor reference status flags
refs: fully reset `struct ref_iterator::ref` on iteration
refs: introduce `.ref` field for the base iterator
refs: introduce wrapper struct for `each_ref_fn`
The list of packfiles used in a running Git process is moved from
the packed_git structure into the packfile store.
* ps/packed-git-in-object-store:
packfile: track packs via the MRU list exclusively
packfile: always add packfiles to MRU when adding a pack
packfile: move list of packs into the packfile store
builtin/pack-objects: simplify logic to find kept or nonlocal objects
packfile: fix approximation of object counts
http: refactor subsystem to use `packfile_list`s
packfile: move the MRU list into the packfile store
packfile: use a `strmap` to store packs by name
We replaced deprecated macos-13 with macos-14 image in GitHub
Actions CI, but we forgot that the image is for arm64. We have
been seeing a lot of test failures ever since. Switch to arm64
binary for Perforce tests.
* jc/ci-use-arm64-p4-on-macos:
Use Perforce arm64 binary on macOS CI jobs
The previous step replaced deprecated macos-13 image with macos-14
image on GitHub Actions CI. While x86-64 binaries can work there,
because macos-14 images are arm64 based (we could replace it with
macos-14-large that is x86-64), it makes more sense to use arm64
binary there. Without this change, we have been getting unusually
higher rate of failures from random macOS CI jobs railing to run
t98xx series of tests.
Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In add_to_cache(), we treat any directories as submodules, and complain
if we can't resolve their HEAD. This call to resolve_gitlink_ref() was
added by f937bc2f86 (add: error appropriately on repository with no
commits, 2019-04-09), with the goal of improving the error message for
empty repositories.
But we already resolve the submodule HEAD in index_path(), which is
where we find the actual oid we're going to use. Resolving it again here
introduces some downsides:
1. It's more work, since we have to open up the submodule repository's
files twice.
2. There are call paths that get to index_path() without going through
add_to_cache(). For instance, we'd want a similar informative
message if "git diff empty" finds that it can't resolve the
submodule's HEAD. (In theory we can also get there through
update-index, but AFAICT it refuses to consider directories as
submodules at all, and just complains about them).
3. The resolution in index_path() catches more errors that we don't
handle here. In particular, it will validate that the object format
for the submodule matches that of the superproject. This isn't a
bug, since our call in add_to_cache() throws away the oid it gets
without looking at it. But it certainly caused confusion for me
when looking at where the object-format check should go.
So instead of resolving the submodule HEAD in add_to_cache(), let's just
teach the call in index_path() to actually produce an error message
(which it already does for other cases). That's probably what f937bc2f86
should have done in the first place, and it gives us a single point of
resolution when adding a submodule to the index.
The resulting output is slightly more verbose, as we propagate the error
up the call stack, but I think that's OK (and again, matches many other
errors we get when indexing fails).
I've left the text of the error message as-is, though it is perhaps
overly specific. There are many reasons that resolving the submodule
HEAD might fail, though outside of corruption or system errors it is
probably most likely that the submodule HEAD is simply on an unborn
branch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The design of the hash algorithm transition plan is that objects stored
must be entirely in one algorithm since we lack any way to indicate a
mix of algorithms. This also includes submodules, but we have
traditionally not enforced this, which leads to various problems when
trying to clone or check out the the submodule from the remote.
Since this cannot work in the general case, restrict adding a submodule
of a different algorithm to the index. Add tests for git add and git
submodule add that these are rejected.
Note that we cannot check this in git fsck because the malformed
submodule is stored in the tree as an object ID which is either
truncated (when a SHA-256 submodule is added to a SHA-1 repository) or
padded with zeros (when a SHA-1 submodule is added to a SHA-256
repository). We cannot detect even the latter case because someone
could have an actual submodule that actually ends in 24 zeros, which
would be a false positive.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git last-modified" was optimized by narrowing the set of paths to
follow as it dug deeper in the history.
* tc/last-modified-active-paths-optimization:
last-modified: implement faster algorithm
Given a set of attribute macros like:
[attr]a1 a2
[attr]a2 a3
...
[attr]a300000 -text
file a1
expanding the attributes for "file" requires expanding "a1" to "a2",
"a2" to "a3", and so on until hitting a non-macro expansion ("-text", in
this case). We implement this via recursion: fill_one() calls
macroexpand_one(), which then recurses back to fill_one(). As a result,
very deep macro chains like the one above can run out of stack space and
cause us to segfault.
The required stack space is fairly small; I needed on the order of
200,000 entries to get a segfault on Linux. So it's unlikely anybody
would hit this accidentally, leaving only malicious inputs. There you
can easily construct a repo which will segfault on clone (we look at
attributes during the checkout step, but you'd see the same trying to do
other operations, like diff in a bare repo). It's mostly harmless, since
anybody constructing such a repo is only preventing victims from cloning
their evil garbage, but it could be a nuisance for hosting sites.
One option to prevent this is to limit the depth of recursion we'll
allow. This is conceptually easy to implement, but it raises other
questions: what should the limit be, and do we need a configuration knob
for it?
The recursion here is simple enough that we can avoid those questions by
just converting it to iteration instead. Rather than iterate over the
states of a match_attr in fill_one(), we'll put them all in a queue, and
the expansion of each can add to the queue rather than recursing. Note
that this is a LIFO queue in order to keep the same depth-first order we
did with the recursive implementation. I've avoided using the word
"stack" in the code because the term is already heavily used to refer to
the stack of .gitattribute files that matches the tree structure of the
repository.
The test uses a limited stack size so we can trigger the problem with a
much smaller input than the one shown above. The value here (3000) is
enough to trigger the issue on my x86_64 Linux machine.
Reported-by: Ben Stav <benstav@miggo.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building "git contacts" script (in contrib/) left the resulting
file unexecutable, which has been corrected.
* dk/make-git-contacts-executable:
perl: also mark git-contacts executable
The build procedure based on meson learned to allow builders to
specify the directory to install HTML documents.
* dk/meson-html-dir:
meson: make GIT_HTML_PATH configurable
Build procedure for Wincred credential helper has been updated.
* tu/credential-wincred-makefile-update:
wincred: align Makefile with other Makefiles in contrib
Ever since 14f9e128 (Define the project whitespace policy,
2008-02-10) added the whitespace rules to .gitattributes, we spelled
the most general rule like so:
* whitespace=!indent,trail,space
in the top-level .gitattributes file. The intent of this line was
described in the commit log message:
- Unless otherwise specified, indent with SP that could be
replaced with HT are not "bad". But SP before HT in the
indent is "bad", and trailing whitespaces are "bad".
It clearly wanted to disable indent-with-non-tab, so !indent is most
likely a misspelt form of '-indent'. Because indent-with-non-tab
has never been enabled by default, by luck this was not causing any
ill effect.
We could either remove "!indent", or spell it "-indent". The
immediate effect would be the same. It would only start to make a
difference when/if we enable indent-with-non-tab by default in
future versions of Git.
Let's take the former option to remove "!indent" from the list. We
would feel the effect first-hand ourselves before anybody else if we
ever decide to change the built-in default whitespace rules, which
would be hidden from us if we decide to rewrite it to "-indent"
instead.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Detecting renames and copies improves diff's output. This effort is
wasted if we don't show any. Disable detection in that case.
This actually fixes the error code when using the options --cached,
--find-copies-harder, --no-ext-diff and --quiet together:
run_diff_index() indirectly calls diff-lib.c::show_modified(), which
queues even non-modified entries using diff_change() because we need
them for copy detection. diff_change() sets flags.has_changes, though,
which causes diff_can_quit_early() to declare we're done after seeing
only the very first entry -- way too soon.
Using --cached, --find-copies-harder and --quiet together without
--no-ext-diff was not affected even before, as it causes the flag
flags.diff_from_contents to be set, which disables the optimization
in a different way.
Reported-by: D. Ben Knoble <ben.knoble@gmail.com>
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git-maintenance(1)' command provides tooling to run maintenance
tasks over Git repositories. The 'run' subcommand, as the name suggests,
runs the maintenance tasks. When used with the '--auto' flag, it uses
heuristics to determine if the required thresholds are met for running
said maintenance tasks.
There is however a lack of insight into these heuristics. Meaning, the
checks are linked to the execution.
Add a new 'is-needed' subcommand to 'git-maintenance(1)' which allows
users to simply check if it is needed to run maintenance without
performing it.
This subcommand can check if it is needed to run maintenance without
actually running it. Ideally it should be used with the '--auto' flag,
which would allow users to check if the thresholds required are met. The
subcommand also supports the '--task' flag which can be used to check
specific maintenance tasks.
While adding the respective tests in 't/t7900-maintenance.sh', remove a
duplicate of the test: 'worktree-prune task with --auto honors
maintenance.worktree-prune.auto'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git-maintenance(1)' command supports an '--auto' flag. Usage of the
flag ensures to run maintenance tasks only if certain thresholds are
met. The heuristic is defined on a task level, wherein each task defines
an 'auto_condition', which states if the task should be run.
The 'pack-refs' task is hard-coded to return 1 as:
1. There was never a way to check if the reference backend needs to be
optimized without actually performing the optimization.
2. We can pass in the '--auto' flag to 'git-pack-refs(1)' which would
optimize based on heuristics.
The previous commit added a `refs_optimize_required()` function, which
can be used to check if a reference backend required optimization. Use
this within `pack_refs_condition()`.
This allows us to add a 'git maintenance is-needed' subcommand which can
notify the user if maintenance is needed without actually performing the
optimization. Without this change, the reference backend would always
state that optimization is needed.
Since we import 'revision.h', we need to remove the definition for
'SEEN' which is duplicated in the included header.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To allow users of the refs namespace to check if the reference backend
requires optimization, add a new field `optimize_required` field to
`struct ref_storage_be`. This field is of type `optimize_required_fn`
which is also introduced in this commit.
Modify the debug, files, packed and reftable backend to implement this
field. A following commit will expose this via 'git pack-refs' and 'git
refs optimize'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable backend performs auto-compaction as part of its regular
flow, which is required to keep the number of tables part of a stack at
bay. This allows it to stay optimized.
Compaction can also be triggered voluntarily by the user via the 'git
pack-refs' or the 'git refs optimize' command. However, currently there
is no way for the user to check if optimization is required without
actually performing it.
Extract out the heuristics logic from 'reftable_stack_auto_compact()'
into an internal function 'update_segment_if_compaction_required()'.
Then use this to add and expose `reftable_stack_compaction_required()`
which will allow users to check if the reftable backend can be
optimized.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `stack_table_sizes_for_compaction()` function returns individual
sizes of each reftable table. This function is only called by
`reftable_stack_auto_compact()` to decide which tables need to be
compacted, if any.
Modify the function to directly return the segments, which avoids the
extra step of receiving the sizes only to pass it to
`suggest_compaction_segment()`.
A future commit will also add functionality for checking whether
auto-compaction is necessary without performing it. This change allows
code re-usability in that context.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refreshes the Irish translation for Git 2.52, including new strings and
consistency improvements. Verified with `git-po-helper check`.
Signed-off-by: Aindriú Mac Giolla Eoin <aindriu80@gmail.com>