Subsequent commits will expand the `struct odb_source` to become a
generic interface for accessing an object database source. As part of
these refactorings we'll add a set of function pointers that will
significantly expand the structure overall.
Prepare for this by splitting out the `struct odb_source` into a
separate header. This keeps the high-level object database interface
detached from the low-level object database sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-info-bits-cleanup:
odb: convert `odb_has_object()` flags into an enum
odb: convert object info flags into an enum
odb: drop gaps in object info flag values
builtin/fsck: fix flags passed to `odb_has_object()`
builtin/backfill: fix flags passed to `odb_has_object()`
* ps/odb-for-each-object:
odb: drop unused `for_each_{loose,packed}_object()` functions
reachable: convert to use `odb_for_each_object()`
builtin/pack-objects: use `packfile_store_for_each_object()`
odb: introduce mtime fields for object info requests
treewide: drop uses of `for_each_{loose,packed}_object()`
treewide: enumerate promisor objects via `odb_for_each_object()`
builtin/fsck: refactor to use `odb_for_each_object()`
odb: introduce `odb_for_each_object()`
packfile: introduce function to iterate through objects
packfile: extract function to iterate through objects of a store
object-file: introduce function to iterate through objects
object-file: extract function to read object info from path
odb: fix flags parameter to be unsigned
odb: rename `FOR_EACH_OBJECT_*` flags
Small clean-up of xdiff library to remove unnecessary data
duplication.
* pw/xdiff-cleanups:
xdiff: remove unused data from xdlclass_t
xdiff: remove "line_hash" field from xrecord_t
"git merge-file" can be run outside a repository, but it ignored
all configuration, even the per-user ones. The command now uses
available configuration files to find its customization.
* yt/merge-file-outside-a-repository:
merge-file: honor merge.conflictStyle outside of a repository
A handful of documentation pages have been modernized to use the
"synopsis" style.
* ja/doc-synopsis-style-even-more:
doc: convert git-show to synopsis style
doc: fix some style issues in git-clone and for-each-ref-options
doc: finalize git-clone documentation conversion to synopsis style
doc: convert git-submodule to synopsis style
Allow recording process ID of the process that holds the lock next
to a lockfile for diagnosis.
* pc/lockfile-pid:
lockfile: add PID file for debugging stale locks
"git merge-ours" is taught to work better in a sparse checkout.
* sb/merge-ours-sparse:
merge-ours: integrate with sparse-index
merge-ours: drop USE_THE_REPOSITORY_VARIABLE
Test contrib/ things in CI to catch breakages before they enter the
"next" branch.
* jc/ci-test-contrib-too:
: Some of our downstream folks run more tests than we do and catch
: breakages in them, namely, where contrib/*/Makefile has "test" target.
: Let's make sure we fail upon accepting a new topic that break them in
: 'seen'.
ci: ubuntu: use GNU coreutils for dirname
test: optionally test contrib in CI
Transaction to create objects (or not) is currently tied to the
repository, but in the future a repository can have multiple object
sources, which may have different transaction mechanisms. Make the
odb transaction API per object source.
* jt/odb-transaction-per-source:
odb: transparently handle common transaction behavior
odb: prepare `struct odb_transaction` to become generic
object-file: rename transaction functions
odb: store ODB source in `struct odb_transaction`
Rename three functions around the commit_list data structure.
* ps/commit-list-functions-renamed:
commit: rename `free_commit_list()` to conform to coding guidelines
commit: rename `reverse_commit_list()` to conform to coding guidelines
commit: rename `copy_commit_list()` to conform to coding guidelines
Giving "git last-modified" a tree (not a commit-ish) died an
uncontrolled death, which has been corrected.
* tc/last-modified-not-a-tree:
last-modified: verify revision argument is a commit-ish
last-modified: remove double error message
last-modified: fix memory leak when more than one commit is given
last-modified: rewrite error message when more than one commit given
ISO C23 redefines strchr and friends that tradiotionally took
a const pointer and returned a non-const pointer derived from it to
preserve constness (i.e., if you ask for a substring in a const
string, you get a const pointer to the substring). Update code
paths that used non-const pointer to receive their results that did
not have to be non-const to adjust.
* cf/c23-const-preserving-strchr-updates-0:
gpg-interface: remove an unnecessary NULL initialization
global: constify some pointers that are not written to
Replace assertion-style 'test -f' checks with Git's
test_path_is_file() helper for clearer failures and
consistency.
Signed-off-by: Ashwani Kumar Kamal <ashwanikamal.im421@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the reason in the preceding commit, convert the
`odb_has_object()` flags into an enum.
With this change, we would have catched the misuse of `odb_has_object()`
that was fixed in a preceding commit as the compiler would have
generated a warning:
../builtin/backfill.c:71:9: error: implicit conversion from enumeration type 'enum odb_object_info_flag' to different enumeration type 'enum odb_has_object_flag' [-Werror,-Wenum-conversion]
70 | if (!odb_has_object(ctx->repo->objects, &list->oid[i],
| ~~~~~~~~~~~~~~
71 | OBJECT_INFO_FOR_PREFETCH))
| ^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the object info flags into an enum and adapt all functions that
receive these flags as parameters to use the enum instead of an integer.
This serves two purposes:
- The function signatures become more self-documenting, as callers
don't have to wonder which flags they expect.
- The compiler can warn when a wrong flag type is passed.
Note that the second benefit is somewhat limited. For example, when
or-ing multiple enum flags together the result will be an integer, and
the compiler will not warn about such use cases. But where it does help
is when a single flag of the wrong type is passed, as the compiler would
generate a warning in that case.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object info flag values have a two gaps in their definitions, where
some bits are skipped over. These gaps don't really hurt, but it makes
one wonder whether anything is going on and whether a subset of flags
might be defined somewhere else.
That's not the case though. Instead, this is a case of flags that have
been dropped in the past:
- The value 4 was used by `OBJECT_INFO_SKIP_CACHED`, removed in
9c8a294a1a (sha1-file: remove OBJECT_INFO_SKIP_CACHED, 2020-01-02).
- The value 8 was used by `OBJECT_INFO_ALLOW_UNKNOWN_TYPE`, removed in
ae24b032a0 (object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag,
2025-05-16).
Close those gaps to avoid any more confusion.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `mark_object()` we invoke `has_object()` with a value of 1. This is
somewhat fishy given that the function expects a bitset of flags, so any
behaviour that this results in is purely coincidental and may break at
any point in time.
The call to `has_object()` was originally introduced in 9eb86f41de
(fsck: do not lazy fetch known non-promisor object, 2020-08-05). The
intent here was to skip lazy fetches of promisor objects: we have
already verified that the object is not a promisor object, so if the
object is missing it indicates a corrupt repository.
The hardcoded value that we pass maps to `HAS_OBJECT_RECHECK_PACKED`,
which is probably the intended behaviour: `odb_has_object()` will not
fetch promisor objects unless `HAS_OBJECT_FETCH_PROMISOR` is passed, but
we may want to verify that no concurrent process has written the object
that we're trying to read.
Convert the code to use the named flag instead of the the hardcoded
value.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `fill_missing_blobs()` receives an array of object IDs and
verifies for each of them whether the corresponding object exists. If it
doesn't exist, we add it to a set of objects and then batch-fetch all of
the objects at once.
The check for whether or not we already have the object is broken
though: we pass `OBJECT_INFO_FOR_PREFETCH`, but `odb_has_object()`
expects us to pass `HAS_OBJECT_*` flags. The flag expands to:
- `OBJECT_INFO_QUICK`, which asks the object database to not reprepare
in case the object wasn't found. This makes sense, as we'd otherwise
reprepare the object database as many times as we have missing
objects.
- `OBJECT_INFO_SKIP_FETCH_OBJECT`, which asks the object database to
not fetch the object in case it's missing. Again, this makes sense,
as we want to batch-fetch the objects.
This shows that we indeed want the equivalent of this flag, but of
course represented as `HAS_OBJECT_*` flags.
Luckily, the code is already working correctly. The `OBJECT_INFO` flag
expands to `(1 << 3) | (1 << 4)`, none of which are valid `HAS_OBJECT`
flags. And if no flags are passed, `odb_has_object()` ends up calling
`odb_read_object_info_extended()` with exactly the above two flags that
we wanted to set in the first place.
Of course, this is pure luck, and this can break any moment. So let's
fix this and correct the code to not pass any flags at all.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For a line to be an anchor it has to appear in each of the files being
diffed exactly once. With that in mind lets delay checking whether
a line is an anchor until we know there is exactly one instance of
the line in each file. As each line is checked at most once, there
is no need to cache the result of is_anchor() and we can drop that
field from the hashmap entries. When diffing 5000 recent commits in
git.git this gives a modest speedup of ~2%. In the (rather extreme)
example below that consists largely of deletions the speedup is ~16%.
seq 0 10000000 >old
printf '%s\n' 300000 100000 200000 >new
git diff --no-index --anchored=300000 old new
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not use // comments in our C code, which is implied by the
description of multi-line comment rule and its examples, but is not
explicitly spelled out. Spell it out.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame --ignore-revs=... --color-lines" did not account for
ignored revisions passing blame to the same commit an adjacent line
gets blamed for.
* rs/blame-ignore-colors-fix:
blame: fix coloring for repeated suspects
GitHub repository banner update.
* am/doc-github-contributiong-link-to-submittingpatches:
.github/CONTRIBUTING.md: link to SubmittingPatches on git-scm.com
When "git show-index" is run outside a repository, it silently
defaults to SHA-1; the tool now warns when this happens.
* sp/show-index-warn-fallback:
show-index: use gettext wrapping in user facing error messages
show-index: warn when falling back to SHA-1 outside a repository
Five commands include these options. Let’s link to the command so that
the curious user can learn more about what “rerere” is about.
It’s also better to consistently refer to things like
e.g. “git-subcommand(1)” over `git subcommand` or `subcommand`.
Also apply the same treatment to git-add(1).
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the algorithm-agnostic is_null_oid() and push the dependency of
read_mmblob() on the_repository->objects to its callers. This allows it
to be used with arbitrary object databases.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>