The logic to close an object database is currently contained in the
packfile subsystem. That choice is somewhat relatable, as most of the
logic really is to close resources associated with the packfile store
itself. But we also end up handling object sources and commit graphs,
which certainly is not related to packfiles.
Move the function into the object database subsystem and rename it to
`odb_close()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git maintenance" command learns the "geometric" strategy where it
avoids doing maintenance tasks that rebuilds everything from
scratch.
* ps/maintenance-geometric:
t7900: fix a flaky test due to git-repack always regenerating MIDX
builtin/maintenance: introduce "geometric" strategy
builtin/maintenance: make "gc" strategy accessible
builtin/maintenance: extend "maintenance.strategy" to manual maintenance
builtin/maintenance: run maintenance tasks depending on type
builtin/maintenance: improve readability of strategies
builtin/maintenance: don't silently ignore invalid strategy
builtin/maintenance: make the geometric factor configurable
builtin/maintenance: introduce "geometric-repack" task
builtin/gc: make `too_many_loose_objects()` reusable without GC config
builtin/gc: remove global `repack` variable
We have two different repacking strategies in Git:
- The "gc" strategy uses git-gc(1).
- The "incremental" strategy uses multi-pack indices and `git
multi-pack-index repack` to merge together smaller packfiles as
determined by a specific batch size.
The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:
- The "gc" strategy performs regular all-into-one repacks. Furthermore
it is rather inflexible, as it is not easily possible for a user to
enable or disable specific subtasks.
- The "incremental" strategy is not a full replacement for the "gc"
strategy as it doesn't know to prune stale data.
So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.
Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.
One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.
This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.
Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.
If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:
- It is impossible to use the default "gc" strategy for a specific
repository when the strategy was globally set to a different strategy.
- It is not possible to use git-gc(1) for scheduled maintenance.
Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.
In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.
So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.
Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.
Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.
But luckily, we can match that behaviour by:
- Keeping all current tasks of the incremental strategy as
`MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
run during manual maintenance.
- Configuring the "gc" task so that it is invoked during manual
maintenance.
Like this, the user shouldn't observe any difference in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We basically have three different ways to execute repository
maintenance:
1. Manual maintenance via `git maintenance run`.
2. Automatic maintenance via `git maintenance run --auto`.
3. Scheduled maintenance via `git maintenance run --schedule=`.
At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.
Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.
The types picked for existing strategies match the status quo:
- The default strategy is only ever executed as part of a manual
maintenance run. It is not possible to use it for scheduled
maintenance.
- The incremental strategy is only ever executed as part of a
scheduled maintenance run. It is not possible to use it for manual
maintenance.
The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our maintenance strategies are essentially a large array of structures,
where each of the tasks can be enabled and scheduled individually. With
the current layout though all the configuration sits on the same nesting
layer, which makes it a bit hard to discern which initialized fields
belong to what task.
Improve readability of the individual tasks by using nested designated
initializers instead.
Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.
Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.
Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.
There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.
To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.
This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.
Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.
Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.
The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.
For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.
As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of different sites that want to iterate through all
packs of a given `struct packfile_store`. This pattern is somewhat
verbose and repetitive, which makes it somewhat cumbersome.
Introduce a new macro `repo_for_each_pack()` that removes some of the
boilerplate.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running maintenance tasks via git-maintenance(1) we have a couple
of auto-conditions that check whether or not a specific task should be
running. One such check is for incremental repacks, which essentially
use `git multi-pack-index repack` to repack a set of smaller packfiles
into one larger packfile.
The auto-condition for this task checks how many packfiles there are
that aren't indexed by any multi-pack index. If there is a sufficient
number then we execute the above command to combine those into a single
pack and add that pack to the MIDX.
As we don't care about MIDX'd packs we use `packfile_store_get_packs()`,
which knows to not load any packs that are indexed by a MIDX. But as
explained in the preceding commit, we want to get rid of that function.
We already handle packfiles that have a MIDX by the very nature of this
function, as we explicitly count non-MIDX'd packs. As such, we can
trivially switch over to use `packfile_store_get_all_packs()` instead.
Do so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_all_packs()` function prepares the packfile store and then
returns its packfiles. Refactor it to accept a packfile store instead of
a repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_packed_git()` function prepares the packfile store and then
returns its packfiles. Refactor it to accept a packfile store instead of
a repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reprepare_packed_git()` we perform a couple of operations:
- We reload alternate object directories.
- We clear the loose object cache.
- We reprepare packfiles.
While the logic is hosted in "packfile.c", it clearly reaches into other
subsystems that aren't related to packfiles.
Split up the responsibility and introduce `odb_reprepare()` which now
becomes responsible for repreparing the whole object database. The
existing `reprepare_packed_git()` function is refactored accordingly and
only cares about reloading the packfile store now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
builtin/remote: determine whether refs need renaming early on
builtin/remote: fix sign comparison warnings
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.
* ps/object-file-wo-the-repository:
object-file: get rid of `the_repository` in index-related functions
object-file: get rid of `the_repository` in `force_object_loose()`
object-file: get rid of `the_repository` in `read_loose_object()`
object-file: get rid of `the_repository` in loose object iterators
object-file: remove declaration for `for_each_file_in_obj_subdir()`
object-file: inline `for_each_loose_file_in_objdir_buf()`
object-file: get rid of `the_repository` when writing objects
odb: introduce `odb_write_object()`
loose: write loose objects map via their source
object-file: get rid of `the_repository` in `finalize_object_file()`
object-file: get rid of `the_repository` in `loose_object_info()`
object-file: get rid of `the_repository` when freshening objects
object-file: inline `check_and_freshen()` functions
object-file: get rid of `the_repository` in `has_loose_object()`
object-file: stop using `the_hash_algo`
object-file: fix -Wsign-compare warnings
The config API had a set of convenience wrapper functions that
implicitly use the_repository instance; they have been removed and
inlined at the calling sites.
* ps/config-wo-the-repository: (21 commits)
config: fix sign comparison warnings
config: move Git config parsing into "environment.c"
config: remove unused `the_repository` wrappers
config: drop `git_config_set_multivar()` wrapper
config: drop `git_config_get_multivar_gently()` wrapper
config: drop `git_config_set_multivar_in_file_gently()` wrapper
config: drop `git_config_set_in_file_gently()` wrapper
config: drop `git_config_set()` wrapper
config: drop `git_config_set_gently()` wrapper
config: drop `git_config_set_in_file()` wrapper
config: drop `git_config_get_bool()` wrapper
config: drop `git_config_get_ulong()` wrapper
config: drop `git_config_get_int()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string_multi()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get()` wrapper
config: drop `git_config_clear()` wrapper
...
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove
`git_config_set_multivar_in_file_gently()`. All callsites are adjusted
so that they use
`repo_config_set_multivar_in_file_gently(the_repository, ...)` instead.
While some callsites might already have a repository available, this
mechanical conversion is the exact same as the current situation and
thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_set()`. All
callsites are adjusted so that they use `repo_config_set(the_repository,
...)` instead. While some callsites might already have a repository
available, this mechanical conversion is the exact same as the current
situation and thus cannot cause any regression. Those sites should
eventually be cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_bool()`. All
callsites are adjusted so that they use
`repo_config_get_bool(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_ulong()`. All
callsites are adjusted so that they use
`repo_config_get_ulong(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_int()`. All
callsites are adjusted so that they use
`repo_config_get_int(the_repository, ...)` instead. While some callsites
might already have a repository available, this mechanical conversion is
the exact same as the current situation and thus cannot cause any
regression. Those sites should eventually be cleaned up in a later patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_string()`.
All callsites are adjusted so that they use
`repo_config_get_string(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_string()`.
All callsites are adjusted so that they use
`repo_config_get_string(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove
`git_config_get_string_multi()`. All callsites are adjusted so that they
use `repo_config_get_string_multi(the_repository, ...)` instead. While
some callsites might already have a repository available, this
mechanical conversion is the exact same as the current situation and
thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_value()`. All
callsites are adjusted so that they use
`repo_config_get_value(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get()`. All
callsites are adjusted so that they use `repo_config_get(the_repository,
...)` instead. While some callsites might already have a repository
available, this mechanical conversion is the exact same as the current
situation and thus cannot cause any regression. Those sites should
eventually be cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
find_cfg_ent() allocates a struct reflog_expire_entry_option via
FLEX_ALLOC_MEM and inserts it into a linked list in the
reflog_expire_options structure. The entries in this list are never
freed, resulting in a leak in cmd_reflog_expire and the gc reflog expire
maintenance task:
Direct leak of 39 byte(s) in 1 object(s) allocated from:
#0 0x7ff975ee6883 in calloc (/lib64/libasan.so.8+0xe6883)
#1 0x0000010edada in xcalloc ../wrapper.c:154
#2 0x000000df0898 in find_cfg_ent ../reflog.c:28
#3 0x000000df0898 in reflog_expire_config ../reflog.c:70
#4 0x00000095c451 in configset_iter ../config.c:2116
#5 0x0000006d29e7 in git_config ../config.h:724
#6 0x0000006d29e7 in cmd_reflog_expire ../builtin/reflog.c:205
#7 0x0000006d504c in cmd_reflog ../builtin/reflog.c:419
#8 0x0000007e4054 in run_builtin ../git.c:480
#9 0x0000007e4054 in handle_builtin ../git.c:746
#10 0x0000007e8a35 in run_argv ../git.c:813
#11 0x0000007e8a35 in cmd_main ../git.c:953
#12 0x000000441e8f in main ../common-main.c:9
#13 0x7ff9754115f4 in __libc_start_call_main (/lib64/libc.so.6+0x35f4)
#14 0x7ff9754116a7 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x36a7)
#15 0x000000444184 in _start (/home/jekeller/libexec/git-core/git+0x444184)
Close this leak by adding a reflog_clear_expire_config() function which
iterates the linked list and frees its elements. Call it upon exit of
cmd_reflog_expire() and reflog_expire_condition().
Add a basic test which covers this leak. While at it, cover the
functionality from commit commit 3cb22b8efe (Per-ref reflog expiry
configuration, 2008-06-15). We've had this support for years, but lacked
any tests.
Co-developed-by: Jeff King <peff@peff.net>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The iterators for loose objects still rely on `the_repository`. Refactor
them:
- `for_each_loose_file_in_objdir()` is refactored so that the caller
is now expected to pass an `odb_source` as parameter instead of the
path to that source. Furthermore, it is renamed accordingly to
`for_each_loose_file_in_source()`.
- `for_each_loose_object()` is refactored to take in an object
database now and calls the above function in a loop.
This allows us to get rid of the global dependency.
Adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
Some code paths in the "git prune" used to ignore passed in
repository object and used the_repository singleton instance
instead, which has been corrected.
* ac/prune-wo-the-repository:
builtin/prune: stop depending on 'the_repository'
repository: move 'repository_format_precious_objects' to repo scope
Use of sysctl() system call to learn the total RAM size used on
BSDs has been corrected.
* cb/total-ram-bsd-fix:
builtin/gc: correct total_ram calculation with HAVE_BSD_SYSCTL
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
The calls to sysctl() assume a 64-bit memory size for the variable
holding the value, but the actual size depends on the key name and
platform, at least for HW_PHYSMEM.
Detect any mismatched reads, and retry with a shorter variable
when needed.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'extensions.preciousObjects' setting when set true, prevents
operations that might drop objects from the object storage. This setting
is populated in the global variable
'repository_format_precious_objects'.
Move this global variable to repo scope by adding it to 'struct
repository and also refactor all the occurences accordingly.
This change is part of an ongoing effort to eliminate global variables,
improve modularity and help libify the codebase.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git maintenance" lacked the care "git gc" had to avoid holding
onto the repository lock for too long during packing refs, which
has been remedied.
* ps/maintenance-ref-lock:
builtin/maintenance: fix locking race when handling "gc" task
builtin/gc: avoid global state in `gc_before_repack()`
usage: allow dying without writing an error message
builtin/maintenance: fix locking race with refs and reflogs tasks
builtin/maintenance: split into foreground and background tasks
builtin/maintenance: fix typedef for function pointers
builtin/maintenance: extract function to run tasks
builtin/maintenance: stop modifying global array of tasks
builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
builtin/maintenance: centralize configuration of explicit tasks
builtin/gc: drop redundant local variable
builtin/gc: use designated field initializers for maintenance tasks
The "gc" task has a similar locking race as the one that we have fixed
for the "pack-refs" and "reflog-expire" tasks in preceding commits. Fix
this by splitting up the logic of the "gc" task:
- We execute `gc_before_repack()` in the foreground, which contains
the logic that git-gc(1) itself would execute in the foreground, as
well.
- We spawn git-gc(1) after detaching, but with a new hidden flag that
suppresses calling `gc_before_repack()`.
Like this we have roughly the same logic as git-gc(1) itself and know to
repack refs and reflogs before detaching, thus fixing the race.
Note that `gc_before_repack()` is renamed to `gc_foreground_tasks()` to
better reflect what this function does.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `gc_before_repack()` should only ever run once in git-gc(1), but we
may end up calling it twice when the "--detach" flag is passed. The
duplicated call is avoided though via a static flag in this function.
This pattern is somewhat unintuitive though. Refactor it to drop the
static flag and instead guard the second call of `gc_before_repack()`
via `opts.detach`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in the preceding commit, git-gc(1) knows to detach only
after it has already packed references and expired reflogs. This is done
to avoid racing around their respective lockfiles.
Adapt git-maintenance(1) accordingly and run the "pack-refs" and
"reflog-expire" tasks in the foreground. Note that the "gc" task has the
same issue, but the fix is a bit more involved there and will thus be
done in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both git-gc(1) and git-maintenance(1) have logic to daemonize so that
the maintenance tasks are performed in the background. git-gc(1) has
some special logic though to not perform _all_ housekeeping tasks in the
background: both references and reflogs are still handled synchronously
in the foreground.
This split exists because otherwise it may easily happen that git-gc(1)
keeps the "packed-refs" file locked for an extended amount of time,
where the next Git command that wants to modify any reference could now
fail. This was especially important in the past, where git-gc(1) was
still executed directly as part of our automatic maintenance: git-gc(1)
was invoked via `git gc --auto --detach`, so we knew to handle most of
the maintenance tasks in the background while doing those parts that may
cause locking issues in the foreground.
We have since moved to git-maintenance(1), which is a more flexible
replacement for git-gc(1). By default this command runs git-gc(1), only,
but it can be configured to run different tasks, as well. This command
does not know about the split between maintenance tasks that should run
before and after detach though, and this has led to several bug reports
about spurious locking errors for the "packed-refs" file.
Prepare for a fix by introducing this split for maintenance tasks. Note
that this commit does not yet change any of the tasks, so there should
not (yet) be a change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The typedefs for `maintenance_task_fn` and `maintenance_auto_fn` are
somewhat confusingly not true function pointers. As such, any user of
those typedefs needs to manually add the pointer to make use of them.
Fix this by making these true function pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the function to run maintenance tasks. This function will be
reused in a subsequent commit where we introduce a split between
maintenance tasks that run before and after daemonizing the process.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>