global/git

mirror of https://github.com/git/git.git synced 2026-03-04 22:47:35 +01:00

Author	SHA1	Message	Date
Patrick Steinhardt	2c5ee36e47	odb/source: make `read_alternates()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:26 -08:00
Patrick Steinhardt	c8c7337c24	odb/source: make `for_each_object()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:25 -08:00
Patrick Steinhardt	968592e421	odb/source: make `read_object_info()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Note that this function is a bit less straight-forward to convert compared to the other functions. The reason here is that the logic to read an object is: 1. We try to read the object. If it exists we return it. 2. If the object does not exist we reprepare the object database source. 3. We then try reading the object info a second time in case the reprepare caused it to appear. The second read is only supposed to happen for the packfile store though, as reading loose objects is not impacted by repreparing the object database. Ideally, we'd just move this whole logic into the ODB source. But that's not easily possible because we try to avoid the reprepare unless really required, which is after we have found out that no other ODB source contains the object, either. So the logic spans across multiple ODB sources, and consequently we cannot move it into an individual source. Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the backend that we already tried to look up the object once, and that this time around the ODB source should try to find any new objects that may have surfaced due to an on-disk change. With this flag, the "files" backend can trivially skip trying to re-read the object as a loose object. Furthermore, as we know that we only try the second read via the packfile store, we can skip repreparing loose objects and only reprepare the packfile store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:25 -08:00
Patrick Steinhardt	cfcfdf6592	odb: split `struct odb_source` into separate header Subsequent commits will expand the `struct odb_source` to become a generic interface for accessing an object database source. As part of these refactorings we'll add a set of function pointers that will significantly expand the structure overall. Prepare for this by splitting out the `struct odb_source` into a separate header. This keeps the high-level object database interface detached from the low-level object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:49:59 -08:00
Junio C Hamano	b1af291b4a	Merge branch 'ps/object-info-bits-cleanup' into ps/odb-sources * ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`	2026-02-23 13:48:48 -08:00
Junio C Hamano	703c97519d	Merge branch 'ps/odb-for-each-object' into ps/odb-sources * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags	2026-02-23 13:48:00 -08:00
Patrick Steinhardt	732ec9b17b	odb: convert `odb_has_object()` flags into an enum Following the reason in the preceding commit, convert the `odb_has_object()` flags into an enum. With this change, we would have catched the misuse of `odb_has_object()` that was fixed in a preceding commit as the compiler would have generated a warning: ../builtin/backfill.c:71:9: error: implicit conversion from enumeration type 'enum odb_object_info_flag' to different enumeration type 'enum odb_has_object_flag' [-Werror,-Wenum-conversion] 70 \| if (!odb_has_object(ctx->repo->objects, &list->oid[i], \| ~~~~~~~~~~~~~~ 71 \| OBJECT_INFO_FOR_PREFETCH)) \| ^~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-12 11:05:08 -08:00
Patrick Steinhardt	f6516a5241	odb: convert object info flags into an enum Convert the object info flags into an enum and adapt all functions that receive these flags as parameters to use the enum instead of an integer. This serves two purposes: - The function signatures become more self-documenting, as callers don't have to wonder which flags they expect. - The compiler can warn when a wrong flag type is passed. Note that the second benefit is somewhat limited. For example, when or-ing multiple enum flags together the result will be an integer, and the compiler will not warn about such use cases. But where it does help is when a single flag of the wrong type is passed, as the compiler would generate a warning in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-12 11:05:08 -08:00
Patrick Steinhardt	ae77afc347	odb: drop gaps in object info flag values The object info flag values have a two gaps in their definitions, where some bits are skipped over. These gaps don't really hurt, but it makes one wonder whether anything is going on and whether a subset of flags might be defined somewhere else. That's not the case though. Instead, this is a case of flags that have been dropped in the past: - The value 4 was used by `OBJECT_INFO_SKIP_CACHED`, removed in `9c8a294a1a` (sha1-file: remove OBJECT_INFO_SKIP_CACHED, 2020-01-02). - The value 8 was used by `OBJECT_INFO_ALLOW_UNKNOWN_TYPE`, removed in `ae24b032a0` (object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, 2025-05-16). Close those gaps to avoid any more confusion. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-12 11:05:08 -08:00
Justin Tobler	fa7d067923	odb: prepare `struct odb_transaction` to become generic An ODB transaction handles how objects are stored temporarily and eventually committed. Due to object storage being implemented differently for a given ODB source, the ODB transactions must be implemented in a manner specific to the source the objects are being written to. To provide generic transactions, `struct odb_transaction` is updated to store a commit callback that can be configured to support a specific ODB source. For now `struct odb_transaction_files` is the only transaction type and what is always returned when starting a transaction. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Patrick Steinhardt	7b7cbaef27	odb: introduce mtime fields for object info requests There are some use cases where we need to figure out the mtime for objects. Most importantly, this is the case when we want to prune unreachable objects. But getting at that data requires users to manually derive the info either via the loose object's mtime, the packfiles' mtime or via the ".mtimes" file. Introduce a new `struct object_info::mtimep` pointer that allows callers to request an object's mtime. This new field will be used in a subsequent commit. Note that the concept of "mtime" is ambiguous: given an object, it may be stored multiple times in the object database, and each of these instances may have a different mtime. Disambiguating these mtimes is nothing that can happen on the generic ODB layer: the caller may search for the oldest object, the newest object, or even the relation of object mtimes depending on the specific source they are located in. As such, it is the responsibility of the caller to disambiguate mtimes. A consequence of this is that it's most likely incorrect to look up the mtime via `odb_read_object_info()`, as this interface does not give us enough information to disambiguate the mtime. Document this accordingly and tell users to use `odb_for_each_object()` instead. Even with this gotcha though it's sensible to have this request as part of the object info, as the mtime is a property of the object storage format. If we for example had a "black-box" storage backend, we'd still need to be able to query it for the mtime info in a generic way. We could introduce a safety mechanism that for example calls `BUG()` in case we look up the mtime outside of `odb_for_each_object()`. But that feels somewhat heavy-handed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:08 -08:00
Patrick Steinhardt	df2fbdfa55	odb: introduce `odb_for_each_object()` Introduce a new function `odb_for_each_object()` that knows to iterate through all objects part of a given object database. This function is essentially a simple wrapper around the object database sources. Subsequent commits will adapt callers to use this new function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:07 -08:00
Patrick Steinhardt	cde615b6f0	object-file: introduce function to iterate through objects We have multiple divergent interfaces to iterate through objects of a specific backend: - `for_each_loose_object()` yields all loose objects. - `for_each_packed_object()` (somewhat obviously) yields all packed objects. These functions have different function signatures, which makes it hard to create a common abstraction layer that covers both of these. Introduce a new function `odb_source_loose_for_each_object()` to plug this gap. This function doesn't take any data specific to loose objects, but instead it accepts a `struct object_info` that will be populated the exact same as if `odb_source_loose_read_object()` was called. The benefit of this new interface is that we can continue to pass backend-specific data, as `struct object_info` contains a union for these exact use cases. This will allow us to unify how we iterate through objects across both loose and packed objects in a subsequent commit. The `for_each_loose_object()` function continues to exist for now, but it will be removed at the end of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:07 -08:00
Patrick Steinhardt	bd1855b897	odb: rename `FOR_EACH_OBJECT_` flags Rename the `FOR_EACH_OBJECT_` flags to have an `ODB_` prefix. This prepares us for a new upcoming `odb_for_each_object()` function and ensures that both the function and its flags have the same prefix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:06 -08:00
Junio C Hamano	bc5cbbe246	Merge branch 'ps/read-object-info-improvements' The object-info API has been cleaned up. * ps/read-object-info-improvements: packfile: drop repository parameter from `packed_object_info()` packfile: skip unpacking object header for disk size requests packfile: disentangle return value of `packed_object_info()` packfile: always populate pack-specific info when reading object info packfile: extend `is_delta` field to allow for "unknown" state packfile: always declare object info to be OI_PACKED object-file: always set OI_LOOSE when reading object info	2026-01-21 08:29:00 -08:00
Junio C Hamano	ec16dde5c8	Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-object * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs	2026-01-15 05:50:16 -08:00
Patrick Steinhardt	27d9486cbc	packfile: extend `is_delta` field to allow for "unknown" state The `struct object_info::u::packed::is_delta` field determines whether or not a specific object is stored as a delta. It only stores whether or not the object is stored as delta, so it is treated as a boolean value. This boolean is insufficient though: when reading a packed object via `packfile_store_read_object_info()` we know to skip parsing the actual object when the user didn't request any object-specific data. In that case we won't read the object itself, but will only look up its position in the packfile. Consequently, we do not know whether it is a delta or not. This isn't really an issue right now, as the check for an empty request is broken. But a subsequent commit will fix it, and once we do we will have the need to also represent an "unknown" delta state. Prepare for this change by introducing a new enum that encodes the object type. We don't use the "unknown" state just yet, but will start to do so in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-12 06:51:14 -08:00
Patrick Steinhardt	7a4bd1b836	packfile: always declare object info to be OI_PACKED When reading object info via a packfile we yield one of two types: - The object can either be OI_PACKED, which is what a caller would typically expect. - Or it can be OI_DBCACHED if it is stored in the delta base cache. The latter really is an implementation detail though, and callers typically don't care at all about the difference. Furthermore, the information whether or not it is part of the delta base cache can already be derived via the `is_delta` field, so the fact that we discern between OI_PACKED and OI_DBCACHED only further complicates the interface. There aren't all that many callers that care about the `whence` field in the first place. In fact, there's only three: - `packfile_store_read_object_info()` checks for `whence == OI_PACKED` and then populates the packfile information of the object info structure. We now start to do this also for deltified objects, which gives its callers strictly more information. - `repack_local_links()` wants to determine whether the object is part of a promisor pack and checks for `whence == OI_PACKED`. If so, it verifies that the packfile is a promisor pack. It's arguably wrong to declare that an object is not part of a promisor pack only because it is stored in the delta base cache. - `is_not_in_promisor_pack_obj()` does the same, but checks that a specific object is _not_ part of a promisor pack. The same reasoning as above applies. Drop the OI_DBCACHED enum completely. None of the callers seem to care about the distinction. Note that this also fixes a segfault introduced in `8c1b84bc97` (streaming: move logic to read packed objects streams into backend, 2025-11-23), which refactors how we stream packed objects. The intent is to only read packed objects in case they are stored non-deltified as we'd otherwise have to deflate them first. But the check for whether or not the object is stored as a delta was unconditionally done via `oi.u.packed.is_delta`, which is only valid in case `oi.whence` is `OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here, which means that none of the `oi.u.packed` fields were initialized at all. Consequently, we assumed the object was not stored as a delta, and then try to read the object from `oi.u.packed.pack`, which is a `NULL` pointer and thus causes a segfault. Add a test case for this issue so that this cannot regress in the future anymore. Reported-by: Matt Smiley <msmiley@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-12 06:51:14 -08:00
Patrick Steinhardt	a282a8f163	packfile: move MIDX into packfile store The multi-pack index still is tracked as a member of the object database source, but ultimately the MIDX is always tied to one specific packfile store. Move the structure into `struct packfile_store` accordingly. This ensures that the packfile store now keeps track of all data related to packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 06:40:08 -08:00
Patrick Steinhardt	84f0e60b28	packfile: move packfile store into object source The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 06:40:07 -08:00
Patrick Steinhardt	2816b748e5	odb: handle changing a repository's commondir The function `repo_set_gitdir()` is called in two situations: - To initialize the repository with its discovered location. As part of this we also set up the new object database. - To update the repository's discovered location in case the process changes its working directory so that we update relative paths. This means we also have to update any relative paths that are potentially used in the object database. In the context of the object database we ideally wouldn't ever have to worry about the second case: if all paths used by our object database sources were absolute, then we wouldn't have to update them. But unfortunately, the paths aren't only used to locate files owned by the given source, but we also use them for reporting purposes. One such example is `repo_get_object_directory()`, where we cannot just change semantics to always return absolute paths, as that is likely to break tooling out there. One solution to this would be to have both a "display path" and an "internal path". This would allow us to use internal paths for all internal matters, but continue to use the potentially-relative display paths so that we don't break compatibility. But converting the codebase to honor this split is quite a messy endeavour, and it wouldn't even help us with the goal to get rid of the need to update the display path on chdir(3p). Another solution would be to rework "setup.c" so that we never have to update paths in the first place. In that case, we'd only initialize the repository once we have figured out final locations for all directories. This would be a significant simplification of that subsystem indeed, but the current logic is so messy that it would take significant investments to get there. Meanwhile though, while object sources may still use relative paths, the best thing we can do is to handle the reparenting of the object source paths in the object database itself. This can be done by registering one callback for each object database so that we get notified whenever the current working directory changes, and we then perform the reparenting ourselves. Ideally, this wouldn't even happen on the object database level, but instead handled by each object database source. But we don't yet have proper pluggable object database sources, so this will need to be handled at a later point in time. The logic itself is rather simple: - We register the callback when creating the object database. - We unregister the callback when releasing it again. - We split up `set_git_dir_1()` so that it becomes possible to skip recreating the object database. This is required because the function is called both when the current working directory changes, but also when we set up the repository. Calling this function without skipping creation of the ODB will result in a bug in case it's already created. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	35d9fc65ed	odb: handle initialization of sources in `odb_new()` The logic to set up a new object database is currently distributed across two functions in "repository.c": - In `initialize_repository()` we initialize an empty object database. This object database is not fully initialized and doesn't have any sources attached to it. - The primary object database source is then created in `repo_set_gitdir()`. Ideally though, the logic should be entirely self-contained so that we can iterate more readily on how exactly the sources themselves get set up. Refactor `odb_new()` to handle both allocation and setup of the object database. This ensures that the object database is always initialized and ready for use, and it allows us to change how the sources get set up eventually. Note that `repo_set_gitdir()` still reaches into the sources when the function gets called with an already-initialized object database. This will be fixed in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	b67b2d9fb7	odb: move logic to disable ref updates into repo Our object database sources have a field `disable_ref_updates`. This field can obviously be set to disable reference updates, but it is somewhat curious that this logic is hosted by the object database. The reason for this is that it was primarily added to keep us from accidentally updating references while an ODB transaction is ongoing. Any objects part of the transaction have not yet been committed to disk, so new references that point to them might get corrupted in case we never end up committing the transaction. As such, whenever we create a new transaction we set up a new temporary ODB source and mark it as disabling reference updates. This has one (and only one?) upside: once we have committed the transaction, the temporary source will be dropped and thus we clean up the disabled reference updates automatically. But other than that, it's somewhat misdesigned: - We can have multiple ODB sources, but only the currently active source inhibits reference updates. - We're mixing concerns of the refdb with the ODB. Arguably, the decision of whether we can update references or not should be handled by the refdb. But that wouldn't be a great fit either, as there can be one refdb per worktree. So we'd again have the same problem that a "global" intent becomes localized to a specific instance. Instead, move the setting into the repository. While at it, convert it into a boolean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:15:59 -08:00
Patrick Steinhardt	f8bdf3127a	odb: refactor `odb_clear()` to `odb_free()` The function `odb_clear()` releases all resources allocated to an object database and ensures that all fields become zero'd out. Despite its naming though it doesn't really clear the object database so that it becomes ready for reuse afterwards again -- the caller would first have to reinitialize it, and that contradicts the terminology of "clearing" as we have defined it in our coding guidelines. There isn't really only a reason to have "clearing" semantics, either. There's only a single caller of `odb_clear()`, and that caller also ends up freeing the object database structure itself. Refactor the function to have "freeing" semantics instead, so that the structure itself is also freed, which allows us to drop some useless boilerplate to zero out the structure's members. This refactoring reveals that we're trying to close the commit graph multiple times: once directly via `free_commit_graph()`, and once via `odb_close()`. Drop the former call. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-19 17:41:03 -08:00
Patrick Steinhardt	9aaba57993	odb: adopt logic to close object databases The logic to close an object database is currently contained in the packfile subsystem. That choice is somewhat relatable, as most of the logic really is to close resources associated with the packfile store itself. But we also end up handling object sources and commit graphs, which certainly is not related to packfiles. Move the function into the object database subsystem and rename it to `odb_close()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-19 17:41:03 -08:00
Patrick Steinhardt	3e5e360888	object-file: refactor writing objects via a stream We have two different ways to write an object into the database: - We either provide the full buffer and write the object all at once. - Or we provide an input stream that has a `read()` function so that we can chunk the object. The latter is especially used for large objects, where it may be too expensive to hold the complete object in memory all at once. While we already have `odb_write_object()` at the ODB-layer, we don't have an equivalent for streaming an object. Introduce a new function `odb_write_object_stream()` to address this gap so that callers don't have to be aware of the inner workings of how to stream an object to disk with a specific object source. Rename `stream_loose_object()` to `odb_source_loose_write_stream()` to clarify its scope. This matches our modern best practices around how to name functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:48 -08:00
Patrick Steinhardt	f2bd88a308	object-file: refactor freshening of objects When writing an object that already exists in our object database we skip the write and instead only update mtimes of the object, either in its packed or loose object format. This logic is wholly contained in "object-file.c", but that file is really only concerned with loose objects. So it does not really make sense that it also contains the logic to freshen a packed object. Introduce a new `odb_freshen_object()` function that sits on the object database level and two functions `packfile_store_freshen_object()` and `odb_source_loose_freshen_object()`. Like this, the format-specific functions can be part of their respective subsystems, while the backend agnostic function to freshen an object sits at the object database layer. Note that this change also moves the logic that iterates through object sources from the object source layer into the object database layer. This change is intentional: object sources should ideally only have to worry about themselves, and coordination of different sources should be handled on the object database level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:47 -08:00
Patrick Steinhardt	376016ec71	object-file: move loose object map into loose source The loose object map is used to map from the repository's canonical object hash to the compatibility hash. As the name indicates, this map is only used for loose objects, and as such it is tied to a specific loose object source. Same as with preceding commits, move this map into the loose object source accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:47 -08:00
Patrick Steinhardt	90a93f9dea	object-file: move loose object cache into loose source Our loose objects use a cache that (optionally) stores all objects for each of the opened sharding directories. This cache is located in the `struct odb_source`, but now that we have `struct odb_source_loose` it makes sense to move it into the latter structure so that all state that relates to loose objects is entirely self-contained. Do so. While at it, rename corresponding functions to have a prefix that relates to `struct odb_source_loose`. Note that despite this prefix, the functions still accept a `struct odb_source` as input. This is done intentionally: once we introduce pluggable object databases, we will continue to accept this struct but then do a cast inside these functions to `struct odb_source_loose`. This design is similar to how we do it for our ref backends. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:46 -08:00
Patrick Steinhardt	ece43d9dc7	object-file: introduce `struct odb_source_loose` Currently, all state that relates to loose objects is held directly by the `struct odb_source`. Introduce a new `struct odb_source_loose` to hold the state instead so that it is entirely self-contained. This structure will eventually morph into the backend for accessing loose objects. As such, this is part of the refactorings to introduce pluggable object databases. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:46 -08:00
Patrick Steinhardt	0cc12dedef	object-file: move `fetch_if_missing` The `fetch_if_missing` global variable is declared in "object-file.h" but defined in "odb.c". The variable relates to the whole object database instead of only loose objects, so move the declaration into "odb.h" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:46 -08:00
Patrick Steinhardt	0820a4b120	odb: introduce `odb_source_new()` We have three different locations where we create a new ODB source. Deduplicate the logic via a new `odb_source_new()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:45 -08:00
Junio C Hamano	fbd67ab9a4	Merge branch 'ps/odb-clean-stale-wrappers' Code clean-up. * ps/odb-clean-stale-wrappers: odb: drop deprecated wrapper functions	2025-10-07 12:25:28 -07:00
Junio C Hamano	8c13c31404	Merge branch 'ps/packfile-store' Code clean-up around the in-core list of all the pack files and object database(s). * ps/packfile-store: packfile: refactor `get_packed_git_mru()` to work on packfile store packfile: refactor `get_all_packs()` to work on packfile store packfile: refactor `get_packed_git()` to work on packfile store packfile: move `get_multi_pack_index()` into "midx.c" packfile: introduce function to load and add packfiles packfile: refactor `install_packed_git()` to work on packfile store packfile: split up responsibilities of `reprepare_packed_git()` packfile: refactor `prepare_packed_git()` to work on packfile store packfile: reorder functions to avoid function declaration odb: move kept cache into `struct packfile_store` odb: move MRU list of packfiles into `struct packfile_store` odb: move packfile map into `struct packfile_store` odb: move initialization bit into `struct packfile_store` odb: move list of packfiles into `struct packfile_store` packfile: introduce a new `struct packfile_store`	2025-10-07 12:25:27 -07:00
Junio C Hamano	fd13909eb6	Merge branch 'jt/odb-transaction' The work to build on the bulk-checkin infrastructure to create many objects at once in a transaction and to abstract it into the generic object layer continues. * jt/odb-transaction: odb: add transaction interface object-file: update naming from bulk-checkin object-file: relocate ODB transaction code bulk-checkin: drop flush_odb_transaction() builtin/update-index: end ODB transaction when --verbose is specified bulk-checkin: remove ODB transaction nesting	2025-10-02 12:26:11 -07:00
Patrick Steinhardt	78237ea53d	packfile: split up responsibilities of `reprepare_packed_git()` In `reprepare_packed_git()` we perform a couple of operations: - We reload alternate object directories. - We clear the loose object cache. - We reprepare packfiles. While the logic is hosted in "packfile.c", it clearly reaches into other subsystems that aren't related to packfiles. Split up the responsibility and introduce `odb_reprepare()` which now becomes responsible for repreparing the whole object database. The existing `reprepare_packed_git()` function is refactored accordingly and only cares about reloading the packfile store now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:50 -07:00
Patrick Steinhardt	bd1a521de8	odb: move kept cache into `struct packfile_store` The object database tracks a cache of "kept" packfiles, which is used by git-pack-objects(1) to handle cruft objects. With the introduction of the `struct packfile_store` we have a better place to host this cache though. Move the cache accordingly. This moves the last bit of packfile-related state from the object database into the packfile store. Adapt the comment for the `packfiles` pointer in `struct object_database` to reflect this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	fe835b0ca0	odb: move MRU list of packfiles into `struct packfile_store` The object database tracks the list of packfiles in most-recently-used order, which is mostly used to favor reading from packfiles that contain most of the objects that we're currently accessing. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the list accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	14aaf5c9d8	odb: move packfile map into `struct packfile_store` The object database tracks a map of packfiles by their respective paths, which is used to figure out whether a given packfile has already been loaded. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the map accordingly. `pack_map_entry_cmp()` isn't used anywhere but in "packfile.c" anymore after this change, so we convert it to a static function, as well. Note that we also drop the `inline` hint: the function is used as a callback function exclusively, and callbacks cannot be inlined. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	3421cb56a8	odb: move initialization bit into `struct packfile_store` The object database knows to skip re-initializing the list of packfiles in case it's already been initialized. Whether or not that is the case is tracked via a separate `initialized` bit that is stored in the object database. With the introduction of the `struct packfile_store` we have a better place to host this bit though. Move it accordingly. While at it, convert the field into a boolean now that we're allowed to use them in our code base. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	535b7a667a	odb: move list of packfiles into `struct packfile_store` The object database tracks the list of packfiles it currently knows about. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the list accordingly. Extract the logic from `odb_clear()` that knows to close all such packfiles and move it into the new subsystem, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:48 -07:00
Patrick Steinhardt	b7983adb51	packfile: introduce a new `struct packfile_store` Information about an object database's packfiles is currently distributed across two different structures: - `struct packed_git` contains the `next` pointer as well as the `mru_head`, both of which serve to store the list of packfiles. - `struct object_database` contains several fields that relate to the packfiles. So we don't really have a central data structure that tracks our packfiles, and consequently responsibilities aren't always clear cut. A consequence for the upcoming pluggable object databases is that this makes it very hard to move management of packfiles from the object database level down into the object database source. Introduce a new `struct packfile_store` which is about to become the single source of truth for managing packfiles. Right now this data structure doesn't yet contain anything, but in subsequent patches we will move all data structures that relate to packfiles and that are currently contained in `struct object_database` into this new home. Note that this is only a first step: most importantly, we won't (yet) move the `struct packed_git::next` pointer around. This will happen in a subsequent patch series though so that `struct packed_git` will really only host information about the specific packfile it represents. Further note that the new structure still sits at the wrong level at the end of this patch series: as mentioned, it should eventually sit at the level of the object database source, not at the object database level. But introducing the packfile store now already makes it way easier to eventually push down the now-selfcontained data structure by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:48 -07:00
Justin Tobler	ce1661f9da	odb: add transaction interface Transactions are managed via the {begin,end}_odb_transaction() function in the object-file subsystem and its implementation is specific to the files object source. Introduce odb_transaction_{begin,commit}() in the odb subsystem to provide an eventual object source agnostic means to manage transactions. Update call sites to instead manage transactions through the odb subsystem. Also rename {begin,end}_odb_transaction() functions to object_file_transaction_{begin,commit}() to clarify the object source it supports. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-16 11:37:06 -07:00
Junio C Hamano	7d00521d7b	Merge branch 'jt/de-global-bulk-checkin' The bulk-checkin code used to depend on a file-scope static singleton variable, which has been updated to pass an instance throughout the callchain. * jt/de-global-bulk-checkin: bulk-checkin: use repository variable from transaction bulk-checkin: require transaction for index_blob_bulk_checkin() bulk-checkin: remove global transaction state bulk-checkin: introduce object database transaction structure	2025-09-15 08:52:05 -07:00
Junio C Hamano	c31a276f12	Merge branch 'ps/object-store-midx-dedup-info' Further code clean-up for multi-pack-index code paths. * ps/object-store-midx-dedup-info: midx: compute paths via their source midx: stop duplicating info redundant with its owning source midx: write multi-pack indices via their source midx: load multi-pack indices via their source midx: drop redundant `struct repository` parameter odb: simplify calling `link_alt_odb_entry()` odb: return newly created in-memory sources odb: consistently use "dir" to refer to alternate's directory odb: allow `odb_find_source()` to fail odb: store locality in object database sources	2025-09-12 10:41:18 -07:00
Patrick Steinhardt	e1d062e8ba	odb: drop deprecated wrapper functions In the Git 2.51 release cycle we've refactored the object database layer to access objects via `struct object_database` directly. To make the transition a bit easier we have retained some of the old-style functions in case those were widely used. Now that Git 2.51 has been released it's time to clean up though and drop these old wrappers. Do so and adapt the small number of newly added users to use the new functions instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-11 09:10:28 -07:00
Junio C Hamano	0b71555742	Merge branch 'ps/object-store-midx-dedup-info' into ps/packfile-store * ps/object-store-midx-dedup-info: midx: compute paths via their source midx: stop duplicating info redundant with its owning source midx: write multi-pack indices via their source midx: load multi-pack indices via their source midx: drop redundant `struct repository` parameter odb: simplify calling `link_alt_odb_entry()` odb: return newly created in-memory sources odb: consistently use "dir" to refer to alternate's directory odb: allow `odb_find_source()` to fail odb: store locality in object database sources	2025-09-02 09:38:03 -07:00
Justin Tobler	b336144725	bulk-checkin: remove global transaction state Object database transactions in the bulk-checkin subsystem rely on global state to track transaction status. Stop relying on global state and instead store the transaction in the `struct object_database`. Functions that operate on transactions are updated to now wire transaction state. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-08-25 09:48:13 -07:00
Patrick Steinhardt	a59d44ff3f	odb: return newly created in-memory sources Callers have no trivial way to obtain the newly created object database source when adding it to the in-memory list of alternates. While not yet needed anywhere, a subsequent commit will want to obtain that pointer. Refactor the function to return the source to make it easily accessible. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-08-11 09:22:21 -07:00
Patrick Steinhardt	0d61933b8f	odb: allow `odb_find_source()` to fail When trying to locate a source for an unknown object directory we will die right away. In subsequent patches we will add new callsites though that want to handle this situation gracefully instead. Refactor the function to return a `NULL` pointer if the source could not be found and adapt the callsites to die instead. Introduce a new wrapper `odb_find_source_or_die()` that continues to die in case the source could not be found. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-08-11 09:22:21 -07:00

1 2

69 Commits