global/git

mirror of https://github.com/git/git.git synced 2026-03-04 22:47:35 +01:00

Author	SHA1	Message	Date
Junio C Hamano	bd631b52db	Merge branch 'sk/oidmap-clear-with-custom-free-func' into jch A bit of OIDmap API enhancement and cleanup. Comments? * sk/oidmap-clear-with-custom-free-func: sequencer: use oidmap_clear_with_free() for string_entry cleanup odb: use oidmap_clear_with_free() to release replace_map entries list-objects-filter: use oidmap_clear_with_free() for cleanup builtin/rev-list: migrate missing_objects cleanup to oidmap_clear_with_free() oidmap: make entry cleanup explicit in oidmap_clear	2026-03-03 11:08:34 -08:00
Seyi Kufoiji	bbf4b4b6db	odb: use oidmap_clear_with_free() to release replace_map entries Replace the direct oidmap_clear() call in odb_free() with oidmap_clear_with_free(), and introduce a free_replace_map_entry() helper to properly free each struct replace_object stored in the map. This centralizes cleanup logic and ensures entries are released correctly via a dedicated callback. Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-02 14:12:48 -08:00
Patrick Steinhardt	7ac02d91e7	odb/source: make `write_alternate()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:26 -08:00
Patrick Steinhardt	2c5ee36e47	odb/source: make `read_alternates()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:26 -08:00
Patrick Steinhardt	15e787477b	odb/source: make `write_object_stream()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:26 -08:00
Patrick Steinhardt	bf6737bcd5	odb/source: make `write_object()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:26 -08:00
Patrick Steinhardt	941c1fad13	odb/source: make `freshen_object()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:26 -08:00
Patrick Steinhardt	c8c7337c24	odb/source: make `for_each_object()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:25 -08:00
Patrick Steinhardt	968592e421	odb/source: make `read_object_info()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Note that this function is a bit less straight-forward to convert compared to the other functions. The reason here is that the logic to read an object is: 1. We try to read the object. If it exists we return it. 2. If the object does not exist we reprepare the object database source. 3. We then try reading the object info a second time in case the reprepare caused it to appear. The second read is only supposed to happen for the packfile store though, as reading loose objects is not impacted by repreparing the object database. Ideally, we'd just move this whole logic into the ODB source. But that's not easily possible because we try to avoid the reprepare unless really required, which is after we have found out that no other ODB source contains the object, either. So the logic spans across multiple ODB sources, and consequently we cannot move it into an individual source. Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the backend that we already tried to look up the object once, and that this time around the ODB source should try to find any new objects that may have surfaced due to an on-disk change. With this flag, the "files" backend can trivially skip trying to re-read the object as a loose object. Furthermore, as we know that we only try the second read via the packfile store, we can skip repreparing loose objects and only reprepare the packfile store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:52:25 -08:00
Patrick Steinhardt	8759a6c258	odb/source: make `close()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:50:00 -08:00
Patrick Steinhardt	ae48509378	odb/source: make `reprepare()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:50:00 -08:00
Patrick Steinhardt	b79d085ef8	odb: move reparenting logic into respective subsystems The primary object database source may be initialized with a relative path. When reparenting the process to a different working directory we thus have to update this path and have it point to the same path, but relative to the new working directory. This logic is handled in the object database layer. It consists of three steps: 1. We undo any potential temporary object directory, which are used for transactions. This is done so that we don't end up modifying the temporary object database source that got applied for the transaction. 2. We then iterate through the non-transactional sources and reparent their respective paths. 3. We reapply the temporary object directory, but update its path. All of this logic is heavily tied to how the object database source handles paths in the first place. It's an internal implementation detail, and as sources may not even use an on-disk path at all it is not a mechanism that applies to all potential sources. Refactor the code so that the logic to reparent the sources is hosted by the "files" source and the temporary object directory subsystems, respectively. This logic is easier to reason about, but it also ensures that this logic is handled at the correct level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:50:00 -08:00
Patrick Steinhardt	ad35252b5b	odb: embed base source in the "files" backend The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:50:00 -08:00
Patrick Steinhardt	60a546bf17	odb: introduce "files" source Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:50:00 -08:00
Patrick Steinhardt	cfcfdf6592	odb: split `struct odb_source` into separate header Subsequent commits will expand the `struct odb_source` to become a generic interface for accessing an object database source. As part of these refactorings we'll add a set of function pointers that will significantly expand the structure overall. Prepare for this by splitting out the `struct odb_source` into a separate header. This keeps the high-level object database interface detached from the low-level object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:49:59 -08:00
Junio C Hamano	b1af291b4a	Merge branch 'ps/object-info-bits-cleanup' into ps/odb-sources * ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`	2026-02-23 13:48:48 -08:00
Junio C Hamano	703c97519d	Merge branch 'ps/odb-for-each-object' into ps/odb-sources * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags	2026-02-23 13:48:00 -08:00
Patrick Steinhardt	732ec9b17b	odb: convert `odb_has_object()` flags into an enum Following the reason in the preceding commit, convert the `odb_has_object()` flags into an enum. With this change, we would have catched the misuse of `odb_has_object()` that was fixed in a preceding commit as the compiler would have generated a warning: ../builtin/backfill.c:71:9: error: implicit conversion from enumeration type 'enum odb_object_info_flag' to different enumeration type 'enum odb_has_object_flag' [-Werror,-Wenum-conversion] 70 \| if (!odb_has_object(ctx->repo->objects, &list->oid[i], \| ~~~~~~~~~~~~~~ 71 \| OBJECT_INFO_FOR_PREFETCH)) \| ^~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-12 11:05:08 -08:00
Patrick Steinhardt	f6516a5241	odb: convert object info flags into an enum Convert the object info flags into an enum and adapt all functions that receive these flags as parameters to use the enum instead of an integer. This serves two purposes: - The function signatures become more self-documenting, as callers don't have to wonder which flags they expect. - The compiler can warn when a wrong flag type is passed. Note that the second benefit is somewhat limited. For example, when or-ing multiple enum flags together the result will be an integer, and the compiler will not warn about such use cases. But where it does help is when a single flag of the wrong type is passed, as the compiler would generate a warning in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-12 11:05:08 -08:00
Justin Tobler	3f67e3d021	odb: transparently handle common transaction behavior A new ODB transaction is created and returned via `odb_transaction_begin()` and stored in the ODB. Only a single transaction may be pending at a time. If the ODB already has a transaction, the function is expected to return NULL. Similarly, when committing a transaction via `odb_transaction_commit()` the transaction being committed must match the pending transaction and upon commit reset the ODB transaction to NULL. These behaviors apply regardless of the ODB transaction implementation. Move the corresponding logic into `odb_transaction_{begin,commit}()` accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Justin Tobler	fa7d067923	odb: prepare `struct odb_transaction` to become generic An ODB transaction handles how objects are stored temporarily and eventually committed. Due to object storage being implemented differently for a given ODB source, the ODB transactions must be implemented in a manner specific to the source the objects are being written to. To provide generic transactions, `struct odb_transaction` is updated to store a commit callback that can be configured to support a specific ODB source. For now `struct odb_transaction_files` is the only transaction type and what is always returned when starting a transaction. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Justin Tobler	8bf06d05a5	object-file: rename transaction functions In a subsequent commit, ODB transactions are made more generic to facilitate each ODB source providing its own transaction handling. Rename `object_file_transaction_{begin,commit}()` to `odb_transaction_files_{begin,commit}()` to better match the future source specific transaction implementation. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Patrick Steinhardt	7b7cbaef27	odb: introduce mtime fields for object info requests There are some use cases where we need to figure out the mtime for objects. Most importantly, this is the case when we want to prune unreachable objects. But getting at that data requires users to manually derive the info either via the loose object's mtime, the packfiles' mtime or via the ".mtimes" file. Introduce a new `struct object_info::mtimep` pointer that allows callers to request an object's mtime. This new field will be used in a subsequent commit. Note that the concept of "mtime" is ambiguous: given an object, it may be stored multiple times in the object database, and each of these instances may have a different mtime. Disambiguating these mtimes is nothing that can happen on the generic ODB layer: the caller may search for the oldest object, the newest object, or even the relation of object mtimes depending on the specific source they are located in. As such, it is the responsibility of the caller to disambiguate mtimes. A consequence of this is that it's most likely incorrect to look up the mtime via `odb_read_object_info()`, as this interface does not give us enough information to disambiguate the mtime. Document this accordingly and tell users to use `odb_for_each_object()` instead. Even with this gotcha though it's sensible to have this request as part of the object info, as the mtime is a property of the object storage format. If we for example had a "black-box" storage backend, we'd still need to be able to query it for the mtime info in a generic way. We could introduce a safety mechanism that for example calls `BUG()` in case we look up the mtime outside of `odb_for_each_object()`. But that feels somewhat heavy-handed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:08 -08:00
Patrick Steinhardt	df2fbdfa55	odb: introduce `odb_for_each_object()` Introduce a new function `odb_for_each_object()` that knows to iterate through all objects part of a given object database. This function is essentially a simple wrapper around the object database sources. Subsequent commits will adapt callers to use this new function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:07 -08:00
Junio C Hamano	d627023d80	Merge branch 'ps/packfile-store-in-odb-source' The packfile_store data structure is moved from object store to odb source. * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source	2026-01-21 08:28:59 -08:00
Junio C Hamano	24c43fb10b	Merge branch 'ps/odb-misc-fixes' Miscellaneous fixes on object database layer. * ps/odb-misc-fixes: odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs	2026-01-15 07:12:41 -08:00
Junio C Hamano	ec16dde5c8	Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-object * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs	2026-01-15 05:50:16 -08:00
Patrick Steinhardt	a282a8f163	packfile: move MIDX into packfile store The multi-pack index still is tracked as a member of the object database source, but ultimately the MIDX is always tied to one specific packfile store. Move the structure into `struct packfile_store` accordingly. This ensures that the packfile store now keeps track of all data related to packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 06:40:08 -08:00
Patrick Steinhardt	84f0e60b28	packfile: move packfile store into object source The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 06:40:07 -08:00
Patrick Steinhardt	480336a9ce	packfile: create store via its owning source In subsequent patches we're about to move the packfile store from the object database layer into the object database source layer. Once done, we'll have one packfile store per source, where the source is owning the store. Prepare for this future and refactor `packfile_store_new()` to be initialized via an object database source instead of via the object database itself. This refactoring leads to a weird in-between state where the store is owned by the object database but created via the source. But this makes subsequent refactorings easier because we can now start to access the owning source of a given store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 06:40:06 -08:00
Junio C Hamano	f1ec43d4d2	Merge branch 'ps/odb-misc-fixes' into ps/packfile-store-in-odb-source * ps/odb-misc-fixes: odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs	2026-01-07 09:37:29 +09:00
Patrick Steinhardt	3d09968656	odb: properly close sources before freeing them It is possible to hit a memory leak when reading data from a submodule via git-grep(1): Direct leak of 192 byte(s) in 1 object(s) allocated from: #0 0x55555562e726 in calloc (git+0xda726) #1 0x555555964734 in xcalloc ../wrapper.c:154:8 #2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2 #3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6 #4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17 #5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3 #6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2 #7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2 #8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4 #9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8 #10 0x5555558540bd in odb_read_object ../odb.c:920:6 #11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12 #12 0x55555580a13a in grep_source_load ../grep.c:1986:10 #13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7 #14 0x555555807574 in grep_source_1 ../grep.c:1625:8 #15 0x555555807322 in grep_source ../grep.c:1837:10 #16 0x5555556a5c58 in run ../builtin/grep.c:208:10 #17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o #18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) The root caues of this leak is the way we set up and release the submodule: 1. We use `repo_submodule_init()` to initialize a new repository. This repository is stored in `repos_to_free`. 2. We now read data from the submodule repository. 3. We then call `repo_clear()` on the submodule repositories. 4. `repo_clear()` calls `odb_free()`. 5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`. The issue here is the 5th step: we call `odb_free_sources()` _before_ we call `odb_close()`. But `odb_free_sources()` already frees all sources, so the logic that closes them in `odb_close()` now becomes a no-op. As a consequence, we never explicitly close sources at all. Fix the leak by closing the store before we free the sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-07 09:16:51 +09:00
Junio C Hamano	e7b1925381	Merge branch 'jc/object-read-stream-fix' Fix a performance regression in recently graduated topic. * jc/object-read-stream-fix: odb: do not use "blank" substitute for NULL	2025-12-30 12:58:22 +09:00
Junio C Hamano	5a8046ab33	Merge branch 'ps/odb-alternates-object-sources' Code refactoring around alternate object store. * ps/odb-alternates-object-sources: odb: write alternates via sources odb: read alternates via sources odb: drop forward declaration of `read_info_alternates()` odb: remove mutual recursion when parsing alternates odb: stop splitting alternate in `odb_add_to_alternates_file()` odb: move computation of normalized objdir into `alt_odb_usable()` odb: resolve relative alternative paths when parsing odb: refactor parsing of alternates to be self-contained	2025-12-22 14:57:48 +09:00
Junio C Hamano	a650ad996d	odb: do not use "blank" substitute for NULL When various *object_info() functions are given an extended object info structure as NULL by a caller that does not want any details, the code uses a file-scope static blank_oi and passes it down to the helper functions they use, to avoid handling NULL specifically. The ps/object-read-stream topic graduated to 'master' recently however had a bug that assumed that two identically named file-scope static variables in two functions are the same, which of course is not the case. This made "git commit" take 0.38 seconds to 1508 seconds in some case, as reported by Aaron Plattner here: https://lore.kernel.org/git/f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com/ We _could_ move the blank_oi variable to the global scope in common section to fix this regression, but explicitly handling the NULL is a much safer fix. It would also reduce the chance of errors that somebody accidentally writes into blank_oi, making its contents dirty, which potentially will make subsequent calls into the function misbehave. By explicitly handling NULL input, we no longer have to worry about it. Reported-by: Aaron Plattner <aplattner@nvidia.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-18 20:48:01 +09:00
Junio C Hamano	dbe54273a7	Merge branch 'ps/object-read-stream' The "git_istream" abstraction has been revamped to make it easier to interface with pluggable object database design. * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`	2025-12-16 11:08:34 +09:00
Junio C Hamano	f1799202ea	Merge branch 'ps/object-read-stream' into ps/packfile-store-in-odb-source * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`	2025-12-15 17:40:31 +09:00
Patrick Steinhardt	221a877d47	odb: write alternates via sources Refactor writing of alternates so that the actual business logic is structured around the object database source we want to write the alternate to. Same as with the preceding commit, this will eventually allow us to have different logic for writing alternates depending on the backend used. Note that after the refactoring we start to call `odb_add_alternate_recursively()` unconditionally. This is fine though as we know to skip adding sources that are tracked already. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:37 +09:00
Patrick Steinhardt	f7dbd9fb2e	odb: read alternates via sources Adapt how we read alternates so that the interface is structured around the object database source we're reading from. This will eventually allow us to abstract away this behaviour with pluggable object databases so that every format can have its own mechanism for listing alternates. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:37 +09:00
Patrick Steinhardt	3f42555322	odb: drop forward declaration of `read_info_alternates()` Now that we have removed the mutual recursion in the preceding commit it is not necessary anymore to have a forward declaration of the `read_info_alternates()` function. Move the function and its dependencies further up so that we can remove it. Note that this commit also removes the function documentation of `read_info_alternates()`. It's unclear what it's documenting, but it for sure isn't documenting the modern behaviour of the function anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:37 +09:00
Patrick Steinhardt	430e0e0f2e	odb: remove mutual recursion when parsing alternates When adding an alternative object database source we not only have to consider the added source itself, but we also have to add _its_ sources to our database. We implement this via mutual recursion: 1. We first call `link_alt_odb_entries()`. 2. `link_alt_odb_entries()` calls `parse_alternates()`. 3. We then add each alternate via `odb_add_alternate_recursively()`. 4. `odb_add_alternate_recursively()` calls `link_alt_odb_entries()` again. This flow is somewhat hard to follow, but more importantly it means that parsing of alternates is somewhat tied to the recursive behaviour. Refactor the function to remove the mutual recursion between adding sources and parsing alternates. The parsing step thus becomes completely oblivious to the fact that there is recursive behaviour going on at all. The recursion is handled by `odb_add_alternate_recursively()` instead, which now recurses with itself. This refactoring allows us to move parsing of alternates into object database sources in a subsequent step. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:36 +09:00
Patrick Steinhardt	dccfb39cdb	odb: stop splitting alternate in `odb_add_to_alternates_file()` When calling `odb_add_to_alternates_file()` we know to add the newly added source to the object database in case we have already loaded alternates. This is done so that we can make its objects accessible immediately without having to fully reload all alternates. The way we do this though is to call `link_alt_odb_entries()`, which adds _multiple_ sources to the object database source in case we have newline-separated entries. This behaviour is not documented in the function documentation of `odb_add_to_alternates_file()`, and all callers only ever pass a single directory to it. It's thus entirely surprising and a conceptual mismatch. Fix this issue by directly calling `odb_add_alternate_recursively()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:36 +09:00
Patrick Steinhardt	d17673ef42	odb: move computation of normalized objdir into `alt_odb_usable()` The function `alt_odb_usable()` receives as input the object database, the path it's supposed to determine usability for as well as the normalized path of the main object directory of the repository. The last part is derived by the function's caller from the object database. As we already pass the object database to `alt_odb_usable()` it is redundant information. Drop the extra parameter and compute the normalized object directory in the function itself. While at it, rename the function to `odb_is_source_usable()` to align it with modern terminology. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:35 +09:00
Patrick Steinhardt	84cec5276e	odb: resolve relative alternative paths when parsing Parsing alternates and resolving potential relative paths is currently handled in two separate steps. This has the effect that the logic to retrieve alternates is not entirely self-contained. We want it to be just that though so that we can eventually move the logic to list alternates into the `struct odb_source`. Move the logic to resolve relative alternative paths into `parse_alternates()`. Besides bringing us a step closer towards the above goal, it also neatly separates concerns of generating the list of alternatives and linking them into the object database. Note that we ignore any errors when the relative path cannot be resolved. This isn't really a change in behaviour though: if the path cannot be resolved to a directory then `alt_odb_usable()` still knows to bail out. While at it, rename the function to `odb_add_alternate_recursively()` to more clearly indicate what its intent is and to align it with modern terminology. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:35 +09:00
Patrick Steinhardt	1660496fc4	odb: refactor parsing of alternates to be self-contained Parsing of the alternates file and environment variable is currently split up across multiple different functions and is entangled with `link_alt_odb_entries()`, which is responsible for linking the parsed object database sources. This results in two downsides: - We have mutual recursion between parsing alternates and linking them into the object database. This is because we also parse alternates that the newly added sources may have. - We mix up the actual logic to parse the data and to link them into place. Refactor the logic so that parsing of the alternates file is entirely self-contained. Note that this doesn't yet fix the above two issues, but it is a necessary step to get there. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-11 18:39:34 +09:00
Patrick Steinhardt	ac65c70663	odb: handle recreation of quarantine directories In the preceding commit we have moved the logic that reparents object database sources on chdir(3p) from "setup.c" into "odb.c". Let's also do the same for any temporary quarantine directories so that the complete reparenting logic is self-contained in "odb.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	2816b748e5	odb: handle changing a repository's commondir The function `repo_set_gitdir()` is called in two situations: - To initialize the repository with its discovered location. As part of this we also set up the new object database. - To update the repository's discovered location in case the process changes its working directory so that we update relative paths. This means we also have to update any relative paths that are potentially used in the object database. In the context of the object database we ideally wouldn't ever have to worry about the second case: if all paths used by our object database sources were absolute, then we wouldn't have to update them. But unfortunately, the paths aren't only used to locate files owned by the given source, but we also use them for reporting purposes. One such example is `repo_get_object_directory()`, where we cannot just change semantics to always return absolute paths, as that is likely to break tooling out there. One solution to this would be to have both a "display path" and an "internal path". This would allow us to use internal paths for all internal matters, but continue to use the potentially-relative display paths so that we don't break compatibility. But converting the codebase to honor this split is quite a messy endeavour, and it wouldn't even help us with the goal to get rid of the need to update the display path on chdir(3p). Another solution would be to rework "setup.c" so that we never have to update paths in the first place. In that case, we'd only initialize the repository once we have figured out final locations for all directories. This would be a significant simplification of that subsystem indeed, but the current logic is so messy that it would take significant investments to get there. Meanwhile though, while object sources may still use relative paths, the best thing we can do is to handle the reparenting of the object source paths in the object database itself. This can be done by registering one callback for each object database so that we get notified whenever the current working directory changes, and we then perform the reparenting ourselves. Ideally, this wouldn't even happen on the object database level, but instead handled by each object database source. But we don't yet have proper pluggable object database sources, so this will need to be handled at a later point in time. The logic itself is rather simple: - We register the callback when creating the object database. - We unregister the callback when releasing it again. - We split up `set_git_dir_1()` so that it becomes possible to skip recreating the object database. This is required because the function is called both when the current working directory changes, but also when we set up the repository. Calling this function without skipping creation of the ODB will result in a bug in case it's already created. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	35d9fc65ed	odb: handle initialization of sources in `odb_new()` The logic to set up a new object database is currently distributed across two functions in "repository.c": - In `initialize_repository()` we initialize an empty object database. This object database is not fully initialized and doesn't have any sources attached to it. - The primary object database source is then created in `repo_set_gitdir()`. Ideally though, the logic should be entirely self-contained so that we can iterate more readily on how exactly the sources themselves get set up. Refactor `odb_new()` to handle both allocation and setup of the object database. This ensures that the object database is always initialized and ready for use, and it allows us to change how the sources get set up eventually. Note that `repo_set_gitdir()` still reaches into the sources when the function gets called with an already-initialized object database. This will be fixed in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	b67b2d9fb7	odb: move logic to disable ref updates into repo Our object database sources have a field `disable_ref_updates`. This field can obviously be set to disable reference updates, but it is somewhat curious that this logic is hosted by the object database. The reason for this is that it was primarily added to keep us from accidentally updating references while an ODB transaction is ongoing. Any objects part of the transaction have not yet been committed to disk, so new references that point to them might get corrupted in case we never end up committing the transaction. As such, whenever we create a new transaction we set up a new temporary ODB source and mark it as disabling reference updates. This has one (and only one?) upside: once we have committed the transaction, the temporary source will be dropped and thus we clean up the disabled reference updates automatically. But other than that, it's somewhat misdesigned: - We can have multiple ODB sources, but only the currently active source inhibits reference updates. - We're mixing concerns of the refdb with the ODB. Arguably, the decision of whether we can update references or not should be handled by the refdb. But that wouldn't be a great fit either, as there can be one refdb per worktree. So we'd again have the same problem that a "global" intent becomes localized to a specific instance. Instead, move the setting into the repository. While at it, convert it into a boolean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:15:59 -08:00
Patrick Steinhardt	385e18810f	packfile: introduce function to read object info from a store Extract the logic to read object info for a packed object from `do_oid_object_into_extended()` into a standalone function that operates on the packfile store. This function will be used in a subsequent commit. Note that this change allows us to make `find_pack_entry()` an internal implementation detail. As a consequence though we have to move around `packfile_store_freshen_object()` so that it is defined after that function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00

1 2

95 Commits