global/git

mirror of https://github.com/git/git.git synced 2026-01-08 09:13:48 +00:00

Author	SHA1	Message	Date
Johannes Schindelin	0458e8b854	ci(dockerized): do show the result of failing tests again The quality of tests and test suites is most apparent not when everything passes, but in how quickly bugs can be identified, analyzed, and resolved after test failures occur. As such, it is an unfortunate side effect of `2a21098b98` (github: adapt containerized jobs to be rootless, 2025-01-10) that the output of failed test cases, which was shown before that change directly in the build logs, is now no longer shown at all. The reason is a side effect of trying to run the build and the tests with permissions other than the `root` user, but without providing the prerequisite permissions to signal what tests failed and whose output hence needs to be included in the logs. The way this signaling works is for the workflow to write into special-purpose files whose path is specific to the current workflow step and which can be accessed via the `$GITHUB_ENV` environment variable, which differs between workflow steps. It is this file that is missing write permission for the `builder` user that was introduced in above-mentioned commit. The solution is simple: make the file world-writable. Technically, this write permission should be removed after the step has completed, if proper security practices were to be upheld, but since nothing uses that file again, it does not matter, and the fix is more succinct this way. This commit is best viewed with `--color-words`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> [jc: squashed Elijah's rewrite of the first paragraph of the log message] [jc: updated chmod to match "world-writable" in the log message] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-26 10:17:44 -08:00
Junio C Hamano	42bf8a534b	Merge branch 'master' of https://github.com/j6t/gitk * 'master' of https://github.com/j6t/gitk: gitk: add external diff file rename detection gitk: show unescaped file names on 'rename' and 'copy' lines gitk: fix a 'continue' statement outside a loop to 'return' gitk: persist position and size of the Tags and Heads window Revert "gitk: Only restore window size from ~/.gitk, not position"	2025-11-26 09:35:09 -08:00
Phillip Wood	9f3a115087	replay: do not copy "gpgsign-sha256" header When "git replay" replays a commit it copies the extended headers across from the original commit. However, if the original commit was signed, we do not want to copy the header associated with the signature is it wont be valid for the new commit. The code already knows to avoid coping the "gpgsig" header but does not know to avoid copying the "gpgsig-sha256" header. Add that header to the list of exclusions to match what "git commit --amend" does. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-26 09:33:52 -08:00
Christian Couder	c20f112e51	fast-import: add 'strip-if-invalid' mode to --signed-commits=<mode> Tools like `git filter-repo`[1] use `git fast-export` and `git fast-import` to rewrite repository history. When rewriting history using one such tool though, commit signatures might become invalid because the commits they sign changed due to the changes in the repository history made by the tool between the fast-export and the fast-import steps. Note that as far as signature handling goes: * Since fast-export doesn't know what changes filter-repo may make to the stream, it can't know whether the signatures will still be valid. * Since filter-repo doesn't know what history canonicalizations fast-export performed (and it performs a few), it can't know whether the signatures will still be valid. * Therefore, fast-import is the only process in the pipeline that can know whether a specified signature remains valid. Having invalid signatures in a rewritten repository could be confusing, so users rewritting history might prefer to simply discard signatures that are invalid at the fast-import step. For example a common use case is to rewrite only "recent" history. While specifying commit ranges corresponding to "recent" commits could work, users worry about getting it wrong and want to just automatically rewrite everything, expecting older commit signatures to be untouched. To let them do that, let's add a new 'strip-if-invalid' mode to the `--signed-commits=<mode>` option of `git fast-import`. It would be interesting for the `--signed-tags=<mode>` option to have this mode too, but we leave that for a future improvement. It might also be possible for `git fast-export` to have such a mode in its `--signed-commits=<mode>` and `--signed-tags=<mode>` options, but the use cases for it are much less clear, so we also leave that for possible future improvements. For now let's just die() if 'strip-if-invalid' is passed to these options where it hasn't been implemented yet. [1]: https://github.com/newren/git-filter-repo Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-26 08:43:44 -08:00
Johannes Sixt	776223c4d8	Merge branch 'tb/external-diff-renamed' * tb/external-diff-renamed: gitk: add external diff file rename detection	2025-11-26 16:04:14 +01:00
Johannes Sixt	bd3fd7e77c	Merge branch 'js/persist-ref-window-geometry' * js/persist-ref-window-geometry: gitk: persist position and size of the Tags and Heads window Revert "gitk: Only restore window size from ~/.gitk, not position"	2025-11-26 16:02:23 +01:00
Patrick Steinhardt	ac65c70663	odb: handle recreation of quarantine directories In the preceding commit we have moved the logic that reparents object database sources on chdir(3p) from "setup.c" into "odb.c". Let's also do the same for any temporary quarantine directories so that the complete reparenting logic is self-contained in "odb.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	2816b748e5	odb: handle changing a repository's commondir The function `repo_set_gitdir()` is called in two situations: - To initialize the repository with its discovered location. As part of this we also set up the new object database. - To update the repository's discovered location in case the process changes its working directory so that we update relative paths. This means we also have to update any relative paths that are potentially used in the object database. In the context of the object database we ideally wouldn't ever have to worry about the second case: if all paths used by our object database sources were absolute, then we wouldn't have to update them. But unfortunately, the paths aren't only used to locate files owned by the given source, but we also use them for reporting purposes. One such example is `repo_get_object_directory()`, where we cannot just change semantics to always return absolute paths, as that is likely to break tooling out there. One solution to this would be to have both a "display path" and an "internal path". This would allow us to use internal paths for all internal matters, but continue to use the potentially-relative display paths so that we don't break compatibility. But converting the codebase to honor this split is quite a messy endeavour, and it wouldn't even help us with the goal to get rid of the need to update the display path on chdir(3p). Another solution would be to rework "setup.c" so that we never have to update paths in the first place. In that case, we'd only initialize the repository once we have figured out final locations for all directories. This would be a significant simplification of that subsystem indeed, but the current logic is so messy that it would take significant investments to get there. Meanwhile though, while object sources may still use relative paths, the best thing we can do is to handle the reparenting of the object source paths in the object database itself. This can be done by registering one callback for each object database so that we get notified whenever the current working directory changes, and we then perform the reparenting ourselves. Ideally, this wouldn't even happen on the object database level, but instead handled by each object database source. But we don't yet have proper pluggable object database sources, so this will need to be handled at a later point in time. The logic itself is rather simple: - We register the callback when creating the object database. - We unregister the callback when releasing it again. - We split up `set_git_dir_1()` so that it becomes possible to skip recreating the object database. This is required because the function is called both when the current working directory changes, but also when we set up the repository. Calling this function without skipping creation of the ODB will result in a bug in case it's already created. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	2574c61736	chdir-notify: add function to unregister listeners While we (obviously) have a way to register new listeners that get called whenever we chdir(3p), we don't have an equivalent that can be used to unregister such a listener again. Add one, as it will be required in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	35d9fc65ed	odb: handle initialization of sources in `odb_new()` The logic to set up a new object database is currently distributed across two functions in "repository.c": - In `initialize_repository()` we initialize an empty object database. This object database is not fully initialized and doesn't have any sources attached to it. - The primary object database source is then created in `repo_set_gitdir()`. Ideally though, the logic should be entirely self-contained so that we can iterate more readily on how exactly the sources themselves get set up. Refactor `odb_new()` to handle both allocation and setup of the object database. This ensures that the object database is always initialized and ready for use, and it allows us to change how the sources get set up eventually. Note that `repo_set_gitdir()` still reaches into the sources when the function gets called with an already-initialized object database. This will be fixed in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	c257bd5916	http-push: stop setting up `the_repository` for each reference When pushing references via HTTP we call `repo_init_revisions()` in a loop for each reference that we're about to push. As third argument we pass the result of `setup_git_directory()`, which causes us to reinitialize the repository every single time. This is an obvious waste of compute, as the repository that we're working in will never change across any of the initializations. The only reason that we do this is to retrieve the directory of the repository. Furthermore, this is about to create issues in a subsequent commit, where reinitializing the repository will cause a `BUG()`. Address this by storing the Git directory in a variable instead so that we don't have to call the function repeatedly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	eea83c010c	t/helper: stop setting up `the_repository` repeatedly The "repository" test helper sets up `the_repository` twice. In fact though, we don't even have to set it up even once: all we need is to set up its hash algorithm, because we still depend on some subsystems that aren't free of `the_repository`. Refactor the code accordingly. This prepares for a subsequent change, where setting up the repository repeatedly will lead to a `BUG()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:16:00 -08:00
Patrick Steinhardt	8dc22e87f0	builtin/index-pack: fix deferred fsck outside repos When asked to perform object consistency checks via the `--fsck-objects` flag we verify that each object part of the pack is valid. In general, this check can even be performed outside of a Git repository: we don't need an initialized object database as we simply read the object from the packfile directly. But there's one exception: a subset of the object checks may be deferred to a later point in time. For now, this only concerns ".gitmodules" and ".gitattributes" files: whenever we see a tree referencing these files we queue them for a deferred check. This is done because we need to do some extra checks for those files to ensure that they are well-formed, and these checks need to be done regardless of whether the corresponding blobs are part of the packfile or not. This works inside a repository, but unfortunately the logic leads to a segfault when running outside of one. This is because we eventually call `odb_read_object()`, which will crash because the object database has not been initialized. There's multiple options here: - We could in theory create a purely in-memory database with only a packfile store that contains the single packfile. We don't really have the infrastructure for this yet though, and it would end up being quite hacky. - We could refuse to perform consistency checks outside of a repository. But most of the checks work alright, so this would be a regression. - We can skip the finalizing consistency checks when running outside of a repository. This is not as invasive as skipping all checks, but it's not great to randomly skip a subset of tests, either. None of these options really feel perfect. The first one would be the obvious choice if easily possible. There's another option though: instead of skipping the final object checks, we can die if there are any queued object checks. With this change we now die exactly if and only if we would have previously segfaulted. Like this we ensure that objects that _may_ fail the consistency checks won't be silently skipped, and at the same time we give users a much better error message. Refactor the code accordingly and add a test that would have triggered the segfault. Note that we also move down the logic to add the packfile to the store. There is no point doing this any earlier than right before we execute `fsck_finish()`, and it ensures that the logic to set up and perform the consistency check is self-contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:15:59 -08:00
Patrick Steinhardt	5d795b34dc	oidset: introduce `oidset_equal()` Introduce a new function that allows the caller to verify whether two oidsets contain the exact same object IDs. Note that this change requires us to change `oidset_iter_init()` to accept a `const struct oidset`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:15:59 -08:00
Patrick Steinhardt	b67b2d9fb7	odb: move logic to disable ref updates into repo Our object database sources have a field `disable_ref_updates`. This field can obviously be set to disable reference updates, but it is somewhat curious that this logic is hosted by the object database. The reason for this is that it was primarily added to keep us from accidentally updating references while an ODB transaction is ongoing. Any objects part of the transaction have not yet been committed to disk, so new references that point to them might get corrupted in case we never end up committing the transaction. As such, whenever we create a new transaction we set up a new temporary ODB source and mark it as disabling reference updates. This has one (and only one?) upside: once we have committed the transaction, the temporary source will be dropped and thus we clean up the disabled reference updates automatically. But other than that, it's somewhat misdesigned: - We can have multiple ODB sources, but only the currently active source inhibits reference updates. - We're mixing concerns of the refdb with the ODB. Arguably, the decision of whether we can update references or not should be handled by the refdb. But that wouldn't be a great fit either, as there can be one refdb per worktree. So we'd again have the same problem that a "global" intent becomes localized to a specific instance. Instead, move the setting into the repository. While at it, convert it into a boolean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 12:15:59 -08:00
Junio C Hamano	dd8e8c786e	submodule add: sanity check existing .gitmodules "git submodule add" tries to find if a submodule with the same name already exists at a different path, by looking up an entry in the .gitmodules file. If the entry in the file is incomplete, e.g., when the submodule.<name>.something variable is defined but there is no definition of submodule.<name>.path variable, it accesses the missing .path member of the submodule structure and triggers a segfault. A brief audit was done to make sure that the code does not assume members other than those that are absolutely certain to exist: a submodule obtained by submodule_from_name() should have .name member, while a submodule obtained by submodule_from_path() should also have .path as well as .name member, and we cannot assume anything else. Luckily, the module_add() codepath was the only problematic one. It is fairly recent code that comes from `1fa06ced` (submodule: prevent overwriting .gitmodules on path reuse, 2025-07-24). A helper used by update_submodule() seems to assume that its call to submodule_from_path() always yields a submodule object without a failure, which seems to rely on the caller making sure it is the case. Leave an assert() with a NEEDSWORK comment there for future developers to make sure the assumption actually holds. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-25 08:43:20 -08:00
Junio C Hamano	0bd16856ff	config: really treat missing optional path as not configured These callers expect that git_config_pathname() that returns 0 is a signal that the variable they passed has a string they need to act on. But with the introduction of ":(optional)path" earlier, that is no longer the case. If the path specified by the configuration variable is missing, their variable will get a NULL in it, and they need to act on it (often, just refraining from copying it elsewhere). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-24 17:00:47 -08:00
Junio C Hamano	ce1a5a22a5	config: really pretend missing :(optional) value is not there Earlier we added support for a value spelled as ":(optional)path" for configuration variables whose values are of type "path", with the documented semantics "if the path is missing, behave as if such a variable definition is not even there." This has worked OK for code paths that reads configuration files and stores the configured value as a string, where NULL in such a string is treated as if the setting is not there, left as the default. However, there are other code paths that do not _ignore_ such NULL values and misbehave. "git config get --path" is one of them. When git_config_pathname() helper function finds that the value of the variable is an optional path and the path is missing, it leaves the destination pointer intact (which usually is left to NULL) and returns 0 to signal a success. format_config() helper however assumed that the destination pointer always gets a string, which no longer is the case, and segfaulted. Make sure that git_config_pathname() clears the destination pointer in such a case, and teach format_config() to react to the condition by returning 1 (which is different from 0 that is a normal success and negative that is an error) to its callers. Adjust the callers to react to this new return value that tells them to pretend as if they did not even see this partcular <key, value> pair. Reported-by: Han Jiang <jhcarl0814@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-24 17:00:47 -08:00
Junio C Hamano	6ab38b7e9c	The third batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-24 15:46:41 -08:00
Junio C Hamano	a5d5c50160	Merge branch 'jx/repo-struct-utf8width-fix' The "git repo structure" subcommand tried to align its output but mixed up byte count and display column width, which has been corrected. * jx/repo-struct-utf8width-fix: builtin/repo: fix table alignment for UTF-8 characters t/unit-tests: add UTF-8 width tests for CJK chars	2025-11-24 15:46:41 -08:00
Junio C Hamano	861312b51d	Merge branch 'kn/osxkeychain-idempotent-store-fix' An earlier check added to osx keychain credential helper to avoid storing the credential itself supplied was overeager and rejected credential material supplied by other helper backends that it would have wanted to store, which has been corrected. * kn/osxkeychain-idempotent-store-fix: osxkeychain: avoid incorrectly skipping store operation	2025-11-24 15:46:41 -08:00
Junio C Hamano	aa934e0950	Merge branch 'kh/doc-commit-extra-references' Doc update. * kh/doc-commit-extra-references: doc: commit: link to git-status(1) on all format options	2025-11-24 15:46:41 -08:00
Junio C Hamano	a545103244	Merge branch 'ps/object-source-loose' A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable	2025-11-24 15:46:41 -08:00
Junio C Hamano	05ce3ab2c6	Merge branch 'qj/doc-http-bad-want-response' Doc update. * qj/doc-http-bad-want-response: doc: clarify server behavior for invalid 'want' lines in HTTP protocol	2025-11-24 15:46:40 -08:00
Junio C Hamano	9370a6be79	Merge branch 'sa/replay-atomic-ref-updates' "git replay" (experimental) learned to perform ref updates itself in a transaction by default, instead of emitting where each refs should point at and leaving the actual update to another command. * sa/replay-atomic-ref-updates: replay: add replay.refAction config option replay: make atomic ref updates the default behavior replay: use die_for_incompatible_opt2() for option validation	2025-11-24 15:46:40 -08:00
Junio C Hamano	d91d79f26d	Merge branch 'bc/submodule-force-same-hash' Adding a repository that uses a different hash function is a no-no, but "git submodule add" did nt prevent it, which has been corrected. * bc/submodule-force-same-hash: read-cache: drop submodule check from add_to_cache() object-file: disallow adding submodules of different hash algo	2025-11-24 15:46:40 -08:00
Junio C Hamano	54f7817456	Merge branch 'jk/attr-macroexpand-wo-recursion' The code to expand attribute macros has been rewritten to avoid recursion to avoid running out of stack space in an uncontrolled way. * jk/attr-macroexpand-wo-recursion: attr: avoid recursion when expanding attribute macros	2025-11-24 15:46:39 -08:00
René Scharfe	18bf67b753	config: fix short help of unset flags The flags --all and --value of "git config unset" don't make the command "replace" or "show" anything, they are about selecting what to unset. Change their help text accordingly. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-24 15:00:46 -08:00
René Scharfe	df963f0df4	config: fix suggestion for failed set of multi-valued option The command "git config set <name> <value>" fails for an option that has multiple values. List the "git config set" flags that can be used, instead of old-style "git config" actions. Reported-by: Paul Wintz <pwintz@ucsc.edu> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-24 14:59:02 -08:00
Jean-Noël Avila via GitGitGadget	fddba8f737	doc: pull-fetch-param typofix An earier patch had a typo discovered after it has been merged to 'next'. Fix it. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-24 10:55:48 -08:00
Patrick Steinhardt	7b94028652	streaming: drop redundant type and size pointers In the preceding commits we have turned `struct odb_read_stream` into a publicly visible structure. Furthermore, this structure now contains the type and size of the object that we are about to stream. Consequently, the out-pointers that we used before to propagate the type and size of the streamed object are now somewhat redundant with the data contained in the structure itself. Drop these out-pointers and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:46 -08:00
Patrick Steinhardt	1599b68d5e	streaming: move into object database subsystem The "streaming" terminology is somewhat generic, so it may not be immediately obvious that "streaming.{c,h}" is specific to the object database. Rectify this by moving it into the "odb/" directory so that it can be immediately attributed to the object subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:46 -08:00
Patrick Steinhardt	378ec56beb	streaming: refactor interface to be object-database-centric Refactor the streaming interface to be centered around object databases instead of centered around the repository. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	8c1b84bc97	streaming: move logic to read packed objects streams into backend Move the logic to read packed object streams into the respective subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	bc30a2f5df	streaming: move logic to read loose objects streams into backend Move the logic to read loose object streams into the respective subsystem. This allows us to make a couple of function declarations private. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	ffc9a34485	streaming: make the `odb_read_stream` definition public Subsequent commits will move the backend-specific logic of setting up an object read stream into the specific subsystems. As the backends are now the ones that are responsible for allocating the stream they'll need to have the stream definition available to them. Make the stream definition public to prepare for this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	c26da3446e	streaming: get rid of `the_repository` Subsequent commits will move the backend-specific logic of object streaming into their respective subsystems. These subsystems have gotten rid of `the_repository` already, but we still use it in two locations in the streaming subsystem. Prepare for the move by fixing those two cases. Converting the logic in `open_istream_pack_non_delta()` is trivial as we already got the object database as input. But for `stream_blob_to_fd()` we have to add a new parameter to make it accessible. So, as we already have to adjust all callers anyway, rename the function to `odb_stream_blob_to_fd()` to indicate it's part of the object subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	4c89d31494	streaming: rely on object sources to create object stream When creating an object stream we first look up the object info and, if it's present, we call into the respective backend that contains the object to create a new stream for it. This has the consequence that, for loose object source, we basically iterate through the object sources twice: we first discover that the file exists as a loose object in the first place by iterating through all sources. And, once we have discovered it, we again walk through all sources to try and map the object. The same issue will eventually also surface once the packfile store becomes per-object-source. Furthermore, it feels rather pointless to first look up the object only to then try and read it. Refactor the logic to be centered around sources instead. Instead of first reading the object, we immediately ask the source to create the object stream for us. If the object exists we get stream, otherwise we'll try the next source. Like this we only have to iterate through sources once. But even more importantly, this change also helps us to make the whole logic pluggable. The object read stream subsystem does not need to be aware of the different source backends anymore, but eventually it'll only have to call the source's callback function. Note that at the current point in time we aren't fully there yet: - The packfile store still sits on the object database level and is thus agnostic of the sources. - We still have to call into both the packfile store and the loose object source. But both of these issues will soon be addressed. This refactoring results in a slight change to semantics: previously, it was `odb_read_object_info_extended()` that picked the source for us, and it would have favored packed (non-deltified) objects over loose objects. And while we still favor packed over loose objects for a single source with the new logic, we'll now favor a loose object from an earlier source over a packed object from a later source. Ultimately this shouldn't matter though: the stream doesn't indicate to the caller which source it is from and whether it was created from a packed or loose object, so such details are opaque to the caller. And other than that we should be able to assume that two objects with the same object ID should refer to the same content, so the streamed data would be the same, too. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	385e18810f	packfile: introduce function to read object info from a store Extract the logic to read object info for a packed object from `do_oid_object_into_extended()` into a standalone function that operates on the packfile store. This function will be used in a subsequent commit. Note that this change allows us to make `find_pack_entry()` an internal implementation detail. As a consequence though we have to move around `packfile_store_freshen_object()` so that it is defined after that function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	eb5abbb4e6	streaming: move zlib stream into backends While all backend-specific data is now contained in a backend-specific structure, we still share the zlib stream across the loose and packed objects. Refactor the code and move it into the specific structures so that we fully detangle the different backends from one another. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	1154b2d2e5	streaming: create structure for filtered object streams As explained in a preceding commit, we want to get rid of the union of stream-type specific data in `struct odb_read_stream`. Create a new structure for filtered object streams to move towards this design. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	5f0d8d2e8d	streaming: create structure for packed object streams As explained in a preceding commit, we want to get rid of the union of stream-type specific data in `struct odb_read_stream`. Create a new structure for packed object streams to move towards this design. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	b7774c0f0d	streaming: create structure for loose object streams As explained in a preceding commit, we want to get rid of the union of stream-type specific data in `struct odb_read_stream`. Create a new structure for loose object streams to move towards this design. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	e030d0aeb5	streaming: create structure for in-core object streams As explained in a preceding commit, we want to get rid of the union of stream-type specific data in `struct odb_read_stream`. Create a new structure for in-core object streams to move towards this design. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:44 -08:00
Patrick Steinhardt	595296e124	streaming: allocate stream inside the backend-specific logic When creating a new stream we first allocate it and then call into backend-specific logic to populate the stream. This design requires that the stream itself contains a `union` with backend-specific members that then ultimately get populated by the backend-specific logic. This works, but it's awkward in the context of pluggable object databases. Each backend will need its own member in that union, and as the structure itself is completely opaque (it's only defined in "streaming.c") it also has the consequence that we must have the logic that is specific to backends in "streaming.c". Ideally though, the infrastructure would be reversed: we have a generic `struct odb_read_stream` and some helper functions in "streaming.c", whereas the backend-specific logic sits in the backend's subsystem itself. This can be realized by using a design that is similar to how we handle reference databases: instead of having a union of members, we instead have backend-specific structures with a `struct odb_read_stream base` as its first member. The backends would thus hand out the pointer to the base, but internally they know to cast back to the backend-specific type. This means though that we need to allocate different structures depending on the backend. To prepare for this, move allocation of the structure into the backend-specific functions that open a new stream. Subsequent commits will then create those new backend-specific structs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:44 -08:00
Patrick Steinhardt	3c7722dd4d	streaming: explicitly pass packfile info when streaming a packed object When streaming a packed object we first populate the stream with information about the pack that contains the object before calling `open_istream_pack_non_delta()`. This is done because we have already looked up both the pack and the object's offset, so it would be a waste of time to look up this information again. But the way this is done makes for a somewhat awkward calling interface, as the caller now needs to be aware of how exactly the function itself behaves. Refactor the code so that we instead explicitly pass the packfile info into `open_istream_pack_non_delta()`. This makes the calling convention explicit, but more importantly this allows us to refactor the function so that it becomes its responsibility to allocate the stream itself in a subsequent patch. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:44 -08:00
Patrick Steinhardt	3f64deabdf	streaming: propagate final object type via the stream When opening the read stream for a specific object the caller is also expected to pass in a pointer to the object type. This type is passed down via multiple levels and will eventually be populated with the type of the looked-up object. The way we propagate down the pointer though is somewhat non-obvious. While `istream_source()` still expects the pointer and looks it up via `odb_read_object_info_extended()`, we also pass it down even further into the format-specific callbacks that perform another lookup. This is quite confusing overall. Refactor the code so that the responsibility to populate the object type rests solely with the format-specific callbacks. This will allow us to drop the call to `odb_read_object_info_extended()` in `istream_source()` entirely in a subsequent patch. Furthermore, instead of propagating the type via an in-pointer, we now propagate the type via a new field in the object stream. It already has a `size` field, so it's only natural to have a second field that contains the object type. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:44 -08:00
Patrick Steinhardt	70c8b5f545	streaming: drop the `open()` callback function When creating a read stream we first populate the structure with the open callback function and then subsequently call the function. This layout is somewhat weird though: - The structure needs to be allocated and partially populated with the open function before we can properly initialize it. - We only ever call the `open()` callback function right after having populated the `struct odb_read_stream::open` member, and it's never called thereafter again. So it is somewhat pointless to store the callback in the first place. Especially the first point creates a problem for us. In subsequent commits we'll want to fully move construction of the read source into the respective object sources. E.g., the loose object source will be the one that is responsible for creating the structure. But this creates a problem: if we first need to create the structure so that we can call the source-specific callback we cannot fully handle creation of the structure in the source itself. We could of course work around that and have the loose object source create the structure and populate its `open()` callback, only. But this doesn't really buy us anything due to the second bullet point above. Instead, drop the callback entirely and refactor `istream_source()` so that we open the streams immediately. This unblocks a subsequent step, where we'll also start to allocate the structure in the source-specific logic. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:44 -08:00
Patrick Steinhardt	6bdda3a3b0	streaming: rename `git_istream` into `odb_read_stream` In the following patches we are about to make the `git_istream` more generic so that it becomes fully controlled by the specific object source that wants to create it. As part of these refactorings we'll fully move the structure into the object database subsystem. Prepare for this change by renaming the structure from `git_istream` to `odb_read_stream`. This mirrors the `odb_write_stream` structure that we already have. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:44 -08:00
Junio C Hamano	debbc87557	The second batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-21 09:14:18 -08:00

... 3 4 5 6 7 ...

79367 Commits