337 Commits

Author SHA1 Message Date
Elijah Newren
59257224a3 merge-ort: prevent the_repository from coming back
Due to the use of DEFAULT_ABBREV, we cannot get rid of our usage of
USE_THE_REPOSITORY_VARIABLE.  However, we have removed all other uses of
the_repository in merge-ort a few times.  But they keep coming back.

Define the_repository to make it a compilation error so that they don't
come back any more.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-21 18:34:07 -08:00
Elijah Newren
5eae39beb1 merge-ort: replace the_hash_algo with opt->repo->hash_algo
We have a perfectly valid repository available and do not need to use
the_hash_algo (a shorthand for the_repository->hash_algo), so use the
known repository instead.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-21 18:34:07 -08:00
Elijah Newren
b54590b463 merge-ort: replace the_repository with opt->repo
We have a perfectly valid repository available and do not need to use
the_repository.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-21 18:34:07 -08:00
Elijah Newren
4f8108b5ff merge-ort: pass repository to write_tree()
In order to get rid of a usage of the_repository, we need to know the
value of opt->repo; pass it along to write_tree().  Once we have the
repository, though, we no longer need to pass
opt->repo->hash_algo->rawsz, we can have write_tree() look up that value
itself.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-21 18:34:06 -08:00
Elijah Newren
84325f0730 merge,diff: remove the_repository check before prefetching blobs
Prefetching of blobs from promisor remotes was added to diff in
7fbbcb21b1 (diff: batch fetching of missing blobs, 2019-04-05).  In
that commit,

  https://lore.kernel.org/git/20190405170934.20441-1-jonathantanmy@google.com/

was squashed into

  https://lore.kernel.org/git/44de02e584f449481e6fb00cf35d74adf0192e9d.1553895166.git.jonathantanmy@google.com/

without the extra explanation about the squashed changes being added to
the commit message; in particular, this explanation from that first link
is absent:

> Also, prefetch only if the repository being diffed is the_repository
> (because we do not support lazy fetching for any other repository
>  anyway).

Then, later, this checking was spread from diff.c to diffcore-rename.c
and diffcore-break.c by 95acf11a3d (diff: restrict when prefetching
occurs, 2020-04-07) and then further split in d331dd3b0c
(diffcore-rename: allow different missing_object_cb functions,
2021-06-22).  I also copied the logic from prefetching blobs from
diff.c to merge-ort.c in 2bff554b23 (merge-ort: add prefetching for
content merges, 2021-06-22).

The reason for all these checks was noted above -- we only supported
lazy fetching for the_repository.  However, that changed with
ef830cc434 (promisor-remote: teach lazy-fetch in any repo,
2021-06-17), so these checks are now unnecessary.  Remove them.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-21 18:34:06 -08:00
Junio C Hamano
d445421a54 Merge branch 'rs/xdiff-wo-the-repository'
Reduce dependency on the_repository of xdiff-interface layer.

* rs/xdiff-wo-the-repository:
  xdiff-interface: stop using the_repository
2026-02-17 13:30:42 -08:00
Junio C Hamano
5288202433 Merge branch 'ps/commit-list-functions-renamed'
Rename three functions around the commit_list data structure.

* ps/commit-list-functions-renamed:
  commit: rename `free_commit_list()` to conform to coding guidelines
  commit: rename `reverse_commit_list()` to conform to coding guidelines
  commit: rename `copy_commit_list()` to conform to coding guidelines
2026-02-13 13:39:25 -08:00
René Scharfe
af5706f033 xdiff-interface: stop using the_repository
Use the algorithm-agnostic is_null_oid() and push the dependency of
read_mmblob() on the_repository->objects to its callers.  This allows it
to be used with arbitrary object databases.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-10 08:16:14 -08:00
Collin Funk
4ac4705afa global: constify some pointers that are not written to
The recent glibc 2.43 release had the following change listed in its
NEWS file:

    For ISO C23, the functions bsearch, memchr, strchr, strpbrk, strrchr,
    strstr, wcschr, wcspbrk, wcsrchr, wcsstr and wmemchr that return
    pointers into their input arrays now have definitions as macros that
    return a pointer to a const-qualified type when the input argument is
    a pointer to a const-qualified type.

When compiling with GCC 15, which defaults to -std=gnu23, this causes
many warnings like this:

    merge-ort.c: In function ‘apply_directory_rename_modifications’:
    merge-ort.c:2734:36: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
     2734 |                 char *last_slash = strrchr(cur_path, '/');
          |                                    ^~~~~~~

This patch fixes the more obvious ones by making them const when we do
not write to the returned pointer.

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-05 17:52:49 -08:00
Patrick Steinhardt
9f18d089c5 commit: rename free_commit_list() to conform to coding guidelines
Our coding guidelines say that:

  Functions that operate on `struct S` are named `S_<verb>()` and should
  generally receive a pointer to `struct S` as first parameter.

While most of the functions related to `struct commit_list` already
follow that naming schema, `free_commit_list()` doesn't.

Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-15 05:32:31 -08:00
Patrick Steinhardt
a468f3cefa commit: rename reverse_commit_list() to conform to coding guidelines
Our coding guidelines say that:

  Functions that operate on `struct S` are named `S_<verb>()` and should
  generally receive a pointer to `struct S` as first parameter.

While most of the functions related to `struct commit_list` already
follow that naming schema, `reverse_commit_list()` doesn't.

Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-15 05:32:31 -08:00
Patrick Steinhardt
ff9fb2cfe6 commit: rename copy_commit_list() to conform to coding guidelines
Our coding guidelines say that:

  Functions that operate on `struct S` are named `S_<verb>()` and should
  generally receive a pointer to `struct S` as first parameter.

While most of the functions related to `struct commit_list` already
follow that naming schema, `copy_commit_list()` doesn't.

Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-15 05:32:31 -08:00
René Scharfe
ec7a16b145 cocci: convert parse_tree functions to repo_ variants
Add and apply a semantic patch to convert calls to parse_tree() and
friends to the corresponding variant that takes a repository argument,
to allow the functions that implicitly use the_repository to be retired
once all potential in-flight topics are settled and converted as well.

The changes in .c files were generated by Coccinelle, but I fixed a
whitespace bug it would have introduced to builtin/commit.c.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 18:36:18 -08:00
Junio C Hamano
2db806d817 Merge branch 'en/ort-recursive-d-f-conflict-fix'
The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.

* en/ort-recursive-d-f-conflict-fix:
  merge-ort: fix corner case recursive submodule/directory conflict handling
2026-01-08 16:40:12 +09:00
Elijah Newren
979ee83e8a merge-ort: fix corner case recursive submodule/directory conflict handling
At GitHub, a few repositories were triggering errors of the form:

    git: merge-ort.c:3037: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
    Aborted (core dumped)

While these may look similar to both
    a562d90a35 (merge-ort: fix failing merges in special corner case,
                  2025-11-03)
and
    f6ecb603ff (merge-ort: fix directory rename on top of source of other
                  rename/delete, 2025-08-06)
the cause is different and in this case the problem is not an
over-conservative assertion, but a bug before the assertion where we did
not update all relevant state appropriately.

It sadly took me a really long time to figure out how to get a simple
reproducer for this one.  It doesn't really have that many moving parts,
but there are multiple pieces of background information needed to
understand it.

First of all, when we have two files added at the same path, merge-ort
does a two-way merge of those files.  If we have two directories added
at the same path, we basically do the same thing (taking the union of
files, and two-way merging files with the same name).  But two-way
merging requires components of the same type.  We can't merge the
contents of a regular file with a directory, or with a symlink, or with
a submodule.  Nor can any of those other types be merged with each
other, e.g. merging a submodule with a directory is a bad idea.  When
two paths have the same name but their types do not match, merge-ort is
forced to move one of them to an alternate filename (using the
unique_path() function).

Second, if two commits being merged have more than one merge-base,
merge-ort will merge the merge-bases to create a virtual merge-base, and
use that as the base commit.

Third, one of the really important optimizations in merge-ort is trivial
tree-level resolution (roughly meaning merging trees without recursing
into them).  This optimization has some nuance to it that is important
to the current bug, and to understand it, it helps to first look at the
high-level overview of how merge-ort runs; there are basically three
high-level functions that the work is divided between:
    collect_merge_info() - walks the top-level trees getting individual
                           paths of interest
    detect_renames() - detect renames between paths in order to match up
                       paths for three-way merging
    process_entries() - does a few things of interest:
      * three-way merging of files,
      * other special handling (e.g. adjusting paths with conflicting
        types to avoid path collisions)
      * as it finishes handling all the files within a subdirectory,
        writes out a new tree object for that directory

If it were not for renames, we could just always do tree-level merging
whenever the tree on at least one side was unmodified.  Unfortunately,
we need to recurse into trees to determine whether there are renames.
However, we can also do tree-level merging so long as there aren't any
*relevant* renames (another merge-ort optimization), which we can
determine without recursing into trees.

We would also be able to do tree-level merging if we somehow apriori
knew what renames existed, by only recursing into the trees which we
could otherwise trivially merge if they contained files involved in
renames.  That might not seem useful, because we need to find out the
renames and we have to recurse into trees to do so, but when you find
out that the process_entries() step is more computationally expensive
than the collect_merge_info() step, it yields an interesting strategy:
   * run collect_merge_info()
   * run detect_renames()
   * cache the renames()
   * restart -- rerun collect_merge_info(), using the cached renames to
     only recurse into the needed trees
   * we already have the renames cached so no need to re-detect
   * run process_entries() on the reduced list of paths
which was implemented back in 7bee6c1004 (merge-ort: avoid recursing
into directories when we don't need to, 2021-07-16)  Crucially, this
restarting only occurs if the number of paths we could skip recursing
into exceeds the number we still need to recurse into by some safety
factor (wanted_factor in handle_deferred_entries()); forgetting this
fact is a great way to repeatedly fail to create a minimal testcase for
several days and go down alternate wrong paths).

Now, I earlier summarized this optimization as "merging trees without
recursing into them", but this optimization does not require that all
three sides of history has a directory at a given path.  So long as the
tree on one side matches the tree in the base version, we can decide to
resolve in favor of whatever the other side of history has at that path
-- be it a directory, a file, a submodule, or a symlink.  Unfortunately,
the code in question didn't fully realize this, and was written assuming
the base version and both sides would have a directory at the given
path, as can be seen by the "ci->filemask == 0" comment in
resolve_trivial_directory_merge() that was added as part of 7bee6c1004
(merge-ort: avoid recursing into directories when we don't need to,
2021-07-16).  A few additional lines of code are needed to handle cases
where we have something other than a directory on the other side of
history.

But, knowing that resolve_trivial_directory_merge() doesn't have
sufficient state updating logic doesn't show us how to trigger a bug
without combining with the other bits of information we provided above.
Here's a relevant testcase:
   * branches A & B
   * commit A1: adds "folder" as a directory with files tracked under it
   * commit B1: adds "folder" as a submodule
   * commit A2: merges B1 into A1, keeping "folder" as a directory
     (and in fact, with no changes to "folder" since A1), discarding the
     submodule
   * commit B2: merges A1 into B1, keeping "folder" as a submodule
     (and in fact, with no changes to "folder" since B1), discarding the
     directory
Here, if we try to merge A2 & B2, the logic proceeds as follows:
   * we have multiple merge-bases: A1 & B1.  So we have to merge those
     to get a virtual merge base.
   * due to "folder" as a directory and "folder" as a submodule, the
     path collision logic triggers and renames "folder" as a submodule
     to "folder~Temporary merge branch 2" so we can keep it alongside
     "folder" as a directory.
   * we now have a virtual merge base (containing both "folder"
     directory and a "folder~Temporary merge branch 2" submodule) and
     can now do the outer merge
   * in the first step of the outer merge, we attempt to defer recursing
     into folder/ as a directory, but find we need to for rename
     detection.
   * in rename detection, we note that "folder~Temporary merge branch 2"
     has the same hash as "folder" as a submodule in B2, which means we
     have an exact rename.
   * after rename detection, we discover no path in folder/ is needed
     for renames, and so we can cache renames and restart.
   * after restarting, we avoid recursing into "folder/" and realize we
     can resolve it trivially since it hasn't been modified.  The
     resolution removes "folder/", leaving us only "folder" as a
     submodule from commit B2.
   * After this point, we should have a rename/delete conflict on
     "folder~Temporary merge branch 2" -> "folder", but our marking of
     the merge of "folder" as clean broke our ability to handle that and
     in fact triggers an assertion in process_renames().

When there was a df_conflict (directory/"file" conflict, where "file"
could be submodule or regular file or symlink), ensure
resolve_trivial_directory_merge() handles it properly.  In particular:
  * do not pre-emptively mark the path as cleanly merged if the
    remaining path is a file; allow it to be processed in
    process_entries() later to determine if it was clean
  * clear the parts of dirmask or filemask corresponding to the matching
    sides of history, since we are resolving those away
  * clear the df_conflict bit afterwards; since we cleared away the two
    matching sides and only have one side left, that one side can't
    have a directory/file conflict with itself.

Also add the above minimal testcase showcasing this bug to t6422, **with
a sufficient number of paths under the folder/ directory to actually
trigger it**.  (I wish I could have all those days back from all the
wrong paths I went down due to not having enough files under that
directory...)

I know this commit has a very high ratio of lines in the commit message
to lines of comments, and a relatively high ratio of comments to actual
code, but given how long it took me to track down, on the off chance
that we ever need to further modify this logic, I wanted it thoroughly
documented for future me and for whatever other poor soul might end up
needing to read this commit message.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-30 08:59:52 +09:00
Junio C Hamano
1b93acd13a Merge branch 'ad/blame-diff-algorithm'
"git blame" learns "--diff-algorithm=<algo>" option.

* ad/blame-diff-algorithm:
  blame: make diff algorithm configurable
  xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK
2025-11-26 10:32:40 -08:00
Elijah Newren
a562d90a35 merge-ort: fix failing merges in special corner case
At GitHub, we had a repository that was triggering
  git: merge-ort.c:3032: process_renames: Assertion `newinfo && !newinfo->merged.clean` failed.
during git replay.

This sounds similar to the somewhat recent f6ecb603ff (merge-ort: fix
directory rename on top of source of other rename/delete, 2025-08-06),
but the cause is different.  Unlike that case, there are no
rename-to-self situations arising in this case, and new to this case it
can only be triggered during a replay operation on the 2nd or later
commit being replayed, never on the first merge in the sequence.

To trigger, the repository needs:
  * an upstream which:
    * renames a file to a different directory, e.g.
        old/file -> new/file
    * leaves other files remaining in the original directory (so that
      e.g. "old/" still exists upstream even though file has been
      removed from it and placed elsewhere)
  * a topic branch being rebased where:
    * a commit in the sequence:
      * modifies old/file
    * a subsequent commit in the sequence being replayed:
      * does NOT touch *anything* under new/
      * does NOT touch old/file
      * DOES modify other paths under old/
      * does NOT have any relevant renames that we need to detect
        _anywhere_ elsewhere in the tree (meaning this interacts
        interestingly with both directory renames and cached renames)

In such a case, the assertion will trigger.  The fix turns out to be
surprisingly simple.  I have a very vague recollection that I actually
considered whether to add such an if-check years ago when I added the
very similar one for oldinfo in 1b6b902d95 (merge-ort:
process_renames() now needs more defensiveness, 2021-01-19), but I think
I couldn't figure out a possible way to trigger it and was worried at
the time that if I didn't know how to trigger it then I wasn't so sure
that simply skipping it was correct.  Waiting did give me a chance to
put more thorough tests and checks into place for the rename-to-self
cases a few months back, which I might not have found as easily
otherwise.  Anyway, put the check in place now and add a test that
demonstrates the fix.

Note that this bug, as demonstrated by the conditions listed above,
runs at the intersection of relevant renames, trivial directory
resolutions, and cached renames.  All three of those optimizations are
ones that unfortunately make the code (and testcases!) a bit more
complex, and threading all three makes it a bit more so.  However, the
testcase isn't crazy enough that I'd expect no one to ever hit it in
practice, and was confused why we didn't see it before.  After some
digging, I discovered that merge.directoryRenames=false is a workaround
to this bug, and GitHub used that setting until recently (it was a
"temporary" match-what-libgit2-does piece of code that lasted years
longer than intended).  Since the conditions I gave above for triggering
this bug rule out the possibility of there being directory renames, one
might assume that it shouldn't matter whether you try to detect such
renames if there aren't any.  However, due to commit a16e8efe5c
(merge-ort: fix merge.directoryRenames=false, 2025-03-13), the heavy
hammer used there means that merge.directoryRenames=false ALSO turns off
rename caching, which is critical to triggering the bug.  This becomes
a bit more than an aside since...

Re-reading that old commit, a16e8efe5c (merge-ort: fix
merge.directoryRenames=false, 2025-03-13), it appears that the solution
to this latest bug might have been at least a partial alternative
solution to that old commit.  And it may have been an improved
alternative (or at least help implement one), since it may be able to
avoid the heavy-handed disabling of rename cache.  That might be an
interesting future thing to investigate, but is not critical for the
current fix.  However, since I spent time digging it all up, at least
leave a small comment tweak breadcrumb to help some future reader
(myself or others) who wants to dig further to connect the dots a little
quicker.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17 14:08:09 -08:00
Elijah Newren
d5663a4b05 merge-ort: remove debugging crud
While developing commit a16e8efe5c (merge-ort: fix
merge.directoryRenames=false, 2025-03-13), I was testing things out and
had an extra condition on one of the if-blocks that I occasionally
swapped between '&& 0' and '&& 1' to see the effects of the changes.  I
forgot to remove it before submitting and it wasn't caught in review.
Remove it now.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17 14:08:08 -08:00
Antonin Delpeuch
881793c4f7 xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience
and histogram diffs, not for the minimal one. This means that when
reseting the diff algorithm to the default one, one needs to separately
clear the bit for the minimal diff. There are places in the code that fail
to do that: merge-ort.c and builtin/merge-file.c.

Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate
clearing of this bit in the places where it hasn't been forgotten.

Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17 09:31:59 -08:00
Junio C Hamano
0949f24eb4 Merge branch 'en/ort-rename-fixes' into maint-2.51
Various bugs about rename handling in "ort" merge strategy have
been fixed.

* en/ort-rename-fixes:
  merge-ort: fix directory rename on top of source of other rename/delete
  merge-ort: fix incorrect file handling
  merge-ort: clarify the interning of strings in opt->priv->path
  t6423: fix missed staging of file in testcases 12i,12j,12k
  t6423: document two bugs with rename-to-self testcases
  merge-ort: drop unnecessary temporary in check_for_directory_rename()
  merge-ort: update comments to modern testfile location
2025-10-15 10:29:28 -07:00
Junio C Hamano
d1123cd810 Merge branch 'en/ort-rename-fixes'
Various bugs about rename handling in "ort" merge strategy have
been fixed.

* en/ort-rename-fixes:
  merge-ort: fix directory rename on top of source of other rename/delete
  merge-ort: fix incorrect file handling
  merge-ort: clarify the interning of strings in opt->priv->path
  t6423: fix missed staging of file in testcases 12i,12j,12k
  t6423: document two bugs with rename-to-self testcases
  merge-ort: drop unnecessary temporary in check_for_directory_rename()
  merge-ort: update comments to modern testfile location
2025-08-21 13:47:02 -07:00
Elijah Newren
f6ecb603ff merge-ort: fix directory rename on top of source of other rename/delete
At GitHub, we've got a real-world repository that has been triggering
failures of the form:

    git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.

which comes from the line:

    VERIFY_CI(newinfo);

Unfortunately, this one has been quite complex to unravel, and is a
bit complex to explain.  So, I'm going to carefully try to explain each
relevant piece needed to understand the fix, then carefully build up
from a simple testcase to some of the relevant testcases.

== New special case we need to consider ==

Rename pairs in the diffcore machinery connect the source path of a
rename with the destination path of a rename.  Since we have rename
pairs to consider on both sides of history since the merge base,
merging has to consider a few special cases of possible overlap:

  A) two rename pairs having the same target path
  B) two rename pairs having the same source path
  C) the source path of one rename pair being the target path of a
     different rename pair

Some of these came up often enough that we gave them names:
  A) a rename/rename(2to1) conflict (looks similar to an add/add conflict)
  B) a rename/rename(1to2) conflict, which represents the same path being
     renamed differently on the two sides of history
  C) not yet named

merge-ort is well-prepared to handle cases (A) and (B), as was
merge-recursive (which was merge-ort's predecessor).  Case (C) was
briefly considered during the years of merge-recursive maintenance,
but the full extent of support it got was a few FIXME/TODO comments
littered around the code highlighting some of the places that would
probably need to be fixed to support it.  When I wrote merge-ort I
ignored case (C) entirely, since I believed that case (C) was only
possible if we were to support break detection during merges.  Not
only had break detection never been supported by any merge algorithm,
I thought break detection wasn't worth the effort to support in a
merge algorithm.  However, it turns out that case (C) can be triggered
without break detection, if there's enough moving pieces.

Before I dive into how to trigger case (C) with directory renames plus
other renames, it might be helpful to use a simpler example with break
detection first.  And before we get to that it may help to explain
some more basics of handling renames in the merge algorithm.  So, let
me first backup and provide a quick refresher on each of

  * handling renames
  * what break detection would mean, if supported in merging
  * handling directory renames

From there, I'll build up from a basic directory rename detection case
to one that triggers a failure currently.

== Handling renames ==

In the merge machinery when we have a rename of a path A -> B,
processing that rename needs to remove path A, and make sure that path B
has the relevant information.  Note that if the content was also
modified on both sides, this may mean that we have 3 different stages
that need to be stored at path B instead of having some stored at path
A.

Having all stages stored at path B makes it much easier for users to
investigate and resolve the content conflict associated with a renamed
path.  For example:
  * "git status" doesn't have to figure out how to list paths A & B and
    attempt to connect them for users; it can just list path B.
  * Users can use "git ls-files -u B" (instead of trying to find the
    previous name of the file so they can list both, i.e. "git ls-files
    -u A B")
  * Users can resolve via "git add B" (without needing to "git rm A")

== What break detection would mean ==

If break detection were supported, we might have cases where A -> B
*and* C -> A, meaning that both rename pairs might believe they need to
update A.  In particular, the processing of A -> B would need to be
careful to not clear out all stages of A and mark it resolved, while
both renames would need to figure out which stages of A belong with A
and which belong with B, so that both paths have the right stages
associated with them.

merge-ort (like merge-recursive before it) makes no attempt to handle
break detection; it runs with break detection turned off.  It would
need to be retrofitted to handle such cases.

== Directory rename detection ==

If one side of history renames directory D/ -> E/, and the other side of
history adds new files to D/, then directory rename detection notices
and suggests moving those new files to E/.  A similar thing is done for
paths renamed into D/, causing them to be transitively renamed into E/.

The default in the merge machinery is to report a conflict whenever a
directory rename might modify the location of a path, so that users can
decide whether they wanted the original path or the
directory-rename-induced location.  However, that means the default
codepath still runs through all the directory rename detection logic, it
just supplements it with providing conflict notices when it is done.

== Building up increasingly complex testcases ==

I'll start with a really simple directory rename example, and then
slowly add twists that explain new pieces until we get to the
problematic cases:

=== Testcase 1 ===

Let's start with a concrete example, where particular files/directories of
interest that exist or are changed on each side are called out:

  Original:   <nothing of note>
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/

For this case, we'd expect to see the original B/file appear not at
C/file but at A/file.

(We would also expect a conflict notice that the user will want to
choose between C/file and A/file, but I'm going to ignore conflict
notices from here on by assuming merge.directoryRenames is set to
`true` rather than `conflict`; the only difference that assumption
makes is whether that makes the merge be considered to be conflicted
and whether it prints a conflict notice; what is written to the index
or working directory is unchanged.)

=== Testcase 2 ===

Modify testcase 1 by having A/file exist from the start:

  Original:   A/file exists
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/

In such a case, to avoid user confusion at what looks kind of like an
add/add conflict (even though the original path at A/file was not added
by either side of the merge), we turn off directory rename detection for
this path and print a "in the way" warning to the user:
    CONFLICT (implicit dir rename): Existing file/dir ... in the way ...
The testcases in section 5 of t6423 explore these in more detail.

=== Testcase 3 ===

Let's modify testcase 1 in a slightly different way: have A/file be
added by their side rather than it already existing.

  Original:   <nothing of note>
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/
              add A/file

In this case, the directory rename detection basically transforms our
side's original B/file -> C/file into a B/file -> A/file, and so we
get a rename/add conflict, with one version of A/file coming from the
renamed file, and another coming from the new A/file, each stored as
stages 2 and 3 in conflicts.  This kind of add/add conflict is perhaps
slightly more complex than a regular add/add conflict, but with the
printed messages it makes sense where it came from and we have
different stages of the file to work with to resolve the conflict.

=== Testcase 4 ===

Let's do something similar to testcase 3, but have the opposite side of
history add A/file:

  Original:   <nothing of note>
  our side:   rename B/file -> C/file
              add    A/file
  their side: rename C/     -> A/

Now if we allow directory rename detection to modify C/file to A/file,
then we also get a rename/add conflict, but in this case we'd need both
higher order stages being recorded on side 2, which makes no sense.  The
index can't store multiple stage 2 entries, and even if we could, it
would probably be confusing for users to work with.  So, similar to what
we do when there was an A/file in the original version, we simply turn
off directory rename detection for cases like this and provide the "in
the way" CONFLICT notice to the user.

=== Testcase 5 ===

We're slowly getting closer.  Let's mix it up by having A/file exist at
the beginning but not exist on their side:

  original:   A/file exists
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/
              rename A/file -> D/file

For this case, you could say that since A/file -> D/file, it's no longer
in the way of C/file being moved by directory rename detection to
A/file.  But that would give us a case where A/file is both the source
and the target of a rename, similar to break detection, which the code
isn't currently equipped to handle.

This is not yet the case that causes current failures; to the current
code, this kind of looks like testcase 4 in that A/file is in the way
on our side (since A/file was in the original and was umodified by our
side).  So, it results in a "in the way" notification with directory
rename detection being turned off for A/file so that B/file ends up at
C/file.

Perhaps the resolution could be improved in the future, but our "in
the way" checks prevented such problems by noticing that A/file exists
on our side and thus turns off directory rename detection from
affecting C/file's location.  So, while the merge result could be
perhaps improved, the fact that this is currently handled by giving
the user an "in the way" message gives the user a chance to resolve
and prevents the code from tripping itself up.

=== Testcase 6 ===

Let's modify testcase 5 a bit more, to also delete A/file on our side:

  original:   A/file exists
  our side:   rename B/file -> C/file
              delete A/file
  their side: rename C/     -> A/
              rename A/file -> D/file

Now the "in the way" logic doesn't detect that there's an A/file in
the way (neither side has an A/file anymore), so it's fine to
transitively rename C/file further to A/file...except that we end up
with A/file being both the source of one rename, and the target of a
different rename.  Each rename pair tries to handle the resolution of
the source and target paths of its own rename.  But when we go to
process the second rename pair in process_renames(), we do not expect
either the source or the destination to be marked as handled already;
so, when we hit the sanity checks that these are not handled:

    VERIFY_CI(oldinfo);
    VERIFY_CI(newinfo);

then one of these is going to throw an assertion failure since the
previous rename pair already marked both of its paths as handled.
This will give us an error of the form:

    git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.

This is the failure we're currently triggering, and it fundamentally
depends on:
  * a path existing in the original
  * that original path being removed or renamed on *both* sides
  * some kind of directory rename moving some *other* path into that
    original path

This was added as testcase 12q in t6423.

=== Testcase 7 ===

Bonus bug found while investigating!

Let's go back to the comparison between testcases 2 & 3, and set up a
file present on their side that we need to consider:

  Original:   A/file exists
  our side:   rename B/file -> C/file
              rename A/file -> D/file
  their side: rename C/     -> A/

Here, there is no A/file in the way on our side like testcase 4.
There is an A/file present on their side like testcase 3, which was an
add/add conflict, but that's associated with the file be renamed to
D/file.  So, that really shouldn't be an add/add conflict because we
instead want all modes of the original A/file to be transported to
D/file.

Unfortunately, the current code kind of treats it like an add/add
conflict instead...but even worse.  There is also a valid mode for
A/file in the original, which normally goes to stage 1.  However, an
add/add conflict should be represented in the index with no mode at
stage 1 (for the original side), only modes at stages 2 and 3 (for our
and their side), so for an add/add we'd expect that mode for A/file in
the original version to be cleared out (or be transported to D/file).

Unfortunately, the code currently leaves not only the stage 3 entry
for A/file intact, it also leaves the stage 1 entry for A/file.  This
results in `git ls-files -u A/file` output of the form:

    100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 1	A/file
    100644 0cfbf08886fca9a91cb753ec8734c84fcbe52c9f 2	A/file
    100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 3	A/file

This would likely cause users to believe this isn't an add/add
conflict; rather, this would lead them to believe that A/file was only
modified on our side and that therefore it should not have been a
conflict in the first place.  And while resolving the conflict in
favor of our side is the correct resolution (because stages 1 and 3
should have been cleared out in the first place), this is certainly
likely to cause confusion for anyone attempting to investigate why
this path was marked as conflicted.

This was added as testcase 12p in t6423.

== Attempted solutions that I discarded ==

1) For each side of history, create a strset of the sources of each
   rename on the other side of history.  Then when using directory
   renames to modify existing renames, verify that we aren't renaming
   to a source of another rename.

   Unfortunately, the "relevant renames" optimization in merge-ort
   means we often don't detect renames -- we just see a delete and an
   add -- which is easy to forget and makes debugging testcases harder,
   but it also turns out that this solution in insufficient to solve
   the related problems in the area (more on that below).

2) Modify the code to be aware of the possibility of renaming to
   the source of another side's rename, and make all the conflict
   resolution logic for each case (including existing
   rename/rename(2to1) and rename/rename(1to2) cases) handle the
   additional complexity.  It turns out there was much more code to
   audit than I wanted, for a really niche case.  I didn't like how
   many changes were needed, and aborted.

== Solution ==

We do not want the stages of unrelated files appearing at the same path
in the index except when dealing with an add/add conflict.  While we
previously handled this for stages 2 & 3, we also need to worry about
stage 1.  So check for a stage 1 index entry being in the way of a
directory rename.

However, if we can detect that the stage 1 index entry is actually from
a related file due to a directory-rename-causes-rename-to-self
situation, then we can allow the stage 1 entry to remain.

From this wording, you may note that it's not just rename cases that
are a problem; bugs could be triggered with directory renames vs simple
adds.  That leads us to...

== Testcases 8+ ==

Another bonus bug, found via understanding our final solutions (and the
failure of our first attempted solution)!

Let's tweak testcase 7 a bit:

  Original:   A/file exists
  our side:   delete A/file
              add -> C/file
  their side: delete A/file
              rename C/     -> A/

Here, there doesn't seem to be a big problem.  Sure C/file gets modified
via the directory rename of C/ -> A/ so that it becomes A/file, but
there's no file in the way, right?  Actually, here we have a problem
that the stage 1 entry of A/file would be combined with the stage 2
entry of C/file, and make it look like a modify/delete conflict.
Perhaps there is some extra checking that could be added to the code to
make it attempt to clear out the stage 1 entry of A/file, but the
various rename-to-self-via-directory-rename testcases make that a bit
more difficult.  For now, it's easier to just treat this as a
path-in-the-way situation and not allow the directory rename to modify
C/file.

That sounds all well and good, but it does have an interesting side
effect.  Due to the "relevant renames" optimizations in merge-ort (i.e.
only detect the renames you need), 100% renames whose files weren't
modified on the other side often go undetected.  This means that if we
modify this testcase slightly to:

  Original:   A/file exists
  our side:   A/file -> C/file
  their side: rename C/ -> A/

Then although this looks like where the directory rename just moves
C/file back to A/file and there's no problem, we may not detect the
A/file -> C/file rename.  Instead it will look like a deletion of A/file
and an addition of C/file.  The directory rename then appears to be
moving C/file to A/file, which is on top of an "unrelated" file (or at
least a file it doesn't know is related).  So, we will report
path-in-the-way conflicts now in cases where we didn't before.  That's
better than silently and accidentally combining stages of unrelated
files and making them look like a modify/delete; users can investigate
the reported conflict and simply resolve it.

This means we tweak the expected solution for testcases 12i, 12j, and
12k.  (Those three tests are basically the same test repeated three
times, but I was worried when I added those that subtle differences in
parent/child, sibling/sibling, and toplevel directories might mess up
how rename-to-self testcases actually get handled.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07 13:24:00 -07:00
Elijah Newren
885ffe538b merge-ort: fix incorrect file handling
We have multiple bugs here -- accidental silent file deletion,
accidental silent file retention for files that should be deleted,
and incorrect number of entries left in the index.

The series merged at commit d3b88be1b4 (Merge branch
'en/merge-dir-rename-corner-case-fix', 2021-07-16) introduced testcase
12i-12k in t6423 which checked for rename-to-self cases, and fixed bugs
that merge-ort and merge-recursive had with these testcases.  At the
time, I noted that merge-ort had one bug for these cases, while
merge-recursive had two.  It turns out that merge-ort did in fact have
another bug, but the "relevant renames" optimizations were masking it.
If we modify testcase 12i from t6423 to modify the file in the commit
that renames it (but only modify it enough that it can still be detected
as a rename), then we can trigger silent deletion of the file.

Tweak testcase 12i slightly to make the file in question have more than
one line in it.  This leaves the testcase intact other than changing the
initial contents of this one file.  The purpose of this tweak is to
minimize the changes between this testcase and a new one that we want to
add.  Then duplicate testcase 12i as 12i2, changing it so that it adds a
single line to the file in question when it is renamed; testcase 12i2
then serves as a testcase for this merge-ort bug that I previously
overlooked.

Further, commit 98a1a00d53 (t6423: add a testcase causing a failed
assertion in process_renames, 2025-03-06), fixed an issue with
rename-to-self but added a new testcase, 12n, that only checked for
whether the merge ran to completion.  A few commits ago, we modified
this test to check for the number of entries in the index -- but noted
that the number was wrong.  And we also noted a
silently-keep-instead-of-delete bug at the same time in the new testcase
12n2.

In summary, we have the following bugs with rename-to-self cases:
  * silent deletion of file expected to be kept (t6423 testcase 12i2)
  * silent retention of file expected to be removed (t6423 testcase 12n2)
  * wrong number of extries left in the index (t6423 testcase 12n)

All of these bugs arise because in a rename-to-self case, when we have a
rename A->B, both A and B name the same file.  The code in
process_renames() assumes A & B are different, and tries to move the
higher order stages and file contents so that they are associated just
with the new path, but the assumptions of A & B being different can
cause A to be deleted when it's not supposed to be or mark B as resolved
and kept in place when it's supposed to be deleted.  Since A & B are
already the same path in the rename-to-self case, simply skip the steps
in process_renames() for such files to fix these bugs.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07 13:23:59 -07:00
Elijah Newren
d3de978600 merge-ort: clarify the interning of strings in opt->priv->path
Because merge-ort is dealing with potentially all the pathnames in the
repository, it sometimes needs to do an awful lot of string comparisons.
Because of this, struct merge_options_internal's path member was
envisioned from the beginning to contain an interned value for every
path in order to allow us to compare strings via pointer comparison
instead of using strcmp.  See
  * 5b59c3db05 (merge-ort: setup basic internal data structures,
                  2020-12-13)
  * f591c47246 (merge-ort: copy and adapt merge_3way() from
                  merge-recursive.c, 2021-01-01)
for some of the early comments.

However, the original comment was slightly misleading when it switched
from mentioning paths to only mentioning directories.  Fix that, and
while at it also point to an example in the code which applies the extra
needed care to permit the pointer comparison optimization.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07 13:23:59 -07:00
Elijah Newren
edbe2abcd8 merge-ort: drop unnecessary temporary in check_for_directory_rename()
check_for_directory_rename() had a weirdly coded check for whether a
strmap contained a certain key.  Replace the temporary variable and call
to strmap_get_entry() with the more natural strmap_contains() call.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07 13:23:58 -07:00
Elijah Newren
c5a2c765a0 merge-ort: update comments to modern testfile location
In commit 919df31955 (Collect merge-related tests to t64xx,
2020-08-10), merge related tests were moved from t60xx to t64xx.  Some
comments in merge-ort relating to some tricky code referenced specific
testcases within certain testfiles for additional information, but
referred to their historical testfile names; update the testfile names
to mention their modern location.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07 13:23:58 -07:00
Junio C Hamano
4ce0caa7cc Merge branch 'ps/object-file-wo-the-repository'
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.

* ps/object-file-wo-the-repository:
  object-file: get rid of `the_repository` in index-related functions
  object-file: get rid of `the_repository` in `force_object_loose()`
  object-file: get rid of `the_repository` in `read_loose_object()`
  object-file: get rid of `the_repository` in loose object iterators
  object-file: remove declaration for `for_each_file_in_obj_subdir()`
  object-file: inline `for_each_loose_file_in_objdir_buf()`
  object-file: get rid of `the_repository` when writing objects
  odb: introduce `odb_write_object()`
  loose: write loose objects map via their source
  object-file: get rid of `the_repository` in `finalize_object_file()`
  object-file: get rid of `the_repository` in `loose_object_info()`
  object-file: get rid of `the_repository` when freshening objects
  object-file: inline `check_and_freshen()` functions
  object-file: get rid of `the_repository` in `has_loose_object()`
  object-file: stop using `the_hash_algo`
  object-file: fix -Wsign-compare warnings
2025-08-05 11:53:55 -07:00
Patrick Steinhardt
5d215a7b3e config: drop git_config_get_bool() wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config_get_bool()`. All
callsites are adjusted so that they use
`repo_config_get_bool(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:20 -07:00
Patrick Steinhardt
3fda14d86d config: drop git_config_get_int() wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config_get_int()`. All
callsites are adjusted so that they use
`repo_config_get_int(the_repository, ...)` instead. While some callsites
might already have a repository available, this mechanical conversion is
the exact same as the current situation and thus cannot cause any
regression. Those sites should eventually be cleaned up in a later patch
series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:20 -07:00
Patrick Steinhardt
627d08cca7 config: drop git_config_get_string() wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config_get_string()`.
All callsites are adjusted so that they use
`repo_config_get_string(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:19 -07:00
Patrick Steinhardt
9ce196e86b config: drop git_config() wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:18 -07:00
Patrick Steinhardt
ab1c6e1d12 odb: introduce odb_write_object()
We do not have a backend-agnostic way to write objects into an object
database. While there is `write_object_file()`, this function is rather
specific to the loose object format.

Introduce `odb_write_object()` to plug this gap. For now, this function
is a simple wrapper around `write_object_file()` and doesn't even use
the passed-in object database yet. This will change in subsequent
commits, where `write_object_file()` is converted so that it works on
top of an `odb_source`. `odb_write_object()` will then become
responsible for deciding which source an object shall be written to.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:15 -07:00
Patrick Steinhardt
d4ff88aee3 odb: rename repo_read_object_file()
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.

Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:38 -07:00
Patrick Steinhardt
e989dd96b8 odb: rename oid_object_info()
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.

Introduce compatibility wrappers so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:37 -07:00
Patrick Steinhardt
8f49151763 object-store: rename files to "odb.{c,h}"
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Elijah Newren
c6d5ca10e3 merge-ort: add a new mergeability_only option
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted.  For such cases, add a new mergeability_only option.  This
option allows the merge machinery to, in the "outer layer" of the merge:
  * exit upon first[-ish] conflict
  * avoid (not prevent) writing merged blobs/trees to the object store

I have a number of qualifiers there, so let me explain each:

"outer layer":

Note that since the recursive merge of merge bases (corresponding to
call_depth > 0) can conflict without the outer final merge
(corresponding to call_depth == 0) conflicting, we can't short-circuit
nor avoid writing merged blobs/trees to the object store during those
inner merges.

"first-ish conflict":

The current patch only exits early from process_entries() on the first
conflict it detects, but conflicts could have been detected in a
previous function call, namely detect_and_process_renames().  However:
  * conflicts detected by detect_and_process_renames() are quite rare
    conflict types
  * the detection would still come after regular rename detection
    (which is the expensive part of detect_and_process_renames()), so
    it is not saving us much in computation time given that
    process_entries() directly follows detect_and_process_renames()
  * [this overlaps with the next bullet point] process_entries() is the
    place where virtually all object writing occurs (object writing is
    sometimes more of a concern for Forges than computation time), so
    exiting early here isn't saving us much in object writes either
  * the code changes needed to handle an earlier exit are slightly
    more invasive in detect_and_process_renames() than for
    process_entries().
Given the rareness of the even earlier conflicts, the limited savings
we'd get from exiting even earlier, and in an attempt to keep this
patch simpler, we don't guarantee that we actually exit on the first
conflict detected.  We can always revisit this decision later if we
decide that a further micro-optimization to exit slightly earlier in
rare cases is worthwhile.

"avoid (not prevent) writing objects":

The detect_and_process_renames() call can also write objects to the
object store, when rename/rename conflicts involve one (or more) files
that have also been modified on both sides.  Because of this alternate
call path leading to handle_content_merges(), our "early exit" does not
prevent writing objects entirely, even within the "outer layer"
(i.e. even within call_depth == 0).  I figure that's fine though, since
we're already writing objects for the inner merges (i.e. for call_depth
> 0), which are likely going to represent vastly more objects than files
involved in rename/rename+modify/modify cases in the outer merge, on
average.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 15:09:14 -07:00
Junio C Hamano
36d8035d27 Merge branch 'ps/object-file-cleanup'
Code clean-up.

* ps/object-file-cleanup:
  object-store: merge "object-store-ll.h" and "object-store.h"
  object-store: remove global array of cached objects
  object: split out functions relating to object store subsystem
  object-file: drop `index_blob_stream()`
  object-file: split up concerns of `HASH_*` flags
  object-file: split out functions relating to object store subsystem
  object-file: move `xmmap()` into "wrapper.c"
  object-file: move `git_open_cloexec()` to "compat/open.c"
  object-file: move `safe_create_leading_directories()` into "path.c"
  object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-24 17:25:33 -07:00
Junio C Hamano
c3ebf18eb2 Merge branch 'en/merge-recursive-debug'
Remove remnants of the recursive merge strategy backend, which was
superseded by the ort merge strategy.

* en/merge-recursive-debug:
  builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM
  tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm
  merge-recursive.[ch]: thoroughly debug these
  merge, sequencer: switch recursive merges over to ort
  sequencer: switch non-recursive merges over to ort
  merge-ort: enable diff-algorithms other than histogram
  builtin/merge-recursive: switch to using merge_ort_generic()
  checkout: replace merge_trees() with merge_ort_nonrecursive()
2025-04-17 10:28:18 -07:00
Junio C Hamano
ee847e0034 Merge branch 'ps/object-wo-the-repository'
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.

* ps/object-wo-the-repository:
  hash: stop depending on `the_repository` in `null_oid()`
  hash: fix "-Wsign-compare" warnings
  object-file: split out logic regarding hash algorithms
  delta-islands: stop depending on `the_repository`
  object-file-convert: stop depending on `the_repository`
  pack-bitmap-write: stop depending on `the_repository`
  pack-revindex: stop depending on `the_repository`
  pack-check: stop depending on `the_repository`
  environment: move access to "core.bigFileThreshold" into repo settings
  pack-write: stop depending on `the_repository` and `the_hash_algo`
  object: stop depending on `the_repository`
  csum-file: stop depending on `the_repository`
2025-04-15 13:50:15 -07:00
Patrick Steinhardt
d9f517d051 object-file: split out functions relating to object store subsystem
While we have the "object-store.h" header, most of the functionality for
object stores is actually hosted in "object-file.c". This makes it hard
to find relevant functions and causes us to mix up concerns.

Split out functions relating to the object store subsystem into a new
"object-store.c" file.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:36 -07:00
Junio C Hamano
0dfca98881 Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanup
* ps/object-wo-the-repository:
  hash: stop depending on `the_repository` in `null_oid()`
  hash: fix "-Wsign-compare" warnings
  object-file: split out logic regarding hash algorithms
  delta-islands: stop depending on `the_repository`
  object-file-convert: stop depending on `the_repository`
  pack-bitmap-write: stop depending on `the_repository`
  pack-revindex: stop depending on `the_repository`
  pack-check: stop depending on `the_repository`
  environment: move access to "core.bigFileThreshold" into repo settings
  pack-write: stop depending on `the_repository` and `the_hash_algo`
  object: stop depending on `the_repository`
  csum-file: stop depending on `the_repository`
2025-04-08 14:28:17 -07:00
Elijah Newren
ad45b327c0 merge-recursive.[ch]: thoroughly debug these
As a wise man once told me, "Deleted code is debugged code!"  So, move
the functions that are shared between merge-recursive and merge-ort from
the former to the latter, and then debug the remainder of
merge-recursive.[ch].

Joking aside, merge-ort was always intended to replace merge-recursive.
It has numerous advantages over merge-recursive (operates much faster,
can operate without a worktree or index, and fixes a number of known
bugs and suboptimal merges).  Since we have now replaced all callers of
merge-recursive with equivalent functions from merge-ort, move the
shared functions from the former to the latter, and delete the remainder
of merge-recursive.[ch].

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 13:59:13 -07:00
Elijah Newren
2e806d8464 merge-ort: enable diff-algorithms other than histogram
The ort merge strategy has always used the histogram diff algorithm.
The recursive merge strategy, in contrast, defaults to the myers
diff algorithm, while allowing it to be changed.

Change the ort merge strategy to allow different diff algorithms, by
removing the hard coded value in merge_start() and instead just making
it a default in init_merge_options().  Technically, this also changes
the default diff algorithm for the recursive backend too, but we're
going to remove the final callers of the recursive backend in the next
two commits.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 13:59:12 -07:00
Junio C Hamano
b97b360c51 Merge branch 'en/assert-wo-side-effects'
Ensure what we write in assert() does not have side effects,
and introduce ASSERT() macro to mark those that cannot be
mechanically checked for lack of side effects.

* en/assert-wo-side-effects:
  treewide: replace assert() with ASSERT() in special cases
  ci: add build checking for side-effects in assert() calls
  git-compat-util: introduce ASSERT() macro
2025-04-08 11:43:12 -07:00
Junio C Hamano
ff926a6d1b Merge branch 'en/random-cleanups'
Miscellaneous code clean-ups.

* en/random-cleanups:
  merge-ort: remove extraneous word in comment
  merge-ort: fix accidental strset<->strintmap
  t7615: be more explicit about diff algorithm used
  t6423: fix a comment that accidentally reversed two commits
  stash: remove merge-recursive.h include
2025-03-29 16:39:10 +09:00
Junio C Hamano
eb7923be1f Merge branch 'en/merge-ort-prepare-to-remove-recursive'
First step of deprecating and removing merge-recursive.

* en/merge-ort-prepare-to-remove-recursive:
  am: switch from merge_recursive_generic() to merge_ort_generic()
  merge-ort: fix merge.directoryRenames=false
  t3650: document bug when directory renames are turned off
  merge-ort: support having merge verbosity be set to 0
  merge-ort: allow rename detection to be disabled
  merge-ort: add new merge_ort_generic() function
2025-03-29 16:39:07 +09:00
Elijah Newren
5633aa3af1 treewide: replace assert() with ASSERT() in special cases
When the compiler/linker cannot verify that an assert() invocation is
free of side effects for us (e.g. because the assertion includes some
kind of function call), replace the use of assert() with ASSERT().

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 03:32:10 -07:00
Elijah Newren
a16e8efe5c merge-ort: fix merge.directoryRenames=false
There are two issues here.

First, when merge.directoryRenames is set to false, there are a few code
paths that should be turned off.  I missed one; collect_renames() was
still doing some directory rename detection logic unconditionally.  It
ended up not having much effect because
get_provisional_directory_renames() was skipped earlier and not setting
up renames->dir_renames, but the code should still be skipped.

Second, the larger issue is that sometimes we get a cached_pair rename
from a previous commit being replayed mapping A->B, but in a subsequent
commit but collect_merge_info() doesn't even recurse into the
directory containing B because there are no source pairings for that
rename that are relevant; we can merge that commit fine without knowing
the rename.  But since the cached renames are added to the normal
renames, when we go to process it and find that B is not part of
opt->priv->paths, we hit the assertion error
  process_renames: Assertion `newinfo && ~newinfo->merged.clean` failed.
I think we could fix this at the beginning of detect_regular_renames() by
pruning from cached_pairs any entry whose destination isn't in
opt->priv->paths, but it's suboptimal in that we'd kind of like the
cached_pair to be restored afterwards so that it can help the subsequent
commit, but more importantly since it sits at the intersection of
the caching renames optimization and the relevant renames optimization,
and the trivial directory resolution optimization, and I don't currently
have Documentation/technical/remembering-renames.txt fully paged in, I'm
not sure if that's a full solution or a bandaid for the current
testcase.  However, since the remembering renames optimization was the
weakest of the set, and the optimization is far less important when
directory rename detection is off (as that implies far fewer potential
renames), let's just use a bigger hammer to ensure this special case is
fixed: turn off the rename caching.  We do the same thing already when
we encounter rename/rename(1to1) cases (as per `git grep -3
disabling.the.optimization`, though it uses a slightly different
triggering mechanism since it's trying to affect the next time that
merge_check_renames_reusable() is called), and I think it makes sense
to do the same here.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-18 09:49:04 -07:00
Elijah Newren
a707d4f941 merge-ort: allow rename detection to be disabled
When merge-ort was written, I did not at first allow rename detection to
be disabled, because I suspected that most folks disabling rename
detection were doing so solely for performance reasons.  Since I put a
lot of working into providing dramatic speedups for rename detection
performance as used by the merge machinery, I wanted to know if there
were still real world repositories where rename detection was
problematic from a performance perspective.  We have had years now to
collect such information, and while we never received one, waiting
longer with the option disabled seems unlikely to help surface such
issues at this point.  Also, there has been at least one request to
allow rename detection to be disabled for behavioral rather than
performance reasons (see the thread including
https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/
), so let's start heeding the config and command line settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-18 09:48:47 -07:00
Elijah Newren
4e5d9de96c merge-ort: add new merge_ort_generic() function
merge-recursive.[ch] have three entry points:
  * merge_trees()
  * merge_recursive()
  * merge_recursive_generic()
merge-ort*.[ch] only has equivalents for the first two.  Add an
equivalent for the final entry point, so we can switch callers to
use it and remove merge-recursive.[ch].

While porting it over, finally fix the issue with the label for the
ancestor (used when merge.conflictStyle=diff3 as a conflict label).
merge-recursive.c has traditionally not allowed callers to set that
label, but I have found that problematic for years.

(Side note: This function was initially part of the merge-ort rewrite,
but reviewers questioned the ancestor label funnyness which I was
never really happy with anyway.  It resulted in me jettisoning it and
hoping at the time that I would eventually be able to force the existing
callers to use some other API.  That worked with `git stash`, as per
874cf2a604 (stash: apply stash using 'merge_ort_nonrecursive()',
2022-05-10), but this API is the most reasonable one for `git am` and
`git merge-recursive`, if we can just allow them some freedom over the
ancestor label.)

The merge_recursive_generic() function did not know whether it was being
invoked by `git stash`, `git merge-recursive`, or `git am`, and the
choice of meaningful ancestor label, when there is a unique ancestor,
varies for these different callers:

  * git am: ancestor is a constructed "fake ancestor" that user knows
            nothing about and has no access to.  (And is different than
            the normal thing we mean by a "virtual merge base" which is
            the merging of merge bases.)
  * git merge-recursive: ancestor might be a tree, but at least it
                         was one specified by the user (if they invoked
                         merge-recursive directly)
  * git stash: ancestor was the commit serving as the stash base

Thus, using a label like "constructed merge base" (as
merge_recursive_generic() does) presupposes that `git am` is the only
caller; it is incorrect for other callers.  This label has thrown me off
more than once.  Allow the caller to override when there is a unique
merge base.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-18 09:48:30 -07:00