The --no-index option of git-diff enables using the diff machinery from
git while operating outside of a repository. This mode of git diff is
able to compare directories and produce a diff of their contents.
When operating git diff in a repository, git has the notion of
"pathspecs" which can specify which files to compare. In particular,
when using git to diff two trees, you might invoke:
$ git diff-tree -r <treeish1> <treeish2>.
where the treeish could point to a subdirectory of the repository.
When invoked this way, users can limit the selected paths of the tree by
using a pathspec. Either by providing some list of paths to accept, or
by removing paths via a negative refspec.
The git diff --no-index mode does not support pathspecs, and cannot
limit the diff output in this way. Other diff programs such as GNU
difftools have options for excluding paths based on a pattern match.
However, using git diff as a diff replacement has several advantages
over many popular diff tools, including coloring moved lines, rename
detections, and similar.
Teach git diff --no-index how to handle pathspecs to limit the
comparisons. This will only be supported if both provided paths are
directories.
For comparisons where one path isn't a directory, the --no-index mode
already has some DWIM shortcuts implemented in the fixup_paths()
function.
Modify the fixup_paths function to return 1 if both paths are
directories. If this is the case, interpret any extra arguments to git
diff as pathspecs via parse_pathspec.
Use parse_pathspec to load the remaining arguments (if any) to git diff
--no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
since we do not have a repository root. All pathspecs are treated as
rooted at the provided comparison paths.
After loading the pathspec data, calculate skip offsets for skipping
past the root portion of the paths. This is required to ensure that
pathspecs start matching from the provided path, rather than matching
from the absolute path. We could instead pass the paths as prefix values
to parse_pathspec. This is slightly problematic because the paths come
from the command line and don't necessarily have the proper trailing
slash. Additionally, that would require parsing pathspecs multiple
times.
Pass the pathspec object and the skip offsets into queue_diff, which
in-turn must pass them along to read_directory_contents.
Modify read_directory_contents to check against the pathspecs when
scanning the directory. Use the skip offset to skip past the initial
root of the path, and only match against portions that are below the
intended directory structure being compared.
The search algorithm for finding paths is recursive with read_dir. To
make pathspec matching work properly, we must set both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.
Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
pathspecs like "a/b/c". This is usually achieved by setting the is_dir
parameter of match_pathspec.
Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
against pathspecs like "a/b/c/d". This is crucial because we recursively
iterate down the directories. We could simply avoid checking pathspecs
at subdirectories, but this would force recursion down directories
which would simply be skipped.
If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
incorrectly match in certain cases such as matching 'a/c' against
':(glob)**/d'. The match logic will see that a matches the leading part
of the **/ and accept this even tho c doesn't match.
To avoid this, use the match_leading_pathspec() variant recently
introduced. This sets both flags when is_dir is set, but leaves them
both cleared when is_dir is 0.
Add test cases and documentation covering the new functionality. Note
for the documentation I opted not to move the placement of '--' which is
sometimes used to disambiguate arguments. The diff --no-index mode
requires exactly 2 arguments determining what to compare. Any additional
arguments are interpreted as pathspecs and must come afterwards. Use of
'--' would not actually disambiguate anything, since there will never be
ambiguity over which arguments represent paths or pathspecs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A following change will add support for pathspecs to the git diff
--no-index command. This mode of git diff does not load any repository.
Add a new PATHSPEC_NO_REPOSITORY flag indicating that we're parsing
pathspecs without a repository.
Both PATHSPEC_ATTR and PATHSPEC_FROMTOP require a repository to
function. Thus, verify that both of these are set in magic_mask to
ensure they won't be accepted when PATHSPEC_NO_REPOSITORY is set.
Check PATHSPEC_NO_REPOSITORY when warning about paths outside the
directory tree. When the flag is set, do not look for a git repository
when generating the warning message.
Finally, add a BUG in match_pathspec_item if the istate is NULL but the
pathspec has PATHSPEC_ATTR set. Callers which support PATHSPEC_ATTR
should always pass a valid istate, and callers which don't pass a valid
istate should have set PATHSPEC_ATTR in the magic_mask field to disable
support for attribute-based pathspecs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The do_match_pathspec() function has the DO_MATCH_LEADING_PATHSPEC
option to allow pathspecs to match when matching "src" against a
pathspec like "src/path/...". This support is not exposed by
match_pathspec, and the internal flags to do_match_pathspec are not
exposed outside of dir.c
The upcoming support for pathspecs in git diff --no-index need the
LEADING matching behavior when iterating down through a directory with
readdir.
We could try to expose the match_pathspec_with_flags to the public API.
However, DO_MATCH_EXCLUDES really shouldn't be public, and its a bit
weird to only have a few of the flags become public.
Instead, add match_leading_pathspec() as a function which sets both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC when is_dir is true.
This will be used in a following change to support pathspec matching in
git diff --no-index.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance regression in not-yet-released code has been corrected.
* ps/reftable-read-block-perffix:
reftable: fix perf regression when reading blocks of unwanted type
Code cleanup.
* jk/oidmap-cleanup:
raw_object_store: drop extra pointer to replace_map
oidmap: add size function
oidmap: rename oidmap_free() to oidmap_clear()
The `send-email` documentation has been updated with OAuth2.0
related examples.
* ag/doc-send-email:
docs: add credential helper for outlook and gmail in OAuth list of helpers
docs: improve send-email documentation
send-mail: improve checks for valid_fqdn
Bundle-URI feature did not use refs recorded in the bundle other
than normal branches as anchoring points to optimize the follow-up
fetch during "git clone"; now it is told to utilize all.
* sc/bundle-uri-use-all-refs-in-bundle:
bundle-uri: add test for bundle-uri clones with tags
bundle-uri: copy all bundle references ino the refs/bundle space
Make repository clean-up tasks "gc" can do available to "git
maintenance" front-end.
* ps/maintenance-missing-tasks:
builtin/maintenance: introduce "rerere-gc" task
builtin/gc: move rerere garbage collection into separate function
builtin/maintenance: introduce "worktree-prune" task
builtin/gc: move pruning of worktrees into a separate function
builtin/gc: remove global variables where it is trivial to do
builtin/gc: fix indentation of `cmd_gc()` parameters
The fallback implementation of open_nofollow() depended on
open("symlink", O_NOFOLLOW) to set errno to ELOOP, but a few BSD
derived systems use different errno, which has been worked around.
* cf/wrapper-bsd-eloop:
wrapper: NetBSD gives EFTYPE and FreeBSD gives EMFILE where POSIX uses ELOOP
"git add 'f?o'" did not add 'foo' if 'f?o', an unusual pathname,
also existed on the working tree, which has been corrected.
* kj/glob-path-with-special-char:
dir.c: literal match with wildcard in pathspec should still glob
Code clean-up around stale CI elements and building with Visual Studio.
* js/ci-buildsystems-cleanup:
config.mak.uname: drop the `vcxproj` target
contrib/buildsystems: drop support for building . vcproj/.vcxproj files
ci: stop linking the `prove` cache
"git diff --minimal" used to give non-minimal output when its
optimization kicked in, which has been disabled.
* ng/xdiff-truly-minimal:
xdiff: disable cleanup_records heuristic with --minimal
"git index-pack --fix-thin" used to abort to prevent a cycle in
delta chains from forming in a corner case even when there is no
such cycle.
* ds/fix-thin-fix:
index-pack: allow revisiting REF_DELTA chains
t5309: create failing test for 'git index-pack'
test-tool: add pack-deltas helper
Further refinement on CI messages when an optional external
software is unavailable (e.g. due to third-party service outage).
* jc/ci-skip-unavailable-external-software:
ci: download JGit from maven, not eclipse.org
ci: update the message for unavailble third-party software
Further code clean-up in the object-store layer.
* ps/object-store-cleanup:
object-store: drop `repo_has_object_file()`
treewide: convert users of `repo_has_object_file()` to `has_object()`
object-store: allow fetching objects via `has_object()`
object-store: move function declarations to their respective subsystems
object-store: move and rename `odb_pack_keep()`
object-store: drop `loose_object_path()`
object-store: move `struct packed_git` into "packfile.h"
Update send-email to work better with Outlook's smtp server.
* ag/send-email-outlook:
send-email: add --[no-]outlook-id-fix option
send-email: retrieve Message-ID from outlook SMTP server
We store the replacement data in an oidmap, which is itself a pointer in
the raw_object_store struct. But there's no need for an extra pointer
indirection here. It is always allocated and initialized along with the
containing struct, and we never check it for NULL-ness.
Let's embed the map directly in the struct, which is simpler and avoids
extra pointer chasing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers which want to know how many items are in an oidmap have to look
at the underlying hashmap struct, leaking an implementation detail.
Let's provide a type-appropriate wrapper and use it.
Note in the call from lookup_replace_object(), the caller was actually
looking at the hashmap's tablesize parameter (the allocated size of the
table) rather than hashmap_get_size(), the number of items in the table.
This probably should have been checking the number of items all along,
but the two are functionally equivalent here since we only add to the
map and never remove anything. Thus if there was any allocation, it was
because there is at least one item.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function does not free the oidmap struct itself; it just drops all
items from the map (using hashmap_clear_() internally). It should be
called oidmap_clear(), per CodingGuidelines.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fd888311fb (reftable/table: move reading block into block reader,
2025-04-07), we have refactored how reftable blocks are read so that
most of the logic is contained in the "block.c" subsystem itself. Most
importantly, the whole logic to read the data itself is now contained in
that subsystem.
This change caused a significant performance regression though when
reading blocks that aren't of the specific type one is searching for:
Benchmark 1: update-ref: create 100k refs (revision = fd888311fbc~)
Time (mean ± σ): 2.171 s ± 0.028 s [User: 1.189 s, System: 0.977 s]
Range (min … max): 2.117 s … 2.206 s 10 runs
Benchmark 2: update-ref: create 100k refs (revision = fd888311fb)
Time (mean ± σ): 3.418 s ± 0.030 s [User: 2.371 s, System: 1.037 s]
Range (min … max): 3.377 s … 3.473 s 10 runs
Summary
update-ref: create 100k refs (revision = fd888311fbc~) ran
1.57 ± 0.02 times faster than update-ref: create 100k refs (revision = fd888311fb)
The root caute of the performance regression is that we changed when
exactly blocks of an uninteresting type are being discarded. Previous to
the refactoring in the mentioned commit we'd load the block data, read
its type, notice that it's not the wanted type and discard the block.
After the commit though we don't discard the block immediately, but we
fully decode it only to realize that it's not the desired type. We then
discard the block again, but have already performed a bunch of pointless
work.
Fix the regression by making `reftable_block_init()` return early in
case the block is not of the desired type. This fixes the performance
hit:
Benchmark 1: update-ref: create 100k refs (revision = HEAD~)
Time (mean ± σ): 2.712 s ± 0.018 s [User: 1.990 s, System: 0.716 s]
Range (min … max): 2.682 s … 2.741 s 10 runs
Benchmark 2: update-ref: create 100k refs (revision = HEAD)
Time (mean ± σ): 1.670 s ± 0.012 s [User: 0.991 s, System: 0.676 s]
Range (min … max): 1.652 s … 1.693 s 10 runs
Summary
update-ref: create 100k refs (revision = HEAD) ran
1.62 ± 0.02 times faster than update-ref: create 100k refs (revision = HEAD~)
Note that the baseline performance is lower than in the original due to
a couple of unrelated performance improvements that have landed since
the original commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/am.c:split_mail_stgit_series, if `fopen` failed,
`series_dir_buf` allocated by `xstrdup` will leak. Add `free` in
`!fp` if branch will prevent the leak.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'test_path_is_file' is a modern path checking method in Git's development.
Replace the basic shell command 'test -f' with this approach.
Signed-off-by: Rodrigo Carvalho <rodrigorsdc@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In reftable/writer.c:writer_index_hash(), if `reftable_buf_add` failed,
key allocated by `reftable_malloc` will not be insert into `obj_index_tree`
thus leaks. Simple add reftable_free(key) will solve this problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In reftable/writer.c:padded_write(), if w->writer failed, zeroed
allocated in `reftable_calloc` will leak. w->writer could be
`reftable_write_data` in reftable/stack.c, and could fail due to
some write error. Simply add reftable_free(zeroed) will solve this
problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many contributors to software use a Language Server Protocol
implementation to allow their editor to learn structural information
about the code they write and provide additional features, such as
jumping to the declaration or definition of a function or type. In C,
the usual implementation is clangd, which requires compiling with clang.
Because C and C++ projects lack a standard file system layout and build
system, unlike languages such as Rust and Go, clangd requires a
compilation database to be generated by the clang compiler in order to
pass the proper compilation flags and discover all of the files
necessary to make the LSP work. This is done by setting
GENERATE_COMPILATION_DATABASE to "yes".
However, when that's enabled and the user runs "make" a second time,
all of the files are re-compiled, which is inconvenient for contributors
to Git, since it makes small changes or rebases recompile the entirety
of the codebase. This happens because the directory holding the
compilation database is updated anytime an object is built, so its
modification date will always be newer than the first object built.
To solve this, use the same trick we do just above for the .depend
directory and filter the compilation database directory out if it
already exists, which avoids making it a target to be built.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been reported that "git rebase --rebase-merges" can create
corrupted reflog entries like
e9c962f2ea0 HEAD@{8}: <binary>�: Merged in <branch> (pull request #4441)
This is due to a use-after-free bug that happens because
reflog_message() uses a static `struct strbuf` and is not called to
update the current reflog message stored in `ctx->reflog_message` when
creating the merge. This means `ctx->reflog_message` points to a stale
reflog message that has been freed by subsequent call to
reflog_message() by a command such as `reset` that used the return value
directly rather than storing the result in `ctx->reflog_message`.
Fix this by creating the reflog message nearer to where the commit is
created and storing it in a local variable which is passed as an
additional parameter to run_git_commit() rather than storing the message
in `struct replay_ctx`. This makes it harder to forget to call
`reflog_message()` before creating a commit and using a variable with a
narrower scope means that a stale value cannot carried across a from one
iteration of the loop to the next which should prevent any similar
use-after-free bugs in the future.
A existing test is modified to demonstrate that merges are now created
with the correct reflog message.
Reported-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit these functions will be called from pick_one_commit()
so move them above that function to avoid a forward declaration.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/gitk:
gitk: add Tamil translation
gitk: limit PATH search to bare executable names
gitk: _search_exe is no longer needed
gitk: override $PATH search only on Windows
gitk: adjust indentation to match the style used in this script
* 'master' of https://github.com/j6t/git-gui:
git-gui: treat the message template file as a built file
git-gui: heed core.commentChar/commentString
git-gui: po/README: update repository location and maintainer
* js/po-update-workflow:
git-gui: treat the message template file as a built file
git-gui: po/README: update repository location and maintainer
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
"git mv a a/b dst" would ask to move the directory 'a' itself, as
well as its contents, in a single destination directory, which is
a contradicting request that is impossible to satisfy. This case is
now detected and the command errors out.
* ps/mv-contradiction-fix:
builtin/mv: convert assert(3p) into `BUG()`
builtin/mv: bail out when trying to move child and its parent