Commit Graph

11998 Commits

Author SHA1 Message Date
Junio C Hamano
51ea70c18a Merge branch 'jk/sparse-leakfix'
Many memory leaks in the sparse-checkout code paths have been
plugged.

* jk/sparse-leakfix:
  sparse-checkout: free duplicate hashmap entries
  sparse-checkout: free string list after displaying
  sparse-checkout: free pattern list in sparse_checkout_list()
  sparse-checkout: free sparse_filename after use
  sparse-checkout: refactor temporary sparse_checkout_patterns
  sparse-checkout: always free "line" strbuf after reading input
  sparse-checkout: reuse --stdin buffer when reading patterns
  dir.c: always copy input to add_pattern()
  dir.c: free removed sparse-pattern hashmap entries
  sparse-checkout: clear patterns when init() sees existing sparse file
  dir.c: free strings in sparse cone pattern hashmaps
  sparse-checkout: pass string literals directly to add_pattern()
  sparse-checkout: free string list in write_cone_to_file()
2024-06-12 13:37:17 -07:00
Junio C Hamano
22cf18fd9e Merge branch 'gt/t-hash-unit-test'
A pair of test helpers that essentially are unit tests on hash
algorithms have been rewritten using the unit-tests framework.

* gt/t-hash-unit-test:
  t/: migrate helper/test-{sha1, sha256} to unit-tests/t-hash
  strbuf: introduce strbuf_addstrings() to repeatedly add a string
2024-06-12 13:37:15 -07:00
Junio C Hamano
5235e56ea5 Merge branch 'jk/leakfixes'
Memory leaks in "git mv" has been plugged.

* jk/leakfixes:
  mv: replace src_dir with a strvec
  mv: factor out empty src_dir removal
  mv: move src_dir cleanup to end of cmd_mv()
  t-strvec: mark variable-arg helper with LAST_ARG_MUST_BE_NULL
  t-strvec: use va_end() to match va_start()
2024-06-10 10:30:39 -07:00
Junio C Hamano
df5c2c4962 Merge branch 'rs/difftool-env-simplify'
Code simplification.

* rs/difftool-env-simplify:
  difftool: add env vars directly in run_file_diff()
2024-06-06 12:49:24 -07:00
Junio C Hamano
cf792653ad Merge branch 'ps/leakfixes'
Leakfixes.

* ps/leakfixes:
  builtin/mv: fix leaks for submodule gitfile paths
  builtin/mv: refactor to use `struct strvec`
  builtin/mv duplicate string list memory
  builtin/mv: refactor `add_slash()` to always return allocated strings
  strvec: add functions to replace and remove strings
  submodule: fix leaking memory for submodule entries
  commit-reach: fix memory leak in `ahead_behind()`
  builtin/credential: clear credential before exit
  config: plug various memory leaks
  config: clarify memory ownership in `git_config_string()`
  builtin/log: stop using globals for format config
  builtin/log: stop using globals for log config
  convert: refactor code to clarify ownership of check_roundtrip_encoding
  diff: refactor code to clarify memory ownership of prefixes
  config: clarify memory ownership in `git_config_pathname()`
  http: refactor code to clarify memory ownership
  checkout: clarify memory ownership in `unique_tracking_name()`
  strbuf: fix leak when `appendwholeline()` fails with EOF
  transport-helper: fix leaking helper name
2024-06-06 12:49:23 -07:00
Jeff King
6d107751b2 sparse-checkout: free duplicate hashmap entries
In insert_recursive_pattern(), we create a new pattern_entry to insert
into the parent_hashmap. If we find that the same entry already exists
in the hashmap, we skip adding the new one. But we forget to free the new
one, creating a leak.

We can fix it by cleaning up the discarded entry. It would probably be
possible to avoid creating it in the first place, but it's non-trivial.
We'd have to define a "keydata" struct that lets us compare the existing
entries to the broken-out fields. It's probably not worth the
complexity, so we'll punt on that for now.

There is one subtlety here: our insertion is happening in a loop, with
each iteration looking at the pattern we just inserted (hence the
"recursive" in the name). So if we skip insertion, what do we look at?

The obvious answer is that we should remember the existing duplicate we
found and use that. But I _think_ in that case, we probably already have
all of the recursive bits already (from when the original entry was
added). And so just breaking out of the loop would be correct. But I'm
not 100% sure on that; after all, the original leaky code could have
done the same break, but it didn't.

So I went with the "obvious answer" above, which has no chance of
changing the behavior aside from fixing the leak.

With this patch, t1091 can now be marked leak-free.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05 09:51:43 -07:00
Jeff King
a544b7da2c sparse-checkout: free string list after displaying
In sparse_checkout_list(), we put the hashmap entries into a string_list
so we can sort them. But after printing, we forget to free the list.

This patch drops 5 leaks from t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05 09:51:43 -07:00
Jeff King
521e04e6e8 sparse-checkout: free pattern list in sparse_checkout_list()
In sparse_checkout_list(), we create a pattern_list that needs to
eventually be cleared. We remember to do so in the regular code path,
but the cone-mode path does an early return, and forgets to clean up.

We could fix the leak by adding a new call to clear_pattern_list(). But
we can simplify even further by just skipping the early return, pushing
the other code path (which consists now of only one line!) into an else
block. That also matches the same cone/non-cone if/else used in some
other functions.

This fixes 15 leaks found in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05 09:51:43 -07:00
Jeff King
008f59d2d6 sparse-checkout: free sparse_filename after use
We allocate a heap buffer via get_sparse_checkout_filename(). Most calls
remember to free it, but sparse_checkout_init() forgets to, causing a
leak. Ironically, it remembers to do so in the error return paths, but
not in the path that makes it all the way to the function end!

Fixing this clears up 6 leaks from t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05 09:51:43 -07:00
Jeff King
a14d49ca84 sparse-checkout: refactor temporary sparse_checkout_patterns
In update_working_directory(), we take in a pattern_list, attach it to
the repository index by assigning it to index->sparse_checkout_patterns,
and then call unpack_trees. Afterwards, we remove it by setting
index->sparse_checkout_patterns back to NULL.

But there are two possible leaks here:

  1. If the index already had a populated sparse_checkout_patterns,
     we've obliterated it. We can fix this by saving and restoring it,
     rather than always setting it back to NULL.

  2. We may call the function with a NULL pattern_list, expecting it to
     use the on-disk sparse file. In that case, the index routines will
     lazy-load the sparse patterns automatically. But now at the end of
     the function when we restore the patterns, we'll leak those
     lazy-loaded ones!

     We can fix this by freeing the pattern list before overwriting its
     pointer whenever it does not match what was passed in (in practice
     this should only happen when the passed-in list is NULL, but this
     is erring on the defensive side).

Together these remove 48 indirect leaks found in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05 09:51:43 -07:00
Jeff King
d765fa0331 sparse-checkout: always free "line" strbuf after reading input
In add_patterns_from_input(), we may read lines from a file with a loop
like this:

  while (!strbuf_getline(&line, file)) {
	...
	strbuf_to_cone_pattern(&line, pl);
  }
  /* we don't strbuf_release(&line) here! */

This generally is OK because strbuf_to_cone_pattern() consumes the
buffer via strbuf_detach(). But we can leak in a few cases:

  1. We don't always consume the buffer! If the line ends up empty after
     trimming, we leave strbuf_to_cone_pattern() without detaching. In
     most cases this is OK, because a subsequent getline() call will use
     the same buffer. But if you had an empty line at the end of file,
     for example, it would leak.

  2. Even if strbuf_to_cone_pattern() always consumed the buffer,
     there's a subtle issue with strbuf_getline(). As we saw in
     94e2aa555e (strbuf: fix leak when `appendwholeline()` fails with
     EOF, 2024-05-27), it's possible for it to return EOF with an
     allocated buffer (e.g., if the underlying getdelim() call saw an
     error). So we should always strbuf_release() after finishing a read
     loop like this.

Note that even the code to read patterns from argv has the same problem.
Because that also uses strbuf_to_cone_pattern(), we stuff each argv
entry into a strbuf. It uses the same "line" strbuf as the getline code,
but we should position the strbuf_release() to cover both code paths.

This fixes at least 9 leaks found in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05 09:51:43 -07:00
Jeff King
c3324649ed sparse-checkout: reuse --stdin buffer when reading patterns
When we read patterns from --stdin, we loop on strbuf_getline(), and
detach each line we read to pass into add_pattern(). This used to be
necessary because add_pattern() required that the pattern strings remain
valid while the pattern_list was in use. But it also created a leak,
since we didn't record the detached buffers anywhere else.

Now that add_pattern() has been modified to make its own copy of the
strings, we can stop detaching and fix the leak. This fixes 4 leaks
detected in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05 09:51:42 -07:00
Jeff King
db83b64cda sparse-checkout: clear patterns when init() sees existing sparse file
In sparse_checkout_init(), we first try to load patterns from an
existing file. If we found any, we return immediately, but end up
leaking the patterns we parsed. Fixing this reduces the number of leaks
in t7002 from 9 down to 5.

Note that there are two other exits from the function, but they don't
need the same treatment:

  - if we can't resolve HEAD, we write out a hard-coded sparse file and
    return. But we know the pattern list is empty there, since we didn't
    find any in the on-disk file and we haven't yet added any of our
    own.

  - otherwise, we do populate the list and then tail-call into
    write_patterns_and_update(). But that function frees the
    pattern_list itself, so we don't need to.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-04 10:38:23 -07:00
Jeff King
4d7f95ed1f sparse-checkout: pass string literals directly to add_pattern()
The add_pattern() function takes a pattern string, but neither makes a
copy of it nor takes ownership of the memory. So it is the caller's
responsibility to make sure the string hangs around as long as the
pattern_list which references it.

There are a few cases in sparse-checkout where we use string literal
patterns by stuffing them into a strbuf, detaching the buffer, and then
passing the result into add_pattern(). This creates a leak when the
pattern_list is eventually cleared, since we don't retain a copy of the
detached buffer to free.

But we can observe that the whole strbuf dance is unnecessary. The point
was presumably[1] to satisfy the lifetime requirement of the string. But
string literals have static duration; we can count on them lasting for
the whole program.

So we can fix the leak by just passing them directly. And as a bonus,
that simplifies the code. The leaks can be seen in t7002, which drops
from 25 leaks to 22 with this patch. It also makes t3602 and t1090
leak-free.

In the long run, we will also want to clean up this (undocumented!)
memory lifetime requirement of add_pattern(). But that can come in a
later patch; passing the string literals directly will be the right
thing either way.

[1] The code in question comes from 416adc8711 (sparse-checkout: update
    working directory in-process for 'init', 2019-11-21) and 99dfa6f970
    (sparse-checkout: use in-process update for disable subcommand,
    2019-11-21), but I didn't see anything in their commit messages or
    on the list explaining the strbufs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-04 10:38:23 -07:00
Jeff King
2181fe6e46 sparse-checkout: free string list in write_cone_to_file()
We use a string list to hold sorted and de-duped patterns, but don't
free it before leaving the function, causing a leak.

This drops the number of leaks found in t7002 from 27 to 25.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-04 10:38:22 -07:00
Junio C Hamano
f8da12adcf Merge branch 'jc/fix-2.45.1-and-friends-for-maint'
Adjust jc/fix-2.45.1-and-friends-for-2.39 for more recent
maintenance track.

* jc/fix-2.45.1-and-friends-for-maint:
  Revert "fsck: warn about symlink pointing inside a gitdir"
  Revert "Add a helper function to compare file contents"
  clone: drop the protections where hooks aren't run
  tests: verify that `clone -c core.hooksPath=/dev/null` works again
  Revert "core.hooksPath: add some protection while cloning"
  init: use the correct path of the templates directory again
  hook: plug a new memory leak
  ci: stop installing "gcc-13" for osx-gcc
  ci: avoid bare "gcc" for osx-gcc job
  ci: drop mention of BREW_INSTALL_PACKAGES variable
  send-email: avoid creating more than one Term::ReadLine object
  send-email: drop FakeTerm hack
2024-05-30 14:15:17 -07:00
Junio C Hamano
6c5be97e4e Merge branch 'jc/undecided-is-not-necessarily-sha1-fix'
The base topic started to make it an error for a command to leave
the hash algorithm unspecified, which revealed a few commands that
were not ready for the change.  Give users a knob to revert back to
the "default is sha-1" behaviour as an escape hatch, and start
fixing these breakages.

* jc/undecided-is-not-necessarily-sha1-fix:
  apply: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function
  builtin/patch-id: fix uninitialized hash function
  t1517: test commands that are designed to be run outside repository
  setup: add an escape hatch for "no more default hash algorithm" change
2024-05-30 14:15:14 -07:00
Junio C Hamano
988499e295 Merge branch 'ps/refs-without-the-repository-updates'
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.

* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-30 14:15:13 -07:00
Junio C Hamano
a60c21b720 Merge branch 'ps/undecided-is-not-necessarily-sha1'
Before discovering the repository details, We used to assume SHA-1
as the "default" hash function, which has been corrected. Hopefully
this will smoke out codepaths that rely on such an unwarranted
assumptions.

* ps/undecided-is-not-necessarily-sha1:
  repository: stop setting SHA1 as the default object hash
  oss-fuzz/commit-graph: set up hash algorithm
  builtin/shortlog: don't set up revisions without repo
  builtin/diff: explicitly set hash algo when there is no repo
  builtin/bundle: abort "verify" early when there is no repository
  builtin/blame: don't access potentially unitialized `the_hash_algo`
  builtin/rev-parse: allow shortening to more than 40 hex characters
  remote-curl: fix parsing of detached SHA256 heads
  attr: fix BUG() when parsing attrs outside of repo
  attr: don't recompute default attribute source
  parse-options-cb: only abbreviate hashes when hash algo is known
  path: move `validate_headref()` to its only user
  path: harden validation of HEAD with non-standard hashes
2024-05-30 14:15:11 -07:00
Jeff King
64f8502b40 mv: replace src_dir with a strvec
We manually manage the src_dir array with ALLOC_GROW. Using a strvec is
a little more ergonomic, and makes the memory ownership more clear. It
does mean that we copy the strings (which were otherwise just pointers
into the "sources" strvec), but using the same rationale as 9fcd9e4e72
(builtin/mv duplicate string list memory, 2024-05-27), it's just not
enough to be worth worrying about here.

As a bonus, this gets rid of some "int"s used for allocation management
(though in practice these were limited to command-line sizes and thus
not overflowable).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30 08:55:29 -07:00
Jeff King
d58a687705 mv: factor out empty src_dir removal
This pulls the loop added by b6f51e3db9 (mv: cleanup empty
WORKING_DIRECTORY, 2022-08-09) into a sub-function. That reduces clutter
in cmd_mv() and makes it easier to see that the lifetime of the
a_src_dir strbuf is limited to this code (and thus its cleanup doesn't
need to go after the "out" label).

Another option would be to just declare the strbuf inside the loop,
since it is only used there. But this refactor retains the existing
property that we can reuse the allocated buffer for each iteration of
the loop. That optimization is probably overkill, but I think the
sub-function is more readable anyway, and then keeping the optimization
is basically free.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30 08:55:29 -07:00
Jeff King
cc65e085e4 mv: move src_dir cleanup to end of cmd_mv()
Commit b6f51e3db9 (mv: cleanup empty WORKING_DIRECTORY, 2022-08-09)
added an auxiliary array where we store directory arguments that we see
while processing the incoming arguments. After actually moving things,
we then use that array to remove now-empty directories, and then
immediately free the array.

But if the actual move queues any errors in only_match_skip_worktree,
that can cause us to jump straight to the "out" label to clean up,
skipping the free() and leaking the array.

Let's push the free() down past the "out" label so that we always clean
up (the array is initialized to NULL, so this is always safe). We'll
hold on to the memory a little longer than necessary, but clarity is
more important than micro-optimizing here.

Note that the adjacent "a_src_dir" strbuf does not suffer the same
problem; it is only allocated during the removal step.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30 08:55:29 -07:00
Junio C Hamano
a3f0e2a064 Merge branch 'ps/leakfixes' into jk/leakfixes
* ps/leakfixes:
  builtin/mv: fix leaks for submodule gitfile paths
  builtin/mv: refactor to use `struct strvec`
  builtin/mv duplicate string list memory
  builtin/mv: refactor `add_slash()` to always return allocated strings
  strvec: add functions to replace and remove strings
  submodule: fix leaking memory for submodule entries
  commit-reach: fix memory leak in `ahead_behind()`
  builtin/credential: clear credential before exit
  config: plug various memory leaks
  config: clarify memory ownership in `git_config_string()`
  builtin/log: stop using globals for format config
  builtin/log: stop using globals for log config
  convert: refactor code to clarify ownership of check_roundtrip_encoding
  diff: refactor code to clarify memory ownership of prefixes
  config: clarify memory ownership in `git_config_pathname()`
  http: refactor code to clarify memory ownership
  checkout: clarify memory ownership in `unique_tracking_name()`
  strbuf: fix leak when `appendwholeline()` fails with EOF
  transport-helper: fix leaking helper name
2024-05-30 08:54:58 -07:00
Ghanshyam Thakkar
a70f8f19ad strbuf: introduce strbuf_addstrings() to repeatedly add a string
In a following commit we are going to port code from
"t/helper/test-sha256.c", t/helper/test-hash.c and "t/t0015-hash.sh" to
a new "t/unit-tests/t-hash.c" file using the recently added unit test
framework.

To port code like: perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;"
we are going to need a new strbuf_addstrings() function that repeatedly
adds the same string a number of times to a buffer.

Such a strbuf_addstrings() function would already be useful in
"json-writer.c" and "builtin/submodule-helper.c" as both of these files
already have code that repeatedly adds the same string. So let's
introduce such a strbuf_addstrings() function in "strbuf.{c,h}" and use
it in both "json-writer.c" and "builtin/submodule-helper.c".

We use the "strbuf_addstrings" name as this way strbuf_addstr() and
strbuf_addstrings() would be similar for strings as strbuf_addch() and
strbuf_addchars() for characters.

Helped-by: Junio C Hamano <gitster@pobox.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-29 09:09:39 -07:00
Junio C Hamano
b32f298264 Merge branch 'jc/format-patch-more-aggressive-range-diff'
The default "creation-factor" used by "git format-patch" has been
raised to make it more aggressively find matching commits.

* jc/format-patch-more-aggressive-range-diff:
  format-patch: run range-diff with larger creation-factor
2024-05-28 11:17:10 -07:00
Junio C Hamano
ee8537ebc9 Merge branch 'tb/pack-bitmap-write-cleanups'
The pack bitmap code saw some clean-up to prepare for a follow-up topic.

* tb/pack-bitmap-write-cleanups:
  pack-bitmap: introduce `bitmap_writer_free()`
  pack-bitmap-write.c: avoid uninitialized 'write_as' field
  pack-bitmap: drop unused `max_bitmaps` parameter
  pack-bitmap: avoid use of static `bitmap_writer`
  pack-bitmap-write.c: move commit_positions into commit_pos fields
  object.h: add flags allocated by pack-bitmap.h
2024-05-28 11:17:07 -07:00
Junio C Hamano
00ffa1cb1c Merge branch 'ps/builtin-config-cleanup'
Code clean-up to reduce inter-function communication inside
builtin/config.c done via the use of global variables.

* ps/builtin-config-cleanup: (21 commits)
  builtin/config: pass data between callbacks via local variables
  builtin/config: convert flags to a local variable
  builtin/config: track "fixed value" option via flags only
  builtin/config: convert `key` to a local variable
  builtin/config: convert `key_regexp` to a local variable
  builtin/config: convert `regexp` to a local variable
  builtin/config: convert `value_pattern` to a local variable
  builtin/config: convert `do_not_match` to a local variable
  builtin/config: move `respect_includes_opt` into location options
  builtin/config: move default value into display options
  builtin/config: move type options into display options
  builtin/config: move display options into local variables
  builtin/config: move location options into local variables
  builtin/config: refactor functions to have common exit paths
  config: make the config source const
  builtin/config: check for writeability after source is set up
  builtin/config: move actions into `cmd_config_actions()`
  builtin/config: move legacy options into `cmd_config()`
  builtin/config: move subcommand options into `cmd_config()`
  builtin/config: move legacy mode into its own function
  ...
2024-05-28 11:17:07 -07:00
Junio C Hamano
16a592f132 Merge branch 'ps/pseudo-ref-terminology'
Terminology to call various ref-like things are getting
straightened out.

* ps/pseudo-ref-terminology:
  refs: refuse to write pseudorefs
  ref-filter: properly distinuish pseudo and root refs
  refs: pseudorefs are no refs
  refs: classify HEAD as a root ref
  refs: do not check ref existence in `is_root_ref()`
  refs: rename `is_special_ref()` to `is_pseudo_ref()`
  refs: rename `is_pseudoref()` to `is_root_ref()`
  Documentation/glossary: define root refs as refs
  Documentation/glossary: clarify limitations of pseudorefs
  Documentation/glossary: redefine pseudorefs as special refs
2024-05-28 11:17:06 -07:00
Patrick Steinhardt
ebdbefa4fe builtin/mv: fix leaks for submodule gitfile paths
Similar to the preceding commit, we have effectively given tracking
memory ownership of submodule gitfile paths. Refactor the code to start
tracking allocated strings in a separate `struct strvec` such that we
can easily plug those leaks. Mark now-passing tests as leak free.

Note that ideally, we wouldn't require two separate data structures to
track those paths. But we do need to store `NULL` pointers for the
gitfile paths such that we can indicate that its corresponding entries
in the other arrays do not have such a path at all. And given that
`struct strvec`s cannot store `NULL` pointers we cannot use them to
store this information.

There is another small gotcha that is easy to miss: you may be wondering
why we don't want to store `SUBMODULE_WITH_GITDIR` in the strvec. This
is because this is a mere sentinel value and not actually a string at
all.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:20:03 -07:00
Patrick Steinhardt
52a7dab439 builtin/mv: refactor to use struct strvec
Memory allocation patterns in git-mv(1) are extremely hard to follow:
We copy around string pointers into manually-managed arrays, some of
which alias each other, but only sometimes, while we also drop some of
those strings at other times without ever daring to free them.

While this may be my own subjective feeling, it seems like others have
given up as the code has multiple calls to `UNLEAK()`. These are not
sufficient though, and git-mv(1) is still leaking all over the place
even with them.

Refactor the code to instead track strings in `struct strvec`. While
this has the effect of effectively duplicating some of the strings
without an actual need, it is way easier to reason about and fixes all
of the aliasing of memory that has been going on. It allows us to get
rid of the `UNLEAK()` calls and also fixes leaks that those calls did
not paper over.

Mark tests which are now leak-free accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:20:02 -07:00
Patrick Steinhardt
9fcd9e4e72 builtin/mv duplicate string list memory
makes the next patch easier, where we will migrate to the paths being
owned by a strvec. given that we are talking about command line
parameters here it's also not like we have tons of allocations that this
would save

while at it, fix a memory leak

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:20:02 -07:00
Patrick Steinhardt
3d231f7b82 builtin/mv: refactor add_slash() to always return allocated strings
The `add_slash()` function will only conditionally return an allocated
string when the passed-in string did not yet have a trailing slash. This
makes the memory ownership harder to track than really necessary.

It's dubious whether this optimization really buys us all that much. The
number of times we execute this function is bounded by the number of
arguments to git-mv(1), so in the typical case we may end up saving an
allocation or two.

Simplify the code to unconditionally return allocated strings.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:20:02 -07:00
Patrick Steinhardt
96c1655095 builtin/credential: clear credential before exit
We never release memory associated with `struct credential`. Fix this
and mark the corresponding test as leak free.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:20:01 -07:00
Patrick Steinhardt
1b261c20ed config: clarify memory ownership in git_config_string()
The out parameter of `git_config_string()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:20:00 -07:00
Patrick Steinhardt
83024d98f7 builtin/log: stop using globals for format config
This commit does the exact same as the preceding commit, only for the
format configuration instead of the log configuration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:20:00 -07:00
Patrick Steinhardt
106a54aecb builtin/log: stop using globals for log config
We're using global variables to store the log configuration. Many of
these can be set both via the command line and via the config, and
depending on how they are being set, they may contain allocated strings.
This leads to hard-to-track memory ownership and memory leaks.

Refactor the code to instead use a `struct log_config` that is being
allocated on the stack. This allows us to more clearly scope the
variables, track memory ownership and ultimately release the memory.

This also prepares us for a change to `git_config_string()`, which will
be adapted to have a `char **` out parameter instead of `const char **`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:19:59 -07:00
Patrick Steinhardt
6073b3b5c3 config: clarify memory ownership in git_config_pathname()
The out parameter of `git_config_pathname()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:19:59 -07:00
Patrick Steinhardt
cc395d6b47 checkout: clarify memory ownership in unique_tracking_name()
The function `unique_tracking_name()` returns an allocated string, but
does not clearly indicate this because its return type is `const char *`
instead of `char *`. This has led to various callsites where we never
free its returned memory at all, which causes memory leaks.

Plug those leaks and mark now-passing tests as leak free.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:19:58 -07:00
René Scharfe
36d900d2b0 difftool: add env vars directly in run_file_diff()
Add the environment variables of the child process directly using
strvec_push() instead of building an array out of them and then adding
that using strvec_pushv().  The new code is shorter and avoids magic
array index values and fragile array padding.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 08:55:59 -07:00
Junio C Hamano
d36cc0d5a4 Merge branch 'fixes/2.45.1/2.44' into jc/fix-2.45.1-and-friends-for-maint
* fixes/2.45.1/2.44:
  Revert "fsck: warn about symlink pointing inside a gitdir"
  Revert "Add a helper function to compare file contents"
  clone: drop the protections where hooks aren't run
  tests: verify that `clone -c core.hooksPath=/dev/null` works again
  Revert "core.hooksPath: add some protection while cloning"
  init: use the correct path of the templates directory again
  hook: plug a new memory leak
  ci: stop installing "gcc-13" for osx-gcc
  ci: avoid bare "gcc" for osx-gcc job
  ci: drop mention of BREW_INSTALL_PACKAGES variable
  send-email: avoid creating more than one Term::ReadLine object
  send-email: drop FakeTerm hack
2024-05-24 16:59:12 -07:00
Junio C Hamano
863c0ed71e Merge branch 'fixes/2.45.1/2.43' into fixes/2.45.1/2.44
* fixes/2.45.1/2.43:
  Revert "fsck: warn about symlink pointing inside a gitdir"
  Revert "Add a helper function to compare file contents"
  clone: drop the protections where hooks aren't run
  tests: verify that `clone -c core.hooksPath=/dev/null` works again
  Revert "core.hooksPath: add some protection while cloning"
  init: use the correct path of the templates directory again
  hook: plug a new memory leak
  ci: stop installing "gcc-13" for osx-gcc
  ci: avoid bare "gcc" for osx-gcc job
  ci: drop mention of BREW_INSTALL_PACKAGES variable
  send-email: avoid creating more than one Term::ReadLine object
  send-email: drop FakeTerm hack
2024-05-24 16:58:35 -07:00
Junio C Hamano
3c562ef2e6 Merge branch 'fixes/2.45.1/2.42' into fixes/2.45.1/2.43
* fixes/2.45.1/2.42:
  Revert "fsck: warn about symlink pointing inside a gitdir"
  Revert "Add a helper function to compare file contents"
  clone: drop the protections where hooks aren't run
  tests: verify that `clone -c core.hooksPath=/dev/null` works again
  Revert "core.hooksPath: add some protection while cloning"
  init: use the correct path of the templates directory again
  hook: plug a new memory leak
  ci: stop installing "gcc-13" for osx-gcc
  ci: avoid bare "gcc" for osx-gcc job
  ci: drop mention of BREW_INSTALL_PACKAGES variable
  send-email: avoid creating more than one Term::ReadLine object
  send-email: drop FakeTerm hack
2024-05-24 16:58:11 -07:00
Junio C Hamano
73339e4dc2 Merge branch 'fixes/2.45.1/2.41' into fixes/2.45.1/2.42
* fixes/2.45.1/2.41:
  Revert "fsck: warn about symlink pointing inside a gitdir"
  Revert "Add a helper function to compare file contents"
  clone: drop the protections where hooks aren't run
  tests: verify that `clone -c core.hooksPath=/dev/null` works again
  Revert "core.hooksPath: add some protection while cloning"
  init: use the correct path of the templates directory again
  hook: plug a new memory leak
  ci: stop installing "gcc-13" for osx-gcc
  ci: avoid bare "gcc" for osx-gcc job
  ci: drop mention of BREW_INSTALL_PACKAGES variable
  send-email: avoid creating more than one Term::ReadLine object
  send-email: drop FakeTerm hack
2024-05-24 16:57:43 -07:00
Junio C Hamano
4f215d214f Merge branch 'fixes/2.45.1/2.40' into fixes/2.45.1/2.41
* fixes/2.45.1/2.40:
  Revert "fsck: warn about symlink pointing inside a gitdir"
  Revert "Add a helper function to compare file contents"
  clone: drop the protections where hooks aren't run
  tests: verify that `clone -c core.hooksPath=/dev/null` works again
  Revert "core.hooksPath: add some protection while cloning"
  init: use the correct path of the templates directory again
  hook: plug a new memory leak
  ci: stop installing "gcc-13" for osx-gcc
  ci: avoid bare "gcc" for osx-gcc job
  ci: drop mention of BREW_INSTALL_PACKAGES variable
  send-email: avoid creating more than one Term::ReadLine object
  send-email: drop FakeTerm hack
2024-05-24 16:57:02 -07:00
Junio C Hamano
48440f60a7 Merge branch 'jc/fix-2.45.1-and-friends-for-2.39' into fixes/2.45.1/2.40
Revert overly aggressive "layered defence" that went into 2.45.1
and friends, which broke "git-lfs", "git-annex", and other use
cases, so that we can rebuild necessary counterparts in the open.

* jc/fix-2.45.1-and-friends-for-2.39:
  Revert "fsck: warn about symlink pointing inside a gitdir"
  Revert "Add a helper function to compare file contents"
  clone: drop the protections where hooks aren't run
  tests: verify that `clone -c core.hooksPath=/dev/null` works again
  Revert "core.hooksPath: add some protection while cloning"
  init: use the correct path of the templates directory again
  hook: plug a new memory leak
  ci: stop installing "gcc-13" for osx-gcc
  ci: avoid bare "gcc" for osx-gcc job
  ci: drop mention of BREW_INSTALL_PACKAGES variable
  send-email: avoid creating more than one Term::ReadLine object
  send-email: drop FakeTerm hack
2024-05-24 12:29:36 -07:00
Junio C Hamano
7593d66928 Merge branch 'la/hide-trailer-info'
The trailer API has been reshuffled a bit.

* la/hide-trailer-info:
  trailer unit tests: inspect iterator contents
  trailer: document parse_trailers() usage
  trailer: retire trailer_info_get() from API
  trailer: make trailer_info struct private
  trailer: make parse_trailers() return trailer_info pointer
  interpret-trailers: access trailer_info with new helpers
  sequencer: use the trailer iterator
  trailer: teach iterator about non-trailer lines
  trailer: add unit tests for trailer iterator
  Makefile: sort UNIT_TEST_PROGRAMS
2024-05-23 11:04:27 -07:00
Johannes Schindelin
873a466ea3 clone: drop the protections where hooks aren't run
As part of the security bug-fix releases v2.39.4, ..., v2.45.1, I
introduced logic to safeguard `git clone` from running hooks that were
installed _during_ the clone operation.

The rationale was that Git's CVE-2024-32002, CVE-2021-21300,
CVE-2019-1354, CVE-2019-1353, CVE-2019-1352, and CVE-2019-1349 should
have been low-severity vulnerabilities but were elevated to
critical/high severity by the attack vector that allows a weakness where
files inside `.git/` can be inadvertently written during a `git clone`
to escalate to a Remote Code Execution attack by virtue of installing a
malicious `post-checkout` hook that Git will then run at the end of the
operation without giving the user a chance to see what code is executed.

Unfortunately, Git LFS uses a similar strategy to install its own
`post-checkout` hook during a `git clone`; In fact, Git LFS is
installing four separate hooks while running the `smudge` filter.

While this pattern is probably in want of being improved by introducing
better support in Git for Git LFS and other tools wishing to register
hooks to be run at various stages of Git's commands, let's undo the
clone protections to unbreak Git LFS-enabled clones.

This reverts commit 8db1e8743c (clone: prevent hooks from running
during a clone, 2024-03-28).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-21 12:33:08 -07:00
Junio C Hamano
4674ab682d apply: fix uninitialized hash function
"git apply" can work outside a repository as a better "GNU patch",
but when it does so, it still assumed that it can access
the_hash_algo, which is no longer true in the new world order.

Make sure we explicitly fall back to SHA-1 algorithm for backward
compatibility.

It is of dubious value to make this configurable to other hash
algorithms, as the code does not use the_hash_algo for hashing
purposes when working outside a repository (which is how
the_hash_algo is left to NULL)---it is only used to learn the max
length of the hash when parsing the object names on the "index"
line, but failing to parse the "index" line is not a hard failure,
and the program does not support operations like applying binary
patches and --3way fallback that requires object access outside a
repository.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-21 09:07:48 -07:00
Patrick Steinhardt
8d058b8024 builtin/hash-object: fix uninitialized hash function
The git-hash-object(1) command allows users to hash an object even
without a repository. Starting with c8aed5e8da (repository: stop setting
SHA1 as the default object hash, 2024-05-07), this will make us hit an
uninitialized hash function, which subsequently leads to a segfault.

Fix this by falling back to SHA-1 explicitly when running outside of
a Git repository. Users can use GIT_DEFAULT_HASH environment to
specify what hash algorithm they want, so arguably this code should
not be needed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-21 09:05:13 -07:00
Patrick Steinhardt
4a1c95931f builtin/patch-id: fix uninitialized hash function
In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have adapted `initialize_repository()` to no longer set
up a default hash function. As this function is also used to set up
`the_repository`, the consequence is that `the_hash_algo` will now by
default be a `NULL` pointer unless the hash algorithm was configured
properly. This is done as a mechanism to detect cases where we may be
using the wrong hash function by accident.

This change now causes git-patch-id(1) to segfault when it's run outside
of a repository. As this command can read diffs from stdin, it does not
necessarily need a repository, but then relies on `the_hash_algo` to
compute the patch ID itself.

It is somewhat dubious that git-patch-id(1) relies on `the_hash_algo` in
the first place. Quoting its manpage:

    A "patch ID" is nothing but a sum of SHA-1 of the file diffs
    associated with a patch, with line numbers ignored. As such, it’s
    "reasonably stable", but at the same time also reasonably unique,
    i.e., two patches that have the same "patch ID" are almost
    guaranteed to be the same thing.

We explicitly document patch IDs to be using SHA-1. Furthermore, patch
IDs are supposed to be stable for most of the part. But even with the
same input, the patch IDs will now be different depending on the repo's
configured object hash.

Work around the issue by setting up SHA-1 when there was no startup
repository for now. This is arguably not the correct fix, but for now we
rather want to focus on getting the segfault fixed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-21 09:05:13 -07:00