The fsmonitor daemon has been implemented for Linux.
* pt/fsmonitor-linux:
fsmonitor: convert shown khash to strset in do_handle_client
fsmonitor: add tests for Linux
fsmonitor: add timeout to daemon stop command
fsmonitor: close inherited file descriptors and detach in daemon
run-command: add close_fd_above_stderr option
fsmonitor: implement filesystem change listener for Linux
fsmonitor: rename fsm-settings-darwin.c to fsm-settings-unix.c
fsmonitor: rename fsm-ipc-darwin.c to fsm-ipc-unix.c
fsmonitor: use pthread_cond_timedwait for cookie wait
compat/win32: add pthread_cond_timedwait
fsmonitor: fix hashmap memory leak in fsmonitor_run_daemon
fsmonitor: fix khash memory leak in do_handle_client
Reduce system overhead "git upload-pack" spends relaying "git
pack-objects" output to the "git fetch" running on the other end of
the connection.
Comments?
cf. <xmqqseaf5k5t.fsf@gitster.g>
* ps/upload-pack-buffer-more-writes:
builtin/pack-objects: reduce lock contention when writing packfile data
csum-file: drop `hashfd_throughput()`
csum-file: introduce `hashfd_ext()`
sideband: use writev(3p) to send pktlines
wrapper: introduce writev(3p) wrappers
compat/posix: introduce writev(3p) wrapper
git-compat-util: introduce `cast_size_t_to_ssize_t()`
upload-pack: reduce lock contention when writing packfile data
upload-pack: adapt keepalives based on buffering
upload-pack: fix debug statement when flushing packfile data
A bit of OIDmap API enhancement and cleanup.
Comments?
* sk/oidmap-clear-with-custom-free-func:
builtin/rev-list: migrate missing_objects cleanup to oidmap_clear_with_free()
oidmap: make entry cleanup explicit in oidmap_clear
"git format-patch --cover-letter" learns to use a simpler format
instead of the traditional shortlog format to list its commits with
a new --cover-letter-format option and format.commitListFormat
configuration variable.
* mf/format-patch-cover-letter-format:
docs: add usage for the cover-letter fmt feature
format-patch: add commitListFormat config
format-patch: add ability to use alt cover format
format-patch: move cover letter summary generation
pretty.c: add %(count) and %(total) placeholders
Further work on incremental repacking using MIDX/bitmap
* tb/incremental-midx-part-3.2:
midx: enable reachability bitmaps during MIDX compaction
midx: implement MIDX compaction
t/helper/test-read-midx.c: plug memory leak when selecting layer
midx-write.c: factor fanout layering from `compute_sorted_entries()`
midx-write.c: enumerate `pack_int_id` values directly
midx-write.c: extract `fill_pack_from_midx()`
midx-write.c: introduce `midx_pack_perm()` helper
midx: do not require packs to be sorted in lexicographic order
midx-write.c: introduce `struct write_midx_opts`
midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
git-multi-pack-index(1): remove non-existent incompatibility
builtin/multi-pack-index.c: make '--progress' a common option
midx: introduce `midx_get_checksum_hex()`
midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
midx: mark `get_midx_checksum()` arguments as const
The object source API is getting restructured to allow plugging new
backends.
Being reviewed.
cf. <aaidbdpkpH7tfn9x@denethor>
* ps/odb-sources:
odb/source: make `begin_transaction()` function pluggable
odb/source: make `write_alternate()` function pluggable
odb/source: make `read_alternates()` function pluggable
odb/source: make `write_object_stream()` function pluggable
odb/source: make `write_object()` function pluggable
odb/source: make `freshen_object()` function pluggable
odb/source: make `for_each_object()` function pluggable
odb/source: make `read_object_stream()` function pluggable
odb/source: make `read_object_info()` function pluggable
odb/source: make `close()` function pluggable
odb/source: make `reprepare()` function pluggable
odb/source: make `free()` function pluggable
odb/source: introduce source type for robustness
odb: move reparenting logic into respective subsystems
odb: embed base source in the "files" backend
odb: introduce "files" source
odb: split `struct odb_source` into separate header
"git repo structure" command learns to report maximum values on
various aspects of objects it inspects.
* jt/repo-structure-extrema:
builtin/repo: find tree with most entries
builtin/repo: find commit with most parents
builtin/repo: add OID annotations to table output
builtin/repo: collect largest inflated objects
builtin/repo: add helper for printing keyvalue output
builtin/repo: update stats for each object
The code in "git help" that shows configuration items in sorted
order was awkwardly organized and prone to bugs.
* ac/help-sort-correctly:
help: cleanup the contruction of keys_uniq
"git replay" (experimental) learns, in addition to "pick" and
"replay", a new operating mode "revert".
* sa/replay-revert:
replay: add --revert mode to reverse commit changes
sequencer: extract revert message formatting into shared function
"git for-each-repo" started from a secondary worktree did not work
as expected, which has been corrected.
* ds/for-each-repo-w-worktree:
for-each-repo: simplify passing of parameters
for-each-repo: work correctly in a worktree
run-command: extract sanitize_repo_env helper
for-each-repo: test outside of repo context
The "files" backend is implemented as a pointer in the `struct
odb_source`. This contradicts our typical pattern for pluggable backends
like we use it for example in the ref store or for object database
streams, where we typically embed the generic base structure in the
specialized implementation. This pattern has a couple of small benefits:
- We avoid an extra allocation.
- We hide implementation details in the generic structure.
- We can easily downcast from a generic backend to the specialized
structure and vice versa because the offsets are known at compile
time.
- It becomes trivial to identify locations where we depend on backend
specific logic because the cast needs to be explicit.
Refactor our "files" object database source to do the same and embed the
`struct odb_source` in the `struct odb_source_files`.
There are still a bunch of sites in our code base where we do have to
access internals of the "files" backend. The intent is that those will
go away over time, but this will certainly take a while. Meanwhile,
provide a `odb_source_files_downcast()` function that can convert a
generic source into a "files" source.
As we only have a single source the downcast succeeds unconditionally
for now. Eventually though the intent is to make the cast `BUG()` in
case the caller requests to downcast a non-"files" backend to a "files"
backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "files" object database source. This source encapsulates
access to both loose object files and the packfile store, similar to how
the "files" backend for refs encapsulates access to loose refs and the
packed-refs file.
Note that for now the "files" source is still a direct member of a
`struct odb_source`. This architecture will be reversed in the next
commit so that the files source contains a `struct odb_source`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the conversion away from oidmap_clear(), switch the
missing_objects map to use oidmap_clear_with_free().
missing_objects stores struct missing_objects_map_entry instances,
which own an xstrdup()'d path string in addition to the container
struct itself. Previously, rev-list manually freed entry->path
before calling oidmap_clear(&missing_objects, true).
Introduce a dedicated free callback and pass it to
oidmap_clear_with_free(), consolidating entry teardown into a
single place and making cleanup semantics explicit.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the khash-based string set used for deduplicating pathnames
in do_handle_client() with a strset, which provides a cleaner
interface for the same purpose.
Since the paths are interned strings from the batch data, use
strdup_strings=0 to avoid unnecessary copies.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "fsmonitor--daemon stop" command polls in a loop waiting for the
daemon to exit after sending a "quit" command over IPC. If the daemon
fails to shut down (e.g. it is stuck or wedged), this loop spins
forever.
Add a 30-second timeout so the stop command returns an error instead
of blocking indefinitely.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the fsmonitor daemon is spawned as a background process, it may
inherit file descriptors from its parent that it does not need. In
particular, when the test harness or a CI system captures output through
pipes, the daemon can inherit duplicated pipe endpoints. If the daemon
holds these open, the parent process never sees EOF and may appear to
hang.
Set close_fd_above_stderr on the child process at both daemon startup
paths: the explicit "fsmonitor--daemon start" command and the implicit
spawn triggered by fsmonitor-ipc when a client finds no running daemon.
Also suppress stdout and stderr on the implicit spawn path to prevent
the background daemon from writing to the client's terminal.
Additionally, call setsid() when the daemon starts with --detach to
create a new session and process group. This prevents the daemon
from being part of the spawning shell's process group, which could
cause the shell's "wait" to block until the daemon exits.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cookie wait in with_lock__wait_for_cookie() uses an infinite
pthread_cond_wait() loop. The existing comment notes the desire
to switch to pthread_cond_timedwait(), but the routine was not
available in git thread-utils.
On certain container or overlay filesystems, inotify watches may
succeed but events are never delivered. In this case the daemon
would hang indefinitely waiting for the cookie event, which in
turn causes the client to hang.
Replace the infinite wait with a one-second timeout using
pthread_cond_timedwait(). If the timeout fires, report an
error and let the client proceed with a trivial (full-scan)
response rather than blocking forever.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `state.cookies` hashmap is initialized during daemon startup but
never freed during cleanup in the `done:` label of
fsmonitor_run_daemon(). The cookie entries also have names allocated
via strbuf_detach() that must be freed individually.
Iterate the hashmap to free each cookie name, then call
hashmap_clear_and_free() to release the entries and table.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `shown` kh_str_t was freed with kh_release_str() at a point in
the code only reachable in the non-trivial response path. When the
client receives a trivial response, the code jumps to the `cleanup`
label, skipping the kh_release_str() call entirely and leaking the
hash table.
Fix this by initializing `shown` to NULL and moving the cleanup to the
`cleanup` label using kh_destroy_str(), which is safe to call on NULL.
This ensures the hash table is freed regardless of which code path is
taken.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parse-options API learned to notice an options[] array with
duplicated long options.
* rs/parse-options-duplicated-long-options:
parseopt: check for duplicate long names and numerical options
pack-objects: remove duplicate --stdin-packs definition
The configuration variable format.noprefix did not behave as a
proper boolean variable, which has now been fixed and documented.
* kh/format-patch-noprefix-is-boolean:
doc: diff-options.adoc: make *.noprefix split translatable
doc: diff-options.adoc: show format.noprefix for format-patch
format-patch: make format.noprefix a boolean
"git add <submodule>" has been taught to honor
submodule.<name>.ignore that is set to "all" (and requires "git add
-f" to override it).
* cs/add-skip-submodule-ignore-all:
Documentation: update add --force option + ignore=all config
tests: fix existing tests when add an ignore=all submodule
tests: t2206-add-submodule-ignored: ignore=all and add --force tests
read-cache: submodule add need --force given ignore=all configuration
read-cache: update add_files_to_cache take param ignored_too
* 'ar/config-hooks' (early part):
hook: add -z option to "git hook list"
hook: allow out-of-repo 'git hook' invocations
hook: allow event = "" to overwrite previous values
hook: allow disabling config hooks
hook: include hooks from the config
hook: add "git hook list" command
hook: run a list of hooks to prepare for multihook support
hook: add internal state alloc/free callbacks
Use the hook API to replace ad-hoc invocation of hook scripts via
the run_command() API.
* ar/run-command-hook-take-2:
builtin/receive-pack: avoid spinning no-op sideband async threads
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
run-command: poll child input in addition to output
hook: add jobs option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: allow separate std[out|err] streams
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add helper for pp child states
t1800: add hook output stream tests
The core.attributesfile is intended to be set per repository, but
were kept track of by a single global variable in-core, which has
been corrected by moving it to per-repository data structure.
* ob/core-attributesfile-in-repository:
environment: move "branch.autoSetupMerge" into `struct repo_config_values`
environment: stop using core.sparseCheckout globally
environment: stop storing `core.attributesFile` globally
"git config list" is taught to show the values interpreted for
specific type with "--type=<X>" option.
* ds/config-list-with-type:
config: use an enum for type
config: restructure format_config()
config: format colors quietly
color: add color_parse_quietly()
config: format expiry dates quietly
config: format paths gently
config: format bools or strings in helper
config: format bools or ints gently
config: format bools gently
config: format int64s gently
config: make 'git config list --type=<X>' work
config: add 'gently' parameter to format_config()
config: move show_all_config()
Clean-up the code around "git repo info" command.
* lo/repo-leftover-bits:
Documentation/git-repo: capitalize format descriptions
Documentation/git-repo: replace 'NUL' with '_NUL_'
t1901: adjust nul format output instead of expected value
t1900: rename t1900-repo to t1900-repo-info
repo: rename struct field to repo_info_field
repo: replace get_value_fn_for_key by get_repo_info_field
repo: rename repo_info_fields to repo_info_field
CodingGuidelines: instruct to name arrays in singular
"git maintenance" starts using the "geometric" strategy by default.
* ps/maintenance-geometric-default:
builtin/maintenance: use "geometric" strategy by default
t7900: prepare for switch of the default strategy
t6500: explicitly use "gc" strategy
t5510: explicitly use "gc" strategy
t5400: explicitly use "gc" strategy
t34xx: don't expire reflogs where it matters
t: disable maintenance where we verify object database structure
t: fix races caused by background maintenance
API clean-up for the worktree subsystem.
* pw/no-more-NULL-means-current-worktree:
path: remove repository argument from worktree_git_path()
wt-status: avoid passing NULL worktree
Wean the mailmap code off of the_repository dependency.
* bk/mailmap-wo-the-repository:
mailmap: drop global config variables
mailmap: stop using the_repository
Allow the directory in which reference backends store their data to
be specified.
* kn/ref-location:
refs: add GIT_REFERENCE_BACKEND to specify reference backend
refs: allow reference location in refstorage config
refs: receive and use the reference storage payload
refs: move out stub modification to generic layer
refs: extract out `refs_create_refdir_stubs()`
setup: don't modify repo in `create_reference_database()`
"git add -p" learned a new mode that allows the user to revisit a
file that was already dealt with.
* aa/add-p-no-auto-advance:
add-patch: allow interfile navigation when selecting hunks
add-patch: allow all-or-none application of patches
add-patch: modify patch_update_file() signature
interactive -p: add new `--auto-advance` flag
This change simplifies the code somewhat from its original
implementation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When run in a worktree, the GIT_DIR directory is set in a different way
than in a typical repository. Show this by updating t0068 to include a
worktree and add a test that runs from that worktree. This requires
moving the repo.key config into a global config instead of the base test
repository's local config (demonstrating that it worked with
non-worktree Git repositories).
We need to be careful to unset the local Git environment variables and
let the child process rediscover them, while also reinstating those
variables in the parent process afterwards. Update run_command_on_repo()
to use the new sanitize_repo_env() helper method to erase these
environment variables.
During review of this bug fix, there were several incorrect patches
demonstrating different bad behaviors. Most of these are covered by
tests, when it is not too expensive to set it up. One case that would be
expensive to set up is the GIT_NO_REPLACE_OBJECTS environment variable,
but we trust that using sanitize_repo_env() will be sufficient to
capture these uncovered cases by using the common code for resetting
environment variables.
Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running `git pack-objects --stdout` we feed the data through
`hashfd_ext()` with a progress meter and a smaller-than-usual buffer
length of 8kB so that we can track throughput more granularly. But as
packfiles tend to be on the larger side, this small buffer size may
cause a ton of write(3p) syscalls.
Originally, the buffer we used in `hashfd()` was 8kB for all use cases.
This was changed though in 2ca245f8be (csum-file.h: increase hashfile
buffer size, 2021-05-18) because we noticed that the number of writes
can have an impact on performance. So the buffer size was increased to
128kB, which improved performance a bit for some use cases.
But the commit didn't touch the buffer size for `hashd_throughput()`.
The reasoning here was that callers expect the progress indicator to
update frequently, and a larger buffer size would of course reduce the
update frequency especially on slow networks.
While that is of course true, there was (and still is, even though it's
now a call to `hashfd_ext()`) only a single caller of this function in
git-pack-objects(1). This command is responsible for writing packfiles,
and those packfiles are often on the bigger side. So arguably:
- The user won't care about increments of 8kB when packfiles tend to
be megabytes or even gigabytes in size.
- Reducing the number of syscalls would be even more valuable here
than it would be for multi-pack indices, which was the benchmark
done in the mentioned commit, as MIDXs are typically significantly
smaller than packfiles.
- Nowadays, many internet connections should be able to transfer data
at a rate significantly higher than 8kB per second.
Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`,
which translates to ~64kB. This limit was chosen because `git
pack-objects --stdout` is most often used when sending packfiles via
git-upload-pack(1), where packfile data is chunked into pktlines when
using the sideband. Furthermore, most internet connections should have a
bandwidth signifcantly higher than 64kB/s, so we'd still be able to
observe progress updates at a rate of at least once per second.
This change significantly reduces the number of write(3p) syscalls from
355,000 to 44,000 when packing the Linux repository. While this results
in a small performance improvement on an otherwise-unused system, this
improvement is mostly negligible. More importantly though, it will
reduce lock contention in the kernel on an extremely busy system where
we have many processes writing data at once.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `hashfd_throughput()` function is used by a single callsite in
git-pack-objects(1). In contrast to `hashfd()`, this function uses a
progress meter to measure throughput and a smaller buffer length so that
the progress meter can provide more granular metrics.
We're going to change that caller in the next commit to be a bit more
specific to packing objects. As such, `hashfd_throughput()` will be a
somewhat unfitting mechanism for any potential new callers.
Drop the function and replace it with a call to `hashfd_ext()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.
* ps/object-info-bits-cleanup:
odb: convert `odb_has_object()` flags into an enum
odb: convert object info flags into an enum
odb: drop gaps in object info flag values
builtin/fsck: fix flags passed to `odb_has_object()`
builtin/backfill: fix flags passed to `odb_has_object()`
Revamp object enumeration API around odb.
* ps/odb-for-each-object:
odb: drop unused `for_each_{loose,packed}_object()` functions
reachable: convert to use `odb_for_each_object()`
builtin/pack-objects: use `packfile_store_for_each_object()`
odb: introduce mtime fields for object info requests
treewide: drop uses of `for_each_{loose,packed}_object()`
treewide: enumerate promisor objects via `odb_for_each_object()`
builtin/fsck: refactor to use `odb_for_each_object()`
odb: introduce `odb_for_each_object()`
packfile: introduce function to iterate through objects
packfile: extract function to iterate through objects of a store
object-file: introduce function to iterate through objects
object-file: extract function to read object info from path
odb: fix flags parameter to be unsigned
odb: rename `FOR_EACH_OBJECT_*` flags
Exit early if the hooks do not exist, to avoid spinning up/down
sideband async threads which no-op.
It is important to call the hook_exists() API provided by hook.[ch]
because it covers both config-defined hooks and the "traditional"
hooks from the hookdir. find_hook() only covers the hookdir hooks.
The regression happened because the no-op async threads add some
additional overhead which can be measured with the receive-refs test
of the benchmarks suite [1].
Reproduced using:
cd benchmarks/receive-refs && \
./run --revisions /path/to/git \
fc148b146ad41be71a7852c4867f0773cbfe1ff9~,fc148b146ad41be71a7852c4867f0773cbfe1ff9 \
--parameter-list refformat reftable --parameter-list refcount 10000
1: https://gitlab.com/gitlab-org/data-access/git/benchmarks
Fixes: fc148b146a ("receive-pack: convert update hooks to new API")
Reported-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
[jc: avoid duplicated hardcoded hook names]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of a tree object usually corresponds with the number of entries
it has. While iterating through objects in the repository for
git-repo-structure, identify the tree with the most entries and display
it in the output.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Complex merge events may produce an octopus merge where the resulting
merge commit has more than two parents. While iterating through objects
in the repository for git-repo-structure, identify the commit with the
most parents and display it in the output.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) does not show the corresponding
OIDs for the largest objects in its "table" output. Update the output to
include a list of OID annotations with an index to the corresponding row
in the table.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) shows the total inflated and disk
sizes of reachable objects in the repository, but doesn't show the size
of the largest individual objects. Since an individual object may be a
large contributor to the overall repository size, it is useful for users
to know the maximum size of individual objects.
While interating across objects, record the size and OID of the largest
objects encountered for each object type to provide as output. Note that
the default "table" output format only displays size information and not
the corresponding OID. In a subsequent commit, the table format is
updated to add table annotations that mention the OID.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The machine-parsable formats for the git-repo(1) "structure" subcommand
print output in keyvalue pairs. Introduce the helper function
`print_keyvalue()` to remove some code duplication and improve
readability.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When walking reachable objects in the repository, `count_objects()`
processes a set of objects and updates the `struct object_stats`. In
preparation for more granular statistics being collected, update the
`struct object_stats` for each individual object instead.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>