From b095803c90dac95a3d49af87dea509a38ffaf039 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 3 Mar 2026 16:00:25 +0100 Subject: [PATCH] builtin/pack-objects: reduce lock contention when writing packfile data When running `git pack-objects --stdout` we feed the data through `hashfd_ext()` with a progress meter and a smaller-than-usual buffer length of 8kB so that we can track throughput more granularly. But as packfiles tend to be on the larger side, this small buffer size may cause a ton of write(3p) syscalls. Originally, the buffer we used in `hashfd()` was 8kB for all use cases. This was changed though in 2ca245f8be (csum-file.h: increase hashfile buffer size, 2021-05-18) because we noticed that the number of writes can have an impact on performance. So the buffer size was increased to 128kB, which improved performance a bit for some use cases. But the commit didn't touch the buffer size for `hashd_throughput()`. The reasoning here was that callers expect the progress indicator to update frequently, and a larger buffer size would of course reduce the update frequency especially on slow networks. While that is of course true, there was (and still is, even though it's now a call to `hashfd_ext()`) only a single caller of this function in git-pack-objects(1). This command is responsible for writing packfiles, and those packfiles are often on the bigger side. So arguably: - The user won't care about increments of 8kB when packfiles tend to be megabytes or even gigabytes in size. - Reducing the number of syscalls would be even more valuable here than it would be for multi-pack indices, which was the benchmark done in the mentioned commit, as MIDXs are typically significantly smaller than packfiles. - Nowadays, many internet connections should be able to transfer data at a rate significantly higher than 8kB per second. Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`, which translates to ~64kB. This limit was chosen because `git pack-objects --stdout` is most often used when sending packfiles via git-upload-pack(1), where packfile data is chunked into pktlines when using the sideband. Furthermore, most internet connections should have a bandwidth signifcantly higher than 64kB/s, so we'd still be able to observe progress updates at a rate of at least once per second. This change significantly reduces the number of write(3p) syscalls from 355,000 to 44,000 when packing the Linux repository. While this results in a small performance improvement on an otherwise-unused system, this improvement is mostly negligible. More importantly though, it will reduce lock contention in the kernel on an extremely busy system where we have many processes writing data at once. Suggested-by: Jeff King Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index ba150a80ad..7301ed8c68 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -41,6 +41,7 @@ #include "promisor-remote.h" #include "pack-mtimes.h" #include "parse-options.h" +#include "pkt-line.h" #include "blob.h" #include "tree.h" #include "path-walk.h" @@ -1333,14 +1334,17 @@ static void write_pack_file(void) if (pack_to_stdout) { /* - * Since we are expecting to report progress of the - * write into this hashfile, use a smaller buffer - * size so the progress indicators arrive at a more - * frequent rate. + * This command is most often invoked via + * git-upload-pack(1), which will typically chunk data + * into pktlines. As such, we use the maximum data + * length of them as buffer length. + * + * Note that we need to subtract one though to + * accomodate for the sideband byte. */ struct hashfd_options opts = { .progress = progress_state, - .buffer_len = 8 * 1024, + .buffer_len = LARGE_PACKET_DATA_MAX - 1, }; f = hashfd_ext(the_repository->hash_algo, 1, "", &opts);