reftable/block: reuse zstream state on inflation

When calling `inflateInit()` and `inflate()`, the zlib library will
allocate several data structures for the underlying `zstream` to keep
track of various information. Thus, when inflating repeatedly, it is
possible to optimize memory allocation patterns by reusing the `zstream`
and then calling `inflateReset()` on it to prepare it for the next chunk
of data to inflate.

This is exactly what the reftable code is doing: when iterating through
reflogs we need to potentially inflate many log blocks, but we discard
the `zstream` every single time. Instead, as we reuse the `block_reader`
for each of the blocks anyway, we can initialize the `zstream` once and
then reuse it for subsequent inflations.

Refactor the code to do so, which leads to a significant reduction in
the number of allocations. The following measurements were done when
iterating through 1 million reflog entries. Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Patrick Steinhardt
2024-04-08 14:17:04 +02:00
committed by Junio C Hamano
parent 15a60b747e
commit ce1f213cc9
3 changed files with 19 additions and 10 deletions

View File

@@ -56,6 +56,8 @@ int block_writer_finish(struct block_writer *w);
/* clears out internally allocated block_writer members. */
void block_writer_release(struct block_writer *bw);
struct z_stream;
/* Read a block. */
struct block_reader {
/* offset of the block header; nonzero for the first block in a
@@ -67,6 +69,7 @@ struct block_reader {
int hash_size;
/* Uncompressed data for log entries. */
z_stream *zstream;
unsigned char *uncompressed_data;
size_t uncompressed_cap;