fast-import was relying on the fact that on most systems mmap() and
write() are synchronized by the filesystem's buffer cache. We were
relying on the ability to mmap() 20 bytes beyond the current end
of the file, then later fill in those bytes with a future write()
call, then read them through the previously obtained mmap() address.
This isn't always true with some implementations of NFS, but it is
especially not true with our NO_MMAP=YesPlease build time option used
on some platforms. If fast-import was built with NO_MMAP=YesPlease
we used the malloc()+pread() emulation and the subsequent write()
call does not update the trailing 20 bytes of a previously obtained
"mmap()" (aka malloc'd) address.
Under NO_MMAP that behavior causes unpack_entry() in sha1_file.c to
be unable to read an object header (or data) that has been unlucky
enough to be written to the packfile at a location such that it
is in the trailing 20 bytes of a window previously opened on that
same packfile.
This bug has gone unnoticed for a very long time as it is highly data
dependent. Not only does the object have to be placed at the right
position, but it also needs to be positioned behind some other object
that has been accessed due to a branch cache invalidation. In other
words the stars had to align just right, and if you did run into
this bug you probably should also have purchased a lottery ticket.
Fortunately the workaround is a lot easier than the bug explanation.
Before we allow unpack_entry() to read data from a pack window
that has also (possibly) been modified through write() we force
all existing windows on that packfile to be closed. By closing
the windows we ensure that any new access via the emulated mmap()
will reread the packfile, updating to the current file content.
This comes at a slight performance degredation as we cannot reuse
previously cached windows when we update the packfile. But it
is a fairly minor difference as the window closes happen at only
two points:
- When the packfile is finalized and its .idx is generated:
At this stage we are getting ready to update the refs and any
data access into the packfile is going to be random, and is
going after only the branch tips (to ensure they are valid).
Our existing windows (if any) are not likely to be positioned
at useful locations to access those final tip commits so we
probably were closing them before anyway.
- When the branch cache missed and we need to reload:
At this point fast-import is getting change commands for the next
commit and it needs to go re-read a tree object it previously
had written out to the packfile. What windows we had (if any)
are not likely to cover the tree in question so we probably were
closing them before anyway.
We do try to avoid unnecessarily closing windows in the second case
by checking to see if the packfile size has increased since the
last time we called unpack_entry() on that packfile. If the size
has not changed then we have not written additional data, and any
existing window is still vaild. This nicely handles the cases where
fast-import is going through a branch cache reload and needs to read
many trees at once. During such an event we are not likely to be
updating the packfile so we do not cycle the windows between reads.
With this change in place t9301-fast-export.sh (which was broken
by c3b0dec509) finally works again.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For consistency and technical reasons on Windows (':' is usually part of directory
names), we must use ';' as path separator.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
The old way of fixing warnings did not succeed on MinGW. MinGW
does not support C99 printf format strings for size_t [1]. But
gcc on MinGW issues warnings if C99 printf format is not used.
Hence, the old stragegy to avoid warnings fails.
[1] http://www.mingw.org/MinGWiki/index.php/C99
This commits passes arguments of type size_t through a tiny
helper functions that casts to the type expected by the format
string.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some places that test for an absolute path. Use the helper
function to ease porting.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
RelNotes-1.5.3.5: describe recent fixes
merge-recursive.c: mrtree in merge() is not used before set
sha1_file.c: avoid gcc signed overflow warnings
Fix a small memory leak in builtin-add
honor the http.sslVerify option in shell scripts
With the recent gcc, we get:
sha1_file.c: In check_packed_git_:
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false
for a piece of code that tries to make sure that off_t is large
enough to hold more than 2^32 offset. The test tried to make
sure these do not wrap-around:
/* make sure we can deal with large pack offsets */
off_t x = 0x7fffffffUL, y = 0xffffffffUL;
if (x > (x + 1) || y > (y + 1)) {
but gcc assumes it can do whatever optimization it wants for a
signed overflow (undefined behaviour) and warns about this
construct.
Follow Linus's suggestion to check sizeof(off_t) instead to work
around the problem.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ph/strbuf: (44 commits)
Make read_patch_file work on a strbuf.
strbuf_read_file enhancement, and use it.
strbuf change: be sure ->buf is never ever NULL.
double free in builtin-update-index.c
Clean up stripspace a bit, use strbuf even more.
Add strbuf_read_file().
rerere: Fix use of an empty strbuf.buf
Small cache_tree_write refactor.
Make builtin-rerere use of strbuf nicer and more efficient.
Add strbuf_cmp.
strbuf_setlen(): do not barf on setting length of an empty buffer to 0
sq_quote_argv and add_to_string rework with strbuf's.
Full rework of quote_c_style and write_name_quoted.
Rework unquote_c_style to work on a strbuf.
strbuf API additions and enhancements.
nfv?asprintf are broken without va_copy, workaround them.
Fix the expansion pattern of the pseudo-static path buffer.
builtin-for-each-ref.c::copy_name() - do not overstep the buffer.
builtin-apply.c: fix a tiny leak introduced during xmemdupz() conversion.
Use xmemdupz() in many places.
...
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.
strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.
as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.
API changes:
* strbuf_setlen now always works, so just make strbuf_reset a convenience
macro.
* strbuf_detatch takes a size_t* optional argument (meaning it can be
NULL) to copy the buffer's len, as it was needed for this refactor to
make the code more readable, and working like the callers.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This brings builtin-stripspace, builtin-tag and mktag to use strbufs.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under some types of packfile corruption the zlib stream holding the
data for a delta within a packfile may fail to inflate, due to say
a CRC failure within the compressed data itself. When this occurs
the unpack_compressed_entry function will return NULL as a signal to
the caller that the data is not available. Unfortunately we then
tried to use that NULL as though it referenced a memory location
where a delta was stored and tried to apply it to the delta base.
Loading a byte from the NULL address typically causes a SIGSEGV.
cate on #git noticed this failure in `git fsck --full` where the
call to verify_pack() first noticed that the packfile was corrupt
by finding that the packfile's SHA-1 did not match the raw data of
the file. After finding this fsck went ahead and tried to verify
every object within the packfile, even though the packfile was
already known to be bad. If we are going to shovel bad data at
the delta unpacking code, we better handle it correctly.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Print the index version when an error occurs so the user
knows what type of header (and size) we thought the index
should have had.
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new name is closer to the purpose of the function.
A NUL-terminated buffer makes things easier when callers need that.
Since the function returns only the memory written with data,
almost always allocating more space than needed because final
size is unknown, an extra NUL terminating the buffer is harmless.
It is not included in the returned size, so the function
remains working as before.
Also, now the function allows the buffer passed to be NULL at first,
and alloc_nr is now used for growing the buffer, instead size=*2.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Document -<n> for git-format-patch
glossary: add 'reflog'
diff --no-index: fix --name-status with added files
Don't smash stack when $GIT_ALTERNATE_OBJECT_DIRECTORIES is too long
There is no restriction on the length of the name returned by
get_object_directory, other than the fact that it must be a stat'able
git object directory. That means its name may have length up to
PATH_MAX-1 (i.e., often 4095) not counting the trailing NUL.
Combine that with the assumption that the concatenation of that name and
suffixes like "/info/alternates" and "/pack/---long-name---.idx" will fit
in a buffer of length PATH_MAX, and you see the problem. Here's a fix:
sha1_file.c (prepare_packed_git_one): Lengthen "path" buffer
so we are guaranteed to be able to append "/pack/" without checking.
Skip any directory entry that is too long to be appended.
(read_info_alternates): Protect against a similar buffer overrun.
Before this change, using the following admittedly contrived environment
setting would cause many git commands to clobber their stack and segfault
on a system with PATH_MAX == 4096:
t=$(perl -e '$s=".git/objects";$n=(4096-6-length($s))/2;print "./"x$n . $s')
export GIT_ALTERNATE_OBJECT_DIRECTORIES=$t
touch g
./git-update-index --add g
If you run the above commands, you'll soon notice that many
git commands now segfault, so you'll want to do this:
unset GIT_ALTERNATE_OBJECT_DIRECTORIES
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pack-file can get created without any objects in it (to transfer "no
data" - which can happen if you use a reference git repo, for example,
or just otherwise just end up transferring only branch head information
and already have all the objects themselves).
And while we probably should never create an index for such a pack, if we
do (and we do), the index file size sanity checking was incorrect.
This fixes it.
Reported-and-tested-by: Jocke Tjernlund <tjernlund@tjernlund.se>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/pack:
Style nit - don't put space after function names
Ensure the pack index is opened before access
Simplify index access condition in count-objects, pack-redundant
Test for recent rev-parse $abbrev_sha1 regression
rev-parse: Identify short sha1 sums correctly.
Attempt to delay prepare_alt_odb during get_sha1
Micro-optimize prepare_alt_odb
Lazily open pack index files on demand
* maint:
git-config: Improve documentation of git-config file handling
git-config: Various small fixes to asciidoc documentation
decode_85(): fix missing return.
fix signed range problems with hex conversions
* maint-1.5.1:
git-config: Improve documentation of git-config file handling
git-config: Various small fixes to asciidoc documentation
decode_85(): fix missing return.
fix signed range problems with hex conversions
Jon Smirl said:
| Once an object reference hits a pack file it is very likely that
| following references will hit the same pack file. So first place to
| look for an object is the same place the previous object was found.
This is indeed a good heuristic so here it is. The search always start
with the pack where the last object lookup succeeded. If the wanted
object is not available there then the search continues with the normal
pack ordering.
To test this I split the Linux repository into 66 packs and performed a
"time git-rev-list --objects --all > /dev/null". Best results are as
follows:
Pack Sort w/o this patch w/ this patch
-------------------------------------------------------------
recent objects last 26.4s 20.9s
recent objects first 24.9s 18.4s
This shows that the pack order based on object age has some influence,
but that the last-used-pack heuristic is even more significant in
reducing object lookup.
Signed-off-by: Nicolas Pitre <nico@cam.org> --- Note: the
--max-pack-size to git-repack currently produces packs with old objects
after those containing recent objects. The pack sort based on
filesystem timestamp is therefore backward for those. This needs to be
fixed of course, but at least it made me think about this variable for
the test.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Make hexval_table[] "const". Also make sure that the accessor
function hexval() does not access the table with out-of-range
values by declaring its parameter "unsigned char", instead of
"unsigned int".
With this, gcc can just generate:
movzbl (%rdi), %eax
movsbl hexval_table(%rax),%edx
movzbl 1(%rdi), %eax
movsbl hexval_table(%rax),%eax
sall $4, %edx
orl %eax, %edx
for the code to generate a byte from two hex characters.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Our style is to not put a space after a function name. I did here,
and Junio applied the patch with the incorrect formatting. So I'm
cleaning up after myself since I noticed it upon review.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Calling getenv() is not that expensive, but its also not free,
and its certainly not cheaper than testing to see if alt_odb_tail
is not null.
Because we are calling prepare_alt_odb() from within find_sha1_file
every time we cannot find an object file locally we want to skip out
of prepare_alt_odb() as early as possible once we have initialized
our alternate list.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In some repository configurations the user may have many packfiles,
but all of the recent commits/trees/tags/blobs are likely to
be in the most recent packfile (the one with the newest mtime).
It is therefore common to be able to complete an entire operation
by accessing only one packfile, even if there are 25 packfiles
available to the repository.
Rather than opening and mmaping the corresponding .idx file for
every pack found, we now only open and map the .idx when we suspect
there might be an object of interest in there.
Of course we cannot known in advance which packfile contains an
object, so we still need to scan the entire packed_git list to
locate anything. But odds are users want to access objects in the
most recently created packfiles first, and that may be all they
ever need for the current operation.
Junio observed in b867092f that placing recent packfiles before
older ones can slightly improve access times for recent objects,
without degrading it for historical object access.
This change improves upon Junio's observations by trying even harder
to avoid the .idx files that we won't need.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>