In some cases, we zero-initialize our object IDs, which sets the algo
member to zero as well, which is not a valid algorithm number. This is
a bad practice, but we typically paper over it in many cases by simply
substituting the repository's hash algorithm.
However, our new Rust loose object map code doesn't handle this
gracefully and can't find object IDs when the algorithm is zero because
they don't compare equal to those with the correct algo field. In
addition, the comparison code doesn't have any knowledge of what the
main algorithm is because that's global state, so we can't adjust the
comparison.
To make our code function properly and to avoid propagating these bad
entries, if we get a source object ID with a zero algo, just make a copy
of it with the fixed algorithm. This has the benefit of also fixing the
object IDs if we're in a single algorithm mode as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our new binary object map code avoids needing to be intimately involved
with file handling by simply writing data to an object implement Write.
This makes it very easy to test by writing to a Cursor wrapping a Vec
for tests, and thus decouples it from intimate knowledge about how we
handle files.
However, we will actually want to write our data to an actual file,
since that's the most practical way to persist data. Implement a
wrapper around the hashfile code that implements the Write trait so that
we can write our object map into a file.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our current loose object format has a few problems. First, it is not
efficient: the list of object IDs is not sorted and even if it were,
there would not be an efficient way to look up objects in both
algorithms.
Second, we need to store mappings for things which are not technically
loose objects but are not packed objects, either, and so cannot be
stored in a pack index. These kinds of things include shallows, their
parents, and their trees, as well as submodules. Yet we also need to
implement a sensible way to store the kind of object so that we can
prune unneeded entries. For instance, if the user has updated the
shallows, we can remove the old values.
For these reasons, introduce a new binary object map format. The
careful reader will notice that it resembles very closely the pack index
v3 format. Add an in-memory object map as well, and allow writing to a
batched map, which can then be written later as one of the binary object
maps. Include several tests for round tripping and data lookup across
algorithms.
Note that the use of this code elsewhere in Git will involve some C code
and some C-compatible code in Rust that will be introduced in a future
commit. Thus, for example, we ignore the fact that if there is no
current batch and the caller asks for data to be written, this code does
nothing, mostly because this code also does not involve itself with
opening or manipulating files. The C code that we will add later will
implement this functionality at a higher level and take care of this,
since the code which is necessary for writing to the object store is
deeply involved with our C abstractions and it would require extensive
work (which would not be especially valuable at this point) to port
those to Rust.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we'll want to hash some data when dealing with an
object map. Let's make this easy by creating a structure to hash
objects and calling into the C functions as necessary to perform the
hashing. For now, we only implement safe hashing, but in the future we
could add unsafe hashing if we want. Implement Clone and Drop to
appropriately manage our memory. Additionally implement Write to make
it easy to use with other formats that implement this trait.
While we're at it, add some tests for the various hashing cases.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cargo uses the build.rs script to determine how to compile and link a
binary. The only binary we're generating, however, is for our tests,
but in a future commit, we're going to link against libgit.a for some
functionality and we'll need to make sure the test binaries are
complete.
Add a build.rs file for this case and specify the files we're going to
be linking against. Because we cannot specify different dependencies
when building our static library versus our tests, update the Makefile
to specify these dependencies for our static library to avoid race
conditions during build.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Cargo links binaries with MSVC, it uses the link.exe linker from
PATH to do so. However, when running under a shell from MSYS, such as
when building with the Git for Windows SDK, which we do in CI, the
/ming64/bin and /usr/bin entries are first in PATH. That means that the
Unix link binary shows up first, which obviously does not work for
linking binaries in any useful way.
To solve this problem, adjust PATH to place those binaries at the end of
the list instead of the beginning. This allows access to the normal
Unix tools, but link.exe will be the compiler's linker. Make sure to
export PATH explicitly: while this should be the default, it's more
robust to not rely on the shell operating in a certain way.
The reason this has not shown up before is that we typically link our
binaries from the C compiler. However, now that we're about to
introduce a Rust build script (build.rs file), Rust will end up linking
that script to further drive Cargo, in which case we'll invoke the
linker from it. There are other solutions, such as using LLD, but this
one is simple and reliable and is most likely to work with existing
systems.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'd like to be able to hash our data in Rust using the same contexts as
in C. However, we need our helper functions to not be inline so they
can be linked into the binary appropriately. In addition, to avoid
managing memory manually and since we don't know the size of the hash
context structure, we want to have simple alloc and free functions we
can use to make sure a context can be easily dynamically created.
Expose the helper functions and create alloc, free, and init functions
we can call.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'll soon be writing out an object map using the hashfile code. Add an
fsync component to allow us to handle fsyncing it correctly.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to call this code from Rust and ensure that the types are the
same for compatibility, which is easiest to do if the type is a fixed
size. Since unsigned int is 32 bits on all the platforms we care about,
define it as a uint32_t instead.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, users can internally access the contents of the ObjectID
struct, which can lead to data that is not valid, such as invalid
algorithms or non-zero-padded hash values. These can cause problems
down the line as we use them more.
Add a constructor for ObjectID that allows us to set these values and
also provide an accessor for the algorithm so that we can access it. In
addition, provide useful Display and Debug implementations that can
format our data in a useful way.
Now that we have the ability to work with these various components in a
nice way, add some tests as well to make sure that ObjectID and
HashAlgorithm work together as expected.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In C, it's easy for us to look up a hash algorithm structure by its
offset by simply indexing the hash_algos array. However, in Rust, we
sometimes need a pointer to pass to a C function, but we have our own
hash algorithm abstraction.
To get one from the other, let's provide a simple function that looks up
the C structure from the offset and expose it in Rust.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This works very similarly to the existing one in C except that it
doesn't provide any functionality to hash an object. We don't currently
need that right now, but the use of those function pointers do make it
substantially more difficult to write a bit-for-bit identical structure
across the C/Rust interface, so omit them for now.
Instead of the more customary "&self", use "self", because the former is
the size of a pointer and the latter is the size of an integer on most
systems. Don't define an unknown value but use an Option for that
instead.
Update the object ID structure to allow slicing the data appropriately
for the algorithm.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'd like to be able to write some Rust code that can work with object
IDs. Add a structure here that's identical to struct object_id in C,
for easy use in sharing across the FFI boundary. We will use this
structure in several places in hot paths, such as index-pack or
pack-objects when converting between algorithms, so prioritize efficient
interchange over a more idiomatic Rust approach.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently use an int for this value, but we'll define this structure
from Rust in a future commit and we want to ensure that our data types
are exactly identical. To make that possible, use a uint32_t for the
hash algorithm.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we set up a repository that doesn't have a compatibility hash
algorithm, we set the destination algorithm object to NULL. In such a
case, we want to silently do nothing instead of crashing, so simply
treat the operation as a no-op and copy the object ID.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'll be implementing some of our interoperability code, like the loose
object map, in Rust. While the code currently compiles with the old
loose object map format, which is written entirely in C, we'll soon
replace that with the Rust-based implementation.
Require the use of Rust for compatibility mode and die if it is not
supported. Because the repo argument is not used when Rust is missing,
cast it to void to silence the compiler warning, which we do not care
about.
Add a prerequisite in our tests, RUST, that checks if Rust functionality
is available and use it in the tests that handle interoperability.
This is technically a regression in functionality compared to our
existing state, but pack index v3 is not yet implemented and thus the
functionality is mostly quite broken, which is why we've recently marked
this functionality as experimental. We don't believe anyone is getting
useful use out of the interoperability code in its current state, so no
actual users should be negatively impacted by this change.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "-z" and "--max-depth" documentation (and implementation of
"-z") in the "git last-modified" command have been updated.
* tc/last-modified-options-cleanup:
last-modified: change default max-depth to 0
last-modified: document option '--max-depth'
last-modified: document option '-z'
last-modified: clarify in the docs the command takes a pathspec
The computation of column width made by "git diff --stat" was
confused when pathnames contain non-ASCII characters.
* lp/diff-stat-utf8-display-width-fix:
t4073: add test for diffstat paths length when containing UTF-8 chars
diff: improve scaling of filenames in diffstat to handle UTF-8 chars
HTTP transport failed to authenticate in some code paths, which has
been corrected.
* ap/http-probe-rpc-use-auth:
remote-curl: use auth for probe_rpc() requests too
Avoid local submodule repository directory paths overlapping with
each other by encoding submodule names before using them as path
components.
* ar/submodule-gitdir-tweak:
submodule: detect conflicts with existing gitdir configs
submodule: hash the submodule name for the gitdir path
submodule: fix case-folding gitdir filesystem collisions
submodule--helper: fix filesystem collisions by encoding gitdir paths
builtin/credential-store: move is_rfc3986_unreserved to url.[ch]
submodule--helper: add gitdir migration command
submodule: allow runtime enabling extensions.submodulePathConfig
submodule: introduce extensions.submodulePathConfig
builtin/submodule--helper: add gitdir command
submodule: always validate gitdirs inside submodule_name_to_gitdir
submodule--helper: use submodule_name_to_gitdir in add_submodule
"git add -p" and friends note what the current status of the hunk
being shown is.
* aa/add-p-previous-decisions:
add -p: show user's hunk decision when selecting hunks
My canonical and old emails were reversed, somehow. Also add
an entry for a new email that may sneak in.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'jx/zh_CN' of github.com:jiangxin/git:
l10n: zh_CN: standardize glossary terms
l10n: zh_CN: updated translation for 2.53
l10n: zh_CN: fix inconsistent use of standard vs. wide colons
Add preferred Chinese terminology notes and align existing translations
to the updated glossary. AI-assisted review was used to check and
improve legacy translations.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Replace mixed usage of standard (ASCII) colons ':' with full-width
(wide) colons ':' in Chinese translations to ensure typographic
consistency, as reported by CAESIUS-TIM [1].
Full-width punctuation is preferred in Chinese localization for better
readability and adherence to typesetting conventions.
[1]: https://github.com/git-l10n/git-po/issues/884
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>