Age | Commit message (Collapse) | Author | Files | Lines |
|
A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.
For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.
By using the bitmaps available in the index, this commit implements
several new functions:
- `prepare_bitmap_git`
- `prepare_bitmap_walk`
- `traverse_bitmap_commit_list`
- `reuse_partial_packfile_from_bitmap`
The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:
- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.
- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.
- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:
a) limit the subsequent walk when building the WANTs bitmap
b) finding the final set of interesting commits by performing an
AND-NOT of the WANTs and the HAVEs.
If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.
As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.
If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.
Hence, this new set of functions are a generic API that allows to
perform the equivalent of
git rev-list --objects [roots...] [^uninteresting...]
for any set of commits, even if they don't have specific bitmaps
generated for them.
In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).
It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.
The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.
EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.
This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.
The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:
https://github.com/vmg/libewok
The original Java implementation can also be found on GitHub:
https://github.com/lemire/javaewah
[jc: stripped debug-only code per Peff's $gmane/239768]
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The POSIX standard doesn't currently define a `ntohll`/`htonll`
function pair to perform network-to-host and host-to-network
swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk
storage of EWAH bitmaps if they are not in native byte order.
Many thanks to Ramsay Jones <ramsay@ramsay1.demon.co.uk> and
Torsten Bögershausen <tboegi@web.de> for cygwin/mingw/msvc
portability fixes.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `git_open_noatime` helper can be of general interest for other
consumers of git's different on-disk formats.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This commit enables users of `struct rev_info` to peform custom limiting
during a revision walk (i.e. `get_revision`).
If the field `include_check` has been set to a callback, this callback
will be issued once for each commit before it is added to the "pending"
list of the revwalk. If the include check returns 0, the commit will be
marked as added but won't be pushed to the pending list, effectively
limiting the walk.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As the pack-objects system grows beyond the single
pack-objects.c file, more parts (like the soon-to-exist
bitmap code) will need to compute hashes for matching
deltas. Factor out name_hash to make it available to other
files.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The hash table that stores the packing list for a given `pack-objects`
run was tightly coupled to the pack-objects code.
In this commit, we refactor the hash table and the underlying storage
array into a `packing_data` struct. The functionality for accessing and
adding entries to the packing list is hence accessible from other parts
of Git besides the `pack-objects` builtin.
This refactoring is a requirement for further patches in this series
that will require accessing the commit packing list from outside of
`pack-objects`.
The hash table implementation has been minimally altered: we now
use table sizes which are always a power of two, to ensure a uniform
index distribution in the array.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Allow users to efficiently lookup consecutive entries that are expected
to be found on the same revindex by exporting `find_revindex_position`:
this function takes a pointer to revindex itself, instead of looking up
the proper revindex for a given packfile on each call.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We are passed a "void *" and write it out without ever
touching it; let's indicate that by using "const".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git ls-files -k" needs to crawl only the part of the working tree
that may overlap the paths in the index to find killed files, but
shared code with the logic to find all the untracked files, which
made it unnecessarily inefficient.
* jc/ls-files-killed-optim:
dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
t3010: update to demonstrate "ls-files -k" optimization pitfalls
ls-files -k: a directory only can be killed if the index has a non-directory
dir.c: use the cache_* macro to access the current index
|
|
"git branch --track" had a minor regression in v1.8.3.2 and later
that made it impossible to base your local work on anything but a
local branch of the upstream repository you are tracking from.
* jh/checkout-auto-tracking:
t3200: fix failure on case-insensitive filesystems
branch.c: Relax unnecessary requirement on upstream's remote ref name
t3200: Add test demonstrating minor regression in 41c21f2
Refer to branch.<name>.remote/merge when documenting --track
t3200: Minor fix when preparing for tracking failure
t2024: Fix &&-chaining and a couple of typos
|
|
When there is no sufficient overlap between old and new history
during a "git fetch" into a shallow repository, objects that the
sending side knows the receiving end has were unnecessarily sent.
* nd/fetch-into-shallow:
Add testcase for needless objects during a shallow fetch
list-objects: mark more commits as edges in mark_edges_uninteresting
list-objects: reduce one argument in mark_edges_uninteresting
upload-pack: delegate rev walking in shallow fetch to pack-objects
shallow: add setup_temporary_shallow()
shallow: only add shallow graft points to new shallow file
move setup_alternate_shallow and write_shallow_commits to shallow.c
|
|
Cleanups and tweaks for credential handling to work with ancient versions
of the gnome-keyring library that are still in use.
* bc/gnome-keyring:
contrib/git-credential-gnome-keyring.c: support really ancient gnome-keyring
contrib/git-credential-gnome-keyring.c: support ancient gnome-keyring
contrib/git-credential-gnome-keyring.c: report failure to store password
contrib/git-credential-gnome-keyring.c: use glib messaging functions
contrib/git-credential-gnome-keyring.c: use glib memory allocation functions
contrib/git-credential-gnome-keyring.c: use secure memory for reading passwords
contrib/git-credential-gnome-keyring.c: use secure memory functions for passwds
contrib/git-credential-gnome-keyring.c: use gnome helpers in keyring_object()
contrib/git-credential-gnome-keyring.c: set Gnome application name
contrib/git-credential-gnome-keyring.c: ensure buffer is non-empty before accessing
contrib/git-credential-gnome-keyring.c: strlen() returns size_t, not ssize_t
contrib/git-credential-gnome-keyring.c: exit non-zero when called incorrectly
contrib/git-credential-gnome-keyring.c: add static where applicable
contrib/git-credential-gnome-keyring.c: *style* use "if ()" not "if()" etc.
contrib/git-credential-gnome-keyring.c: remove unused die() function
contrib/git-credential-gnome-keyring.c: remove unnecessary pre-declarations
|
|
Explain how '.' can be used to refer to the "current repository"
in the documentation.
* po/dot-url:
doc/cli: make "dot repository" an independent bullet point
config doc: update dot-repository notes
doc: command line interface (cli) dot-repository dwimmery
|
|
An enhancement to the GIT_PS1_SHOWUPSTREAM facility.
* jc/prompt-upstream:
git-prompt.sh: optionally show upstream branch name
|
|
"git cherry-pick" without further options would segfault.
Could use a follow-up to handle '-' after argv[1] better.
* hu/cherry-pick-previous-branch:
cherry-pick: handle "-" after parsing options
|
|
Make "git grep" and "git show" pay attention to --textconv when
dealing with blob objects.
* mg/more-textconv:
grep: honor --textconv for the case rev:path
grep: allow to use textconv filters
t7008: demonstrate behavior of grep with textconv
cat-file: do not die on --textconv without textconv filters
show: honor --textconv for blobs
diff_opt: track whether flags have been set explicitly
t4030: demonstrate behavior of show with textconv
|
|
* jc/pack-objects:
pack-objects: shrink struct object_entry
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
git-merge: document the -S option
|
|
Document rules to use GIT_REFLOG_ACTION variable in the scripted
Porcelain. git-rebase--interactive locally violates them, but it
is a leaf user that does not call out to or dot-source other
scripts, so it does not urgently need to be fixed.
* jc/reflog-doc:
setup_reflog_action: document the rules for using GIT_REFLOG_ACTION
|
|
Rewrite "git repack" in C.
* sb/repack-in-c:
repack: improve warnings about failure of renaming and removing files
repack: retain the return value of pack-objects
repack: rewrite the shell script in C
|
|
Some progress and diagnostic messages from "git clone" were
incorrectly sent to the standard output stream, not to the standard
error stream.
* jk/clone-progress-to-stderr:
clone: always set transport options
clone: treat "checking connectivity" like other progress
clone: send diagnostic messages to stderr
|
|
* git://github.com/git-l10n/git-po:
l10n: fr.po: 2135/2135 messages translated
|
|
The option to gpg sign a merge commit is available but was not
documented. Use wording from the git-commit(1) manpage.
Signed-off-by: Nicolas Vigier <boklm@mars-attacks.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Sebastien Helleu <flashcode@flashtux.org>
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
|
|
|
|
"format-patch --from=<whom>" forgot to omit unnecessary in-body
from line, i.e. when <whom> is the same as the real author.
* jk/format-patch-from:
format-patch: print in-body "From" only when needed
|
|
Clean up the internal of the name-hash mechanism used to work
around case insensitivity on some filesystems to cleanly fix a
long-standing API glitch where the caller of cache_name_exists()
that ask about a directory with a counted string was required to
have '/' at one location past the end of the string.
* es/name-hash-no-trailing-slash-in-dirs:
dir: revert work-around for retired dangerous behavior
name-hash: stop storing trailing '/' on paths in index_state.dir_hash
employ new explicit "exists in index?" API
name-hash: refactor polymorphic index_name_exists()
|
|
Code refactoring.
* jk/trailing-slash-in-pathspec:
reset: handle submodule with trailing slash
rm: re-use parse_pathspec's trailing-slash removal
|
|
"git filter-branch" in a repository with many refs blew limit of
command line length.
* lc/filter-branch-too-many-refs:
Allow git-filter-branch to process large repositories with lots of branches.
|
|
"git checkout [--detach] <commit>" was listed poorly in the
synopsis section of its documentation.
* jc/checkout-detach-doc:
checkout: update synopsys and documentation on detaching HEAD
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jk/upload-pack-keepalive:
upload-pack: bump keepalive default to 5 seconds
upload-pack: send keepalive packets during pack computation
|
|
* bc/http-backend-allow-405:
http-backend: provide Allow header for 405
|
|
* jc/cvsserver-perm-bit-fix:
cvsserver: pick up the right mode bits
|
|
* js/add-i-mingw:
add--interactive: fix external command invocation on Windows
|
|
* nd/git-dir-pointing-at-gitfile:
Make setup_git_env() resolve .git file when $GIT_DIR is not specified
|
|
* jk/has-sha1-file-retry-packed:
has_sha1_file: re-check pack directory before giving up
|
|
* ap/commit-author-mailmap:
commit: search author pattern against mailmap
|
|
* es/rebase-i-no-abbrev:
rebase -i: fix short SHA-1 collision
t3404: rebase -i: demonstrate short SHA-1 collision
t3404: make tests more self-contained
Conflicts:
t/t3404-rebase-interactive.sh
|
|
* rt/rebase-p-no-merge-summary:
rebase --preserve-merges: ignore "merge.log" config
|
|
* es/rebase-i-respect-core-commentchar:
rebase -i: fix cases ignoring core.commentchar
|
|
- Don't start tests with 'test $? = 0' to catch preparation done
outside the test_expect_success block.
- Move writing the bogus patch and the expected output into the
appropriate test_expect_success blocks.
- Use the test_must_fail helper instead of manually checking for
non-zero exit code.
- Use the debug-friendly test_path_is_file helper instead of 'test -f'.
- No space after '>'.
Signed-off-by: SZEDER Gábor <szeder@ira.uka.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
List notable topics that graduated during Jonathan's interim
maintainership.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|