diff options
author | Junio C Hamano <gitster@pobox.com> | 2014-02-27 14:01:48 -0800 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2014-02-27 14:01:48 -0800 |
commit | 0f9e62e0847c075678a7a5a748567d1e881d16f8 (patch) | |
tree | 8ef8989069ae40eef891b0964fc2cb8036a74e48 /Documentation | |
parent | Merge branch 'dk/blame-janitorial' (diff) | |
parent | ewah: unconditionally ntohll ewah data (diff) | |
download | tgif-0f9e62e0847c075678a7a5a748567d1e881d16f8.tar.xz |
Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/config.txt | 25 | ||||
-rw-r--r-- | Documentation/git-repack.txt | 9 | ||||
-rw-r--r-- | Documentation/git-rev-list.txt | 1 | ||||
-rw-r--r-- | Documentation/rev-list-options.txt | 8 | ||||
-rw-r--r-- | Documentation/technical/bitmap-format.txt | 164 |
5 files changed, 206 insertions, 1 deletions
diff --git a/Documentation/config.txt b/Documentation/config.txt index 7eec746950..040197b10e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1870,6 +1870,31 @@ pack.packSizeLimit:: Common unit suffixes of 'k', 'm', or 'g' are supported. +pack.useBitmaps:: + When true, git will use pack bitmaps (if available) when packing + to stdout (e.g., during the server side of a fetch). Defaults to + true. You should not generally need to turn this off unless + you are debugging pack bitmaps. + +pack.writebitmaps:: + When true, git will write a bitmap index when packing all + objects to disk (e.g., when `git repack -a` is run). This + index can speed up the "counting objects" phase of subsequent + packs created for clones and fetches, at the cost of some disk + space and extra time spent on the initial repack. Defaults to + false. + +pack.writeBitmapHashCache:: + When true, git will include a "hash cache" section in the bitmap + index (if one is written). This cache can be used to feed git's + delta heuristics, potentially leading to better deltas between + bitmapped and non-bitmapped objects (e.g., when serving a fetch + between an older, bitmapped pack and objects that have been + pushed since the last gc). The downside is that it consumes 4 + bytes per object of disk space, and that JGit's bitmap + implementation does not understand it, causing it to complain if + Git and JGit are used on the same repository. Defaults to false. + pager.<cmd>:: If the value is boolean, turns on or off pagination of the output of a particular Git subcommand when writing to a tty. diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 509cf73e50..002cfd5eb9 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository SYNOPSIS -------- [verse] -'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [--window=<n>] [--depth=<n>] +'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=<n>] [--depth=<n>] DESCRIPTION ----------- @@ -110,6 +110,13 @@ other objects in that pack they already have locally. The default is unlimited, unless the config variable `pack.packSizeLimit` is set. +-b:: +--write-bitmap-index:: + Write a reachability bitmap index as part of the repack. This + only makes sense when used with `-a` or `-A`, as the bitmaps + must be able to refer to all reachable objects. This option + overrides the setting of `pack.writebitmaps`. + Configuration ------------- diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index 045b37b82e..7a1585def0 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -55,6 +55,7 @@ SYNOPSIS [ \--reverse ] [ \--walk-reflogs ] [ \--no-walk ] [ \--do-walk ] + [ \--use-bitmap-index ] <commit>... [ \-- <paths>... ] DESCRIPTION diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 03533af715..9a3da3646e 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -257,6 +257,14 @@ See also linkgit:git-reflog[1]. Output excluded boundary commits. Boundary commits are prefixed with `-`. +ifdef::git-rev-list[] +--use-bitmap-index:: + + Try to speed up the traversal using the pack bitmap index (if + one is available). Note that when traversing with `--objects`, + trees and blobs will not have their associated path printed. +endif::git-rev-list[] + -- History Simplification diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt new file mode 100644 index 0000000000..f8c18a0f7a --- /dev/null +++ b/Documentation/technical/bitmap-format.txt @@ -0,0 +1,164 @@ +GIT bitmap v1 format +==================== + + - A header appears at the beginning: + + 4-byte signature: {'B', 'I', 'T', 'M'} + + 2-byte version number (network byte order) + The current implementation only supports version 1 + of the bitmap index (the same one as JGit). + + 2-byte flags (network byte order) + + The following flags are supported: + + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the bitmap + index has been generated for a packfile with full closure + (i.e. where every single object in the packfile can find + its parent links inside the same packfile). This is a + requirement for the bitmap index format, also present in JGit, + that greatly reduces the complexity of the implementation. + + - BITMAP_OPT_HASH_CACHE (0x4) + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the + pack. The format and meaning of the name-hash is + described below. + + 4-byte entry count (network byte order) + + The total count of entries (bitmapped commits) in this bitmap index. + + 20-byte checksum + + The SHA1 checksum of the pack this bitmap index belongs to. + + - 4 EWAH bitmaps that act as type indexes + + Type indexes are serialized after the hash cache in the shape + of four EWAH bitmaps stored consecutively (see Appendix A for + the serialization format of an EWAH bitmap). + + There is a bitmap for each Git object type, stored in the following + order: + + - Commits + - Trees + - Blobs + - Tags + + In each bitmap, the `n`th bit is set to true if the `n`th object + in the packfile is of that type. + + The obvious consequence is that the OR of all 4 bitmaps will result + in a full set (all bits set), and the AND of all 4 bitmaps will + result in an empty bitmap (no bits set). + + - N entries with compressed bitmaps, one for each indexed commit + + Where `N` is the total amount of entries in this bitmap index. + Each entry contains the following: + + - 4-byte object position (network byte order) + The position **in the index for the packfile** where the + bitmap for this commit is found. + + - 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positive, and hence entries are always xor'ed + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + + - 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + - The compressed bitmap itself, see Appendix A. + +== Appendix A: Serialization format for an EWAH bitmap + +Ewah bitmaps are serialized in the same protocol as the JAVAEWAH +library, making them backwards compatible with the JGit +implementation: + + - 4-byte number of bits of the resulting UNCOMPRESSED bitmap + + - 4-byte number of words of the COMPRESSED bitmap, when stored + + - N x 8-byte words, as specified by the previous field + + This is the actual content of the compressed bitmap. + + - 4-byte position of the current RLW for the compressed + bitmap + +All words are stored in network byte order for their corresponding +sizes. + +The compressed bitmap is stored in a form of run-length encoding, as +follows. It consists of a concatenation of an arbitrary number of +chunks. Each chunk consists of one or more 64-bit words + + H L_1 L_2 L_3 .... L_M + +H is called RLW (run length word). It consists of (from lower to higher +order bits): + + - 1 bit: the repeated bit B + + - 32 bits: repetition count K (unsigned) + + - 31 bits: literal word count M (unsigned) + +The bitstream represented by the above chunk is then: + + - K repetitions of B + + - The bits stored in `L_1` through `L_M`. Within a word, bits at + lower order come earlier in the stream than those at higher + order. + +The next word after `L_M` (if any) must again be a RLW, for the next +chunk. For efficient appending to the bitstream, the EWAH stores a +pointer to the last RLW in the stream. + + +== Appendix B: Optional Bitmap Sections + +These sections may or may not be present in the `.bitmap` file; their +presence is indicated by the header flags section described above. + +Name-hash cache +--------------- + +If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains +a cache of 32-bit values, one per object in the pack. The value at +position `i` is the hash of the pathname at which the `i`th object +(counting in index order) in the pack can be found. This can be fed +into the delta heuristics to compare objects with similar pathnames. + +The hash algorithm used is: + + hash = 0; + while ((c = *name++)) + if (!isspace(c)) + hash = (hash >> 2) + (c << 24); + +Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag. +If implementations want to choose a different hashing scheme, they are +free to do so, but MUST allocate a new header flag (because comparing +hashes made under two different schemes would be pointless). |