summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-11-30pack-bitmap: do not use gcc packed attributeLibravatar Karsten Blees4-18/+29
The "__attribute__" flag may be a noop on some compilers. That's OK as long as the code is correct without the attribute, but in this case it is not. We would typically end up with a struct that is 2 bytes too long due to struct padding, breaking both reading and writing of bitmaps. Instead of marshalling the data in a struct, let's just provide helpers for reading and writing the appropriate types. Besides being correct on all platforms, the result is more efficient and simpler to read. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-22ewah_bitmap.c: do not assume size_t and eword_t are the same sizeLibravatar Kyle J. McKay1-1/+1
When buffer_grow changes the size of the buffer using realloc, it first computes and saves the rlw pointer's offset into the buffer using (uint8_t *) math before the realloc but then restores it using (eword_t *) math. In order to do this it's necessary to convert the (uint8_t *) offset into an (eword_t *) offset. It was doing this by dividing by the sizeof(size_t). Unfortunately sizeof(size_t) is not same as sizeof(eword_t) on all platforms. This causes illegal memory accesses and other bad things to happen when attempting to use bitmaps on those platforms. Fix this by dividing by the sizeof(eword_t) instead which will always be correct for all platforms. Signed-off-by: Kyle J. McKay <mackyle@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-04pack-objects: do not reuse packfiles without --delta-base-offsetLibravatar Jeff King1-1/+12
When we are sending a packfile to a remote, we currently try to reuse a whole chunk of packfile without bothering to look at the individual objects. This can make things like initial clones much lighter on the server, as we can just dump the packfile bytes. However, it's possible that the other side cannot read our packfile verbatim. For example, we may have objects stored as OFS_DELTA, but the client is an antique version of git that only understands REF_DELTA. We negotiate this capability over the fetch protocol. A normal pack-objects run will convert OFS_DELTA into REF_DELTA on the fly, but the "reuse pack" code path never even looks at the objects. This patch disables packfile reuse if the other side is missing any capabilities that we might have used in the on-disk pack. Right now the only one is OFS_DELTA, but we may need to expand in the future (e.g., if packv4 introduces new object types). We could be more thorough and only disable reuse in this case when we actually have an OFS_DELTA to send, but: 1. We almost always will have one, since we prefer OFS_DELTA to REF_DELTA when possible. So this case would almost never come up. 2. Looking through the objects defeats the purpose of the optimization, which is to do as little work as possible to get the bytes to the remote. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-04add `ignore_missing_links` mode to revwalkLibravatar Vicent Marti5-5/+44
When pack-objects is computing the reachability bitmap to serve a fetch request, it can erroneously die() if some of the UNINTERESTING objects are not present. Upload-pack throws away HAVE lines from the client for objects we do not have, but we may have a tip object without all of its ancestors (e.g., if the tip is no longer reachable and was new enough to survive a `git prune`, but some of its reachable objects did get pruned). In the non-bitmap case, we do a revision walk with the HAVE objects marked as UNINTERESTING. The revision walker explicitly ignores errors in accessing UNINTERESTING commits to handle this case (and we do not bother looking at UNINTERESTING trees or blobs at all). When we have bitmaps, however, the process is quite different. The bitmap index for a pack-objects run is calculated in two separate steps: First, we perform an extensive walk from all the HAVEs to find the full set of objects reachable from them. This walk is usually optimized away because we are expected to hit an object with a bitmap during the traversal, which allows us to terminate early. Secondly, we perform an extensive walk from all the WANTs, which usually also terminates early because we hit a commit with an existing bitmap. Once we have the resulting bitmaps from the two walks, we AND-NOT them together to obtain the resulting set of objects we need to pack. When we are walking the HAVE objects, the revision walker does not know that we are walking it only to mark the results as uninteresting. We strip out the UNINTERESTING flag, because those objects _are_ interesting to us during the first walk. We want to keep going to get a complete set of reachable objects if we can. We need some way to tell the revision walker that it's OK to silently truncate the HAVE walk, just like it does for the UNINTERESTING case. This patch introduces a new `ignore_missing_links` flag to the `rev_info` struct, which we set only for the HAVE walk. It also adds tests to cover UNINTERESTING objects missing from several positions: a missing blob, a missing tree, and a missing parent commit. The missing blob already worked (as we do not care about its contents at all), but the other two cases caused us to die(). Note that there are a few cases we do not need to test: 1. We do not need to test a missing tree, with the blob still present. Without the tree that refers to it, we would not know that the blob is relevant to our walk. 2. We do not need to test a tip commit that is missing. Upload-pack omits these for us (and in fact, we complain even in the non-bitmap case if it fails to do so). Reported-by: Siddharth Agarwal <sid0@fb.com> Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-17pack-objects: turn off bitmaps when skipping objectsLibravatar Jeff King2-2/+15
The pack bitmap format requires that we have a single bit for each object in the pack, and that each object's bitmap represents its complete set of reachable objects. Therefore we have no way to represent the bitmap of an object which references objects outside the pack. We notice this problem while generating the bitmaps, as we try to find the offset of a particular object and realize that we do not have it. In this case we die, and neither the bitmap nor the pack is generated. This is correct, but perhaps a little unfriendly. If you have bitmaps turned on in the config, many repacks will fail which would otherwise succeed. E.g., incremental repacks, repacks with "-l" when you have alternates, ".keep" files. Instead, this patch notices early that we are omitting some objects from the pack and turns off bitmaps (with a warning). Note that this is not strictly correct, as it's possible that the object being omitted is not reachable from any other object in the pack. In practice, this is almost never the case, and there are two advantages to doing it this way: 1. The code is much simpler, as we do not have to cleanly abort the bitmap-generation process midway through. 2. We do not waste time partially generating bitmaps only to find out that some object deep in the history is not being packed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-12ewah: unconditionally ntohll ewah dataLibravatar Jeff King1-7/+3
Commit a201c20 tried to optimize out a loop like: for (i = 0; i < len; i++) data[i] = ntohll(data[i]); in the big-endian case, because we know that ntohll is a noop, and we do not need to pay the cost of the loop at all. However, it mistakenly assumed that __BYTE_ORDER was always defined, whereas it may not be on systems which do not define it by default, and where we did not need to define it to set up the ntohll macro. This includes OS X and Windows. We could muck with the ordering in compat/bswap.h to make sure it is defined unconditionally, but it is simpler to still to just execute the loop unconditionally. That avoids the application code knowing anything about these magic macros, and lets it depend only on having ntohll defined. And since the resulting loop looks like (on a big-endian system): for (i = 0; i < len; i++) data[i] = data[i]; any decent compiler can probably optimize it out. Original report and analysis by Brian Gernhardt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23ewah: support platforms that require aligned readsLibravatar Vicent Marti1-9/+24
The caller may hand us an unaligned buffer (e.g., because it is an mmap of a file with many ewah bitmaps). On some platforms (like SPARC) this can cause a bus error. We can fix it with a combination of get_be32 and moving the data into an aligned buffer (which we would do anyway, but we can move it before fixing the endianness). Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23read-cache: use get_be32 instead of hand-rolled ntoh_lLibravatar Jeff King2-32/+16
Commit d60c49c (read-cache.c: allow unaligned mapping of the index file, 2012-04-03) introduced helpers to access unaligned data. However, we already have get_be32, which has a few advantages: 1. It's already written, so we avoid duplication. 2. It's probably faster, since it does the endian conversion and the alignment fix at the same time. 3. The get_be32 code is well-tested, having been in block-sha1 for a long time. By contrast, our custom helpers were probably almost never used, since the user needed to manually define a macro to enable them. We have to add a get_be16 implementation to the existing get_be32, but that is very simple to do. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23block-sha1: factor out get_be and put_be wrappersLibravatar Jeff King2-32/+32
The BLK_SHA1 code has optimized wrappers for doing endian conversions on memory that may not be aligned. Let's pull them out so that we can use them elsewhere, especially the time-tested list of platforms that prefer each strategy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-16do not discard revindex when re-preparing packfilesLibravatar Jeff King3-13/+0
When an object lookup fails, we re-read the objects/pack directory to pick up any new packfiles that may have been created since our last read. We also discard any pack revindex structs we've allocated. The discarding is a problem for the pack-bitmap code, which keeps a pointer to the revindex for the bitmapped pack. After the discard, the pointer is invalid, and we may read free()d memory. Other revindex users do not keep a bare pointer to the revindex; instead, they always access it through revindex_for_pack(), which lazily builds the revindex. So one solution is to teach the pack-bitmap code a similar trick. It would be slightly less efficient, but probably not all that noticeable. However, it turns out this discarding is not actually necessary. When we call reprepare_packed_git, we do not throw away our old pack list. We keep the existing entries, and only add in new ones. So there is no safety problem; we will still have the pack struct that matches each revindex. The packfile itself may go away, of course, but we are already prepared to handle that, and it may happen outside of reprepare_packed_git anyway. Throwing away the revindex may save some RAM if the pack never gets reused (about 12 bytes per object). But it also wastes some CPU time (to regenerate the index) if the pack does get reused. It's hard to say which is more valuable, but in either case, it happens very rarely (only when we race with a simultaneous repack). Just leaving the revindex in place is simple and safe both for current and future code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30pack-bitmap: implement optional name_hash cacheLibravatar Vicent Marti8-7/+91
When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30t/perf: add tests for pack bitmapsLibravatar Jeff King1-0/+56
This adds a few basic perf tests for the pack bitmap code to show off its improvements. The tests are: 1. How long does it take to do a repack (it gets slower with bitmaps, since we have to do extra work)? 2. How long does it take to do a clone (it gets faster with bitmaps)? 3. How does a small fetch perform when we've just repacked? 4. How does a clone perform when we haven't repacked since a week of pushes? Here are results against linux.git: Test origin/master this tree ----------------------------------------------------------------------- 5310.2: repack to disk 33.64(32.64+2.04) 67.67(66.75+1.84) +101.2% 5310.3: simulated clone 30.49(29.47+2.05) 1.20(1.10+0.10) -96.1% 5310.4: simulated fetch 3.49(6.79+0.06) 5.57(22.35+0.07) +59.6% 5310.6: partial bitmap 36.70(43.87+1.81) 8.18(21.92+0.73) -77.7% You can see that we do take longer to repack, but we do way better for further clones. A small fetch performs a bit worse, as we spend way more time on delta compression (note the heavy user CPU time, as we have 8 threads) due to the lack of name hashes for the bitmapped objects. The final test shows how the bitmaps degrade over time between packs. There's still a significant speedup over the non-bitmap case, but we don't do quite as well (we have to spend time accessing the "new" objects the old fashioned way, including delta compression). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30t: add basic bitmap functionality testsLibravatar Jeff King1-0/+138
Now that we can read and write bitmaps, we can exercise them with some basic functionality tests. These tests aren't particularly useful for seeing the benefit, as the test repo is too small for it to make a difference. However, we can at least check that using bitmaps does not break anything. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30count-objects: recognize .bitmap in garbage-checkingLibravatar Nguyễn Thái Ngọc Duy1-0/+1
Count-objects will report any "garbage" files in the packs directory, including files whose extensions it does not know (case 1), and files whose matching ".pack" file is missing (case 2). Without having learned about ".bitmap" files, the current code reports all such files as garbage (case 1), even if their pack exists. Instead, they should be treated as case 2. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30repack: consider bitmaps when performing repacksLibravatar Vicent Marti2-2/+16
Since `pack-objects` will write a `.bitmap` file next to the `.pack` and `.idx` files, this commit teaches `git-repack` to consider the new bitmap indexes (if they exist) when performing repack operations. This implies moving old bitmap indexes out of the way if we are repacking a repository that already has them, and moving the newly generated bitmap indexes into the `objects/pack` directory, next to their corresponding packfiles. Since `git repack` is now capable of handling these `.bitmap` files, a normal `git gc` run on a repository that has `pack.writebitmaps` set to true in its config file will generate bitmap indexes as part of the garbage collection process. Alternatively, `git repack` can be called with the `-b` switch to explicitly generate bitmap indexes if you are experimenting and don't want them on all the time. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30repack: handle optional files created by pack-objectsLibravatar Jeff King1-2/+7
We ask pack-objects to pack to a set of temporary files, and then rename them into place. Some files that pack-objects creates may be optional (like a .bitmap file), in which case we would not want to call rename(). We already call stat() and make the chmod optional if the file cannot be accessed. We could simply skip the rename step in this case, but that would be a minor regression in noticing problems with non-optional files (like the .pack and .idx files). Instead, we can now annotate extensions as optional, and skip them if they don't exist (and otherwise rely on rename() to barf). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30repack: turn exts array into array-of-structLibravatar Jeff King1-6/+11
This is slightly more verbose, but will let us annotate the extensions with further options in future commits. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30repack: stop using magic number for ARRAY_SIZE(exts)Libravatar Jeff King1-4/+4
We have a static array of extensions, but hardcode the size of the array in our loops. Let's pull out this magic number, which will make it easier to change. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30pack-objects: implement bitmap writingLibravatar Vicent Marti8-0/+711
This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30rev-list: add bitmap mode to speed up object listsLibravatar Vicent Marti3-0/+48
The bitmap reachability index used to speed up the counting objects phase during `pack-objects` can also be used to optimize a normal rev-list if the only thing required are the SHA1s of the objects during the list (i.e., not the path names at which trees and blobs were found). Calling `git rev-list --objects --use-bitmap-index [committish]` will perform an object iteration based on a bitmap result instead of actually walking the object graph. These are some example timings for `torvalds/linux` (warm cache, best-of-five): $ time git rev-list --objects master > /dev/null real 0m34.191s user 0m33.904s sys 0m0.268s $ time git rev-list --objects --use-bitmap-index master > /dev/null real 0m1.041s user 0m0.976s sys 0m0.064s Likewise, using `git rev-list --count --use-bitmap-index` will speed up the counting operation by building the resulting bitmap and performing a fast popcount (number of bits set on the bitmap) on the result. Here are some sample timings of different ways to count commits in `torvalds/linux`: $ time git rev-list master | wc -l 399882 real 0m6.524s user 0m6.060s sys 0m3.284s $ time git rev-list --count master 399882 real 0m4.318s user 0m4.236s sys 0m0.076s $ time git rev-list --use-bitmap-index --count master 399882 real 0m0.217s user 0m0.176s sys 0m0.040s This also respects negative refs, so you can use it to count a slice of history: $ time git rev-list --count v3.0..master 144843 real 0m1.971s user 0m1.932s sys 0m0.036s $ time git rev-list --use-bitmap-index --count v3.0..master real 0m0.280s user 0m0.220s sys 0m0.056s Though note that the closer the endpoints, the less it helps. In the traversal case, we have fewer commits to cross, so we take less time. But the bitmap time is dominated by generating the pack revindex, which is constant with respect to the refs given. Note that you cannot yet get a fast --left-right count of a symmetric difference (e.g., "--count --left-right master...topic"). The slow part of that walk actually happens during the merge-base determination when we parse "master...topic". Even though a count does not actually need to know the real merge base (it only needs to take the symmetric difference of the bitmaps), the revision code would require some refactoring to handle this case. Additionally, a `--test-bitmap` flag has been added that will perform the same rev-list manually (i.e. using a normal revwalk) and using bitmaps, and verify that the results are the same. This can be used to exercise the bitmap code, and also to verify that the contents of the .bitmap file are sane. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30pack-objects: use bitmaps when packing objectsLibravatar Vicent Marti2-0/+113
In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30pack-objects: split add_object_entryLibravatar Jeff King1-20/+78
This function actually does three things: 1. Check whether we've already added the object to our packing list. 2. Check whether the object meets our criteria for adding. 3. Actually add the object to our packing list. It's a little hard to see these three phases, because they happen linearly in the rather long function. Instead, this patch breaks them up into three separate helper functions. The result is a little easier to follow, though it unfortunately suffers from some optimization interdependencies between the stages (e.g., during step 3 we use the packing list index from step 1 and the packfile information from step 2). More importantly, though, the various parts can be composed differently, as they will be in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30pack-bitmap: add support for bitmap indexesLibravatar Vicent Marti4-0/+1353
A bitmap index is a `.bitmap` file that can be found inside `$GIT_DIR/objects/pack/`, next to its corresponding packfile, and contains precalculated reachability information for selected commits. The full specification of the format for these bitmap indexes can be found in `Documentation/technical/bitmap-format.txt`. For a given commit SHA1, if it happens to be available in the bitmap index, its bitmap will represent every single object that is reachable from the commit itself. The nth bit in the bitmap is the nth object in the packfile; if it's set to 1, the object is reachable. By using the bitmaps available in the index, this commit implements several new functions: - `prepare_bitmap_git` - `prepare_bitmap_walk` - `traverse_bitmap_commit_list` - `reuse_partial_packfile_from_bitmap` The `prepare_bitmap_walk` function tries to build a bitmap of all the objects that can be reached from the commit roots of a given `rev_info` struct by using the following algorithm: - If all the interesting commits for a revision walk are available in the index, the resulting reachability bitmap is the bitwise OR of all the individual bitmaps. - When the full set of WANTs is not available in the index, we perform a partial revision walk using the commits that don't have bitmaps as roots, and limiting the revision walk as soon as we reach a commit that has a corresponding bitmap. The earlier OR'ed bitmap with all the indexed commits can now be completed as this walk progresses, so the end result is the full reachability list. - For revision walks with a HAVEs set (a set of commits that are deemed uninteresting), first we perform the same method as for the WANTs, but using our HAVEs as roots, in order to obtain a full reachability bitmap of all the uninteresting commits. This bitmap then can be used to: a) limit the subsequent walk when building the WANTs bitmap b) finding the final set of interesting commits by performing an AND-NOT of the WANTs and the HAVEs. If `prepare_bitmap_walk` runs successfully, the resulting bitmap is stored and the equivalent of a `traverse_commit_list` call can be performed by using `traverse_bitmap_commit_list`; the bitmap version of this call yields the objects straight from the packfile index (without having to look them up or parse them) and hence is several orders of magnitude faster. As an extra optimization, when `prepare_bitmap_walk` succeeds, the `reuse_partial_packfile_from_bitmap` call can be attempted: it will find the amount of objects at the beginning of the on-disk packfile that can be reused as-is, and return an offset into the packfile. The source packfile can then be loaded and the bytes up to `offset` can be written directly to the result without having to consider the entires inside the packfile individually. If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files are available), the `rev_info` struct is left untouched, and can be used to perform a manual rev-walk using `traverse_commit_list`. Hence, this new set of functions are a generic API that allows to perform the equivalent of git rev-list --objects [roots...] [^uninteresting...] for any set of commits, even if they don't have specific bitmaps generated for them. In further patches, we'll use this bitmap traversal optimization to speed up the `pack-objects` and `rev-list` commands. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30documentation: add documentation for the bitmap formatLibravatar Vicent Marti1-0/+131
This is the technical documentation for the JGit-compatible Bitmap v1 on-disk format. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30ewah: compressed bitmap implementationLibravatar Vicent Marti7-2/+1599
EWAH is a word-aligned compressed variant of a bitset (i.e. a data structure that acts as a 0-indexed boolean array for many entries). It uses a 64-bit run-length encoding (RLE) compression scheme, trading some compression for better processing speed. The goal of this word-aligned implementation is not to achieve the best compression, but rather to improve query processing time. As it stands right now, this EWAH implementation will always be more efficient storage-wise than its uncompressed alternative. EWAH arrays will be used as the on-disk format to store reachability bitmaps for all objects in a repository while keeping reasonable sizes, in the same way that JGit does. This EWAH implementation is a mostly straightforward port of the original `javaewah` library that JGit currently uses. The library is self-contained and has been embedded whole (4 files) inside the `ewah` folder to ease redistribution. The library is re-licensed under the GPLv2 with the permission of Daniel Lemire, the original author. The source code for the C version can be found on GitHub: https://github.com/vmg/libewok The original Java implementation can also be found on GitHub: https://github.com/lemire/javaewah [jc: stripped debug-only code per Peff's $gmane/239768] Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18compat: add endianness helpersLibravatar Vicent Marti1-1/+75
The POSIX standard doesn't currently define a `ntohll`/`htonll` function pair to perform network-to-host and host-to-network swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk storage of EWAH bitmaps if they are not in native byte order. Many thanks to Ramsay Jones <ramsay@ramsay1.demon.co.uk> and Torsten Bögershausen <tboegi@web.de> for cygwin/mingw/msvc portability fixes. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24sha1_file: export `git_open_noatime`Libravatar Vicent Marti2-3/+2
The `git_open_noatime` helper can be of general interest for other consumers of git's different on-disk formats. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24revision: allow setting custom limiter functionLibravatar Vicent Marti2-0/+6
This commit enables users of `struct rev_info` to peform custom limiting during a revision walk (i.e. `get_revision`). If the field `include_check` has been set to a callback, this callback will be issued once for each commit before it is added to the "pending" list of the revwalk. If the include check returns 0, the commit will be marked as added but won't be pushed to the pending list, effectively limiting the walk. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24pack-objects: factor out name_hashLibravatar Vicent Marti2-22/+22
As the pack-objects system grows beyond the single pack-objects.c file, more parts (like the soon-to-exist bitmap code) will need to compute hashes for matching deltas. Factor out name_hash to make it available to other files. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24pack-objects: refactor the packing listLibravatar Vicent Marti4-135/+200
The hash table that stores the packing list for a given `pack-objects` run was tightly coupled to the pack-objects code. In this commit, we refactor the hash table and the underlying storage array into a `packing_data` struct. The functionality for accessing and adding entries to the packing list is hence accessible from other parts of Git besides the `pack-objects` builtin. This refactoring is a requirement for further patches in this series that will require accessing the commit packing list from outside of `pack-objects`. The hash table implementation has been minimally altered: we now use table sizes which are always a power of two, to ensure a uniform index distribution in the array. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24revindex: export new APIsLibravatar Vicent Marti2-13/+33
Allow users to efficiently lookup consecutive entries that are expected to be found on the same revindex by exporting `find_revindex_position`: this function takes a pointer to revindex itself, instead of looking up the proper revindex for a given packfile on each call. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24sha1write: make buffer const-correctLibravatar Jeff King2-4/+4
We are passed a "void *" and write it out without ever touching it; let's indicate that by using "const". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-23Update draft release notes to 1.8.5Libravatar Junio C Hamano1-0/+4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-23Sync with 'maint'Libravatar Junio C Hamano1-0/+13
2013-10-23Almost 1.8.4.2 ;-)Libravatar Junio C Hamano1-0/+13
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-23Merge branch 'jc/ls-files-killed-optim' into maintLibravatar Junio C Hamano4-17/+67
"git ls-files -k" needs to crawl only the part of the working tree that may overlap the paths in the index to find killed files, but shared code with the logic to find all the untracked files, which made it unnecessarily inefficient. * jc/ls-files-killed-optim: dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage t3010: update to demonstrate "ls-files -k" optimization pitfalls ls-files -k: a directory only can be killed if the index has a non-directory dir.c: use the cache_* macro to access the current index
2013-10-23Merge branch 'jh/checkout-auto-tracking' into maintLibravatar Junio C Hamano4-8/+45
"git branch --track" had a minor regression in v1.8.3.2 and later that made it impossible to base your local work on anything but a local branch of the upstream repository you are tracking from. * jh/checkout-auto-tracking: t3200: fix failure on case-insensitive filesystems branch.c: Relax unnecessary requirement on upstream's remote ref name t3200: Add test demonstrating minor regression in 41c21f2 Refer to branch.<name>.remote/merge when documenting --track t3200: Minor fix when preparing for tracking failure t2024: Fix &&-chaining and a couple of typos
2013-10-23Merge branch 'nd/fetch-into-shallow' into maintLibravatar Junio C Hamano12-160/+152
When there is no sufficient overlap between old and new history during a "git fetch" into a shallow repository, objects that the sending side knows the receiving end has were unnecessarily sent. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c
2013-10-23Merge branch 'bc/gnome-keyring'Libravatar Junio C Hamano2-134/+167
Cleanups and tweaks for credential handling to work with ancient versions of the gnome-keyring library that are still in use. * bc/gnome-keyring: contrib/git-credential-gnome-keyring.c: support really ancient gnome-keyring contrib/git-credential-gnome-keyring.c: support ancient gnome-keyring contrib/git-credential-gnome-keyring.c: report failure to store password contrib/git-credential-gnome-keyring.c: use glib messaging functions contrib/git-credential-gnome-keyring.c: use glib memory allocation functions contrib/git-credential-gnome-keyring.c: use secure memory for reading passwords contrib/git-credential-gnome-keyring.c: use secure memory functions for passwds contrib/git-credential-gnome-keyring.c: use gnome helpers in keyring_object() contrib/git-credential-gnome-keyring.c: set Gnome application name contrib/git-credential-gnome-keyring.c: ensure buffer is non-empty before accessing contrib/git-credential-gnome-keyring.c: strlen() returns size_t, not ssize_t contrib/git-credential-gnome-keyring.c: exit non-zero when called incorrectly contrib/git-credential-gnome-keyring.c: add static where applicable contrib/git-credential-gnome-keyring.c: *style* use "if ()" not "if()" etc. contrib/git-credential-gnome-keyring.c: remove unused die() function contrib/git-credential-gnome-keyring.c: remove unnecessary pre-declarations
2013-10-23Merge branch 'po/dot-url'Libravatar Junio C Hamano2-2/+8
Explain how '.' can be used to refer to the "current repository" in the documentation. * po/dot-url: doc/cli: make "dot repository" an independent bullet point config doc: update dot-repository notes doc: command line interface (cli) dot-repository dwimmery
2013-10-23Merge branch 'jc/prompt-upstream'Libravatar Junio C Hamano1-1/+6
An enhancement to the GIT_PS1_SHOWUPSTREAM facility. * jc/prompt-upstream: git-prompt.sh: optionally show upstream branch name
2013-10-23Merge branch 'hu/cherry-pick-previous-branch'Libravatar Junio C Hamano2-2/+14
"git cherry-pick" without further options would segfault. Could use a follow-up to handle '-' after argv[1] better. * hu/cherry-pick-previous-branch: cherry-pick: handle "-" after parsing options
2013-10-23Merge branch 'mg/more-textconv'Libravatar Junio C Hamano14-57/+237
Make "git grep" and "git show" pay attention to --textconv when dealing with blob objects. * mg/more-textconv: grep: honor --textconv for the case rev:path grep: allow to use textconv filters t7008: demonstrate behavior of grep with textconv cat-file: do not die on --textconv without textconv filters show: honor --textconv for blobs diff_opt: track whether flags have been set explicitly t4030: demonstrate behavior of show with textconv
2013-10-23Merge branch 'jc/pack-objects'Libravatar Junio C Hamano1-11/+12
* jc/pack-objects: pack-objects: shrink struct object_entry
2013-10-18Update draft release notes to 1.8.5Libravatar Junio C Hamano1-0/+7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-18Merge branch 'maint'Libravatar Junio C Hamano1-1/+5
* maint: git-merge: document the -S option
2013-10-18Merge branch 'jc/reflog-doc'Libravatar Junio C Hamano3-3/+49
Document rules to use GIT_REFLOG_ACTION variable in the scripted Porcelain. git-rebase--interactive locally violates them, but it is a leaf user that does not call out to or dot-source other scripts, so it does not urgently need to be fixed. * jc/reflog-doc: setup_reflog_action: document the rules for using GIT_REFLOG_ACTION
2013-10-18Merge branch 'sb/repack-in-c'Libravatar Junio C Hamano5-1/+391
Rewrite "git repack" in C. * sb/repack-in-c: repack: improve warnings about failure of renaming and removing files repack: retain the return value of pack-objects repack: rewrite the shell script in C
2013-10-18Merge branch 'jk/clone-progress-to-stderr'Libravatar Junio C Hamano4-28/+32
Some progress and diagnostic messages from "git clone" were incorrectly sent to the standard output stream, not to the standard error stream. * jk/clone-progress-to-stderr: clone: always set transport options clone: treat "checking connectivity" like other progress clone: send diagnostic messages to stderr
2013-10-18Merge git://github.com/git-l10n/git-poLibravatar Junio C Hamano1-2367/+3193
* git://github.com/git-l10n/git-po: l10n: fr.po: 2135/2135 messages translated