summaryrefslogtreecommitdiff
path: root/builtin/pack-objects.c
AgeCommit message (Collapse)AuthorFilesLines
2021-01-13check_object(): convert to new revindex APILibravatar Taylor Blau1-4/+4
Replace direct accesses to the revindex with calls to 'offset_to_pack_pos()' and 'pack_pos_to_index()'. Since this caller already had some error checking (it can jump to the 'give_up' label if it encounters an error), we can easily check whether or not the provided offset points to an object in the given pack. This error checking existed prior to this patch, too, since the caller checks whether the return value from 'find_pack_revindex()' was NULL or not. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13write_reused_pack_verbatim(): convert to new revindex APILibravatar Taylor Blau1-1/+1
Replace a direct access to the revindex array with 'pack_pos_to_offset()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13write_reused_pack_one(): convert to new revindex APILibravatar Taylor Blau1-4/+10
Replace direct revindex accesses with calls to 'pack_pos_to_offset()' and 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13write_reuse_object(): convert to new revindex APILibravatar Taylor Blau1-4/+9
First replace 'find_pack_revindex()' with its replacement 'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make its way through until 'write_reuse_object()' from causing a bad memory read (if 'revidx' is 'NULL') Next, replace a direct access of '->nr' with the wrapper function 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08pack-bitmap-write: ignore BITMAP_FLAG_REUSELibravatar Jeff King1-1/+0
The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19Merge branch 'jc/object-names-are-not-sha-1'Libravatar Junio C Hamano1-1/+1
A few end-user facing messages have been updated to be hash-algorithm agnostic. * jc/object-names-are-not-sha-1: messages: avoid SHA-1 in end-user facing messages
2020-08-14messages: avoid SHA-1 in end-user facing messagesLibravatar Junio C Hamano1-1/+1
There are still a handful mentions of SHA-1 when we meant the (hexadecimal) object names in end-user facing messages. Rewrite them. I was hoping that this can mostly be s/SHA-1/object name/, but a few messages needed rephrasing to keep the result readable. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13Merge branch 'jt/has_object'Libravatar Junio C Hamano1-2/+2
A new helper function has_object() has been introduced to make it easier to mark object existence checks that do and don't want to trigger lazy fetches, and a few such checks are converted using it. * jt/has_object: fsck: do not lazy fetch known non-promisor object pack-objects: no fetch when allow-{any,promisor} apply: do not lazy fetch when applying binary sha1-file: introduce no-lazy-fetch has_object()
2020-08-10Merge branch 'jk/strvec'Libravatar Junio C Hamano1-13/+13
The argv_array API is useful for not just managing argv but any "vector" (NULL-terminated array) of strings, and has seen adoption to a certain degree. It has been renamed to "strvec" to reduce the barrier to adoption. * jk/strvec: strvec: rename struct fields strvec: drop argv_array compatibility layer strvec: update documention to avoid argv_array strvec: fix indentation in renamed calls strvec: convert remaining callers away from argv_array name strvec: convert more callers away from argv_array name strvec: convert builtin/ callers away from argv_array name quote: rename sq_dequote_to_argv_array to mention strvec strvec: rename files from argv-array to strvec argv-array: rename to strvec argv-array: use size_t for count and alloc
2020-08-06pack-objects: no fetch when allow-{any,promisor}Libravatar Jonathan Tan1-2/+2
The options --missing=allow-{any,promisor} were introduced in caf3827e2f ("rev-list: add list-objects filtering support", 2017-11-22) with the following note in the commit message: This patch introduces handling of missing objects to help debugging and development of the "partial clone" mechanism, and once the mechanism is implemented, for a power user to perform operations that are missing-object aware without incurring the cost of checking if a missing link is expected. The idea that these options are missing-object aware (and thus do not need to lazily fetch objects, unlike unaware commands that assume that all objects are present) are assumed in later commits such as 07ef3c6604 ("fetch test: use more robust test for filtered objects", 2020-01-15). However, the current implementations of these options use has_object_file(), which indeed lazily fetches missing objects. Teach these implementations not to do so. Also, update the documentation of these options to be clearer. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30strvec: rename struct fieldsLibravatar Jeff King1-1/+1
The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert builtin/ callers away from argv_array nameLibravatar Jeff King1-11/+11
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the files in builtin/ to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add builtin/". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: rename files from argv-array to strvecLibravatar Jeff King1-1/+1
This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '*.c' '*.h' | xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-21pack-objects: prefetch objects to be packedLibravatar Jonathan Tan1-4/+32
When an object to be packed is noticed to be missing, prefetch all to-be-packed objects in one batch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-21pack-objects: refactor to oid_object_info_extendedLibravatar Jonathan Tan1-2/+6
Use oid_object_info_extended() instead of oid_object_info() because a subsequent commit needs to specify an additional flag here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10upload-pack: send part of packfile response as uriLibravatar Jonathan Tan1-0/+76
Teach upload-pack to send part of its packfile response as URIs. An administrator may configure a repository with one or more "uploadpack.blobpackfileuri" lines, each line containing an OID, a pack hash, and a URI. A client may configure fetch.uriprotocols to be a comma-separated list of protocols that it is willing to use to fetch additional packfiles - this list will be sent to the server. Whenever an object with one of those OIDs would appear in the packfile transmitted by upload-pack, the server may exclude that object, and instead send the URI. The client will then download the packs referred to by those URIs before performing the connectivity check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13Merge branch 'tb/shallow-cleanup'Libravatar Junio C Hamano1-0/+1
Code cleanup. * tb/shallow-cleanup: shallow: use struct 'shallow_lock' for additional safety shallow.h: document '{commit,rollback}_shallow_file' shallow: extract a header file for shallow-related functions commit: make 'commit_graft_pos' non-static
2020-04-30shallow: extract a header file for shallow-related functionsLibravatar Taylor Blau1-0/+1
There are many functions in commit.h that are more related to shallow repositories than they are to any sort of generic commit machinery. Likely this began when there were only a few shallow-related functions, and commit.h seemed a reasonable enough place to put them. But, now there are a good number of shallow-related functions, and placing them all in 'commit.h' doesn't make sense. This patch extracts a 'shallow.h', which takes all of the declarations from 'commit.h' for functions which already exist in 'shallow.c'. We will bring the remaining shallow-related functions defined in 'commit.c' in a subsequent patch. For now, move only the ones that already are implemented in 'shallow.c', and update the necessary includes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28Use OPT_CALLBACK and OPT_CALLBACK_FLibravatar Denton Liu1-6/+6
In the codebase, there are many options which use OPTION_CALLBACK in a plain ol' struct definition. However, we have the OPT_CALLBACK and OPT_CALLBACK_F macros which are meant to abstract these plain struct definitions away. These macros are useful as they semantically signal to developers that these are just normal callback option with nothing fancy happening. Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or OPT_CALLBACK_F where applicable. The heavy lifting was done using the following (disgusting) shell script: #!/bin/sh do_replacement () { tr '\n' '\r' | sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\s*0,\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' | sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' | tr '\r' '\n' } for f in $(git ls-files \*.c) do do_replacement <"$f" >"$f.tmp" mv "$f.tmp" "$f" done The result was manually inspected and then reformatted to match the style of the surrounding code. Finally, using `git grep OPTION_CALLBACK \*.c`, leftover results which were not handled by the script were manually transformed. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-22Merge branch 'jk/oid-array-cleanups'Libravatar Junio C Hamano1-1/+1
Code cleanup. * jk/oid-array-cleanups: oidset: stop referring to sha1-array ref-filter: stop referring to "sha1 array" bisect: stop referring to sha1_array test-tool: rename sha1-array to oid-array oid_array: rename source file from sha1-array oid_array: use size_t for iteration oid_array: use size_t for count and allocation
2020-03-30oid_array: rename source file from sha1-arrayLibravatar Jeff King1-1/+1
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to oid_array, 2017-03-31), but the file is still called sha1-array. Besides being slightly confusing, it makes it more annoying to grep for leftover occurrences of "sha1" in various files, because the header is included in so many places. Let's complete the transition by renaming the source and header files (and fixing up a few comment references). I kept the "-" in the name, as that seems to be our style; cf. fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10). We also have oidmap.h and oidset.h without any punctuation, but those are "struct oidmap" and "struct oidset" in the code. We _could_ make this "oidarray" to match, but somehow it looks uglier to me because of the length of "array" (plus it would be a very invasive patch for little gain). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29Merge branch 'ds/default-pack-use-sparse-to-true'Libravatar Junio C Hamano1-2/+2
The 'pack.useSparse' configuration variable now defaults to 'true', enabling an optimization that has been experimental since Git 2.21. * ds/default-pack-use-sparse-to-true: pack-objects: flip the use of GIT_TEST_PACK_SPARSE config: set pack.useSparse=true by default
2020-03-26Merge branch 'bc/sha-256-part-1-of-4'Libravatar Junio C Hamano1-1/+1
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-20pack-objects: flip the use of GIT_TEST_PACK_SPARSELibravatar Derrick Stolee1-2/+2
The environment variable GIT_TEST_PACK_SPARSE was previously used to allow testing the --sparse option for "git pack-objects" in the test suite. This allowed interesting cases of "git push" to also test this algorithm. Since pack.useSparse is now true by default, we do not need this variable to _enable_ the --sparse option, but instead to _disable_ it. This flips how we work with the variable a bit. When checking for the variable, default to a value of -1 for "unset". If unset, then take the default from the repo settings, which is currently 1. Then, the --[no-]sparse command-line option will override either of these settings. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-05Merge branch 'jk/nth-packed-object-id'Libravatar Junio C Hamano1-23/+25
Code cleanup to use "struct object_id" more by replacing use of "char *sha1" * jk/nth-packed-object-id: packfile: drop nth_packed_object_sha1() packed_object_info(): use object_id internally for delta base packed_object_info(): use object_id for returning delta base pack-check: push oid lookup into loop pack-check: convert "internal error" die to a BUG() pack-bitmap: use object_id when loading on-disk bitmaps pack-objects: use object_id struct in pack-reuse code pack-objects: convert oe_set_delta_ext() to use object_id pack-objects: read delta base oid into object_id struct nth_packed_object_oid(): use customary integer return
2020-03-02Merge branch 'jk/object-filter-with-bitmap'Libravatar Junio C Hamano1-3/+3
The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal. There however are some cases where they can work together, and they were taught about them. * jk/object-filter-with-bitmap: rev-list --count: comment on the use of count_right++ pack-objects: support filters with bitmaps pack-bitmap: implement BLOB_LIMIT filtering pack-bitmap: implement BLOB_NONE filtering bitmap: add bitmap_unset() function rev-list: use bitmap filters for traversal pack-bitmap: basic noop bitmap filter infrastructure rev-list: allow commit-only bitmap traversals t5310: factor out bitmap traversal comparison rev-list: allow bitmaps when counting objects rev-list: make --count work with --objects rev-list: factor out bitmap-optimized routines pack-bitmap: refuse to do a bitmap traversal with pathspecs rev-list: fallback to non-bitmap traversal when filtering pack-bitmap: fix leak of haves/wants object lists pack-bitmap: factor out type iterator initialization
2020-02-24pack-objects: use object_id struct in pack-reuse codeLibravatar Jeff King1-4/+5
When the pack-reuse code is dumping an OFS_DELTA entry to a client that doesn't support it, we re-write it as a REF_DELTA. To do so, we use nth_packed_object_sha1() to get the oid, but that function is soon going away in favor of the more type-safe nth_packed_object_id(). Let's switch now in preparation. Note that this does incur an extra hash copy (from the pack idx mmap to the object_id and then to the output, rather than straight from mmap to the output). But this is not worth worrying about. It's probably not measurable even when it triggers, and this is fallback code that we expect to trigger very rarely (since everybody supports OFS_DELTA these days anyway). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24pack-objects: convert oe_set_delta_ext() to use object_idLibravatar Jeff King1-1/+1
We already store an object_id internally, and now our sole caller also has one. Let's stop passing around the internal hash array, which adds a bit of type safety. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24pack-objects: read delta base oid into object_id structLibravatar Jeff King1-17/+18
When we're considering reusing an on-disk delta, we get the oid of the base as a pointer to unsigned char bytes of the hash, either into the packfile itself (for REF_DELTA) or into the pack idx (using the revindex to convert the offset into an index entry). Instead, we'd prefer to use a more type-safe object_id as much as possible. We can get the pack idx using nth_packed_object_id() instead. For the packfile bytes, we can copy them out using oidread(). This doesn't even incur an extra copy overall, since the next thing we'd always do with that pointer is pass it to can_reuse_delta(), which needs an object_id anyway (and called oidread() itself). So this patch also converts that function to take the object_id directly. Note that we did previously use NULL as a sentinel value when the object isn't a delta. We could probably get away with using the null oid for this, but instead we'll use an explicit boolean flag, which should make things more obvious for people reading the code later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24nth_packed_object_oid(): use customary integer returnLibravatar Jeff King1-2/+2
Our nth_packed_object_sha1() function returns NULL for error. So when we wrapped it with nth_packed_object_oid(), we kept the same semantics. But it's a bit funny, because the caller actually passes in an out parameter, and the pointer we return is just that same struct they passed to us (or NULL). It's not too terrible, but it does make the interface a little non-idiomatic. Let's switch to our usual "0 for success, negative for error" return value. Most callers either don't check it, or are trivially converted. The one that requires the biggest change is actually improved, as we can ditch an extra aliased pointer variable. Since we are changing the interface in a subtle way that the compiler wouldn't catch, let's also change the name to catch any topics in flight. We can drop the 'o' and make it nth_packed_object_id(). That's slightly shorter, but also less redundant since the 'o' stands for "object" already. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24builtin/pack-objects: make hash agnosticLibravatar brian m. carlson1-1/+1
Avoid hard-coding a hash size, instead preferring to use the_hash_algo. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14Merge branch 'mt/use-passed-repo-more-in-funcs'Libravatar Junio C Hamano1-1/+2
Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument
2020-02-14Merge branch 'jk/packfile-reuse-cleanup'Libravatar Junio C Hamano1-49/+194
The way "git pack-objects" reuses objects stored in existing pack to generate its result has been improved. * jk/packfile-reuse-cleanup: pack-bitmap: don't rely on bitmap_git->reuse_objects pack-objects: add checks for duplicate objects pack-objects: improve partial packfile reuse builtin/pack-objects: introduce obj_is_packed() pack-objects: introduce pack.allowPackReuse csum-file: introduce hashfile_total() pack-bitmap: simplify bitmap_has_oid_in_uninteresting() pack-bitmap: uninteresting oid can be outside bitmapped packfile pack-bitmap: introduce bitmap_walk_contains() ewah/bitmap: introduce bitmap_word_alloc() packfile: expose get_delta_base() builtin/pack-objects: report reused packfile objects
2020-02-14pack-objects: support filters with bitmapsLibravatar Jeff King1-2/+1
Just as rev-list recently learned to combine filters and bitmaps, let's do the same for pack-objects. The infrastructure is all there; we just need to pass along our filter options, and the pack-bitmap code will decide to use bitmaps or not. This unsurprisingly makes things faster for partial clones of large repositories (here we're cloning linux.git): Test HEAD^ HEAD ------------------------------------------------------------------------------ 5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14pack-bitmap: basic noop bitmap filter infrastructureLibravatar Jeff King1-1/+1
Currently you can't use object filters with bitmaps, but we plan to support at least some filters with bitmaps. Let's introduce some infrastructure that will help us do that: - prepare_bitmap_walk() now accepts a list_objects_filter_options parameter (which can be NULL for no filtering; all the current callers pass this) - we'll bail early if the filter is incompatible with bitmaps (just as we would if there were no bitmaps at all). Currently all filters are incompatible. - we'll filter the resulting bitmap; since there are no supported filters yet, this is always a noop. There should be no behavior change yet, but we'll support some actual filters in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14rev-list: allow commit-only bitmap traversalsLibravatar Jeff King1-1/+2
Ever since we added reachability bitmap support, we've been able to use it with rev-list to get the full list of objects, like: git rev-list --objects --use-bitmap-index --all But you can't do so without --objects, since we weren't ready to just show the commits. However, the internals of the bitmap code are mostly ready for this: they avoid opening up trees when walking to fill in the bitmaps. We just need to actually pass in the rev_info to traverse_bitmap_commit_list() so it knows which types to bother triggering our callback for. For completeness, the perf test now covers both the existing --objects case, as well as the new commits-only behavior (the objects one got way faster when we introduced bitmaps, but obviously isn't improved now). Here are numbers for linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8% 5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0% That run was cheating a little, as I didn't have any commit-graph in the repository, and we'd built it by default these days when running git-gc. Here are numbers with a commit-graph: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1% 5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1% Still an improvement, but a lot less impressive. We could have the perf script remove any commit-graph to show the out-sized effect, but it probably makes sense to leave it in what would be a more typical setup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31streaming: allow open_istream() to handle any repoLibravatar Matheus Tavares1-1/+2
Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23pack-objects: add checks for duplicate objectsLibravatar Jeff King1-1/+7
Additional checks are added in have_duplicate_entry() and obj_is_packed() to avoid duplicate objects in the reuse bitmap. It was probably buggy to not have such a check before. Git as a client would never both asks for a tag by sha1 and specify "include-tag", but libgit2 will, so a libgit2 client cloning from a Git server would trigger the bug. If a client both asks for a tag by sha1 and specifies "include-tag", we may end up including the tag in the reuse bitmap (due to the first thing), and then later adding it to the packlist (due to the second). This results in duplicate objects in the pack, which git chokes on. We should notice that we are already including it when doing the include-tag portion, and avoid adding it to the packlist. The simplest place to fix this is right in add_ref_tag(), where we could avoid peeling the tag at all if we know that we are already including it. However, this pushes the check instead into have_duplicate_entry(). This fixes not only this case, but also means that we cannot have any similar problems lurking in other code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23pack-objects: improve partial packfile reuseLibravatar Jeff King1-44/+170
The old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do _some_ per-object work, but way less than we'd traditionally do. The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but _not_ adding it to our packlist (which costs memory, and increases the search space for deltas). One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in try_partial_reuse() to make sure we have its delta. About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks. For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23builtin/pack-objects: introduce obj_is_packed()Libravatar Jeff King1-2/+7
Let's refactor the way we check if an object is packed by introducing obj_is_packed(). This function is now a simple wrapper around packlist_find(), but it will evolve in a following commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23pack-objects: introduce pack.allowPackReuseLibravatar Jeff King1-1/+7
Let's make it possible to configure if we want pack reuse or not. The main reason it might not be wanted is probably debugging and performance testing, though pack reuse _might_ cause larger packs, because we wouldn't consider the reused objects as bases for finding new deltas. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10Fix spelling errors in code commentsLibravatar Elijah Newren1-1/+1
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-30Merge branch 'jk/misc-uninitialized-fixes'Libravatar Junio C Hamano1-19/+14
Various fixes to codepaths gcc 9 had trouble following dataflow. * jk/misc-uninitialized-fixes: pack-objects: drop packlist index_pos optimization test-read-cache: drop namelen variable diff-delta: set size out-parameter to 0 for NULL delta bulk-checkin: zero-initialize hashfile_checkpoint pack-objects: use object_id in packlist_alloc() git-am: handle missing "author" when parsing commit
2019-09-18Merge branch 'jk/drop-release-pack-memory'Libravatar Junio C Hamano1-11/+0
xmalloc() used to have a mechanism to ditch memory and address space resources as the last resort upon seeing an allocation failure from the underlying malloc(), which made the code complex and thread-unsafe with dubious benefit, as major memory resource users already do limit their uses with various other mechanisms. It has been simplified away. * jk/drop-release-pack-memory: packfile: drop release_pack_memory()
2019-09-13builtin/pack-objects: report reused packfile objectsLibravatar Jeff King1-2/+4
To see when packfile reuse kicks in or not, it is useful to show reused packfile objects statistics in the output of upload-pack. Helped-by: James Ramsay <james@jramsay.com.au> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-09Merge branch 'ds/feature-macros'Libravatar Junio C Hamano1-4/+4
A mechanism to affect the default setting for a (related) group of configuration variables is introduced. * ds/feature-macros: repo-settings: create feature.experimental setting repo-settings: create feature.manyFiles setting repo-settings: parse core.untrackedCache commit-graph: turn on commit-graph by default t6501: use 'git gc' in quiet mode repo-settings: consolidate some config settings
2019-09-06pack-objects: drop packlist index_pos optimizationLibravatar Jeff King1-19/+14
Once upon a time, the code to add an object to our packing list in pack-objects all lived in a single function. It computed the position within the hash table once, then used it to check if the object was already present, and if not, to add it. Later, in 2834bc27c1 (pack-objects: refactor the packing list, 2013-10-24), this was split into two functions: packlist_find() and packlist_alloc(). We ended up with an "index_pos" variable that gets passed through several functions to make it from one to the other. The resulting code is rather confusing to follow. The "index_pos" variable is sometimes undefined, if we don't yet have a hash table. This works out in practice because in that case packlist_alloc() won't use it at all, since it will have to create/grow the hash table. But it's hard to verify that, and it does cause gcc 9.2.1's -Wmaybe-uninitialized to complain when compiled with "-flto -O3" (rightfully, since we do pass the uninitialized value as a function parameter, even if nobody ends up using it). All of this is to save computing the hash index again when we're inserting into the hash table, which I found doesn't make a measurable difference in the program runtime (which is not surprising, since we're doing all kinds of other heavyweight things for each object). Let's just drop this index_pos variable entirely, simplifying the code (and pleasing the compiler). We might be better still refactoring this custom hash table to use one of our existing implementations (an oidmap, or a kh_oid_map). I stopped short of that here, but this would be the likely first step towards that anyway. Reported-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06pack-objects: use object_id in packlist_alloc()Libravatar Jeff King1-1/+1
The only caller of packlist_alloc() already has a "struct object_id", and we immediately copy the hash they pass us into our own object_id. Let's avoid the unnecessary round-trip to a raw sha1 pointer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13repo-settings: consolidate some config settingsLibravatar Derrick Stolee1-4/+4
There are a few important config settings that are not loaded during git_default_config. These are instead loaded on-demand. Centralize these config options to a single scan, and store all of the values in a repo_settings struct. The values for each setting are initialized as negative to indicate "unset". This centralization will be particularly important in a later change to introduce "meta" config settings that change the defaults for these config settings. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13packfile: drop release_pack_memory()Libravatar Jeff King1-11/+0
Long ago, in 97bfeb34df (Release pack windows before reporting out of memory., 2006-12-24), we taught xmalloc() and friends to try unmapping pack windows when malloc() failed. It's unlikely that his helps a lot in practice, and it has some downsides. First, the downsides: 1. It makes xmalloc() not thread-safe. We've worked around this in pack-objects.c, which installs its own locking version of the try_to_free_routine(). But other threaded code doesn't. 2. It makes the system as a whole harder to reason about. Functions which allocate heap memory under the hood may have farther-reaching effects than expected. That might be worth the tradeoff if there's a benefit. But in practice, it seems unlikely. We're generally dealing with mmap'd files, so the OS is going to do a much better job at responding to memory pressure by dropping individual pages (the exception is systems with NO_MMAP, but even there the OS can probably respond just as well with swapping). So the only thing we're really freeing is address space. On 64-bit systems, we have plenty of that to go around. On 32-bit systems, it could possibly help. But around the same time we made two other changes: 77ccc5bbd1 (Introduce new config option for mmap limit., 2006-12-23) and 60bb8b1453 (Fully activate the sliding window pack access., 2006-12-23). Together that means that a 32-bit system should have no more than 256MB total of packed-git mmaps at one time, split between a few 32MB windows. It's unlikely we have any address space problems since then, but we don't have any data since the features were all added at the same time. Likewise, xmmap() will try to free memory. At first glance, it seems like we'd need this (when we try to mmap a new window, we might need to close an old one to save address space on a 32-bit system). But we're saved again by core.packedGitLimit: if we're going to exceed our 256MB limit, we'll close an existing window before we even call mmap(). So it seems unlikely that this feature is actually doing anything useful. And while we don't have reports of it harming anything (probably because it rarely if ever kicks in), it would be nice to simplify the system overall. This patch drops the whole try_to_free system from xmalloc(), as well as the manual pack memory release in xmmap(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>