summaryrefslogtreecommitdiff
path: root/packfile.c
AgeCommit message (Collapse)AuthorFilesLines
2019-09-30Merge branch 'rs/get-tagged-oid'Libravatar Junio C Hamano1-1/+1
Code cleanup. * rs/get-tagged-oid: use get_tagged_oid() tag: factor out get_tagged_oid()
2019-09-18Merge branch 'cc/multi-promisor'Libravatar Junio C Hamano1-1/+2
Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands
2019-09-05use get_tagged_oid()Libravatar René Scharfe1-1/+1
Avoid derefencing ->tagged without checking for NULL by using the convenience wrapper for getting the ID of the tagged object. It die()s when encountering a broken tag instead of segfaulting. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13packfile: drop release_pack_memory()Libravatar Jeff King1-18/+0
Long ago, in 97bfeb34df (Release pack windows before reporting out of memory., 2006-12-24), we taught xmalloc() and friends to try unmapping pack windows when malloc() failed. It's unlikely that his helps a lot in practice, and it has some downsides. First, the downsides: 1. It makes xmalloc() not thread-safe. We've worked around this in pack-objects.c, which installs its own locking version of the try_to_free_routine(). But other threaded code doesn't. 2. It makes the system as a whole harder to reason about. Functions which allocate heap memory under the hood may have farther-reaching effects than expected. That might be worth the tradeoff if there's a benefit. But in practice, it seems unlikely. We're generally dealing with mmap'd files, so the OS is going to do a much better job at responding to memory pressure by dropping individual pages (the exception is systems with NO_MMAP, but even there the OS can probably respond just as well with swapping). So the only thing we're really freeing is address space. On 64-bit systems, we have plenty of that to go around. On 32-bit systems, it could possibly help. But around the same time we made two other changes: 77ccc5bbd1 (Introduce new config option for mmap limit., 2006-12-23) and 60bb8b1453 (Fully activate the sliding window pack access., 2006-12-23). Together that means that a 32-bit system should have no more than 256MB total of packed-git mmaps at one time, split between a few 32MB windows. It's unlikely we have any address space problems since then, but we don't have any data since the features were all added at the same time. Likewise, xmmap() will try to free memory. At first glance, it seems like we'd need this (when we try to mmap a new window, we might need to close an old one to save address space on a 32-bit system). But we're saved again by core.packedGitLimit: if we're going to exceed our 256MB limit, we'll close an existing window before we even call mmap(). So it seems unlikely that this feature is actually doing anything useful. And while we don't have reports of it harming anything (probably because it rarely if ever kicks in), it would be nice to simplify the system overall. This patch drops the whole try_to_free system from xmalloc(), as well as the manual pack memory release in xmmap(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-19Merge branch 'ds/midx-expire-repack'Libravatar Junio C Hamano1-0/+28
"git multi-pack-index" learned expire and repack subcommands. * ds/midx-expire-repack: t5319: use 'test-tool path-utils' instead of 'ls -l' t5319-multi-pack-index.sh: test batch size zero midx: add test that 'expire' respects .keep files multi-pack-index: test expire while adding packs midx: implement midx_repack() multi-pack-index: prepare 'repack' subcommand multi-pack-index: implement 'expire' subcommand midx: refactor permutation logic and pack sorting midx: simplify computation of pack name lengths multi-pack-index: prepare for 'expire' subcommand Docs: rearrange subcommands for multi-pack-index repack: refactor pack deletion for future use
2019-07-09Merge branch 'rs/copy-array'Libravatar Junio C Hamano1-3/+3
Code clean-up. * rs/copy-array: use COPY_ARRAY for copying arrays coccinelle: use COPY_ARRAY for copying arrays
2019-07-09Merge branch 'ds/close-object-store'Libravatar Junio C Hamano1-1/+4
The commit-graph file is now part of the "files that the runtime may keep open file descriptors on, all of which would need to be closed when done with the object store", and the file descriptor to an existing commit-graph file now is closed before "gc" finalizes a new instance to replace it. * ds/close-object-store: packfile: rename close_all_packs to close_object_store packfile: close commit-graph in close_all_packs commit-graph: use raw_object_store when closing
2019-06-25Use promisor_remote_get_direct() and has_promisor_remote()Libravatar Christian Couder1-1/+2
Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-17use COPY_ARRAY for copying arraysLibravatar René Scharfe1-3/+3
Convert calls of memcpy(3) to use COPY_ARRAY, which shortens and simplifies the code a bit. Patch generated by Coccinelle and contrib/coccinelle/array.cocci. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13Merge branch 'mh/import-transport-fd-fix'Libravatar Junio C Hamano1-1/+1
The ownership rule for the file descriptor to fast-import remote backend was mixed up, leading to unrelated file descriptor getting closed, which has been fixed. * mh/import-transport-fd-fix: Use xmmap_gently instead of xmmap in use_pack dup() the input fd for fast-import used for remote helpers
2019-06-12packfile: rename close_all_packs to close_object_storeLibravatar Derrick Stolee1-1/+1
The close_all_packs() method is now responsible for more than just pack-files. It also closes the commit-graph and the multi-pack-index. Rename the function to be more descriptive of its larger role. The name also fits because the input parameter is a raw_object_store. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12packfile: close commit-graph in close_all_packsLibravatar Derrick Stolee1-0/+3
The close_all_packs() method is used to close all read handles to pack-files and the multi-pack-index before running 'git gc --auto'. This is particularly important on the Windows platform, where read handles block any writes to those files. Replacing one of these files with a rename() will fail in this situation. The commit-graph also performs a rename, so is susceptable to this problem. We are careful to close the commit-graph before writing, but that doesn't work when a 'git fetch' (or similar) process runs 'git gc --auto' which may write a commit-graph. Here, close the commit-graph as part of close_all_packs(). Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11repack: refactor pack deletion for future useLibravatar Derrick Stolee1-0/+28
The repack builtin deletes redundant pack-files and their associated .idx, .promisor, .bitmap, and .keep files. We will want to re-use this logic in the future for other types of repack, so pull the logic into 'unlink_pack_path()' in packfile.c. The 'ignore_keep' parameter is enabled for the use in repack, but will be important for a future caller. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19Merge branch 'ds/midx-too-many-packs'Libravatar Junio C Hamano1-21/+9
The code to generate the multi-pack idx file was not prepared to see too many packfiles and ran out of open file descriptor, which has been corrected. * ds/midx-too-many-packs: midx: add packs to packed_git linked list midx: pass a repository pointer
2019-05-16Use xmmap_gently instead of xmmap in use_packLibravatar Mike Hommey1-1/+1
use_pack has its own error message on mmap error, but it can't be reached when using xmmap, which dies with its own error. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-09Merge branch 'nd/sha1-name-c-wo-the-repository'Libravatar Junio C Hamano1-7/+7
Further code clean-up to allow the lowest level of name-to-object mapping layer to work with a passed-in repository other than the default one. * nd/sha1-name-c-wo-the-repository: (34 commits) sha1-name.c: remove the_repo from get_oid_mb() sha1-name.c: remove the_repo from other get_oid_* sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name submodule-config.c: use repo_get_oid for reading .gitmodules sha1-name.c: add repo_get_oid() sha1-name.c: remove the_repo from get_oid_with_context_1() sha1-name.c: remove the_repo from resolve_relative_path() sha1-name.c: remove the_repo from diagnose_invalid_index_path() sha1-name.c: remove the_repo from handle_one_ref() sha1-name.c: remove the_repo from get_oid_1() sha1-name.c: remove the_repo from get_oid_basic() sha1-name.c: remove the_repo from get_describe_name() sha1-name.c: remove the_repo from get_oid_oneline() sha1-name.c: add repo_interpret_branch_name() sha1-name.c: remove the_repo from interpret_branch_mark() sha1-name.c: remove the_repo from interpret_nth_prior_checkout() sha1-name.c: remove the_repo from get_short_oid() sha1-name.c: add repo_for_each_abbrev() sha1-name.c: store and use repo in struct disambiguate_state sha1-name.c: add repo_find_unique_abbrev_r() ...
2019-05-07midx: add packs to packed_git linked listLibravatar Derrick Stolee1-20/+8
The multi-pack-index allows searching for objects across multiple packs using one object list. The original design gains many of these performance benefits by keeping the packs in the multi-pack-index out of the packed_git list. Unfortunately, this has one major drawback. If the multi-pack-index covers thousands of packs, and a command loads many of those packs, then we can hit the limit for open file descriptors. The close_one_pack() method is used to limit this resource, but it only looks at the packed_git list, and uses an LRU cache to prevent thrashing. Instead of complicating this close_one_pack() logic to include direct references to the multi-pack-index, simply add the packs opened by the multi-pack-index to the packed_git list. This immediately solves the file-descriptor limit problem, but requires some extra steps to avoid performance issues or other problems: 1. Create a multi_pack_index bit in the packed_git struct that is one if and only if the pack was loaded from a multi-pack-index. 2. Skip packs with the multi_pack_index bit when doing object lookups and abbreviations. These algorithms already check the multi-pack-index before the packed_git struct. This has a very small performance hit, as we need to walk more packed_git structs. This is acceptable, since these operations run binary search on the other packs, so this walk-and-ignore logic is very fast by comparison. 3. When closing a multi-pack-index file, do not close its packs, as those packs will be closed using close_all_packs(). In some cases, such as 'git repack', we run 'close_midx()' without also closing the packs, so we need to un-set the multi_pack_index bit in those packs. This is necessary, and caught by running t6501-freshen-objects.sh with GIT_TEST_MULTI_PACK_INDEX=1. To manually test this change, I inserted trace2 logging into close_pack_fd() and set pack_max_fds to 10, then ran 'git rev-list --all --objects' on a copy of the Git repo with 300+ pack-files and a multi-pack-index. The logs verified the packs are closed as we read them beyond the file descriptor limit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-07midx: pass a repository pointerLibravatar Derrick Stolee1-2/+2
Much of the multi-pack-index code focuses on the multi_pack_index struct, and so we only pass a pointer to the current one. However, we will insert a dependency on the packed_git linked list in a future change, so we will need a repository reference. Inserting these parameters is a significant enough change to split out. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'bc/hash-transition-16'Libravatar Junio C Hamano1-3/+3
Conversion from unsigned char[20] to struct object_id continues. * bc/hash-transition-16: (35 commits) gitweb: make hash size independent Git.pm: make hash size independent read-cache: read data in a hash-independent way dir: make untracked cache extension hash size independent builtin/difftool: use parse_oid_hex refspec: make hash size independent archive: convert struct archiver_args to object_id builtin/get-tar-commit-id: make hash size independent get-tar-commit-id: parse comment record hash: add a function to lookup hash algorithm by length remote-curl: make hash size independent http: replace sha1_to_hex http: compute hash of downloaded objects using the_hash_algo http: replace hard-coded constant with the_hash_algo http-walker: replace sha1_to_hex http-push: remove remaining uses of sha1_to_hex http-backend: allow 64-character hex names http-push: convert to use the_hash_algo builtin/pull: make hash-size independent builtin/am: make hash size independent ...
2019-04-25Merge branch 'jk/server-info-rabbit-hole'Libravatar Junio C Hamano1-3/+15
Code clean-up around a much-less-important-than-it-used-to-be update_server_info() funtion. * jk/server-info-rabbit-hole: update_info_refs(): drop unused force parameter server-info: drop objdirlen pointer arithmetic server-info: drop nr_alloc struct member server-info: use strbuf to read old info/packs file server-info: simplify cleanup in parse_pack_def() server-info: fix blind pointer arithmetic http: simplify parsing of remote objects/info/packs packfile: fix pack basename computation midx: check both pack and index names for containment t5319: drop useless --buffer from cat-file t5319: fix bogus cat-file argument pack-revindex: open index if necessary packfile.h: drop extern from function declarations
2019-04-16packfile: fix pack basename computationLibravatar Jeff King1-1/+11
When we have a multi-pack-index that covers many packfiles, we try to avoid opening the .idx for those packfiles. To do that we feed the pack name to midx_contains_pack(). But that function wants to see only the basename, which we compute using strrchr() to find the final slash. But that leaves an extra "/" at the start of our string. We can fix this by incrementing the pointer. That also raises the question of what to do when the name does not have a '/' at all. This should generally not happen (we always find files in "pack/"), but it doesn't hurt to be defensive here. Let's wrap all of that up in a helper function and make it publicly available, since a later patch will need to use it, too. The tests don't notice because there's nothing about opening those .idx files that would cause us to give incorrect output. It's just a little slower. The new test checks this case by corrupting the covered .idx, and then making sure we don't complain about it. We also have to tweak t5570, which intentionally corrupts a .idx file and expects us to notice it. When run with GIT_TEST_MULTI_PACK_INDEX, this will fail since we now will (correctly) not bother opening the .idx at all. We can fix that by unconditionally dropping any midx that's there, which ensures we'll have to read the .idx. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16pack-revindex: open index if necessaryLibravatar Jeff King1-2/+4
We can't create a pack revindex if we haven't actually looked at the index. Normally we would never get as far as creating a revindex without having already been looking in the pack, so this code never bothered to double-check that pack->index_data had been loaded. But with the new multi-pack-index feature, many code paths might not load the individual pack .idx at all (they'd find objects via the midx and then open the .pack, but not its index). This can't yet be triggered in practice, because a bug in the midx code means we accidentally open up the individual .idx files anyway. But in preparation for fixing that, let's have the revindex code check that everything it needs has been loaded. In most cases this will just be a quick noop. But note that this does introduce a possibility of error (if we have to open the index and it's corrupt), so load_pack_revindex() now returns a result code, and callers need to handle the error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08packfile.c: add repo_approximate_object_count()Libravatar Nguyễn Thái Ngọc Duy1-7/+7
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01object-store: rename and expand packed_git's sha1 memberLibravatar brian m. carlson1-3/+3
This member is used to represent the pack checksum of the pack in question. Expand this member to be GIT_MAX_RAWSZ bytes in length so it works with longer hashes and rename it to be "hash" instead of "sha1". This transformation was made with a change to the definition and the following semantic patch: @@ struct packed_git *E1; @@ - E1->sha1 + E1->hash @@ struct packed_git E1; @@ - E1.sha1 + E1.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-22midx: during verify group objects by packfile to speed verificationLibravatar Jeff Hostetler1-1/+1
Teach `multi-pack-index verify` to sort the set of object by packfile so that only one packfile needs to be open at a time. This is a performance improvement. Previously, objects were verified in OID order. This essentially requires all packfiles to be open at the same time. If the number of packfiles exceeds the open file limit, packfiles would be LRU-closed and re-opened many times. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-05Merge branch 'sb/more-repo-in-api'Libravatar Junio C Hamano1-2/+3
The in-core repository instances are passed through more codepaths. * sb/more-repo-in-api: (23 commits) t/helper/test-repository: celebrate independence from the_repository path.h: make REPO_GIT_PATH_FUNC repository agnostic commit: prepare free_commit_buffer and release_commit_memory for any repo commit-graph: convert remaining functions to handle any repo submodule: don't add submodule as odb for push submodule: use submodule repos for object lookup pretty: prepare format_commit_message to handle arbitrary repositories commit: prepare logmsg_reencode to handle arbitrary repositories commit: prepare repo_unuse_commit_buffer to handle any repo commit: prepare get_commit_buffer to handle any repo commit-reach: prepare in_merge_bases[_many] to handle any repo commit-reach: prepare get_merge_bases to handle any repo commit-reach.c: allow get_merge_bases_many_0 to handle any repo commit-reach.c: allow remove_redundant to handle any repo commit-reach.c: allow merge_bases_many to handle any repo commit-reach.c: allow paint_down_to_common to handle any repo commit: allow parse_commit* to handle any repo object: parse_object to honor its repository argument object-store: prepare has_{sha1, object}_file to handle any repo object-store: prepare read_object_file to deal with any repo ...
2019-01-29Merge branch 'bc/tree-walk-oid'Libravatar Junio C Hamano1-1/+1
The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use
2019-01-15tree-walk: store object_id in a separate memberLibravatar brian m. carlson1-1/+1
When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08object-store: factor out odb_clear_loose_cache()Libravatar René Scharfe1-5/+2
Add and use a function for emptying the loose object cache, so callers don't have to know any of its implementation details. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-04Merge branch 'jk/loose-object-cache'Libravatar Junio C Hamano1-6/+14
Code clean-up with optimization for the codepath that checks (non-)existence of loose objects. * jk/loose-object-cache: odb_load_loose_cache: fix strbuf leak fetch-pack: drop custom loose object cache sha1-file: use loose object cache for quick existence check object-store: provide helpers for loose_objects_cache sha1-file: use an object_directory for the main object dir handle alternates paths the same as the main object dir sha1_file_name(): overwrite buffer instead of appending rename "alternate_object_database" to "object_directory" submodule--helper: prefer strip_suffix() to ends_with() fsck: do not reuse child_process structs
2018-11-13Merge branch 'ds/test-multi-pack-index'Libravatar Junio C Hamano1-0/+5
Tests for the recently introduced multi-pack index machinery. * ds/test-multi-pack-index: packfile: close multi-pack-index in close_all_packs multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX midx: close multi-pack-index on repack midx: fix broken free() in close_midx()
2018-11-13object-store: provide helpers for loose_objects_cacheLibravatar Jeff King1-0/+8
Our object_directory struct has a loose objects cache that all users of the struct can see. But the only one that knows how to load the cache is find_short_object_filename(). Let's extract that logic in to a reusable function. While we're at it, let's also reset the cache when we re-read the object directories. This shouldn't have an impact on performance, as re-reads are meant to be rare (and are already expensive, so we avoid them with things like OBJECT_INFO_QUICK). Since the cache is already meant to be an approximation, it's tempting to skip even this bit of safety. But it's necessary to allow more code to use it. For instance, fetch-pack explicitly re-reads the object directory after performing its fetch, and would be confused if we didn't clear the cache. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13sha1-file: use an object_directory for the main object dirLibravatar Jeff King1-5/+5
Our handling of alternate object directories is needlessly different from the main object directory. As a result, many places in the code basically look like this: do_something(r->objects->objdir); for (odb = r->objects->alt_odb_list; odb; odb = odb->next) do_something(odb->path); That gets annoying when do_something() is non-trivial, and we've resorted to gross hacks like creating fake alternates (see find_short_object_filename()). Instead, let's give each raw_object_store a unified list of object_directory structs. The first will be the main store, and everything after is an alternate. Very few callers even care about the distinction, and can just loop over the whole list (and those who care can just treat the first element differently). A few observations: - we don't need r->objects->objectdir anymore, and can just mechanically convert that to r->objects->odb->path - object_directory's path field needs to become a real pointer rather than a FLEX_ARRAY, in order to fill it with expand_base_dir() - we'll call prepare_alt_odb() earlier in many functions (i.e., outside of the loop). This may result in us calling it even when our function would be satisfied looking only at the main odb. But this doesn't matter in practice. It's not a very expensive operation in the first place, and in the majority of cases it will be a noop. We call it already (and cache its results) in prepare_packed_git(), and we'll generally check packs before loose objects. So essentially every program is going to call it immediately once per program. Arguably we should just prepare_alt_odb() immediately upon setting up the repository's object directory, which would save us sprinkling calls throughout the code base (and forgetting to do so has been a source of subtle bugs in the past). But I've stopped short of that here, since there are already a lot of other moving parts in this patch. - Most call sites just get shorter. The check_and_freshen() functions are an exception, because they have entry points to handle local and nonlocal directories separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13rename "alternate_object_database" to "object_directory"Libravatar Jeff King1-4/+4
In preparation for unifying the handling of alt odb's and the normal repo object directory, let's use a more neutral name. This patch is purely mechanical, swapping the type name, and converting any variables named "alt" to "odb". There should be no functional change, but it will reduce the noise in subsequent diffs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30Merge branch 'bc/hash-transition-part-15'Libravatar Junio C Hamano1-2/+3
More codepaths are moving away from hardcoded hash sizes. * bc/hash-transition-part-15: rerere: convert to use the_hash_algo submodule: make zero-oid comparison hash function agnostic apply: rename new_sha1_prefix and old_sha1_prefix apply: replace hard-coded constants tag: express constant in terms of the_hash_algo transport: use parse_oid_hex instead of a constant upload-pack: express constants in terms of the_hash_algo refs/packed-backend: express constants using the_hash_algo packfile: express constants in terms of the_hash_algo pack-revindex: express constants in terms of the_hash_algo builtin/fetch-pack: remove constants with parse_oid_hex builtin/mktree: remove hard-coded constant builtin/repack: replace hard-coded constants pack-bitmap-write: use GIT_MAX_RAWSZ for allocation object_id.cocci: match only expressions of type 'struct object_id'
2018-10-26packfile: close multi-pack-index in close_all_packsLibravatar Derrick Stolee1-0/+5
Whenever we delete pack-files from the object directory, we need to also delete the multi-pack-index that may refer to those objects. Sometimes, this connection is obvious, like during a repack. Other times, this is less obvious, like when gc calls a repack command and then does other actions on the objects, like write a commit-graph file. The pattern we use to avoid out-of-date in-memory packed_git structs is to call close_all_packs(). This should also call close_midx(). Since we already pass an object store to close_all_packs(), this is a nicely scoped operation. This fixes a test failure when running t6500-gc.sh with GIT_TEST_MULTI_PACK_INDEX=1. Reported-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19packfile: allow has_packed_and_bad to handle arbitrary repositoriesLibravatar Stefan Beller1-2/+3
has_packed_and_bad is not widely used, so just migrate it all at once. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-15fuzz: add fuzz testing for packfile indices.Libravatar Josh Steadmon1-19/+25
Breaks the majority of check_packed_git_idx() into a separate function, load_idx(). The latter function operates on arbitrary buffers, which makes it suitable as a fuzzing test target. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-15packfile: express constants in terms of the_hash_algoLibravatar brian m. carlson1-2/+3
Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid dependence on a particular hash length. It's likely that in the future, we'll update the pack format to indicate what hash algorithm it uses, and then this code will change. However, at least on an interim basis, make it easier to develop on a pure SHA-256 Git by using the_hash_algo here. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'jk/cocci'Libravatar Junio C Hamano1-6/+6
spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion
2018-08-29convert "hashcmp() != 0" to "!hasheq()"Libravatar Jeff King1-1/+1
This rounds out the previous three patches, covering the inequality logic for the "hash" variant of the functions. As with the previous three, the accompanying code changes are the mechanical result of applying the coccinelle patch; see those patches for more discussion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "hashcmp() == 0" to hasheq()Libravatar Jeff King1-5/+5
This is the partner patch to the previous one, but covering the "hash" variants instead of "oid". Note that our coccinelle rule is slightly more complex to avoid triggering the call in hasheq(). I didn't bother to add a new rule to convert: - hasheq(E1->hash, E2->hash) + oideq(E1, E2) Since these are new functions, there won't be any such existing callers. And since most of the code is already using oideq, we're not likely to introduce new ones. We might still see "!hashcmp(E1->hash, E2->hash)" from topics in flight. But because our new rule comes after the existing ones, that should first get converted to "!oidcmp(E1, E2)" and then to "oideq(E1, E2)". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20treewide: use get_all_packsLibravatar Derrick Stolee1-1/+1
There are many places in the codebase that want to iterate over all packfiles known to Git. The purposes are wide-ranging, and those that can take advantage of the multi-pack-index already do. So, use get_all_packs() instead of get_packed_git() to be sure we are iterating over all packfiles. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20packfile: add all_packs listLibravatar Derrick Stolee1-0/+27
If a repo contains a multi-pack-index, then the packed_git list does not contain the packfiles that are covered by the multi-pack-index. This is important for doing object lookups, abbreviations, and approximating object count. However, there are many operations that really want to iterate over all packfiles. Create a new 'all_packs' linked list that contains this list, starting with the packfiles in the multi-pack-index and then continuing along the packed_git linked list. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20midx: stop reporting garbageLibravatar Derrick Stolee1-3/+4
When prepare_packed_git is called with the report_garbage method initialized, we report unexpected files in the objects directory as garbage. Stop reporting the multi-pack-index and the pack-files it covers as garbage. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20multi-pack-index: store local propertyLibravatar Derrick Stolee1-2/+2
A pack-file is 'local' if it is stored within the usual object directory. If it is stored in an alternate, it is non-local. Pack-files are stored using a 'pack_local' member in the packed_git struct. Add a similar 'local' member to the multi_pack_index struct and 'local' parameters to the methods that load and prepare multi- pack-indexes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20Sync 'ds/multi-pack-index' to v2.19.0-rc0Libravatar Junio C Hamano1-42/+127
* ds/multi-pack-index: (23 commits) midx: clear midx on repack packfile: skip loading index if in multi-pack-index midx: prevent duplicate packfile loads midx: use midx in approximate_object_count midx: use existing midx when writing new one midx: use midx in abbreviation calculations midx: read objects from multi-pack-index config: create core.multiPackIndex setting midx: write object offsets midx: write object id fanout chunk midx: write object ids in a chunk midx: sort and deduplicate objects from packfiles midx: read pack names into array multi-pack-index: write pack names in chunk multi-pack-index: read packfile list packfile: generalize pack directory list t5319: expand test data multi-pack-index: load into memory midx: write header information to lockfile multi-pack-index: add 'write' verb ...
2018-08-13for_each_packed_object: support iterating in pack-orderLibravatar Jeff King1-5/+16
We currently iterate over objects within a pack in .idx order, which uses the object hashes. That means that it is effectively random with respect to the location of the object within the pack. If you're going to access the actual object data, there are two reasons to move linearly through the pack itself: 1. It improves the locality of access in the packfile. In the cold-cache case, this may mean fewer disk seeks, or better usage of disk cache. 2. We store related deltas together in the packfile. Which means that the delta base cache can operate much more efficiently if we visit all of those related deltas in sequence, as the earlier items are likely to still be in the cache. Whereas if we visit the objects in random order, our cache entries are much more likely to have been evicted by unrelated deltas in the meantime. So in general, if you're going to access the object contents pack order is generally going to end up more efficient. But if you're simply generating a list of object names, or if you're going to end up sorting the result anyway, you're better off just using the .idx order, as finding the pack order means generating the in-memory pack-revindex. According to the numbers in 8b8dfd5132 (pack-revindex: radix-sort the revindex, 2013-07-11), that takes about 200ms for linux.git, and 20ms for git.git (those numbers are a few years old but are still a good ballpark). That makes it a good optimization for some cases (we can save tens of seconds in git.git by having good locality of delta access, for a 20ms cost), but a bad one for others (e.g., right now "cat-file --batch-all-objects --batch-check="%(objectname)" is 170ms in git.git, so adding 20ms to that is noticeable). Hence this patch makes it an optional flag. You can't actually do any interesting timings yet, as it's not plumbed through to any user-facing tools like cat-file. That will come in a later patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13for_each_*_object: take flag arguments as enumLibravatar Jeff King1-1/+2
It's not wrong to pass our flags in an "unsigned", as we know it will be at least as large as the enum. However, using the enum in the declaration makes it more obvious where to find the list of flags. While we're here, let's also drop the "extern" noise-words from the declarations, per our modern coding style. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20packfile: skip loading index if in multi-pack-indexLibravatar Derrick Stolee1-2/+17
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>