summaryrefslogtreecommitdiff
path: root/list-objects.c
AgeCommit message (Collapse)AuthorFilesLines
2019-02-06Merge branch 'ds/push-sparse-tree-walk'Libravatar Junio C Hamano1-11/+59
"git pack-objects" learned another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes. * ds/push-sparse-tree-walk: pack-objects: create GIT_TEST_PACK_SPARSE pack-objects: create pack.useSparse setting revision: implement sparse algorithm list-objects: consume sparse tree walk revision: add mark_tree_uninteresting_sparse
2019-01-29Merge branch 'bc/tree-walk-oid'Libravatar Junio C Hamano1-3/+3
The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use
2019-01-17list-objects: consume sparse tree walkLibravatar Derrick Stolee1-11/+59
When creating a pack-file using 'git pack-objects --revs' we provide a list of interesting and uninteresting commits. For example, a push operation would make the local topic branch be interesting and the known remote refs as uninteresting. We want to discover the set of new objects to send to the server as a thin pack. We walk these commits until we discover a frontier of commits such that every commit walk starting at interesting commits ends in a root commit or unintersting commit. We then need to discover which non-commit objects are reachable from uninteresting commits. This commit walk is not changing during this series. The mark_edges_uninteresting() method in list-objects.c iterates on the commit list and does the following: * If the commit is UNINTERSTING, then mark its root tree and every object it can reach as UNINTERESTING. * If the commit is interesting, then mark the root tree of every UNINTERSTING parent (and all objects that tree can reach) as UNINTERSTING. At the very end, we repeat the process on every commit directly given to the revision walk from stdin. This helps ensure we properly cover shallow commits that otherwise were not included in the frontier. The logic to recursively follow trees is in the mark_tree_uninteresting() method in revision.c. The algorithm avoids duplicate work by not recursing into trees that are already marked UNINTERSTING. Add a new 'sparse' option to the mark_edges_uninteresting() method that performs this logic in a slightly different way. As we iterate over the commits, we add all of the root trees to an oidset. Then, call mark_trees_uninteresting_sparse() on that oidset. Note that we include interesting trees in this process. The current implementation of mark_trees_unintersting_sparse() will walk the same trees as the old logic, but this will be replaced in a later change. Add a '--sparse' flag in 'git pack-objects' to call this new logic. Add a new test script t/t5322-pack-objects-sparse.sh that tests this option. The tests currently demonstrate that the resulting object list is the same as the old algorithm. This includes a case where both algorithms pack an object that is not needed by a remote due to limits on the explored set of trees. When the sparse algorithm is changed in a later commit, we will add a test that demonstrates a change of behavior in some cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15tree-walk: store object_id in a separate memberLibravatar brian m. carlson1-3/+3
When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'nd/attr-pathspec-in-tree-walk'Libravatar Junio C Hamano1-1/+2
The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree*() take 'struct repository *'
2018-11-19tree-walk.c: make tree_entry_interesting() take an indexLibravatar Nguyễn Thái Ngọc Duy1-1/+2
In order to support :(attr) when matching pathspec on a tree, tree_entry_interesting() needs to take an index (because git_check_attr() needs it). This is the preparation step for it. This also makes it clearer what index we fall back to when looking up attributes during an unpack-trees operation: the source index. This also fixes revs->pruning.repo initialization that should have been done in 2abf350385 (revision.c: remove implicit dependency on the_index - 2018-09-21). Without it, skip_uninteresting() will dereference a NULL pointer through this call chain get_revision(revs) get_revision_internal get_revision_1 try_to_simplify_commit rev_compare_tree diff_tree_oid(..., &revs->pruning) ll_diff_tree_oid diff_tree_paths ll_diff_tree skip_uninteresting Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12list-objects.c: reduce the_repository referencesLibravatar Nguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12list-objects-filter.c: remove implicit dependency on the_indexLibravatar Nguyễn Thái Ngọc Duy1-3/+6
While at there, since we have access to struct repository now, eliminate the only the_repository reference in this file. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30Merge branch 'md/filter-trees'Libravatar Junio C Hamano1-110/+125
The "rev-list --filter" feature learned to exclude all trees via "tree:0" filter. * md/filter-trees: list-objects: support for skipping tree traversal filter-trees: code clean-up of tests list-objects-filter: implement filter tree:0 list-objects-filter-options: do not over-strbuf_init list-objects-filter: use BUG rather than die revision: mark non-user-given objects instead rev-list: handle missing tree objects properly list-objects: always parse trees gently list-objects: refactor to process_tree_contents list-objects: store common func args in struct
2018-10-18list-objects: support for skipping tree traversalLibravatar Matthew DeVore1-1/+4
The tree:0 filter does not need to traverse the trees that it has filtered out, so optimize list-objects and list-objects-filter to skip traversing the trees entirely. Before this patch, we iterated over all children of the tree, and did nothing for all of them, which was wasteful. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07revision: mark non-user-given objects insteadLibravatar Matthew DeVore1-13/+18
Currently, list-objects.c incorrectly treats all root trees of commits as USER_GIVEN. Also, it would be easier to mark objects that are non-user-given instead of user-given, since the places in the code where we access an object through a reference are more obvious than the places where we access an object that was given by the user. Resolve these two problems by introducing a flag NOT_USER_GIVEN that marks blobs and trees that are non-user-given, replacing USER_GIVEN. (Only blobs and trees are marked because this mark is only used when filtering objects, and filtering of other types of objects is not supported yet.) This fixes a bug in that git rev-list behaved differently from git pack-objects. pack-objects would *not* filter objects given explicitly on the command line and rev-list would filter. This was because the two commands used a different function to add objects to the rev_info struct. This seems to have been an oversight, and pack-objects has the correct behavior, so I added a test to make sure that rev-list now behaves properly. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07rev-list: handle missing tree objects properlyLibravatar Matthew DeVore1-3/+8
Previously, we assumed only blob objects could be missing. This patch makes rev-list handle missing trees like missing blobs. The --missing=* and --exclude-promisor-objects flags now work for trees as they already do for blobs. This is demonstrated in t6112. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21revision.c: reduce implicit dependency the_repositoryLibravatar Nguyễn Thái Ngọc Duy1-3/+5
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15list-objects: always parse trees gentlyLibravatar Matthew DeVore1-3/+1
If parsing fails when revs->ignore_missing_links and revs->exclude_promisor_objects are both false, we print the OID anyway in the die("bad tree object...") call, so any message printed by parse_tree_gently() is superfluous. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13list-objects: refactor to process_tree_contentsLibravatar Matthew DeVore1-27/+41
This will be used in a follow-up patch to reduce indentation needed when invoking the logic conditionally. i.e. rather than: if (foo) { while (...) { /* this is very indented */ } } we will have: if (foo) process_tree_contents(...); Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13list-objects: store common func args in structLibravatar Matthew DeVore1-84/+74
This will make utility functions easier to create, as done by the next patch. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02Merge branch 'sb/object-store-lookup'Libravatar Junio C Hamano1-2/+2
lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...
2018-07-24Merge branch 'jt/partial-clone-fsck-connectivity'Libravatar Junio C Hamano1-3/+3
Partial clone support of "git clone" has been updated to correctly validate the objects it receives from the other side. The server side has been corrected to send objects that are directly requested, even if they may match the filtering criteria (e.g. when doing a "lazy blob" partial clone). * jt/partial-clone-fsck-connectivity: clone: check connectivity even if clone is partial upload-pack: send refs' objects despite "filter"
2018-07-18Merge branch 'sb/object-store-grafts'Libravatar Junio C Hamano1-0/+1
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-09upload-pack: send refs' objects despite "filter"Libravatar Jonathan Tan1-3/+3
A filter line in a request to upload-pack filters out objects regardless of whether they are directly referenced by a "want" line or not. This means that cloning with "--filter=blob:none" (or another filter that excludes blobs) from a repository with at least one ref pointing to a blob (for example, the Git repository itself) results in output like the following: error: missing object referenced by 'refs/tags/junio-gpg-pub' and if that particular blob is not referenced by a fetched tree, the resulting clone fails fsck because there is no object from the remote to vouch that the missing object is a promisor object. Update both the protocol and the upload-pack implementation to include all explicitly specified "want" objects in the packfile regardless of the filter specification. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29tree: add repository argument to lookup_treeLibravatar Stefan Beller1-1/+1
Add a repository argument to allow the callers of lookup_tree to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29blob: add repository argument to lookup_blobLibravatar Stefan Beller1-1/+1
Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29Merge branch 'sb/object-store-grafts' into sb/object-store-lookupLibravatar Junio C Hamano1-0/+1
* sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-05-16object-store: move object access functions to object-store.hLibravatar Stefan Beller1-0/+1
This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11treewide: replace maybe_tree with accessor methodsLibravatar Derrick Stolee1-5/+5
In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11treewide: rename tree to maybe_treeLibravatar Derrick Stolee1-5/+5
Using the commit-graph file to walk commit history removes the large cost of parsing commits during the walk. This exposes a performance issue: lookup_tree() takes a large portion of the computation time, even when Git never uses those trees. In anticipation of lazy-loading these trees, rename the 'tree' member of struct commit to 'maybe_tree'. This serves two purposes: it hints at the future role of possibly being NULL even if the commit has a valid tree, and it allows for unambiguous transformation from simple member access (i.e. commit->maybe_tree) to method access. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13Merge branch 'jh/fsck-promisors'Libravatar Junio C Hamano1-1/+28
In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension
2017-12-28Merge branch 'sb/describe-blob'Libravatar Junio C Hamano1-21/+46
"git describe" was taught to dig trees deeper to find a <commit-ish>:<path> that refers to a given blob object. * sb/describe-blob: builtin/describe.c: describe a blob builtin/describe.c: factor out describe_commit builtin/describe.c: print debug statements earlier builtin/describe.c: rename `oid` to avoid variable shadowing revision.h: introduce blob/tree walking in order of the commits list-objects.c: factor out traverse_trees_and_blobs t6120: fix typo in test name
2017-12-08rev-list: support termination at promisor objectsLibravatar Jonathan Tan1-1/+28
Teach rev-list to support termination of an object traversal at any object from a promisor remote (whether one that the local repo also has, or one that the local repo knows about because it has another promisor object that references it). This will be used subsequently in gc and in the connectivity check used by fetch. For efficiency, if an object is referenced by a promisor object, and is in the local repo only as a non-promisor object, object traversal will not stop there. This is to avoid building the list of promisor object references. (In list-objects.c, the case where obj is NULL in process_blob() and process_tree() do not need to be changed because those happen only when there is a conflict between the expected type and the existing object. If the object doesn't exist, an object will be synthesized, which is fine.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22list-objects: filter objects in traverse_commit_listLibravatar Jeff Hostetler1-16/+79
Create traverse_commit_list_filtered() and add filtering interface to allow certain objects to be omitted from the traversal. Update traverse_commit_list() to be a wrapper for the above with a null filter to minimize the number of callers that needed to be changed. Object filtering will be used in a future commit by rev-list and pack-objects for partial clone and fetch to omit unwanted objects from the result. traverse_bitmap_commit_list() does not work with filtering. If a packfile bitmap is present, it will not be used. It should be possible to extend such support in the future (at least to simple filters that do not require object pathnames), but that is beyond the scope of this patch series. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-16revision.h: introduce blob/tree walking in order of the commitsLibravatar Stefan Beller1-0/+8
The functionality to list tree objects in the order they were seen while traversing the commits will be used in one of the next commits, where we teach `git describe` to describe not only commits, but blobs, too. The change in list-objects.c is rather minimal as we'll be re-using the infrastructure put in place of the revision walking machinery. For example one could expect that add_pending_tree is not called, but rather commit->tree is directly passed to the tree traversal function. This however requires a lot more code than just emptying the queue containing trees after each commit. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-03list-objects.c: factor out traverse_trees_and_blobsLibravatar Stefan Beller1-19/+31
With traverse_trees_and_blobs factored out of the main traverse function, the next patch can introduce an in-order revision walking with ease. In the next patch we'll call `traverse_trees_and_blobs` from within the loop walking the commits, such that we'll have one invocation of that function per commit. That is why we do not want to have memory allocations in that function, such as we'd have if we were to use a strbuf locally. Pass a strbuf from traverse_commit_list into the blob and tree traversing function as a scratch pad that only needs to be allocated once. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert lookup_tree to struct object_idLibravatar brian m. carlson1-1/+1
Convert the lookup_tree function to take a pointer to struct object_id. The commit was created with manual changes to tree.c, tree.h, and object.c, plus the following semantic patch: @@ @@ - lookup_tree(EMPTY_TREE_SHA1_BIN) + lookup_tree(&empty_tree_oid) @@ expression E1; @@ - lookup_tree(E1.hash) + lookup_tree(&E1) @@ expression E1; @@ - lookup_tree(E1->hash) + lookup_tree(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert lookup_blob to struct object_idLibravatar brian m. carlson1-1/+1
Convert lookup_blob to take a pointer to struct object_id. The commit was created with manual changes to blob.c and blob.h, plus the following semantic patch: @@ expression E1; @@ - lookup_blob(E1.hash) + lookup_blob(&E1) @@ expression E1; @@ - lookup_blob(E1->hash) + lookup_blob(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25struct name_entry: use struct object_id instead of unsigned char sha1[20]Libravatar brian m. carlson1-3/+3
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: pass full pathname to callbacksLibravatar Jeff King1-5/+9
When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: drop name_path entirelyLibravatar Jeff King1-7/+5
In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: convert name_path to a strbufLibravatar Jeff King1-13/+9
The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-20Convert struct object to object_idLibravatar brian m. carlson1-2/+2
struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-06-11Merge branch 'jk/squelch-missing-link-warning-for-unreachable'Libravatar Junio C Hamano1-1/+1
Recent "git prune" traverses young unreachable objects to safekeep old objects in the reachability chain from them, which sometimes caused error messages that are unnecessarily alarming. * jk/squelch-missing-link-warning-for-unreachable: suppress errors on missing UNINTERESTING links silence broken link warnings with revs->ignore_missing_links add quieter versions of parse_{tree,commit}
2015-06-01silence broken link warnings with revs->ignore_missing_linksLibravatar Jeff King1-1/+1
We set revs->ignore_missing_links to instruct the revision-walking machinery that we know the history graph may be incomplete. For example, we use it when walking unreachable but recent objects; we want to add what we can, but it's OK if the history is incomplete. However, we still print error messages for the missing objects, which can be confusing. This is not an error, but just a normal situation when transitioning from a repository last pruned by an older git (which can leave broken segments of history) to a more recent one (where we try to preserve whole reachable segments). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-29rev-list: add an option to mark fewer edges as uninterestingLibravatar brian m. carlson1-2/+2
In commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), we marked an increasing number of edges uninteresting. This change, and the subsequent change to make this conditional on --objects-edge, are used by --thin to make much smaller packs for shallow clones. Unfortunately, they cause a significant performance regression when pushing non-shallow clones with lots of refs (23.322 seconds vs. 4.785 seconds with 22400 refs). Add an option to git rev-list, --objects-edge-aggressive, that preserves this more aggressive behavior, while leaving --objects-edge to provide more performant behavior. Preserve the current behavior for the moment by using the aggressive option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19traverse_commit_list: support pending blobs/trees with pathsLibravatar Jeff King1-2/+5
When we call traverse_commit_list, we may have trees and blobs in the pending array. As we process these, we pass the "name" field from the pending entry as the path of the object within the tree (which then becomes the root path if we recurse into subtrees). When we set up the traversal in prepare_revision_walk, though, the "name" field of any pending trees and blobs is likely to be the ref at which we found the object. We would not want to make this part of the path (e.g., doing so would make "git rev-list --objects v2.6.11-tree" in linux.git show paths like "v2.6.11-tree/Makefile", which is nonsensical). Therefore prepare_revision_walk sets the name field of each pending tree and blobs to the empty string. However, this leaves no room for a caller who does know the correct path of a pending object to propagate that information to the revision walker. We can fix this by making two related changes: 1. Use the "path" field as the path instead of the "name" field in traverse_commit_list. If the path is not set, default to "" (which is what we always ended up with in the current code, because of prepare_revision_walk). 2. In prepare_revision_walk, make a complete copy of the entry. This makes the path field available to the walker (if there is one), solving our problem. Leaving the name field intact is now OK, as we do not use it as a path due to point (1) above (and we can use it to make more meaningful error messages if we want). We also make the original "mode" field available to the walker, though it does not actually use it. Note that we still re-add the pending objects and free the old ones (so we may strdup the path and name only to free the old ones). This could be made more efficient by simply copying the object_array entries that we are keeping. However, that would require more restructuring of the code, and is not done here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16object_array: add a "clear" functionLibravatar Jeff King1-6/+1
There's currently no easy way to free the memory associated with an object_array (and in most cases, we simply leak the memory in a rev_info's pending array). Let's provide a helper to make this easier to handle. We can make use of it in list-objects.c, which does the same thing by hand (but fails to free the "name" field of each entry, potentially leaking memory). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-08Merge branch 'jk/pack-bitmap'Libravatar Junio C Hamano1-1/+4
* jk/pack-bitmap: pack-objects: do not reuse packfiles without --delta-base-offset add `ignore_missing_links` mode to revwalk
2014-04-04add `ignore_missing_links` mode to revwalkLibravatar Vicent Marti1-1/+4
When pack-objects is computing the reachability bitmap to serve a fetch request, it can erroneously die() if some of the UNINTERESTING objects are not present. Upload-pack throws away HAVE lines from the client for objects we do not have, but we may have a tip object without all of its ancestors (e.g., if the tip is no longer reachable and was new enough to survive a `git prune`, but some of its reachable objects did get pruned). In the non-bitmap case, we do a revision walk with the HAVE objects marked as UNINTERESTING. The revision walker explicitly ignores errors in accessing UNINTERESTING commits to handle this case (and we do not bother looking at UNINTERESTING trees or blobs at all). When we have bitmaps, however, the process is quite different. The bitmap index for a pack-objects run is calculated in two separate steps: First, we perform an extensive walk from all the HAVEs to find the full set of objects reachable from them. This walk is usually optimized away because we are expected to hit an object with a bitmap during the traversal, which allows us to terminate early. Secondly, we perform an extensive walk from all the WANTs, which usually also terminates early because we hit a commit with an existing bitmap. Once we have the resulting bitmaps from the two walks, we AND-NOT them together to obtain the resulting set of objects we need to pack. When we are walking the HAVE objects, the revision walker does not know that we are walking it only to mark the results as uninteresting. We strip out the UNINTERESTING flag, because those objects _are_ interesting to us during the first walk. We want to keep going to get a complete set of reachable objects if we can. We need some way to tell the revision walker that it's OK to silently truncate the HAVE walk, just like it does for the UNINTERESTING case. This patch introduces a new `ignore_missing_links` flag to the `rev_info` struct, which we set only for the HAVE walk. It also adds tests to cover UNINTERESTING objects missing from several positions: a missing blob, a missing tree, and a missing parent commit. The missing blob already worked (as we do not care about its contents at all), but the other two cases caused us to die(). Note that there are a few cases we do not need to test: 1. We do not need to test a missing tree, with the blob still present. Without the tree that refers to it, we would not know that the blob is relevant to our walk. 2. We do not need to test a tip commit that is missing. Upload-pack omits these for us (and in fact, we complain even in the non-bitmap case if it fails to do so). Reported-by: Siddharth Agarwal <sid0@fb.com> Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-27Merge branch 'jk/mark-edges-uninteresting'Libravatar Junio C Hamano1-9/+11
Fix performance regression in v1.8.4.x and later. * jk/mark-edges-uninteresting: list-objects: only look at cmdline trees with edge_hint t/perf: time rev-list with UNINTERESTING commits
2014-01-21list-objects: only look at cmdline trees with edge_hintLibravatar Jeff King1-9/+11
When rev-list is given a command-line like: git rev-list --objects $commit --not --all the most accurate answer is the difference between the set of objects reachable from $commit and the set reachable from all of the existing refs. However, we have not historically provided that answer, because it is very expensive to calculate. We would have to open every tree of every commit in the entire history. Instead, we find the accurate set difference of the reachable commits, and then mark the trees at the boundaries as uninteresting. This misses objects which appear in the trees of both the interesting commits and deep within the uninteresting history. Commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting, 2013-08-16) noticed that we miss those objects during pack-objects, and added code to examine the trees of all of the "--not" refs given on the command-line. Note that this is still not the complete set difference, because we look only at the tips of the command-line arguments, not all of their reachable commits. But it increases the set of boundary objects we consider, which is especially important for shallow fetches. So we are trading extra CPU time for a larger set of boundary objects, which can improve the resulting pack size for a --thin pack. This tradeoff probably makes sense in the context of pack-objects, where we have set revs->edge_hint to have the traversal feed us the set of boundary objects. For a regular rev-list, though, it is probably not a good tradeoff. It is true that it makes our list slightly closer to a true set difference, but it is a rare case where this is important. And because we do not have revs->edge_hint set, we do nothing useful with the larger set of boundary objects. This patch therefore ties the extra tree examination to the revs->edge_hint flag; it is the presence of that flag that makes the tradeoff worthwhile. Here is output from the p0001-rev-list showing the improvement in performance: Test HEAD^ HEAD ----------------------------------------------------------------------------------------- 0001.1: rev-list --all 0.69(0.65+0.02) 0.69(0.66+0.02) +0.0% 0001.2: rev-list --all --objects 3.22(3.19+0.03) 3.23(3.20+0.03) +0.3% 0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0% 0001.5: rev-list --objects $commit --not --all 0.27(0.26+0.01) 0.04(0.04+0.00) -85.2% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-20Merge branch 'nd/fetch-into-shallow'Libravatar Junio C Hamano1-4/+20
When there is no sufficient overlap between old and new history during a fetch into a shallow repository, we unnecessarily sent objects the sending side knows the receiving end has. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c
2013-08-28list-objects: mark more commits as edges in mark_edges_uninterestingLibravatar Nguyễn Thái Ngọc Duy1-0/+17
The purpose of edge commits is to let pack-objects know what objects it can use as base, but does not need to include in the thin pack because the other side is supposed to already have them. So far we mark uninteresting parents of interesting commits as edges. But even an unrelated uninteresting commit (that the other side has) may become a good base for pack-objects and help produce more efficient packs. This is especially true for shallow clone, when the client issues a fetch with a depth smaller or equal to the number of commits the server is ahead of the client. For example, in this commit history the client has up to "A" and the server has up to "B": -------A---B have--^ ^ / want--+ If depth 1 is requested, the commit list to send to the client includes only B. The way m_e_u is working, it checks if parent commits of B are uninteresting, if so mark them as edges. Due to shallow effect, commit B is grafted to have no parents and the revision walker never sees A as the parent of B. In fact it marks no edges at all in this simple case and sends everything B has to the client even if it could have excluded what A and also the client already have. In a slightly different case where A is not a direct parent of B (iow there are commits in between A and B), marking A as an edge can still save some because B may still have stuff from the far ancestor A. There is another case from the earlier patch, when we deepen a ref from C->E to A->E: ---A---B C---D---E want--^ ^ ^ shallow-+ / have-------+ In this case we need to send A and B to the client, and C (i.e. the current shallow point that the client informs the server) is a very good base because it's closet to A and B. Normal m_e_u won't recognize C as an edge because it only looks back to parents (i.e. A<-B) not the opposite way B->C even if C is already marked as uninteresting commit by the previous patch. This patch includes all uninteresting commits from command line as edges and lets pack-objects decide what's best to do. The upside is we have better chance of producing better packs in certain cases. The downside is we may need to process some extra objects on the server side. For the shallow case on git.git, when the client is 5 commits behind and does "fetch --depth=3", the result pack is 99.26 KiB instead of 4.92 MiB. Reported-and-analyzed-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>