summaryrefslogtreecommitdiff
path: root/fast-import.c
AgeCommit message (Collapse)AuthorFilesLines
2019-10-15Merge branch 'en/fast-imexport-nested-tags'Libravatar Junio C Hamano1-11/+81
Updates to fast-import/export. * en/fast-imexport-nested-tags: fast-export: handle nested tags t9350: add tests for tags of things other than a commit fast-export: allow user to request tags be marked with --mark-tags fast-export: add support for --import-marks-if-exists fast-import: add support for new 'alias' command fast-import: allow tags to be identified by mark labels fast-import: fix handling of deleted tags fast-export: fix exporting a tag and nothing else
2019-10-04fast-import: add support for new 'alias' commandLibravatar Elijah Newren1-10/+52
fast-export and fast-import have nice --import-marks flags which allow for incremental migrations. However, if there is a mark in fast-export's file of marks without a corresponding mark in the one for fast-import, then we run the risk that fast-export tries to send new objects relative to the mark it knows which fast-import does not, causing fast-import to fail. This arises in practice when there is a filter of some sort running between the fast-export and fast-import processes which prunes some commits programmatically. Provide such a filter with the ability to alias pruned commits to their most recent non-pruned ancestor. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04fast-import: allow tags to be identified by mark labelsLibravatar Elijah Newren1-1/+2
Mark identifiers are used in fast-export and fast-import to provide a label to refer to earlier content. Blobs are given labels because they need to be referenced in the commits where they first appear with a given filename, and commits are given labels because they can be the parents of other commits. Tags were never given labels, probably because they were viewed as unnecessary, but that presents two problems: 1. It leaves us without a way of referring to previous tags if we want to create a tag of a tag (or higher nestings). 2. It leaves us with no way of recording that a tag has already been imported when using --export-marks and --import-marks. Fix these problems by allowing an optional mark label for tags. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04fast-import: fix handling of deleted tagsLibravatar Elijah Newren1-0/+27
If our input stream includes a tag which is later deleted, we were not properly deleting it. We did have a step which would delete it, but we left a tag in the tag list noting that it needed to be updated, and the updating of annotated tags occurred AFTER ref deletion. So, when we record that a tag needs to be deleted, also remove it from the list of annotated tags to update. While this has likely been something that has not happened in practice, it will come up more in order to support nested tags. For nested tags, we either need to give temporary names to the intermediate tags and then delete them, or else we need to use the final name for the intermediate tags. If we use the final name for the intermediate tags, then in order to keep the sanity check that someone doesn't try to update the same tag twice, we need to delete the ref after creating the intermediate tag. So, either way nested tags imply the need to delete temporary inner tag references. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-27fast-import: duplicate into history rather than passing ownershipLibravatar Jeff King1-3/+1
Fast-import's read_next_command() has somewhat odd memory ownership semantics for the command_buf strbuf. After reading a command, we copy the strbuf's pointer (without duplicating the string) into our cmd_hist array of recent commands. And then when we're about to read a new command, we clear the strbuf by calling strbuf_detach(), dropping ownership from the strbuf (leaving the cmd_hist reference as the remaining owner). This has a few surprising implications: - if the strbuf hasn't been copied into cmd_hist (e.g., because we haven't ready any commands yet), then the strbuf_detach() will leak the resulting string - any modification to command_buf risks invalidating the pointer held by cmd_hist. There doesn't seem to be any way to trigger this currently (since we tend to modify it only by detaching and reading in a new value), but it's subtly dangerous. - any pointers into an input string will remain valid as long as cmd_hist points to them. So in general, you can point into command_buf.buf and call read_next_command() up to 100 times before your string is cycled out and freed, leaving you with a dangling pointer. This makes it easy to miss bugs during testing, as they might trigger only for a sufficiently large commit (e.g., the bug fixed in the previous commit). Instead, let's make a new string to copy the command into the history array, rather than having dual ownership with the old. Then we can drop the strbuf_detach() calls entirely, and just reuse the same buffer within command_buf over and over. We'd normally have to strbuf_reset() it before using it again, but in both cases here we're using strbuf_getline(), which does it automatically for us. This fixes the leak, and it means that even a single call to read_next_command() will invalidate any held pointers, making it easier to find bugs. In fact, we can drop the extra input lines added to the test case by the previous commit, as the unfixed bug would now trigger just from reading the commit message, even without any modified files in the commit. Reported-by: Mike Hommey <mh@glandium.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-27fast-import: duplicate parsed encoding stringLibravatar Jeff King1-2/+5
We read each line of the fast-import stream into the command_buf strbuf. When reading a commit, we parse a line like "encoding foo" by storing a pointer to "foo", but not making a copy. We may then read an unbounded number of other lines (e.g., one for each modified file in the commit), each of which writes into command_buf. This works out in practice for small cases, because we hand off ownership of the heap buffer from command_buf to the cmd_hist array, and read new commands into a fresh heap buffer. And thus the pointer to "foo" remains valid as long as there aren't so many intermediate lines that we end up dropping the original "encoding" line from the history. But as the test modification shows, if we go over our default of 100 lines, we end up with our encoding string pointing into freed heap memory. This seems to fail reliably by writing garbage into the output, but running under ASan definitely detects this as a use-after-free. We can fix it by duplicating the encoding value, just as we do for other parsed lines (e.g., an author line ends up in parse_ident, which copies it to a new string). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-19Merge branch 'nd/tree-walk-with-repo'Libravatar Junio C Hamano1-3/+6
The tree-walk API learned to pass an in-core repository instance throughout more codepaths. * nd/tree-walk-with-repo: t7814: do not generate same commits in different repos Use the right 'struct repository' instead of the_repository match-trees.c: remove the_repo from shift_tree*() tree-walk.c: remove the_repo from get_tree_entry_follow_symlinks() tree-walk.c: remove the_repo from get_tree_entry() tree-walk.c: remove the_repo from fill_tree_descriptor() sha1-file.c: remove the_repo from read_object_with_reference()
2019-07-09Merge branch 'rs/copy-array'Libravatar Junio C Hamano1-1/+1
Code clean-up. * rs/copy-array: use COPY_ARRAY for copying arrays coccinelle: use COPY_ARRAY for copying arrays
2019-06-27sha1-file.c: remove the_repo from read_object_with_reference()Libravatar Nguyễn Thái Ngọc Duy1-3/+6
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-17use COPY_ARRAY for copying arraysLibravatar René Scharfe1-1/+1
Convert calls of memcpy(3) to use COPY_ARRAY, which shortens and simplifies the code a bit. Patch generated by Coccinelle and contrib/coccinelle/array.cocci. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14fast-import: support 'encoding' commit headerLibravatar Elijah Newren1-2/+9
Since git supports commit messages with an encoding other than UTF-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'bc/hash-transition-16'Libravatar Junio C Hamano1-26/+40
Conversion from unsigned char[20] to struct object_id continues. * bc/hash-transition-16: (35 commits) gitweb: make hash size independent Git.pm: make hash size independent read-cache: read data in a hash-independent way dir: make untracked cache extension hash size independent builtin/difftool: use parse_oid_hex refspec: make hash size independent archive: convert struct archiver_args to object_id builtin/get-tar-commit-id: make hash size independent get-tar-commit-id: parse comment record hash: add a function to lookup hash algorithm by length remote-curl: make hash size independent http: replace sha1_to_hex http: compute hash of downloaded objects using the_hash_algo http: replace hard-coded constant with the_hash_algo http-walker: replace sha1_to_hex http-push: remove remaining uses of sha1_to_hex http-backend: allow 64-character hex names http-push: convert to use the_hash_algo builtin/pull: make hash-size independent builtin/am: make hash size independent ...
2019-04-01fast-import: fix erroneous handling of get-mark with empty orphan commitsLibravatar Elijah Newren1-6/+2
When get-mark was introduced in commit 28c7b1f7b7b7 ("fast-import: add a get-mark command", 2015-07-01), it followed the precedent of the cat-blob command to be allowed on any line other than in the middle of a data directive; see commit 777f80d7429b ("fast-import: Allow cat-blob requests at arbitrary points in stream", 2010-11-28). It was useful to allow cat-blob directives in the middle of a commit to get more data that would be used in writing the current commit object. get-mark is not similarly useful since fast-import can already use either object id or mark. Further, trying to allow this command anywhere caused parsing bugs. Fix the parsing problems by only allowing get-mark commands to appear when other commands have completed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01fast-import: only allow cat-blob requests where it makes senseLibravatar Elijah Newren1-6/+13
In commit 777f80d7429b ("fast-import: Allow cat-blob requests at arbitrary points in stream", 2010-11-28), fast-import started allowing cat-blob commands to appear on the start of any line except in the middle of a "data" command. It could be in the middle of various directives that were part of a tag command, or in the middle of checkpoints or progresses (each of which allow an optional second empty newline), or even immediately after the mark command of a blob before the data directive appeared (raising the question of what if it used the mark for the blob that just barely appeared in the stream that we do not yet have the data for). None of these locations make any sense as places to put cat-blob requests. The purpose of this change as stated in that commit message was to [save] frontends from having to loop over everything they want to commit in the next commit and cat-ing the necessary objects in advance. However, that can be achieved by simply allowing cat-blob requests to appear whenever a filemodify directive is allowed. Further, it avoids setting a bad precedent for other commands to follow (e.g. get-mark); a precedent which caused parsing problems in corner cases. Technically, inline filemodify directives add a slight wrinkle in that frontends might want to have cat-blob directives appear after the start of the filemodify and before the data directive contained within it. I think it would have been better to disallow such a case (it would be trivial to use cat-blob before the filemodify instead), but since there is evidence this was used, for backwards compatibility let's support that case too. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01fast-import: check most prominent commands firstLibravatar Elijah Newren1-2/+2
This is not a very important change, and one that I expect to have no performance impact whatsoever, but reading the code bothered me. The parsing of command types in cmd_main() mostly runs in order of most common to least common commands; sure, it's hard to say for sure what the most common are without some type of study, but it seems fairly clear to mark the original four ("blob", "commit", "tag", "reset") as the most prominent. Indeed, the parsing for most other commands were added to later in the list. However, when "ls" was added, it was stuck near the top of the list, with no rationale for that particular location. Move it down to later to appease my Tourette's-like internal twitching that its former location was causing. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01fast-import: replace sha1_to_hexLibravatar brian m. carlson1-2/+2
Replace the uses of sha1_to_hex in this function with hash_to_hex to allow the use of SHA-256 as well. Rename a variable since it is no longer limited to SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01fast-import: make hash-size independentLibravatar brian m. carlson1-16/+29
Replace several uses of GIT_SHA1_HEXSZ and 40-based constants with references to the_hash_algo. Update the note handling code here to compute path sizes based on GIT_MAX_RAWSZ as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01object-store: rename and expand packed_git's sha1 memberLibravatar brian m. carlson1-8/+9
This member is used to represent the pack checksum of the pack in question. Expand this member to be GIT_MAX_RAWSZ bytes in length so it works with longer hashes and rename it to be "hash" instead of "sha1". This transformation was made with a change to the definition and the following semantic patch: @@ struct packed_git *E1; @@ - E1->sha1 + E1->hash @@ struct packed_git E1; @@ - E1.sha1 + E1.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-04Merge branch 'en/fast-export-import'Libravatar Junio C Hamano1-154/+12
Small fixes and features for fast-export and fast-import, mostly on the fast-export side. * en/fast-export-import: fast-export: add a --show-original-ids option to show original names fast-import: remove unmaintained duplicate documentation fast-export: add --reference-excluded-parents option fast-export: ensure we export requested refs fast-export: when using paths, avoid corrupt stream with non-existent mark fast-export: move commit rewriting logic into a function for reuse fast-export: avoid dying when filtering by paths and old tags exist fast-export: use value from correct enum git-fast-export.txt: clarify misleading documentation about rev-list args git-fast-import.txt: fix documentation for --quiet option fast-export: convert sha1 to oid
2018-11-17fast-export: add a --show-original-ids option to show original namesLibravatar Elijah Newren1-0/+12
Knowing the original names (hashes) of commits can sometimes enable post-filtering that would otherwise be difficult or impossible. In particular, the desire to rewrite commit messages which refer to other prior commits (on top of whatever other filtering is being done) is very difficult without knowing the original names of each commit. In addition, knowing the original names (hashes) of blobs can allow filtering by blob-id without requiring re-hashing the content of the blob, and is thus useful as a small optimization. Once we add original ids for both commits and blobs, we may as well add them for tags too for completeness. Perhaps someone will have a use for them. This commit teaches a new --show-original-ids option to fast-export which will make it add a 'original-oid <hash>' line to blob, commits, and tags. It also teaches fast-import to parse (and ignore) such lines. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-17fast-import: remove unmaintained duplicate documentationLibravatar Elijah Newren1-154/+0
fast-import.c has started with a comment for nine and a half years re-directing the reader to Documentation/git-fast-import.txt for maintained documentation. Instead of leaving the unmaintained documentation in place, just excise it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12Upcast size_t variables to uintmax_t when printingLibravatar Torsten Bögershausen1-2/+2
When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'jk/cocci'Libravatar Junio C Hamano1-5/+5
spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion
2018-09-17Merge branch 'ds/reachable'Libravatar Junio C Hamano1-0/+1
The code for computing history reachability has been shuffled, obtained a bunch of new tests to cover them, and then being improved. * ds/reachable: commit-reach: correct accidental #include of C file commit-reach: use can_all_from_reach commit-reach: make can_all_from_reach... linear commit-reach: replace ref_newer logic test-reach: test commit_contains test-reach: test can_all_from_reach_with_flags test-reach: test reduce_heads test-reach: test get_merge_bases_many test-reach: test is_descendant_of test-reach: test in_merge_bases test-reach: create new test tool for ref_newer commit-reach: move can_all_from_reach_with_flags upload-pack: generalize commit date cutoff upload-pack: refactor ok_to_give_up() upload-pack: make reachable() more generic commit-reach: move commit_contains from ref-filter commit-reach: move ref_newer from remote.c commit.h: remove method declarations commit-reach: move walk methods from commit.c
2018-08-29convert "oidcmp() != 0" to "!oideq()"Libravatar Jeff King1-2/+2
This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() == 0" to oideq()Libravatar Jeff King1-3/+3
Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20treewide: use get_all_packsLibravatar Derrick Stolee1-2/+2
There are many places in the codebase that want to iterate over all packfiles known to Git. The purposes are wide-ranging, and those that can take advantage of the multi-pack-index already do. So, use get_all_packs() instead of get_packed_git() to be sure we are iterating over all packfiles. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02Merge branch 'sb/object-store-lookup'Libravatar Junio C Hamano1-2/+4
lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...
2018-07-20commit.h: remove method declarationsLibravatar Derrick Stolee1-0/+1
These methods are now declared in commit-reach.h. Remove them from commit.h and add new include statements in all files that require these declarations. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-06fast-import: do not call diff_delta() with empty bufferLibravatar Mike Hommey1-1/+1
We know diff_delta() returns NULL, saying "no good delta exists for it", when fed an empty data. Check the length of the data in the caller to avoid such a call. This incidentally reduces the number of attempted deltification we see in the final statistics. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29commit: add repository argument to lookup_commit_reference_gentlyLibravatar Stefan Beller1-2/+4
Add a repository argument to allow callers of lookup_commit_reference_gently to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-30Merge branch 'ma/lockfile-cleanup'Libravatar Junio C Hamano1-1/+1
Code clean-up to adjust to a more recent lockfile API convention that allows lockfile instances kept on the stack. * ma/lockfile-cleanup: lock_file: move static locks into functions lock_file: make function-local locks non-static refs.c: do not die if locking fails in `delete_pseudoref()` refs.c: do not die if locking fails in `write_pseudoref()` t/helper/test-write-cache: clean up lock-handling
2018-05-23Merge branch 'sb/oid-object-info'Libravatar Junio C Hamano1-6/+10
The codepath around object-info API has been taught to take the repository object (which in turn tells the API which object store the objects are to be located). * sb/oid-object-info: cache.h: allow oid_object_info to handle arbitrary repositories packfile: add repository argument to cache_or_unpack_entry packfile: add repository argument to unpack_entry packfile: add repository argument to read_object packfile: add repository argument to packed_object_info packfile: add repository argument to packed_to_object_type packfile: add repository argument to retry_bad_packed_offset cache.h: add repository argument to oid_object_info cache.h: add repository argument to oid_object_info_extended
2018-05-10lock_file: make function-local locks non-staticLibravatar Martin Ågren1-1/+1
Placing `struct lock_file`s on the stack used to be a bad idea, because the temp- and lockfile-machinery would keep a pointer into the struct. But after 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05), we can safely have lockfiles on the stack. (This applies even if a user returns early, leaving a locked lock behind.) These `struct lock_file`s are local to their respective functions and we can drop their staticness. For good measure, I have inspected these sites and come to believe that they always release the lock, with the possible exception of bailing out using `die()` or `exit()` or by returning from a `cmd_foo()`. As pointed out by Jeff King, it would be bad if someone held on to a `struct lock_file *` for some reason. After some grepping, I agree with his findings: no-one appears to be doing that. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08Merge branch 'ds/commit-graph'Libravatar Junio C Hamano1-1/+1
Precompute and store information necessary for ancestry traversal in a separate file to optimize graph walking. * ds/commit-graph: commit-graph: implement "--append" option commit-graph: build graph from starting commits commit-graph: read only from specific pack-indexes commit: integrate commit graph with commit parsing commit-graph: close under reachability commit-graph: add core.commitGraph setting commit-graph: implement git commit-graph read commit-graph: implement git-commit-graph write commit-graph: implement write_commit_graph() commit-graph: create git-commit-graph builtin graph: add commit graph design document commit-graph: add format document csum-file: refactor finalize_hashfile() method csum-file: rename hashclose() to finalize_hashfile()
2018-04-26packfile: add repository argument to unpack_entryLibravatar Stefan Beller1-1/+1
Add a repository argument to allow the callers of unpack_entry to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26cache.h: add repository argument to oid_object_infoLibravatar Stefan Beller1-5/+9
Add a repository argument to allow the callers of oid_object_info to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-25Merge branch 'jm/mem-pool'Libravatar Junio C Hamano1-61/+16
An reusable "memory pool" implementation has been extracted from fast-import.c, which in turn has become the first user of the mem-pool API. * jm/mem-pool: mem-pool: move reusable parts of memory pool into its own file fast-import: introduce mem_pool type fast-import: rename mem_pool type to mp_block
2018-04-12mem-pool: move reusable parts of memory pool into its own fileLibravatar Jameson Miller1-69/+1
This moves the reusable parts of the memory pool logic used by fast-import.c into its own file for use by other components. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-12fast-import: introduce mem_pool typeLibravatar Jameson Miller1-29/+52
Introduce the mem_pool type which encapsulates all the information necessary to manage a pool of memory. This change moves the existing variables in fast-import used to support the global memory pool to use this structure. It also renames variables that are no longer used by memory pools to reflect their more scoped usage. These changes allow for the multiple instances of a memory pool to exist and be reused outside of fast-import. In a future commit the mem_pool type will be moved to its own file. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-12fast-import: rename mem_pool type to mp_blockLibravatar Jameson Miller1-10/+10
This is part of a patch series to extract the memory pool logic in fast-import into a more generalized version. The existing mem_pool type maps more closely to a "block of memory" (mp_block) in the more generalized memory pool. This commit renames the mem_pool to mp_block to reduce churn in future patches. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11Merge branch 'sb/packfiles-in-repository'Libravatar Junio C Hamano1-2/+1
Refactoring of the internal global data structure continues. * sb/packfiles-in-repository: packfile: keep prepare_packed_git() private packfile: allow find_pack_entry to handle arbitrary repositories packfile: add repository argument to find_pack_entry packfile: allow reprepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_one to handle arbitrary repositories packfile: add repository argument to reprepare_packed_git packfile: add repository argument to prepare_packed_git packfile: add repository argument to prepare_packed_git_one packfile: allow install_packed_git to handle arbitrary repositories packfile: allow rearrange_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_mru to handle arbitrary repositories
2018-04-11Merge branch 'sb/object-store'Libravatar Junio C Hamano1-2/+6
Refactoring the internal global data structure to make it possible to open multiple repositories, work with and then close them. Rerolled by Duy on top of a separate preliminary clean-up topic. The resulting structure of the topics looked very sensible. * sb/object-store: (27 commits) sha1_file: allow sha1_loose_object_info to handle arbitrary repositories sha1_file: allow map_sha1_file to handle arbitrary repositories sha1_file: allow map_sha1_file_1 to handle arbitrary repositories sha1_file: allow open_sha1_file to handle arbitrary repositories sha1_file: allow stat_sha1_file to handle arbitrary repositories sha1_file: allow sha1_file_name to handle arbitrary repositories sha1_file: add repository argument to sha1_loose_object_info sha1_file: add repository argument to map_sha1_file sha1_file: add repository argument to map_sha1_file_1 sha1_file: add repository argument to open_sha1_file sha1_file: add repository argument to stat_sha1_file sha1_file: add repository argument to sha1_file_name sha1_file: allow prepare_alt_odb to handle arbitrary repositories sha1_file: allow link_alt_odb_entries to handle arbitrary repositories sha1_file: add repository argument to prepare_alt_odb sha1_file: add repository argument to link_alt_odb_entries sha1_file: add repository argument to read_info_alternates sha1_file: add repository argument to link_alt_odb_entry sha1_file: add raw_object_store argument to alt_odb_usable pack: move approximate object count to object store ...
2018-04-10Merge branch 'bc/object-id'Libravatar Junio C Hamano1-15/+16
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (36 commits) convert: convert to struct object_id sha1_file: introduce a constant for max header length Convert lookup_replace_object to struct object_id sha1_file: convert read_sha1_file to struct object_id sha1_file: convert read_object_with_reference to object_id tree-walk: convert tree entry functions to object_id streaming: convert istream internals to struct object_id tree-walk: convert get_tree_entry_follow_symlinks internals to object_id builtin/notes: convert static functions to object_id builtin/fmt-merge-msg: convert remaining code to object_id sha1_file: convert sha1_object_info* to object_id Convert remaining callers of sha1_object_info_extended to object_id packfile: convert unpack_entry to struct object_id sha1_file: convert retry_bad_packed_offset to struct object_id sha1_file: convert assert_sha1_type to object_id builtin/mktree: convert to struct object_id streaming: convert open_istream to use struct object_id sha1_file: convert check_sha1_signature to struct object_id sha1_file: convert read_loose_object to use struct object_id builtin/index-pack: convert struct ref_delta_entry to object_id ...
2018-04-02csum-file: rename hashclose() to finalize_hashfile()Libravatar Derrick Stolee1-1/+1
The hashclose() method behaves very differently depending on the flags parameter. In particular, the file descriptor is not always closed. Perform a simple rename of "hashclose()" to "finalize_hashfile()" in preparation for functional changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26packfile: keep prepare_packed_git() privateLibravatar Nguyễn Thái Ngọc Duy1-1/+0
The reason callers have to call this is to make sure either packed_git or packed_git_mru pointers are initialized since we don't do that by default. Sometimes it's hard to see this connection between where the function is called and where packed_git pointer is used (sometimes in separate functions). Keep this dependency internal because now all access to packed_git and packed_git_mru must go through get_xxx() wrappers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26packfile: add repository argument to prepare_packed_gitLibravatar Stefan Beller1-1/+1
Add a repository argument to allow prepare_packed_git callers to be more specific about which repository to handle. See commit "sha1_file: add repository argument to link_alt_odb_entry" for an explanation of the #define trick. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26packfile: allow install_packed_git to handle arbitrary repositoriesLibravatar Stefan Beller1-1/+1
This conversion was done without the #define trick used in the earlier series refactoring to have better repository access, because this function is easy to review, as it only has one caller and all lines but the first two are converted. We must not convert 'pack_open_fds' to be a repository specific variable, as it is used to monitor resource usage of the machine that Git executes on. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26object-store: move packed_git and packed_git_mru to object storeLibravatar Stefan Beller1-2/+6
In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-21Merge branch 'rj/warning-uninitialized-fix'Libravatar Junio C Hamano1-2/+2
Compilation fix. * rj/warning-uninitialized-fix: read-cache: fix an -Wmaybe-uninitialized warning -Wuninitialized: remove some 'init-self' workarounds