summaryrefslogtreecommitdiff
path: root/commit.c
AgeCommit message (Collapse)AuthorFilesLines
2014-07-10Merge branch 'mg/verify-commit'Libravatar Junio C Hamano1-0/+1
Add 'verify-commit' to be used in a way similar to 'verify-tag' is used. Further work on verifying the mergetags might be needed. * mg/verify-commit: t7510: test verify-commit t7510: exit for loop with test result verify-commit: scriptable commit signature verification gpg-interface: provide access to the payload gpg-interface: provide clear helper for struct signature_check
2014-07-09Merge branch 'jk/skip-prefix'Libravatar Junio C Hamano1-4/+2
* jk/skip-prefix: http-push: refactor parsing of remote object names imap-send: use skip_prefix instead of using magic numbers use skip_prefix to avoid repeated calculations git: avoid magic number with skip_prefix fetch-pack: refactor parsing in get_ack fast-import: refactor parsing of spaces stat_opt: check extra strlen call daemon: use skip_prefix to avoid magic numbers fast-import: use skip_prefix for parsing input use skip_prefix to avoid repeating strings use skip_prefix to avoid magic numbers transport-helper: avoid reading past end-of-string fast-import: fix read of uninitialized argv memory apply: use skip_prefix instead of raw addition refactor skip_prefix to return a boolean avoid using skip_prefix as a boolean daemon: mark some strings as const parse_diff_color_slot: drop ofs parameter
2014-07-02Merge branch 'jk/commit-buffer-length'Libravatar Junio C Hamano1-38/+91
Move "commit->buffer" out of the in-core commit object and keep track of their lengths. Use this to optimize the code paths to validate GPG signatures in commit objects. * jk/commit-buffer-length: reuse cached commit buffer when parsing signatures commit: record buffer length in cache commit: convert commit->buffer to a slab commit-slab: provide a static initializer use get_commit_buffer everywhere convert logmsg_reencode to get_commit_buffer use get_commit_buffer to avoid duplicate code use get_cached_commit_buffer where appropriate provide helpers to access the commit buffer provide a helper to set the commit buffer provide a helper to free commit buffer sequencer: use logmsg_reencode in get_message logmsg_reencode: return const buffer do not create "struct commit" with xcalloc commit: push commit_index update into alloc_commit_node alloc: include any-object allocations in alloc_report replace dangerous uses of strbuf_attach commit_tree: take a pointer/len pair rather than a const strbuf
2014-06-23gpg-interface: provide access to the payloadLibravatar Michael J Gruber1-0/+1
In contrast to tag signatures, commit signatures are put into the header, that is between the other header parts and commit messages. Provide access to the commit content sans the signature, which is the payload that is actually signed. Commit signature verification does the parsing anyways, and callers may wish to act on or display the commit object sans the signature. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20refactor skip_prefix to return a booleanLibravatar Jeff King1-4/+2
The skip_prefix() function returns a pointer to the content past the prefix, or NULL if the prefix was not found. While this is nice and simple, in practice it makes it hard to use for two reasons: 1. When you want to conditionally skip or keep the string as-is, you have to introduce a temporary variable. For example: tmp = skip_prefix(buf, "foo"); if (tmp) buf = tmp; 2. It is verbose to check the outcome in a conditional, as you need extra parentheses to silence compiler warnings. For example: if ((cp = skip_prefix(buf, "foo")) /* do something with cp */ Both of these make it harder to use for long if-chains, and we tend to use starts_with() instead. However, the first line of "do something" is often to then skip forward in buf past the prefix, either using a magic constant or with an extra strlen(3) (which is generally computed at compile time, but means we are repeating ourselves). This patch refactors skip_prefix() to return a simple boolean, and to provide the pointer value as an out-parameter. If the prefix is not found, the out-parameter is untouched. This lets you write: if (skip_prefix(arg, "foo ", &arg)) do_foo(arg); else if (skip_prefix(arg, "bar ", &arg)) do_bar(arg); Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13reuse cached commit buffer when parsing signaturesLibravatar Jeff King1-17/+10
When we call show_signature or show_mergetag, we read the commit object fresh via read_sha1_file and reparse its headers. However, in most cases we already have the object data available, attached to the "struct commit". This is partially laziness in dealing with the memory allocation issues, but partially defensive programming, in that we would always want to verify a clean version of the buffer (not one that might have been munged by other users of the commit). However, we do not currently ever munge the commit buffer, and not using the already-available buffer carries a fairly big performance penalty when we are looking at a large number of commits. Here are timings on linux.git: [baseline, no signatures] $ time git log >/dev/null real 0m4.902s user 0m4.784s sys 0m0.120s [before] $ time git log --show-signature >/dev/null real 0m14.735s user 0m9.964s sys 0m0.944s [after] $ time git log --show-signature >/dev/null real 0m9.981s user 0m5.260s sys 0m0.936s Note that our user CPU time drops almost in half, close to the non-signature case, but we do still spend more wall-clock and system time, presumably from dealing with gpg. An alternative to this is to note that most commits do not have signatures (less than 1% in this repo), yet we pay the re-parsing cost for every commit just to find out if it has a mergetag or signature. If we checked that when parsing the commit initially, we could avoid re-examining most commits later on. Even if we did pursue that direction, however, this would still speed up the cases where we _do_ have signatures. So it's probably worth doing either way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13commit: record buffer length in cacheLibravatar Jeff King1-18/+36
Most callsites which use the commit buffer try to use the cached version attached to the commit, rather than re-reading from disk. Unfortunately, that interface provides only a pointer to the NUL-terminated buffer, with no indication of the original length. For the most part, this doesn't matter. People do not put NULs in their commit messages, and the log code is happy to treat it all as a NUL-terminated string. However, some code paths do care. For example, when checking signatures, we want to be very careful that we verify all the bytes to avoid malicious trickery. This patch just adds an optional "size" out-pointer to get_commit_buffer and friends. The existing callers all pass NULL (there did not seem to be any obvious sites where we could avoid an immediate strlen() call, though perhaps with some further refactoring we could). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13commit: convert commit->buffer to a slabLibravatar Jeff King1-7/+13
This will make it easier to manage the buffer cache independently of the "struct commit" objects. It also shrinks "struct commit" by one pointer, which may be helpful. Unfortunately it does not reduce the max memory size of something like "rev-list", because rev-list uses get_cached_commit_buffer() to decide not to show each commit's output (and due to the design of slab_at, accessing the slab requires us to extend it, allocating exactly the same number of buffer pointers we dropped from the commit structs). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13use get_commit_buffer to avoid duplicate codeLibravatar Jeff King1-13/+3
For both of these sites, we already do the "fallback to read_sha1_file" trick. But we can shorten the code by just using get_commit_buffer. Note that the error cases are slightly different when read_sha1_file fails. get_commit_buffer will die() if the object cannot be loaded, or is a non-commit. For get_sha1_oneline, this will almost certainly never happen, as we will have just called parse_object (and if it does, it's probably worth complaining about). For record_author_date, the new behavior is probably better; we notify the user of the error instead of silently ignoring it. And because it's used only for sorting by author-date, somebody examining a corrupt repo can fallback to the regular traversal order. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13provide helpers to access the commit bufferLibravatar Jeff King1-0/+28
Many sites look at commit->buffer to get more detailed information than what is in the parsed commit struct. However, we sometimes drop commit->buffer to save memory, in which case the caller would need to read the object afresh. Some callers do this (leading to duplicated code), and others do not (which opens the possibility of a segfault if somebody else frees the buffer). Let's provide a pair of helpers, "get" and "unuse", that let callers easily get the buffer. They will use the cached buffer when possible, and otherwise load from disk using read_sha1_file. Note that we also need to add a "get_cached" variant which returns NULL when we do not have a cached buffer. At first glance this seems to defeat the purpose of "get", which is to always provide a return value. However, some log code paths actually use the NULL-ness of commit->buffer as a boolean flag to decide whether to try printing the commit. At least for now, we want to continue supporting that use. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13provide a helper to set the commit bufferLibravatar Jeff King1-1/+6
Right now this is just a one-liner, but abstracting it will make it easier to change later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13provide a helper to free commit bufferLibravatar Jeff King1-0/+13
This converts two lines into one at each caller. But more importantly, it abstracts the concept of freeing the buffer, which will make it easier to change later. Note that we also need to provide a "detach" mechanism for a tricky case in index-pack. We are passed a buffer for the object generated by processing the incoming pack. If we are not using --strict, we just calculate the sha1 on that buffer and return, leaving the caller to free it. But if we are using --strict, we actually attach that buffer to an object, pass the object to the fsck functions, and then detach the buffer from the object again (so that the caller can free it as usual). In this case, we don't want to free the buffer ourselves, but just make sure it is no longer associated with the commit. Note that we are making the assumption here that the attach/detach process does not impact the buffer at all (e.g., it is never reallocated or modified). That holds true now, and we have no plans to change that. However, as we abstract the commit_buffer code, this dependency becomes less obvious. So when we detach, let's also make sure that we get back the same buffer that we gave to the commit_buffer code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12commit: push commit_index update into alloc_commit_nodeLibravatar Jeff King1-2/+0
Whenever we create a commit object via lookup_commit, we give it a unique index to be used with the commit-slab API. The theory is that any "struct commit" we create would follow this code path, so any such struct would get an index. However, callers could use alloc_commit_node() directly (and get multiple commits with index 0). Let's push the indexing into alloc_commit_node so that it's hard for callers to get it wrong. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12commit_tree: take a pointer/len pair rather than a const strbufLibravatar Jeff King1-5/+7
While strbufs are pretty common throughout our code, it is more flexible for functions to take a pointer/len pair than a strbuf. It's easy to turn a strbuf into such a pair (by dereferencing its members), but less easy to go the other way (you can strbuf_attach, but that has implications about memory ownership). This patch teaches commit_tree (and its associated callers and sub-functions) to take such a pair for the commit message rather than a strbuf. This makes passing the buffer around slightly more verbose, but means we can get rid of some dangerous strbuf_attach calls in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27commit.c: rearrange xcalloc argumentsLibravatar Brian Gesiak1-1/+1
xcalloc() takes two arguments: the number of elements and their size. reduce_heads() passes the arguments in reverse order, passing the size of a commit*, followed by the number of commit* to be allocated. Rearrange them so they are in the correct order. Signed-off-by: Brian Gesiak <modocache@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-03Merge branch 'nd/log-show-linear-break'Libravatar Junio C Hamano1-1/+1
Attempts to show where a single-strand-of-pearls break in "git log" output. * nd/log-show-linear-break: log: add --show-linear-break to help see non-linear history object.h: centralize object flag allocation
2014-03-25object.h: centralize object flag allocationLibravatar Nguyễn Thái Ngọc Duy1-1/+1
While the field "flags" is mainly used by the revision walker, it is also used in many other places. Centralize the whole flag allocation to one place for a better overview (and easier to move flags if we have too). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-18Merge branch 'dd/use-alloc-grow'Libravatar Junio C Hamano1-6/+2
Replace open-coded reallocation with ALLOC_GROW() macro. * dd/use-alloc-grow: sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() read-cache.c: use ALLOC_GROW() in add_index_entry() builtin/mktree.c: use ALLOC_GROW() in append_to_tree() attr.c: use ALLOC_GROW() in handle_attr_line() dir.c: use ALLOC_GROW() in create_simplify() reflog-walk.c: use ALLOC_GROW() replace_object.c: use ALLOC_GROW() in register_replace_object() patch-ids.c: use ALLOC_GROW() in add_commit() diffcore-rename.c: use ALLOC_GROW() diff.c: use ALLOC_GROW() commit.c: use ALLOC_GROW() in register_commit_graft() cache-tree.c: use ALLOC_GROW() in find_subtree() bundle.c: use ALLOC_GROW() in add_to_ref_list() builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
2014-03-18Merge branch 'dd/find-graft-with-sha1-pos'Libravatar Junio C Hamano1-15/+9
Replace a hand-rolled binary search with a call to our generic binary search helper function. * dd/find-graft-with-sha1-pos: commit.c: use the generic "sha1_pos" function for lookup
2014-03-04commit.c: use skip_prefix() instead of starts_with()Libravatar Tanay Abhra1-8/+6
In record_author_date() & parse_gpg_output(), the callers of starts_with() not just want to know if the string starts with the prefix, but also can benefit from knowing the string that follows the prefix. By using skip_prefix(), we can do both at the same time. Helped-by: Max Horn <max@quendi.de> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03commit.c: use ALLOC_GROW() in register_commit_graft()Libravatar Dmitry S. Dolzhenko1-6/+2
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27commit.c: use the generic "sha1_pos" function for lookupLibravatar Dmitry S. Dolzhenko1-15/+9
Refactor binary search in "commit_graft_pos" function: use generic "sha1_pos" function. Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-10Merge branch 'vm/octopus-merge-bases-simplify'Libravatar Junio C Hamano1-18/+18
* vm/octopus-merge-bases-simplify: get_octopus_merge_bases(): cleanup redundant variable
2014-01-10Merge branch 'js/lift-parent-count-limit'Libravatar Junio C Hamano1-5/+5
There is no reason to have a hardcoded upper limit of the number of parents for an octopus merge, created via the graft mechanism. * js/lift-parent-count-limit: Remove the line length limit for graft files
2014-01-10Merge branch 'nd/commit-tree-constness'Libravatar Junio C Hamano1-2/+2
Code clean-up. * nd/commit-tree-constness: commit.c: make "tree" a const pointer in commit_tree*()
2014-01-03get_octopus_merge_bases(): cleanup redundant variableLibravatar Vasily Makarov1-18/+18
pptr is needless. Some related code got cleaned as well. Signed-off-by: Vasily Makarov <einmalfel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-27Remove the line length limit for graft filesLibravatar Johannes Schindelin1-5/+5
Support for grafts predates Git's strbuf, and hence it is understandable that there was a hard-coded line length limit of 1023 characters (which was chosen a bit awkwardly, given that it is *exactly* one byte short of aligning with the 41 bytes occupied by a commit name and the following space or new-line character). While regular commit histories hardly win comprehensibility in general if they merge more than twenty-two branches in one go, it is not Git's business to limit grafts in such a way. In this particular developer's case, the use case that requires substantially longer graft lines to be supported is the visualization of the commits' order implied by their changes: commits are considered to have an implicit relationship iff exchanging them in an interactive rebase would result in merge conflicts. Thusly implied branches tend to be very shallow in general, and the resulting thicket of implied branches is usually very wide; It is actually quite common that *most* of the commits in a topic branch have not even one implied parent, so that a final merge commit has about as many implied parents as there are commits in said branch. [jc: squashed in tests by Jonathan] Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26commit.c: make "tree" a const pointer in commit_tree*()Libravatar Nguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-17Merge branch 'cc/starts-n-ends-with'Libravatar Junio C Hamano1-3/+3
Remove a few duplicate implementations of prefix/suffix comparison functions, and rename them to starts_with and ends_with. * cc/starts-n-ends-with: replace {pre,suf}fixcmp() with {starts,ends}_with() strbuf: introduce starts_with() and ends_with() builtin/remote: remove postfixcmp() and use suffixcmp() instead environment: normalize use of prefixcmp() by removing " != 0"
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Libravatar Christian Couder1-3/+3
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05Merge branch 'jk/robustify-parse-commit'Libravatar Junio C Hamano1-1/+8
* jk/robustify-parse-commit: checkout: do not die when leaving broken detached HEAD use parse_commit_or_die instead of custom message use parse_commit_or_die instead of segfaulting assume parse_commit checks for NULL commit assume parse_commit checks commit->object.parsed log_tree_diff: die when we fail to parse a commit
2013-10-24assume parse_commit checks for NULL commitLibravatar Jeff King1-1/+1
The parse_commit function will check whether it was passed a NULL commit pointer, and if so, return an error. There is no need for callers to check this separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24log_tree_diff: die when we fail to parse a commitLibravatar Jeff King1-0/+7
We currently call parse_commit and then assume we can dereference the resulting "tree" struct field. If parsing failed, however, that field is NULL and we end up segfaulting. Instead of a segfault, let's print an error message and die a little more gracefully. Note that this should never happen in practice, but may happen in a corrupt repository. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-09Merge branch 'tr/log-full-diff-keep-true-parents'Libravatar Junio C Hamano1-0/+16
Output from "git log --full-diff -- <pathspec>" looked strange, because comparison was done with the previous ancestor that touched the specified <pathspec>, causing the patches for paths outside the pathspec to show more than the single commit has changed. Tweak "git reflog -p" for the same reason using the same mechanism. * tr/log-full-diff-keep-true-parents: log: use true parents for diff when walking reflogs log: use true parents for diff even when rewriting
2013-08-05Merge branch 'bc/commit-invalid-utf8'Libravatar Junio C Hamano1-1/+1
* bc/commit-invalid-utf8: commit: typofix for xxFFF[EF] check
2013-08-05commit: typofix for xxFFF[EF] checkLibravatar Junio C Hamano1-1/+1
We wanted to catch all codepoints that ends with FFFE and FFFF, not with 0FFFE and 0FFFF. Noticed and corrected by Peter Krefting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01log: use true parents for diff even when rewritingLibravatar Thomas Rast1-0/+16
When using pathspec filtering in combination with diff-based log output, parent simplification happens before the diff is computed. The diff is therefore against the *simplified* parents. This works okay, arguably by accident, in the normal case: simplification reduces to one parent as long as the commit is TREESAME to it. So the simplified parent of any given commit must have the same tree contents on the filtered paths as its true (unfiltered) parent. However, --full-diff breaks this guarantee, and indeed gives pretty spectacular results when comparing the output of git log --graph --stat ... git log --graph --full-diff --stat ... (--graph internally kicks in parent simplification, much like --parents). To fix it, store a copy of the parent list before simplification (in a slab) whenever --full-diff is in effect. Then use the stored parents instead of the simplified ones in the commit display code paths. The latter do not actually check for --full-diff to avoid duplicated code; they just grab the original parents if save_parents() has not been called for this revision walk. For ordinary commits it should be obvious that this is the right thing to do. Merge commits are a bit subtle. Observe that with default simplification, merge simplification is an all-or-nothing decision: either the merge is TREESAME to one parent and disappears, or it is different from all parents and the parent list remains intact. Redundant parents are not pruned, so the existing code also shows them as a merge. So if we do show a merge commit, the parent list just consists of the rewrite result on each parent. Running, e.g., --cc on this in --full-diff mode is not very useful: if any commits were skipped, some hunks will disagree with all sides of the merge (with one side, because commits were skipped; with the others, because they didn't have those changes in the first place). This triggers --cc showing these hunks spuriously. Therefore I believe that even for merge commits it is better to show the diffs wrt. the original parents. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-18Merge branch 'bc/commit-invalid-utf8'Libravatar Junio C Hamano1-6/+32
Logic to auto-detect character encodings in the commit log message did not reject overlong and invalid UTF-8 characters. * bc/commit-invalid-utf8: commit: reject non-characters commit: reject overlong UTF-8 sequences commit: reject invalid UTF-8 codepoints
2013-07-09commit: reject non-charactersLibravatar Peter Krefting1-2/+5
Unicode clause D14 defines all characters U+nFFFE and U+nFFFF (where 0 <= n <= 10h) as well as the range U+FDD0..U+FDEF as non-characters, reserved for internal use only. Disallow these characters in commit messages as they are normally not recommended for interchange. Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-04commit: reject overlong UTF-8 sequencesLibravatar brian m. carlson1-6/+12
The commit code accepts pseudo-UTF-8 sequences that encode a character with more bytes than necessary. Reject such sequences, since they are not valid UTF-8. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-04commit: reject invalid UTF-8 codepointsLibravatar brian m. carlson1-5/+22
The commit code already contains code for validating UTF-8, but it does not check for invalid values, such as guaranteed non-characters and surrogates. Fix this by explicitly checking for and rejecting such characters. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-02commit.c: make compare_commits_by_commit_date globalLibravatar Jeff King1-1/+1
This helper function was introduced as a prio_queue comparator to help topological sorting. However, other users of prio_queue who want to replace commit_list_insert_by_date will want to use it, too. So let's make it public. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-01Merge branch 'jc/topo-author-date-sort'Libravatar Junio C Hamano1-27/+118
"git log" learned the "--author-date-order" option, with which the output is topologically sorted and commits in parallel histories are shown intermixed together based on the author timestamp. * jc/topo-author-date-sort: t6003: add --author-date-order test topology tests: teach a helper to set author dates as well t6003: add --date-order test topology tests: teach a helper to take abbreviated timestamps t/lib-t6000: style fixes log: --author-date-order sort-in-topological-order: use prio-queue prio-queue: priority queue of pointers to structs toposort: rename "lifo" field
2013-07-01Merge branch 'jk/commit-info-slab'Libravatar Junio C Hamano1-9/+28
Allow adding custom information to commit objects in order to represent unbound number of flag bits etc. * jk/commit-info-slab: commit-slab: introduce a macro to define a slab for new type commit-slab: avoid large realloc commit: allow associating auxiliary info on-demand
2013-06-11log: --author-date-orderLibravatar Junio C Hamano1-0/+74
Sometimes people would want to view the commits in parallel histories in the order of author dates, not committer dates. Teach "topo-order" sort machinery to do so, using a commit-info slab to record the author dates of each commit, and prio-queue to sort them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11sort-in-topological-order: use prio-queueLibravatar Junio C Hamano1-31/+44
Use the prio-queue data structure to implement a priority queue of commits sorted by committer date, when handling --date-order. The structure can also be used as a simple LIFO stack, which is a good match for --topo-order processing. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11toposort: rename "lifo" fieldLibravatar Junio C Hamano1-4/+8
The primary invariant of sort_in_topological_order() is that a parent commit is not emitted until all children of it are. When traversing a forked history like this with "git log C E": A----B----C \ D----E we ensure that A is emitted after all of B, C, D, and E are done, B has to wait until C is done, and D has to wait until E is done. In some applications, however, we would further want to control how these child commits B, C, D and E on two parallel ancestry chains are shown. Most of the time, we would want to see C and B emitted together, and then E and D, and finally A (i.e. the --topo-order output). The "lifo" parameter of the sort_in_topological_order() function is used to control this behaviour. We start the traversal by knowing two commits, C and E. While keeping in mind that we also need to inspect E later, we pick C first to inspect, and we notice and record that B needs to be inspected. By structuring the "work to be done" set as a LIFO stack, we ensure that B is inspected next, before other in-flight commits we had known that we will need to inspect, e.g. E. When showing in --date-order, we would want to see commits ordered by timestamps, i.e. show C, E, B and D in this order before showing A, possibly mixing commits from two parallel histories together. When "lifo" parameter is set to false, the function keeps the "work to be done" set sorted in the date order to realize this semantics. After inspecting C, we add B to the "work to be done" set, but the next commit we inspect from the set is E which is newer than B. The name "lifo", however, is too strongly tied to the way how the function implements its behaviour, and does not describe what the behaviour _means_. Replace this field with an enum rev_sort_order, with two possible values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and update the existing code. The mechanical replacement rule is: "lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE" "lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER" Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07commit-slab: introduce a macro to define a slab for new typeLibravatar Junio C Hamano1-58/+14
Introduce a header file to define a macro that can define the struct type, initializer, accessor and cleanup functions to manage a commit slab. Update the "indegree" topological sort facility using it. To associate 32 flag bits with each commit, you can write: define_commit_slab(flag32, uint32); to declare "struct flag32" type, define an instance of it with struct flag32 flags; and initialize it by calling init_flag32(&flags); After that, a call to flag32_at() function uint32 *fp = flag32_at(&flags, commit); will return a pointer pointing at a uint32 for that commit. Once you are done with these flags, clean them up with clear_flag32(&flags); Callers that cannot hard-code how wide the data to be associated with the commit be at compile time can use the "_with_stride" variant to initialize the slab. Suppose you want to give one bit per existing ref, and paint commits down to find which refs are descendants of each commit. Saying typedef uint32 bits320[5]; define_commit_slab(flagbits, bits320); at compile time will still limit your code with hard-coded limit, because you may find that you have more than 320 refs at runtime. The code can declare a commit slab "struct flagbits" like this instead: define_commit_slab(flagbits, unsigned char); struct flagbits flags; and initialize it by: nrefs = ... count number of refs ... init_flagbits_with_stride(&flags, (nrefs + 7) / 8); so that unsigned char *fp = flagbits_at(&flags, commit); will return a pointer pointing at an array of 40 "unsigned char"s associated with the commit, once you figure out nrefs is 320 at runtime. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-13commit-slab: avoid large reallocLibravatar Junio C Hamano1-20/+42
Instead of using a single "slab" and keep reallocating it as we find that we need to deal with commits with larger values of commit->index, make a "slab" an array of many "slab_piece"s. Each access may need two levels of indirections, but we only need to reallocate the first level array of pointers when we have to grow the table this way. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-13commit: allow associating auxiliary info on-demandLibravatar Jeff King1-9/+50
The "indegree" field in the commit object is only used while sorting a list of commits in topological order, and wasting memory otherwise. We would prefer to shrink the size of individual commit objects, which we may have to hold thousands of in-core. We could eject "indegree" field out from the commit object and represent it as a dynamic table based on the decoration infrastructure, but the decoration is meant for sparse annotation and is not a good match. Instead, let's try a different approach. - Assign an integer (commit->index) to each commit we keep in-core (reuse the space of "indegree" field for it); - When running the topological sort, allocate an array of integers in bulk (called "slab"), use the commit->index as an index into this array, and store the "indegree" information there. This does _not_ reduce the memory footprint of a commit object, but the commit->index can be used as the index to dynamically associate commits with other kinds of information as needed. Signed-off-by: Junio C Hamano <gitster@pobox.com>