summaryrefslogtreecommitdiff
path: root/builtin-rev-list.c
AgeCommit message (Collapse)AuthorFilesLines
2007-04-05Document --left-right option to rev-list.Libravatar Brian Gernhardt1-0/+1
Explanation is paraphrased from "577ed5c... rev-list --left-right" Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-23Fix path-limited "rev-list --bisect" termination condition.Libravatar Junio C Hamano1-1/+1
In a path-limited bisection, when the $bad commit is not changing the limited path, and the number of suspects is 1, the code miscounted and returned $bad from find_bisection(), which is not marked with TREECHANGE. This is of course filtered by the output routine, resulting in an empty output, in turn causing git-bisect driver to say "$bad was both good and bad". Illustration. Suppose you have these four commits, and only C changes path P. You know D is bad and A is good. A---B---C*--D git-bisect driver runs this to find a bisection point: $ git rev-list --bisect A..D -- P which calls find_bisection() with B, C and D. The set of commits that is given to this function is the same set of commits as rev-list without --bisect option and pathspec returns. Among them, only C is marked with TREECHANGE. Let's call the set of commits given to find_bisection() that are marked with TREECHANGE (or all of them if no path limiter is in effect) "the bisect set". In the above example, the size of the bisect set is 1 (contains only "C"). For each commit in its input, find_bisection() computes the number of commits it can reach in the bisect set. For a commit in the bisect set, this number includes itself, so the number is 1 or more. This number is called "depth", and computed by count_distance() function. When you have a bisect set of N commits, and a commit has depth D, how good is your bisection if you returned that commit? How good this bisection is can be measured by how many commits are effectively tested "together" by testing one commit. Currently you have (N-1) untested commits (the tip of the bisect set, although it is included in the bisect set, is already known to be bad). If the commit with depth D turns out to be bad, then your next bisect set will have D commits and you will have (D-1) untested commits left, which means you tested (N-1)-(D-1) = (N-D) commits with this bisection. If it turns out to be good, then your next bisect set will have (N-D) commits, and you will have (N-D-1) untested commits left, which means you tested (N-1)-(N-D-1) = D commits with this bisection. Therefore, the goodness of this bisection is is min(N-D, D), and find_bisection() function tries to find a commit that maximizes this, by initializing "closest" variable to 0 and whenever a commit with the goodness that is larger than the current "closest" is found, that commit and its goodness are remembered by updating "closest" variable. The "the commit with the best goodness so far" is kept in "best" variable, and is initialized to a commit that happens to be at the beginning of the list of commits given to this function (which may or may not be in the bisect set when path-limit is in use). However, when N is 1, then the sole tree-changing commit has depth of 1, and min(N-D, D) evaluates to 0. This is not larger than the initial value of "closest", and the "so far the best one" commit is never replaced in the loop. When path-limit is not in use, this is not a problem, as any commit in the input set is tree-changing. But when path-limit is in use, and when the starting "bad" commit does not change the specified path, it is not correct to return it. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-18Read the config in rev-listLibravatar Fredrik Kuivinen1-0/+1
Otherwise "git rev-list --header HEAD" will not do the right thing if i18n.commitencoding is set. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-17Teach all of log family --left-right output.Libravatar Junio C Hamano1-6/+1
This makes reviewing git log --left-right --merge --no-merges -p a lot more pleasant. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-17rev-list --left-rightLibravatar Junio C Hamano1-0/+11
The output from "symmetric diff", i.e. A...B, does not distinguish between commits that are reachable from A and the ones that are reachable from B. In this picture, such a symmetric diff includes commits marked with a and b. x---b---b branch B / \ / / . / / \ o---x---a---a branch A However, you cannot tell which ones are 'a' and which ones are 'b' from the output. Sometimes this is frustrating. This adds an output option, --left-right, to rev-list. rev-list --left-right A...B would show ones reachable from A prefixed with '<' and the ones reachable from B prefixed with '>'. When combined with --boundary, boundary commits (the ones marked with 'x' in the above picture) are shown with prefix '-', so you would see list that looks like this: git rev-list --left-right --boundary --pretty=oneline A...B >bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 3rd on b >bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 2nd on b <aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 3rd on a <aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 2nd on a -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1st on b -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1st on a Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20git log: Unify header_filter and message_filter into one.Libravatar Junio C Hamano1-3/+1
Now we can tell the built-in grep to grep only in head or in body, use that to update --author, --committer, and --grep. Unfortunately, to make --and, --not and other grep boolean expressions useful, as in: # Things written by Junio committed and by Linus and log # does not talk about diff. git log --author=Junio --and --committer=Linus \ --grep-not --grep=diff we will need to do another round of built-in grep core enhancement, because grep boolean expressions are designed to work on one line at a time. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20rev-list: fix segfault with --{author,committer,grep}Libravatar Jeff King1-1/+3
We need to save the commit buffer if we're going to match against it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07pack-objects: further work on internal rev-list logic.Libravatar Junio C Hamano1-30/+6
This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07Separate object listing routines out of rev-listLibravatar Junio C Hamano1-97/+13
Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-05Teach rev-list an option to read revs from the standard input.Libravatar Junio C Hamano1-0/+25
When --stdin option is given, in addition to the <rev>s listed on the command line, the command can read one rev parameter per line from the standard input. The list of revs ends at the first empty line or EOF. Note that you still have to give all the flags from the command line; only rev arguments (including A..B, A...B, and A^@ notations) can be give from the standard input. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-02Replace uses of strdup with xstrdup.Libravatar Shawn Pearce1-2/+2
Like xmalloc and xrealloc xstrdup dies with a useful message if the native strdup() implementation returns NULL rather than a valid pointer. I just tried to use xstrdup in new code and found it to be missing. However I expected it to be present as xmalloc and xrealloc are already commonly used throughout the code. [jc: removed the part that deals with last_XXX, which I am finding more and more dubious these days.] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-28Add --relative-date option to the revision interfaceLibravatar Jonas Fonseca1-1/+1
Exposes the infrastructure from 9a8e35e98793af086f05d1ca9643052df9b44a74. Signed-off-by: Jonas Fonseca <fonseca@diku.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-27free(NULL) is perfectly valid.Libravatar Junio C Hamano1-4/+2
Jonas noticed some places say "if (X) free(X)" which is totally unnecessary. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15remove unnecessary initializationsLibravatar David Rientjes1-3/+3
[jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-29Call setup_git_directory() much earlierLibravatar Linus Torvalds1-2/+2
This changes the calling convention of built-in commands and passes the "prefix" (i.e. pathname of $PWD relative to the project root level) down to them. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-28Call setup_git_directory() earlyLibravatar Linus Torvalds1-1/+1
Any git command that expects to work in a subdirectory of a project, and that reads the git config files (which is just about all of them) needs to make sure that it does the "setup_git_directory()" call before it tries to read the config file. This means, among other things, that we need to move the call out of "init_revisions()", and into the caller. This does the mostly trivial conversion to do that. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-12Remove TYPE_* constant macros and use object_type enums consistently.Libravatar Linus Torvalds1-3/+3
This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19Add "named object array" conceptLibravatar Linus Torvalds1-31/+33
We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17Some more memory leak avoidanceLibravatar Linus Torvalds1-0/+8
This is really the dregs of my effort to not waste memory in git-rev-list, and makes barely one percent of a difference in the memory footprint, but hey, it's also a pretty small patch. It discards the parent lists and the commit buffer after the commit has been shown by git-rev-list (and "git log" - which already did the commit buffer part), and frees the commit list entry that was used by the revision walker. The big win would be to get rid of the "refs" pointer in the object structure (another 5%), because it's only used by fsck. That would require some pretty major surgery to fsck, though, so I'm timid and did the less interesting but much easier part instead. This (percentually) makes a bigger difference to "git log" and friends, since those are walking _just_ commits, and thus the list entries tend to be a bigger percentage of the memory use. But the "list all objects" case does improve too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17Shrink "struct object" a bitLibravatar Linus Torvalds1-3/+3
This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-05rev-list: fix process_tree() conversion.Libravatar Linus Torvalds1-2/+2
The tree-walking conversion of the "process_tree()" function broke packing by using an unrelated variable from outer scope. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30tree_entry(): new tree-walking helper functionLibravatar Linus Torvalds1-11/+5
This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29Remove "tree->entries" tree-entry list from tree parserLibravatar Linus Torvalds1-10/+16
Instead, just use the tree buffer directly, and use the tree-walk infrastructure to walk the buffers instead of the tree-entry list. The tree-entry list is inefficient, and generates tons of small allocations for no good reason. The tree-walk infrastructure is generally no harder to use than following a linked list, and allows us to do most tree parsing in-place. Some programs still use the old tree-entry lists, and are a bit painful to convert without major surgery. For them we have a helper function that creates a temporary tree-entry list on demand. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29Make "tree_entry" have a SHA1 instead of a union of object pointersLibravatar Linus Torvalds1-2/+2
This is preparatory work for further cleanups, where we try to make tree_entry look more like the more efficient tree-walk descriptor. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29Make "struct tree" contain the pointer to the tree bufferLibravatar Linus Torvalds1-1/+2
This allows us to avoid allocating information for names etc, because we can just use the information from the tree buffer directly. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-28Fix memory leak in "git rev-list --objects"Libravatar Linus Torvalds1-0/+3
Martin Langhoff points out that "git repack -a" ends up using up a lot of memory for big archives, and that git cvsimport probably should do only incremental repacks in order to avoid having repacking flush all the caches. The big majority of the memory usage of repacking is from git rev-list tracking all objects, and this patch should go a long way in avoiding the excessive memory usage: the bulk of it was due to the object names being leaked from the tree parser. For the historic Linux kernel archive, this simple patch does: Before: /usr/bin/time git-rev-list --all --objects > /dev/null 72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+125376minor)pagefaults 0swaps After: /usr/bin/time git-rev-list --all --objects > /dev/null 75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+43921minor)pagefaults 0swaps where we do end up wasting a bit of time on some extra strdup()s (which could be avoided, but that would require tracking where the pathnames came from), but we avoid a lot of memory usage. Minor page faults track maximum RSS very closely (each page fault maps in one page into memory), so the reduction from 125376 page faults to 43921 means a rough reduction of VM footprint from almost half a gigabyte to about a third of that. Those numbers were also double-checked by looking at "top" while the process was running. (Side note: at least part of the remaining VM footprint is the mapping of the 177MB pack-file, so the remaining memory use is at least partly "well behaved" from a project caching perspective). For the current git archive itself, the memory usage for a "--all --objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to 9MB), so the reduction seems to hold for much smaller projects too. For regular "git-rev-list" usage (ie without the "--objects" flag) this patch has no impact. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-21fmt-patch: Support --attachLibravatar Johannes Schindelin1-1/+1
This patch touches a couple of files, because it adds options to print a custom text just after the subject of a commit, and just after the diffstat. [jc: made "many dashes" used as the boundary leader into a single variable, to reduce the possibility of later tweaks to miscount the number of dashes to break it.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-21Merge branch 'master' into js/fmt-patchLibravatar Junio C Hamano1-0/+358
* master: (119 commits) diff family: add --check option Document that "git add" only adds non-ignored files. Add a conversion tool to migrate remote information into the config fetch, pull: ask config for remote information Fix build procedure for builtin-init-db read-tree -m -u: do not overwrite or remove untracked working tree files. apply --cached: do not check newly added file in the working tree Implement a --dry-run option to git-quiltimport Implement git-quiltimport Revert "builtin-grep: workaround for non GNU grep." builtin-grep: workaround for non GNU grep. builtin-grep: workaround for non GNU grep. git-am: use apply --cached apply --cached: apply a patch without using working tree. apply --numstat: show new name, not old name. Documentation/Makefile: create tarballs for the man pages and html files Allow pickaxe and diff-filter options to be used by git log. Libify the index refresh logic Builtin git-init-db Remove unnecessary local in get_ref_sha1. ...
2006-05-18Make "git rev-list" be a builtinLibravatar Linus Torvalds1-0/+358
This was surprisingly easy. The diff is truly minimal: rename "main()" to "cmd_rev_list()" in rev-list.c, and rename the whole file to reflect its new built-in status. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>