Merge branch 'es/walken-tutorial'

A tutorial on object enumeration. * es/walken-tutorial: documentation: add tutorial for object walking
author: Junio C Hamano <gitster@pobox.com> 2019-11-10 18:02:11 +0900
committer: Junio C Hamano <gitster@pobox.com> 2019-11-10 18:02:11 +0900
commit: 15d9f3dc6642497b1186e943cdccba3a8f9f0b0e (patch)
tree: 8b63c1d71896398ec3f4d2816037756a7f338ed8 /Documentation/MyFirstObjectWalk.txt
parent: Merge branch 'jt/fetch-pack-record-refs-in-the-dot-promisor' (diff)
parent: documentation: add tutorial for object walking (diff)
download: tgif-15d9f3dc6642497b1186e943cdccba3a8f9f0b0e.tar.xz
1 files changed, 906 insertions, 0 deletions
diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
new file mode 100644
index 0000000000..4d24daeb9f
--- /dev/null
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -0,0 +1,906 @@
+= My First Object Walk
+
+== What's an Object Walk?
+
+The object walk is a key concept in Git - this is the process that underpins
+operations like object transfer and fsck. Beginning from a given commit, the
+list of objects is found by walking parent relationships between commits (commit
+X based on commit W) and containment relationships between objects (tree Y is
+contained within commit X, and blob Z is located within tree Y, giving our
+working tree for commit X something like `y/z.txt`).
+
+A related concept is the revision walk, which is focused on commit objects and
+their parent relationships and does not delve into other object types. The
+revision walk is used for operations like `git log`.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
+  gives a good overview of the types of objects in Git and what your object
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+----
+git checkout -b revwalk origin/master
+----
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+----
+/*
+ * "git walken"
+ *
+ * Part of the "My First Object Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
+----
+
+NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
+off at runtime. For the purposes of this tutorial, we will write `walken` as
+though it is intended for use as a "plumbing" command: that is, a command which
+is used primarily in scripts, rather than interactively by humans (a "porcelain"
+command). So we will send our debug output to `trace_printf()` instead. When
+running, enable trace output by setting the environment variable `GIT_TRACE`.
+
+Add usage text and `-h` handling, like all subcommands should consistently do
+(our test suite will notice and complain if you fail to do so).
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	const char * const walken_usage[] = {
+		N_("git walken"),
+		NULL,
+	}
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	...
+}
+----
+
+Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix);
+----
+
+Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
+maintaining alphabetical ordering:
+
+----
+{ "walken", cmd_walken, RUN_SETUP },
+----
+
+Add it to the `Makefile` near the line for `builtin/worktree.o`:
+
+----
+BUILTIN_OBJS += builtin/walken.o
+----
+
+Build and test out your command, without forgetting to ensure the `DEVELOPER`
+flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
+
+----
+$ echo DEVELOPER=1 >>config.mak
+$ make
+$ GIT_TRACE=1 ./bin-wrappers/git walken
+----
+
+NOTE: For a more exhaustive overview of the new command process, take a look at
+`Documentation/MyFirstContribution.txt`.
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/revwalk.
+
+=== `struct rev_cmdline_info`
+
+The definition of `struct rev_cmdline_info` can be found in `revision.h`.
+
+This struct is contained within the `rev_info` struct and is used to reflect
+parameters provided by the user over the CLI.
+
+`nr` represents the number of `rev_cmdline_entry` present in the array.
+
+`alloc` is used by the `ALLOC_GROW` macro. Check
+`Documentation/technical/api-allocation-growing.txt` - this variable is used to
+track the allocated size of the list.
+
+Per entry, we find:
+
+`item` is the object provided upon which to base the object walk. Items in Git
+can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
+
+`name` is the object ID (OID) of the object - a hex string you may be familiar
+with from using Git to organize your source in the past. Check the tutorial
+mentioned above towards the top for a discussion of where the OID can come
+from.
+
+`whence` indicates some information about what to do with the parents of the
+specified object. We'll explore this flag more later on; take a look at
+`Documentation/revisions.txt` to get an idea of what could set the `whence`
+value.
+
+`flags` are used to hint the beginning of the revision walk and are the first
+block under the `#include`s in `revision.h`. The most likely ones to be set in
+the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
+can be used during the walk, as well.
+
+=== `struct rev_info`
+
+This one is quite a bit longer, and many fields are only used during the walk
+by `revision.c` - not configuration options. Most of the configurable flags in
+`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
+good idea to take some time and read through that document.
+
+== Basic Commit Walk
+
+First, let's see if we can replicate the output of `git log --oneline`. We'll
+refer back to the implementation frequently to discover norms when performing
+an object walk of our own.
+
+To do so, we'll first find all the commits, in order, which preceded the current
+commit. We'll extract the name and subject of the commit from each.
+
+Ideally, we will also be able to find out which ones are currently at the tip of
+various branches.
+
+=== Setting Up
+
+Preparing for your object walk has some distinct stages.
+
+1. Perform default setup for this mode, and others which may be invoked.
+2. Check configuration files for relevant settings.
+3. Set up the `rev_info` struct.
+4. Tweak the initialized `rev_info` to suit the current walk.
+5. Prepare the `rev_info` for the walk.
+6. Iterate over the objects, processing each one.
+
+==== Default Setups
+
+Before examining configuration files which may modify command behavior, set up
+default state for switches or options your command may have. If your command
+utilizes other Git components, ask them to set up their default states as well.
+For instance, `git log` takes advantage of `grep` and `diff` functionality, so
+its `init_log_defaults()` sets its own state (`decoration_style`) and asks
+`grep` and `diff` to initialize themselves by calling each of their
+initialization functions.
+
+For our first example within `git walken`, we don't intend to use any other
+components within Git, and we don't have any configuration to do.  However, we
+may want to add some later, so for now, we can add an empty placeholder. Create
+a new function in `builtin/walken.c`:
+
+----
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+----
+
+Make sure to add a line invoking it inside of `cmd_walken()`.
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	init_walken_defaults();
+}
+----
+
+==== Configuring From `.gitconfig`
+
+Next, we should have a look at any relevant configuration settings (i.e.,
+settings readable and settable from `git config`). This is done by providing a
+callback to `git_config()`; within that callback, you can also invoke methods
+from other components you may need that need to intercept these options. Your
+callback will be invoked once per each configuration value which Git knows about
+(global, local, worktree, etc.).
+
+Similarly to the default values, we don't have anything to do here yet
+ourselves; however, we should call `git_default_config()` if we aren't calling
+any other existing config callbacks.
+
+Add a new function to `builtin/walken.c`:
+
+----
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back to
+	 * the default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+----
+
+Make sure to invoke `git_config()` with it in your `cmd_walken()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	git_config(git_walken_config, NULL);
+
+	...
+}
+----
+
+==== Setting Up `rev_info`
+
+Now that we've gathered external configuration and options, it's time to
+initialize the `rev_info` object which we will use to perform the walk. This is
+typically done by calling `repo_init_revisions()` with the repository you intend
+to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
+struct.
+
+Add the `struct rev_info` and the `repo_init_revisions()` call:
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/* This can go wherever you like in your declarations.*/
+	struct rev_info rev;
+	...
+
+	/* This should go after the git_config() call. */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	...
+}
+----
+
+==== Tweaking `rev_info` For the Walk
+
+We're getting close, but we're still not quite ready to go. Now that `rev` is
+initialized, we can modify it to fit our needs. This is usually done within a
+helper for clarity, so let's add one:
+
+----
+static void final_rev_info_setup(struct rev_info *rev)
+{
+	/*
+	 * We want to mimic the appearance of `git log --oneline`, so let's
+	 * force oneline format.
+	 */
+	get_commit_format("oneline", rev);
+
+	/* Start our object walk at HEAD. */
+	add_head_to_pending(rev);
+}
+----
+
+[NOTE]
+====
+Instead of using the shorthand `add_head_to_pending()`, you could do
+something like this:
+----
+	struct setup_revision_opt opt;
+
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	opt.revarg_opt = REVARG_COMMITTISH;
+	setup_revisions(argc, argv, rev, &opt);
+----
+Using a `setup_revision_opt` gives you finer control over your walk's starting
+point.
+====
+
+Then let's invoke `final_rev_info_setup()` after the call to
+`repo_init_revisions()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	final_rev_info_setup(&rev);
+
+	...
+}
+----
+
+Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
+now, this is all we need.
+
+==== Preparing `rev_info` For the Walk
+
+Now that `rev` is all initialized and configured, we've got one more setup step
+before we get rolling. We can do this in a helper, which will both prepare the
+`rev_info` for the walk, and perform the walk itself. Let's start the helper
+with the call to `prepare_revision_walk()`, which can return an error without
+dying on its own:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+}
+----
+
+NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
+`stderr` it's likely to be seen by a human, so we will localize it.
+
+==== Performing the Walk!
+
+Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
+can also be used as an iterator; we move to the next item in the walk by using
+`get_revision()` repeatedly. Add the listed variable declarations at the top and
+the walk loop below the `prepare_revision_walk()` call within your
+`walken_commit_walk()`:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	...
+
+	while ((commit = get_revision(rev))) {
+		if (!commit)
+			continue;
+
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		puts(prettybuf.buf);
+	}
+	strbuf_release(&prettybuf);
+}
+----
+
+NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
+command we expect to be machine-parsed, we're sending it directly to stdout.
+
+Give it a shot.
+
+----
+$ make
+$ ./bin-wrappers/git walken
+----
+
+You should see all of the subject lines of all the commits in
+your tree's history, in order, ending with the initial commit, "Initial revision
+of "git", the information manager from hell". Congratulations! You've written
+your first revision walk. You can play with printing some additional fields
+from each commit if you're curious; have a look at the functions available in
+`commit.h`.
+
+=== Adding a Filter
+
+Next, let's try to filter the commits we see based on their author. This is
+equivalent to running `git log --author=<pattern>`. We can add a filter by
+modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
+
+First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
+`grep_config()` to `git_walken_config()`:
+
+----
+static void init_walken_defaults(void)
+{
+	init_grep_defaults(the_repository);
+}
+
+...
+
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	grep_config(var, value, cb);
+	return git_default_config(var, value, cb);
+}
+----
+
+Next, we can modify the `grep_filter`. This is done with convenience functions
+found in `grep.h`. For fun, we're filtering to only commits from folks using a
+`gmail.com` email address - a not-very-precise guess at who may be working on
+Git as a hobby. Since we're checking the author, which is a specific line in the
+header, we'll use the `append_header_grep_pattern()` helper. We can use
+the `enum grep_header_field` to indicate which part of the commit header we want
+to search.
+
+In `final_rev_info_setup()`, add your filter line:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
+		"gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
+	...
+}
+----
+
+`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
+it won't work unless we compile it with `compile_grep_patterns()`.
+
+NOTE: If you are using `setup_revisions()` (for example, if you are passing a
+`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
+to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
+
+NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
+wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
+`enum grep_pat_token` for us.
+
+=== Changing the Order
+
+There are a few ways that we can change the order of the commits during a
+revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
+typical orderings.
+
+`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
+before all of its children have been shown, and we avoid mixing commits which
+are in different lines of history. (`git help log`'s section on `--topo-order`
+has a very nice diagram to illustrate this.)
+
+Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
+`REV_SORT_BY_AUTHOR_DATE`. Add the following:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+
+	...
+}
+----
+
+Let's output this into a file so we can easily diff it with the walk sorted by
+author date.
+
+----
+$ make
+$ ./bin-wrappers/git walken > commit-date.txt
+----
+
+Then, let's sort by author date and run it again.
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
+
+	...
+}
+----
+
+----
+$ make
+$ ./bin-wrappers/git walken > author-date.txt
+----
+
+Finally, compare the two. This is a little less helpful without object names or
+dates, but hopefully we get the idea.
+
+----
+$ diff -u commit-date.txt author-date.txt
+----
+
+This display indicates that commits can be reordered after they're written, for
+example with `git rebase`.
+
+Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
+Set that flag somewhere inside of `final_rev_info_setup()`:
+
+----
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	...
+
+	rev->reverse = 1;
+
+	...
+}
+----
+
+Run your walk again and note the difference in order. (If you remove the grep
+pattern, you should see the last commit this call gives you as your current
+HEAD.)
+
+== Basic Object Walk
+
+So far we've been walking only commits. But Git has more types of objects than
+that! Let's see if we can walk _all_ objects, and find out some information
+about each one.
+
+We can base our work on an example. `git pack-objects` prepares all kinds of
+objects for packing into a bitmap or packfile. The work we are interested in
+resides in `builtins/pack-objects.c:get_object_list()`; examination of that
+function shows that the all-object walk is being performed by
+`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
+functions reside in `list-objects.c`; examining the source shows that, despite
+the name, these functions traverse all kinds of objects. Let's have a look at
+the arguments to `traverse_commit_list_filtered()`, which are a superset of the
+arguments to the unfiltered version.
+
+- `struct list_objects_filter_options *filter_options`: This is a struct which
+  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `show_commit_fn show_commit`: A callback which will be used to handle each
+  individual commit object.
+- `show_object_fn show_object`: A callback which will be used to handle each
+  non-commit object (so each blob, tree, or tag).
+- `void *show_data`: A context buffer which is passed in turn to `show_commit`
+  and `show_object`.
+- `struct oidset *omitted`: A linked-list of object IDs which the provided
+  filter caused to be omitted.
+
+It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
+instead of needing us to call it repeatedly ourselves. Cool! Let's add the
+callbacks first.
+
+For the sake of this tutorial, we'll simply keep track of how many of each kind
+of object we find. At file scope in `builtin/walken.c` add the following
+tracking variables:
+
+----
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+----
+
+Commits are handled by a different callback than other objects; let's do that
+one first:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+----
+
+The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
+the `buf` argument is actually the context buffer that we can provide to the
+traversal calls - `show_data`, which we mentioned a moment ago.
+
+Since we have the `struct commit` object, we can look at all the same parts that
+we looked at in our earlier commit-only walk. For the sake of this tutorial,
+though, we'll just increment the commit counter and move on.
+
+The callback for non-commits is a little different, as we'll need to check
+which kind of object we're dealing with:
+
+----
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		BUG("unexpected commit object in walken_show_object\n");
+	default:
+		BUG("unexpected object type %s in walken_show_object\n",
+			type_name(obj->type));
+	}
+}
+----
+
+Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
+context pointer that `walken_show_commit()` receives: the `show_data` argument
+to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
+`str` contains the name of the object, which ends up being something like
+`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
+
+To help assure us that we aren't double-counting commits, we'll include some
+complaining if a commit object is routed through our non-commit callback; we'll
+also complain if we see an invalid object type. Since those two cases should be
+unreachable, and would only change in the event of a semantic change to the Git
+codebase, we complain by using `BUG()` - which is a signal to a developer that
+the change they made caused unintended consequences, and the rest of the
+codebase needs to be updated to understand that change. `BUG()` is not intended
+to be seen by the public, so it is not localized.
+
+Our main object walk implementation is substantially different from our commit
+walk implementation, so let's make a new function to perform the object walk. We
+can perform setup which is applicable to all objects here, too, to keep separate
+from setup which is applicable to commit-only walks.
+
+We'll start by enabling all types of objects in the `struct rev_info`.  We'll
+also turn on `tree_blobs_in_commit_order`, which means that we will walk a
+commit's tree and everything it points to immediately after we find each commit,
+as opposed to waiting for the end and walking through all trees after the commit
+history has been discovered. With the appropriate settings configured, we are
+ready to call `prepare_revision_walk()`.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+----
+
+Let's start by calling just the unfiltered walk and reporting our counts.
+Complete your implementation of `walken_object_walk()`:
+
+----
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
+		blob_count, tag_count, tree_count);
+}
+----
+
+NOTE: This output is intended to be machine-parsed. Therefore, we are not
+sending it to `trace_printf()`, and we are not localizing it - we need scripts
+to be able to count on the formatting to be exactly the way it is shown here.
+If we were intending this output to be read by humans, we would need to localize
+it with `_()`.
+
+Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
+command line options is out of scope for this tutorial, so we'll just hardcode
+a branch we can change at compile time. Where you call `final_rev_info_setup()`
+and `walken_commit_walk()`, instead branch like so:
+
+----
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
+----
+
+NOTE: For simplicity, we've avoided all the filters and sorts we applied in
+`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
+want, you can certainly use the filters we added before by moving
+`final_rev_info_setup()` out of the conditional and removing the call to
+`add_head_to_pending()`.
+
+Now we can try to run our command! It should take noticeably longer than the
+commit walk, but an examination of the output will give you an idea why. Your
+output should look similar to this example, but with different counts:
+
+----
+Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
+----
+
+This makes sense. We have more trees than commits because the Git project has
+lots of subdirectories which can change, plus at least one tree per commit. We
+have no tags because we started on a commit (`HEAD`) and while tags can point to
+commits, commits can't point to tags.
+
+NOTE: You will have different counts when you run this yourself! The number of
+objects grows along with the Git project.
+
+=== Adding a Filter
+
+There are a handful of filters that we can apply to the object walk laid out in
+`Documentation/rev-list-options.txt`. These filters are typically useful for
+operations such as creating packfiles or performing a partial clone. They are
+defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
+will use the "tree:1" filter, which causes the walk to omit all trees and blobs
+which are not directly referenced by commits reachable from the commit in
+`pending` when the walk begins. (`pending` is the list of objects which need to
+be traversed during a walk; you can imagine a breadth-first tree traversal to
+help understand. In our case, that means we omit trees and blobs not directly
+referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
+`HEAD` in the `pending` list.)
+
+First, we'll need to `#include "list-objects-filter-options.h`" and set up the
+`struct list_objects_filter_options` at the top of the function.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	struct list_objects_filter_options filter_options = {};
+
+	...
+----
+
+For now, we are not going to track the omitted objects, so we'll replace those
+parameters with `NULL`. For the sake of simplicity, we'll add a simple
+build-time branch to use our filter or not. Replace the line calling
+`traverse_commit_list()` with the following, which will remind us which kind of
+walk we've just performed:
+
+----
+	if (0) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(
+			_("Filtered object walk with filterspec 'tree:1'.\n"));
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
+----
+
+`struct list_objects_filter_options` is usually built directly from a command
+line argument, so the module provides an easy way to build one from a string.
+Even though we aren't taking user input right now, we can still build one with
+a hardcoded string using `parse_list_objects_filter()`.
+
+With the filter spec "tree:1", we are expecting to see _only_ the root tree for
+each commit; therefore, the tree object count should be less than or equal to
+the number of commits. (For an example of why that's true: `git commit --revert`
+points to the same tree object as its grandparent.)
+
+=== Counting Omitted Objects
+
+We also have the capability to enumerate all objects which were omitted by a
+filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
+`traverse_commit_list_filtered()` to populate the `omitted` list means that our
+object walk does not perform any better than an unfiltered object walk; all
+reachable objects are walked in order to populate the list.
+
+First, add the `struct oidset` and related items we will use to iterate it:
+
+----
+static void walken_object_walk(
+	...
+
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
+
+	...
+----
+
+Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
+object:
+
+----
+	...
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, &omitted);
+
+	...
+----
+
+Then, after your traversal, the `oidset` traversal is pretty straightforward.
+Count all the objects within and modify the print statement:
+
+----
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
+		commit_count, blob_count, tag_count, tree_count, omitted_count);
+----
+
+By running your walk with and without the filter, you should find that the total
+object count in each case is identical. You can also time each invocation of
+the `walken` subcommand, with and without `omitted` being passed in, to confirm
+to yourself the runtime impact of tracking all omitted objects.
+
+=== Changing the Order
+
+Finally, let's demonstrate that you can also reorder walks of all objects, not
+just walks of commits. First, we'll make our handlers chattier - modify
+`walken_show_commit()` and `walken_show_object()` to print the object as they
+go:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
+
+	...
+}
+----
+
+NOTE: Since we will be examining this output directly as humans, we'll use
+`trace_printf()` here. Additionally, since this change introduces a significant
+number of printed lines, using `trace_printf()` will allow us to easily silence
+those lines without having to recompile.
+
+(Leave the counter increment logic in place.)
+
+With only that change, run again (but save yourself some scrollback):
+
+----
+$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
+----
+
+Take a look at the top commit with `git show` and the object ID you printed; it
+should be the same as the output of `git show HEAD`.
+
+Next, let's change a setting on our `struct rev_info` within
+`walken_object_walk()`. Find where you're changing the other settings on `rev`,
+such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
+`reverse` setting at the bottom:
+
+----
+	...
+
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->reverse = 1;
+
+	...
+----
+
+Now, run again, but this time, let's grab the last handful of objects instead
+of the first handful:
+
+----
+$ make
+$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
+----
+
+The last commit object given should have the same OID as the one we saw at the
+top before, and running `git show <oid>` with that OID should give you again
+the same results as `git show HEAD`. Furthermore, if you run and examine the
+first ten lines again (with `head` instead of `tail` like we did before applying
+the `reverse` setting), you should see that now the first commit printed is the
+initial commit, `e83c5163`.
+
+== Wrapping Up
+
+Let's review. In this tutorial, we:
+
+- Built a commit walk from the ground up
+- Enabled a grep filter for that commit walk
+- Changed the sort order of that filtered commit walk
+- Built an object walk (tags, commits, trees, and blobs) from the ground up
+- Learned how to add a filter-spec to an object walk
+- Changed the display order of the filtered object walk
author	Junio C Hamano <gitster@pobox.com>	2019-11-10 18:02:11 +0900
committer	Junio C Hamano <gitster@pobox.com>	2019-11-10 18:02:11 +0900
commit	15d9f3dc6642497b1186e943cdccba3a8f9f0b0e (patch)
tree	8b63c1d71896398ec3f4d2816037756a7f338ed8 /Documentation/MyFirstObjectWalk.txt
parent	Merge branch 'jt/fetch-pack-record-refs-in-the-dot-promisor' (diff)
parent	documentation: add tutorial for object walking (diff)
download	tgif-15d9f3dc6642497b1186e943cdccba3a8f9f0b0e.tar.xz