summaryrefslogtreecommitdiff
path: root/Documentation/technical/commit-graph.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical/commit-graph.txt')
-rw-r--r--Documentation/technical/commit-graph.txt69
1 files changed, 39 insertions, 30 deletions
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index 0550c6d0dc..001395e950 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -77,6 +77,29 @@ in the commit graph. We can treat these commits as having "infinite"
generation number and walk until reaching commits with known generation
number.
+We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
+in the commit-graph file. If a commit-graph file was written by a version
+of Git that did not compute generation numbers, then those commits will
+have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
+
+Since the commit-graph file is closed under reachability, we can guarantee
+the following weaker condition on all commits:
+
+ If A and B are commits with generation numbers N amd M, respectively,
+ and N < M, then A cannot reach B.
+
+Note how the strict inequality differs from the inequality when we have
+fully-computed generation numbers. Using strict inequality may result in
+walking a few extra commits, but the simplicity in dealing with commits
+with generation number *_INFINITY or *_ZERO is valuable.
+
+We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
+generation numbers are computed to be at least this value. We limit at
+this value since it is the largest value that can be stored in the
+commit-graph file using the 30 bits available to generation numbers. This
+presents another case where a commit can have generation number equal to
+that of a parent.
+
Design Details
--------------
@@ -89,46 +112,32 @@ Design Details
- The file format includes parameters for the object ID hash function,
so a future change of hash algorithm does not require a change in format.
+- Commit grafts and replace objects can change the shape of the commit
+ history. The latter can also be enabled/disabled on the fly using
+ `--no-replace-objects`. This leads to difficultly storing both possible
+ interpretations of a commit id, especially when computing generation
+ numbers. The commit-graph will not be read or written when
+ replace-objects or grafts are present.
+
+- Shallow clones create grafts of commits by dropping their parents. This
+ leads the commit-graph to think those commits have generation number 1.
+ If and when those commits are made unshallow, those generation numbers
+ become invalid. Since shallow clones are intended to restrict the commit
+ history to a very small set of commits, the commit-graph feature is less
+ helpful for these clones, anyway. The commit-graph will not be read or
+ written when shallow commits are present.
+
Future Work
-----------
-- The commit graph feature currently does not honor commit grafts. This can
- be remedied by duplicating or refactoring the current graft logic.
-
-- The 'commit-graph' subcommand does not have a "verify" mode that is
- necessary for integration with fsck.
-
-- The file format includes room for precomputed generation numbers. These
- are not currently computed, so all generation numbers will be marked as
- 0 (or "uncomputed"). A later patch will include this calculation.
-
- After computing and storing generation numbers, we must make graph
walks aware of generation numbers to gain the performance benefits they
enable. This will mostly be accomplished by swapping a commit-date-ordered
priority queue with one ordered by generation number. The following
operations are important candidates:
- - paint_down_to_common()
- 'log --topo-order'
-
-- Currently, parse_commit_gently() requires filling in the root tree
- object for a commit. This passes through lookup_tree() and consequently
- lookup_object(). Also, it calls lookup_commit() when loading the parents.
- These method calls check the ODB for object existence, even if the
- consumer does not need the content. For example, we do not need the
- tree contents when computing merge bases. Now that commit parsing is
- removed from the computation time, these lookup operations are the
- slowest operations keeping graph walks from being fast. Consider
- loading these objects without verifying their existence in the ODB and
- only loading them fully when consumers need them. Consider a method
- such as "ensure_tree_loaded(commit)" that fully loads a tree before
- using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
- When this feature stabilizes enough to recommend to most users, we should
- add automatic graph writes to common operations that create many commits.
- For example, one could compute a graph on 'clone', 'fetch', or 'repack'
- commands.
+ - 'tag --merged'
- A server could provide a commit graph file as part of the network protocol
to avoid extra calculations by clients. This feature is only of benefit if