From 0fabafd0b948bfc1c07db73494863d581c7f5d04 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 22 Feb 2021 21:25:27 -0500 Subject: builtin/repack.c: add '--geometric' option Often it is useful to both: - have relatively few packfiles in a repository, and - avoid having so few packfiles in a repository that we repack its entire contents regularly This patch implements a '--geometric=' option in 'git repack'. This allows the caller to specify that they would like each pack to be at least a factor times as large as the previous largest pack (by object count). Concretely, say that a repository has 'n' packfiles, labeled P1, P2, ..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'. With a geometric factor of 'r', it should be that: objects(Pi) > r*objects(P(i-1)) for all i in [1, n], where the packs are sorted by objects(P1) <= objects(P2) <= ... <= objects(Pn). Since finding a true optimal repacking is NP-hard, we approximate it along two directions: 1. We assume that there is a cutoff of packs _before starting the repack_ where everything to the right of that cut-off already forms a geometric progression (or no cutoff exists and everything must be repacked). 2. We assume that everything smaller than the cutoff count must be repacked. This forms our base assumption, but it can also cause even the "heavy" packs to get repacked, for e.g., if we have 6 packs containing the following number of objects: 1, 1, 1, 2, 4, 32 then we would place the cutoff between '1, 1' and '1, 2, 4, 32', rolling up the first two packs into a pack with 2 objects. That breaks our progression and leaves us: 2, 1, 2, 4, 32 ^ (where the '^' indicates the position of our split). To restore a progression, we move the split forward (towards larger packs) joining each pack into our new pack until a geometric progression is restored. Here, that looks like: 2, 1, 2, 4, 32 ~> 3, 2, 4, 32 ~> 5, 4, 32 ~> ... ~> 9, 32 ^ ^ ^ ^ This has the advantage of not repacking the heavy-side of packs too often while also only creating one new pack at a time. Another wrinkle is that we assume that loose, indexed, and reflog'd objects are insignificant, and lump them into any new pack that we create. This can lead to non-idempotent results. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau Reviewed-by: Jeff King Signed-off-by: Junio C Hamano --- Documentation/git-repack.txt | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'Documentation') diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 92f146d27d..136da9fa0b 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -165,6 +165,29 @@ depth is 4095. Pass the `--delta-islands` option to `git-pack-objects`, see linkgit:git-pack-objects[1]. +-g=:: +--geometric=:: + Arrange resulting pack structure so that each successive pack + contains at least `` times the number of objects as the + next-largest pack. ++ +`git repack` ensures this by determining a "cut" of packfiles that need +to be repacked into one in order to ensure a geometric progression. It +picks the smallest set of packfiles such that as many of the larger +packfiles (by count of objects contained in that pack) may be left +intact. ++ +Unlike other repack modes, the set of objects to pack is determined +uniquely by the set of packs being "rolled-up"; in other words, the +packs determined to need to be combined in order to restore a geometric +progression. ++ +When `--unpacked` is specified, loose objects are implicitly included in +this "roll-up", without respect to their reachability. This is subject +to change in the future. This option (implying a drastically different +repack mode) is not guaranteed to work with all other combinations of +option to `git repack`). + Configuration ------------- -- cgit v1.2.3