diff options
Diffstat (limited to 'Documentation/git-pack-objects.txt')
-rw-r--r-- | Documentation/git-pack-objects.txt | 239 |
1 files changed, 218 insertions, 21 deletions
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index 20c8551d6a..dbfd1f9017 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -12,14 +12,16 @@ SYNOPSIS 'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied] [--no-reuse-delta] [--delta-base-offset] [--non-empty] [--local] [--incremental] [--window=<n>] [--depth=<n>] - [--revs [--unpacked | --all]] [--stdout | base-name] - [--keep-true-parents] < object-list + [--revs [--unpacked | --all]] [--keep-pack=<pack-name>] + [--stdout [--filter=<filter-spec>] | base-name] + [--shallow] [--keep-true-parents] [--[no-]sparse] < object-list DESCRIPTION ----------- -Reads list of objects from the standard input, and writes a packed -archive with specified base-name, or to the standard output. +Reads list of objects from the standard input, and writes either one or +more packed archives with the specified base-name to disk, or a packed +archive to the standard output. A packed archive is an efficient way to transfer a set of objects between two repositories as well as an access efficient archival @@ -35,7 +37,7 @@ A pack index file (.idx) is generated for fast, random access to the objects in the pack. Placing both the index file (.idx) and the packed archive (.pack) in the pack/ subdirectory of $GIT_OBJECT_DIRECTORY (or any of the directories on $GIT_ALTERNATE_OBJECT_DIRECTORIES) -enables git to read from the pack archive. +enables Git to read from the pack archive. The 'git unpack-objects' command can read the packed archive and expand the objects contained in the pack into "one-file @@ -47,12 +49,11 @@ transport by their peers. OPTIONS ------- base-name:: - Write into a pair of files (.pack and .idx), using + Write into pairs of files (.pack and .idx), using <base-name> to determine the name of the created file. - When this option is used, the two files are written in - <base-name>-<SHA1>.{pack,idx} files. <SHA1> is a hash - of the sorted object names to make the resulting filename - based on the pack content, and written to the standard + When this option is used, the two files in a pair are written in + <base-name>-<SHA-1>.{pack,idx} files. <SHA-1> is a hash + based on the pack content and is written to the standard output of the command. --stdout:: @@ -65,6 +66,8 @@ base-name:: the same way as 'git rev-list' with the `--objects` flag uses its `commit` arguments to build the list of objects it outputs. The objects on the resulting list are packed. + Besides revisions, `--not` or `--shallow <SHA-1>` lines are + also accepted. --unpacked:: This implies `--revs`. When processing the list of @@ -80,7 +83,17 @@ base-name:: --include-tag:: Include unasked-for annotated tags if the object they reference was included in the resulting packfile. This - can be useful to send new tags to native git clients. + can be useful to send new tags to native Git clients. + +--stdin-packs:: + Read the basenames of packfiles (e.g., `pack-1234abcd.pack`) + from the standard input, instead of object names or revision + arguments. The resulting pack contains all objects listed in the + included packs (those not beginning with `^`), excluding any + objects listed in the excluded packs (beginning with `^`). ++ +Incompatible with `--revs`, or options that imply `--revs` (such as +`--all`), with the exception of `--unpacked`, which is compatible. --window=<n>:: --depth=<n>:: @@ -93,7 +106,9 @@ base-name:: it too deep affects the performance on the unpacker side, because delta data needs to be applied that many times to get to the necessary object. - The default value for --window is 10 and --depth is 50. ++ +The default value for --window is 10 and --depth is 50. The maximum +depth is 4095. --window-memory=<n>:: This option provides an additional limit on top of `--window`; @@ -103,21 +118,33 @@ base-name:: out of memory with a large window, but still be able to take advantage of the large window for the smaller objects. The size can be suffixed with "k", "m", or "g". - `--window-memory=0` makes memory usage unlimited, which is the - default. + `--window-memory=0` makes memory usage unlimited. The default + is taken from the `pack.windowMemory` configuration variable. --max-pack-size=<n>:: - Maximum size of each output pack file. The size can be suffixed with + In unusual scenarios, you may not be able to create files + larger than a certain size on your filesystem, and this option + can be used to tell the command to split the output packfile + into multiple independent packfiles, each not larger than the + given size. The size can be suffixed with "k", "m", or "g". The minimum size allowed is limited to 1 MiB. - If specified, multiple packfiles may be created. The default is unlimited, unless the config variable - `pack.packSizeLimit` is set. + `pack.packSizeLimit` is set. Note that this option may result in + a larger and slower repository; see the discussion in + `pack.packSizeLimit`. --honor-pack-keep:: This flag causes an object already in a local pack that has a .keep file to be ignored, even if it would have otherwise been packed. +--keep-pack=<pack-name>:: + This flag causes an object already in the given pack to be + ignored, even if it would have otherwise been + packed. `<pack-name>` is the pack file name without + leading directory (e.g. `pack-123.pack`). The option could be + specified multiple times to keep multiple packs. + --incremental:: This flag causes an object already in a pack to be ignored even if it would have otherwise been packed. @@ -179,20 +206,36 @@ base-name:: Add --no-reuse-object if you want to force a uniform compression level on all data no matter the source. +--[no-]sparse:: + Toggle the "sparse" algorithm to determine which objects to include in + the pack, when combined with the "--revs" option. This algorithm + only walks trees that appear in paths that introduce new objects. + This can have significant performance benefits when computing + a pack to send a small change. However, it is possible that extra + objects are added to the pack-file if the included commits contain + certain types of direct renames. If this option is not included, + it defaults to the value of `pack.useSparse`, which is true unless + otherwise specified. + --thin:: Create a "thin" pack by omitting the common objects between a sender and a receiver in order to reduce network transfer. This option only makes sense in conjunction with --stdout. + Note: A thin pack violates the packed archive format by omitting -required objects and is thus unusable by git without making it +required objects and is thus unusable by Git without making it self-contained. Use `git index-pack --fix-thin` (see linkgit:git-index-pack[1]) to restore the self-contained property. +--shallow:: + Optimize a pack that will be provided to a client with a shallow + repository. This option, combined with --thin, can result in a + smaller pack at the cost of speed. + --delta-base-offset:: A packed archive can express the base object of a delta as either a 20-byte object name or as an offset in the - stream, but ancient versions of git don't understand the + stream, but ancient versions of Git don't understand the latter. By default, 'git pack-objects' only uses the former format for better compatibility. This option allows the command to use the latter format for @@ -202,7 +245,7 @@ self-contained. Use `git index-pack --fix-thin` + Note: Porcelain commands such as `git gc` (see linkgit:git-gc[1]), `git repack` (see linkgit:git-repack[1]) pass this option by default -in modern git when they put objects in your repository into pack files. +in modern Git when they put objects in your repository into pack files. So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle. --threads=<n>:: @@ -212,7 +255,7 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle. This is meant to reduce packing time on multiprocessor machines. The required amount of memory for the delta search window is however multiplied by the number of threads. - Specifying 0 will cause git to auto-detect the number of CPU's + Specifying 0 will cause Git to auto-detect the number of CPU's and set the number of threads accordingly. --index-version=<version>[,<offset>]:: @@ -224,6 +267,160 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle. With this option, parents that are hidden by grafts are packed nevertheless. +--filter=<filter-spec>:: + Requires `--stdout`. Omits certain objects (usually blobs) from + the resulting packfile. See linkgit:git-rev-list[1] for valid + `<filter-spec>` forms. + +--no-filter:: + Turns off any previous `--filter=` argument. + +--missing=<missing-action>:: + A debug option to help with future "partial clone" development. + This option specifies how missing objects are handled. ++ +The form '--missing=error' requests that pack-objects stop with an error if +a missing object is encountered. If the repository is a partial clone, an +attempt to fetch missing objects will be made before declaring them missing. +This is the default action. ++ +The form '--missing=allow-any' will allow object traversal to continue +if a missing object is encountered. No fetch of a missing object will occur. +Missing objects will silently be omitted from the results. ++ +The form '--missing=allow-promisor' is like 'allow-any', but will only +allow object traversal to continue for EXPECTED promisor missing objects. +No fetch of a missing object will occur. An unexpected missing object will +raise an error. + +--exclude-promisor-objects:: + Omit objects that are known to be in the promisor remote. (This + option has the purpose of operating only on locally created objects, + so that when we repack, we still maintain a distinction between + locally created objects [without .promisor] and objects from the + promisor remote [with .promisor].) This is used with partial clone. + +--keep-unreachable:: + Objects unreachable from the refs in packs named with + --unpacked= option are added to the resulting pack, in + addition to the reachable objects that are not in packs marked + with *.keep files. This implies `--revs`. + +--pack-loose-unreachable:: + Pack unreachable loose objects (and their loose counterparts + removed). This implies `--revs`. + +--unpack-unreachable:: + Keep unreachable objects in loose form. This implies `--revs`. + +--delta-islands:: + Restrict delta matches based on "islands". See DELTA ISLANDS + below. + + +DELTA ISLANDS +------------- + +When possible, `pack-objects` tries to reuse existing on-disk deltas to +avoid having to search for new ones on the fly. This is an important +optimization for serving fetches, because it means the server can avoid +inflating most objects at all and just send the bytes directly from +disk. This optimization can't work when an object is stored as a delta +against a base which the receiver does not have (and which we are not +already sending). In that case the server "breaks" the delta and has to +find a new one, which has a high CPU cost. Therefore it's important for +performance that the set of objects in on-disk delta relationships match +what a client would fetch. + +In a normal repository, this tends to work automatically. The objects +are mostly reachable from the branches and tags, and that's what clients +fetch. Any deltas we find on the server are likely to be between objects +the client has or will have. + +But in some repository setups, you may have several related but separate +groups of ref tips, with clients tending to fetch those groups +independently. For example, imagine that you are hosting several "forks" +of a repository in a single shared object store, and letting clients +view them as separate repositories through `GIT_NAMESPACE` or separate +repos using the alternates mechanism. A naive repack may find that the +optimal delta for an object is against a base that is only found in +another fork. But when a client fetches, they will not have the base +object, and we'll have to find a new delta on the fly. + +A similar situation may exist if you have many refs outside of +`refs/heads/` and `refs/tags/` that point to related objects (e.g., +`refs/pull` or `refs/changes` used by some hosting providers). By +default, clients fetch only heads and tags, and deltas against objects +found only in those other groups cannot be sent as-is. + +Delta islands solve this problem by allowing you to group your refs into +distinct "islands". Pack-objects computes which objects are reachable +from which islands, and refuses to make a delta from an object `A` +against a base which is not present in all of `A`'s islands. This +results in slightly larger packs (because we miss some delta +opportunities), but guarantees that a fetch of one island will not have +to recompute deltas on the fly due to crossing island boundaries. + +When repacking with delta islands the delta window tends to get +clogged with candidates that are forbidden by the config. Repacking +with a big --window helps (and doesn't take as long as it otherwise +might because we can reject some object pairs based on islands before +doing any computation on the content). + +Islands are configured via the `pack.island` option, which can be +specified multiple times. Each value is a left-anchored regular +expressions matching refnames. For example: + +------------------------------------------- +[pack] +island = refs/heads/ +island = refs/tags/ +------------------------------------------- + +puts heads and tags into an island (whose name is the empty string; see +below for more on naming). Any refs which do not match those regular +expressions (e.g., `refs/pull/123`) is not in any island. Any object +which is reachable only from `refs/pull/` (but not heads or tags) is +therefore not a candidate to be used as a base for `refs/heads/`. + +Refs are grouped into islands based on their "names", and two regexes +that produce the same name are considered to be in the same +island. The names are computed from the regexes by concatenating any +capture groups from the regex, with a '-' dash in between. (And if +there are no capture groups, then the name is the empty string, as in +the above example.) This allows you to create arbitrary numbers of +islands. Only up to 14 such capture groups are supported though. + +For example, imagine you store the refs for each fork in +`refs/virtual/ID`, where `ID` is a numeric identifier. You might then +configure: + +------------------------------------------- +[pack] +island = refs/virtual/([0-9]+)/heads/ +island = refs/virtual/([0-9]+)/tags/ +island = refs/virtual/([0-9]+)/(pull)/ +------------------------------------------- + +That puts the heads and tags for each fork in their own island (named +"1234" or similar), and the pull refs for each go into their own +"1234-pull". + +Note that we pick a single island for each regex to go into, using "last +one wins" ordering (which allows repo-specific config to take precedence +over user-wide config, and so forth). + + +CONFIGURATION +------------- + +Various configuration variables affect packing, see +linkgit:git-config[1] (search for "pack" and "delta"). + +Notably, delta compression is not used on objects larger than the +`core.bigFileThreshold` configuration variable and on files with the +attribute `delta` set to false. + SEE ALSO -------- linkgit:git-rev-list[1] |