diff options
Diffstat (limited to 'Documentation/git-pack-objects.txt')
-rw-r--r-- | Documentation/git-pack-objects.txt | 422 |
1 files changed, 422 insertions, 0 deletions
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt new file mode 100644 index 0000000000..f85cb7ea93 --- /dev/null +++ b/Documentation/git-pack-objects.txt @@ -0,0 +1,422 @@ +git-pack-objects(1) +=================== + +NAME +---- +git-pack-objects - Create a packed archive of objects + + +SYNOPSIS +-------- +[verse] +'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied] + [--no-reuse-delta] [--delta-base-offset] [--non-empty] + [--local] [--incremental] [--window=<n>] [--depth=<n>] + [--revs [--unpacked | --all]] [--keep-pack=<pack-name>] + [--stdout [--filter=<filter-spec>] | base-name] + [--shallow] [--keep-true-parents] [--[no-]sparse] < object-list + + +DESCRIPTION +----------- +Reads list of objects from the standard input, and writes either one or +more packed archives with the specified base-name to disk, or a packed +archive to the standard output. + +A packed archive is an efficient way to transfer a set of objects +between two repositories as well as an access efficient archival +format. In a packed archive, an object is either stored as a +compressed whole or as a difference from some other object. +The latter is often called a delta. + +The packed archive format (.pack) is designed to be self-contained +so that it can be unpacked without any further information. Therefore, +each object that a delta depends upon must be present within the pack. + +A pack index file (.idx) is generated for fast, random access to the +objects in the pack. Placing both the index file (.idx) and the packed +archive (.pack) in the pack/ subdirectory of $GIT_OBJECT_DIRECTORY (or +any of the directories on $GIT_ALTERNATE_OBJECT_DIRECTORIES) +enables Git to read from the pack archive. + +The 'git unpack-objects' command can read the packed archive and +expand the objects contained in the pack into "one-file +one-object" format; this is typically done by the smart-pull +commands when a pack is created on-the-fly for efficient network +transport by their peers. + + +OPTIONS +------- +base-name:: + Write into pairs of files (.pack and .idx), using + <base-name> to determine the name of the created file. + When this option is used, the two files in a pair are written in + <base-name>-<SHA-1>.{pack,idx} files. <SHA-1> is a hash + based on the pack content and is written to the standard + output of the command. + +--stdout:: + Write the pack contents (what would have been written to + .pack file) out to the standard output. + +--revs:: + Read the revision arguments from the standard input, instead of + individual object names. The revision arguments are processed + the same way as 'git rev-list' with the `--objects` flag + uses its `commit` arguments to build the list of objects it + outputs. The objects on the resulting list are packed. + Besides revisions, `--not` or `--shallow <SHA-1>` lines are + also accepted. + +--unpacked:: + This implies `--revs`. When processing the list of + revision arguments read from the standard input, limit + the objects packed to those that are not already packed. + +--all:: + This implies `--revs`. In addition to the list of + revision arguments read from the standard input, pretend + as if all refs under `refs/` are specified to be + included. + +--include-tag:: + Include unasked-for annotated tags if the object they + reference was included in the resulting packfile. This + can be useful to send new tags to native Git clients. + +--window=<n>:: +--depth=<n>:: + These two options affect how the objects contained in + the pack are stored using delta compression. The + objects are first internally sorted by type, size and + optionally names and compared against the other objects + within --window to see if using delta compression saves + space. --depth limits the maximum delta depth; making + it too deep affects the performance on the unpacker + side, because delta data needs to be applied that many + times to get to the necessary object. ++ +The default value for --window is 10 and --depth is 50. The maximum +depth is 4095. + +--window-memory=<n>:: + This option provides an additional limit on top of `--window`; + the window size will dynamically scale down so as to not take + up more than '<n>' bytes in memory. This is useful in + repositories with a mix of large and small objects to not run + out of memory with a large window, but still be able to take + advantage of the large window for the smaller objects. The + size can be suffixed with "k", "m", or "g". + `--window-memory=0` makes memory usage unlimited. The default + is taken from the `pack.windowMemory` configuration variable. + +--max-pack-size=<n>:: + In unusual scenarios, you may not be able to create files + larger than a certain size on your filesystem, and this option + can be used to tell the command to split the output packfile + into multiple independent packfiles, each not larger than the + given size. The size can be suffixed with + "k", "m", or "g". The minimum size allowed is limited to 1 MiB. + This option + prevents the creation of a bitmap index. + The default is unlimited, unless the config variable + `pack.packSizeLimit` is set. + +--honor-pack-keep:: + This flag causes an object already in a local pack that + has a .keep file to be ignored, even if it would have + otherwise been packed. + +--keep-pack=<pack-name>:: + This flag causes an object already in the given pack to be + ignored, even if it would have otherwise been + packed. `<pack-name>` is the pack file name without + leading directory (e.g. `pack-123.pack`). The option could be + specified multiple times to keep multiple packs. + +--incremental:: + This flag causes an object already in a pack to be ignored + even if it would have otherwise been packed. + +--local:: + This flag causes an object that is borrowed from an alternate + object store to be ignored even if it would have otherwise been + packed. + +--non-empty:: + Only create a packed archive if it would contain at + least one object. + +--progress:: + Progress status is reported on the standard error stream + by default when it is attached to a terminal, unless -q + is specified. This flag forces progress status even if + the standard error stream is not directed to a terminal. + +--all-progress:: + When --stdout is specified then progress report is + displayed during the object count and compression phases + but inhibited during the write-out phase. The reason is + that in some cases the output stream is directly linked + to another command which may wish to display progress + status of its own as it processes incoming pack data. + This flag is like --progress except that it forces progress + report for the write-out phase as well even if --stdout is + used. + +--all-progress-implied:: + This is used to imply --all-progress whenever progress display + is activated. Unlike --all-progress this flag doesn't actually + force any progress display by itself. + +-q:: + This flag makes the command not to report its progress + on the standard error stream. + +--no-reuse-delta:: + When creating a packed archive in a repository that + has existing packs, the command reuses existing deltas. + This sometimes results in a slightly suboptimal pack. + This flag tells the command not to reuse existing deltas + but compute them from scratch. + +--no-reuse-object:: + This flag tells the command not to reuse existing object data at all, + including non deltified object, forcing recompression of everything. + This implies --no-reuse-delta. Useful only in the obscure case where + wholesale enforcement of a different compression level on the + packed data is desired. + +--compression=<n>:: + Specifies compression level for newly-compressed data in the + generated pack. If not specified, pack compression level is + determined first by pack.compression, then by core.compression, + and defaults to -1, the zlib default, if neither is set. + Add --no-reuse-object if you want to force a uniform compression + level on all data no matter the source. + +--[no-]sparse:: + Toggle the "sparse" algorithm to determine which objects to include in + the pack, when combined with the "--revs" option. This algorithm + only walks trees that appear in paths that introduce new objects. + This can have significant performance benefits when computing + a pack to send a small change. However, it is possible that extra + objects are added to the pack-file if the included commits contain + certain types of direct renames. If this option is not included, + it defaults to the value of `pack.useSparse`, which is true unless + otherwise specified. + +--thin:: + Create a "thin" pack by omitting the common objects between a + sender and a receiver in order to reduce network transfer. This + option only makes sense in conjunction with --stdout. ++ +Note: A thin pack violates the packed archive format by omitting +required objects and is thus unusable by Git without making it +self-contained. Use `git index-pack --fix-thin` +(see linkgit:git-index-pack[1]) to restore the self-contained property. + +--shallow:: + Optimize a pack that will be provided to a client with a shallow + repository. This option, combined with --thin, can result in a + smaller pack at the cost of speed. + +--delta-base-offset:: + A packed archive can express the base object of a delta as + either a 20-byte object name or as an offset in the + stream, but ancient versions of Git don't understand the + latter. By default, 'git pack-objects' only uses the + former format for better compatibility. This option + allows the command to use the latter format for + compactness. Depending on the average delta chain + length, this option typically shrinks the resulting + packfile by 3-5 per-cent. ++ +Note: Porcelain commands such as `git gc` (see linkgit:git-gc[1]), +`git repack` (see linkgit:git-repack[1]) pass this option by default +in modern Git when they put objects in your repository into pack files. +So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle. + +--threads=<n>:: + Specifies the number of threads to spawn when searching for best + delta matches. This requires that pack-objects be compiled with + pthreads otherwise this option is ignored with a warning. + This is meant to reduce packing time on multiprocessor machines. + The required amount of memory for the delta search window is + however multiplied by the number of threads. + Specifying 0 will cause Git to auto-detect the number of CPU's + and set the number of threads accordingly. + +--index-version=<version>[,<offset>]:: + This is intended to be used by the test suite only. It allows + to force the version for the generated pack index, and to force + 64-bit index entries on objects located above the given offset. + +--keep-true-parents:: + With this option, parents that are hidden by grafts are packed + nevertheless. + +--filter=<filter-spec>:: + Requires `--stdout`. Omits certain objects (usually blobs) from + the resulting packfile. See linkgit:git-rev-list[1] for valid + `<filter-spec>` forms. + +--no-filter:: + Turns off any previous `--filter=` argument. + +--missing=<missing-action>:: + A debug option to help with future "partial clone" development. + This option specifies how missing objects are handled. ++ +The form '--missing=error' requests that pack-objects stop with an error if +a missing object is encountered. If the repository is a partial clone, an +attempt to fetch missing objects will be made before declaring them missing. +This is the default action. ++ +The form '--missing=allow-any' will allow object traversal to continue +if a missing object is encountered. No fetch of a missing object will occur. +Missing objects will silently be omitted from the results. ++ +The form '--missing=allow-promisor' is like 'allow-any', but will only +allow object traversal to continue for EXPECTED promisor missing objects. +No fetch of a missing object will occur. An unexpected missing object will +raise an error. + +--exclude-promisor-objects:: + Omit objects that are known to be in the promisor remote. (This + option has the purpose of operating only on locally created objects, + so that when we repack, we still maintain a distinction between + locally created objects [without .promisor] and objects from the + promisor remote [with .promisor].) This is used with partial clone. + +--keep-unreachable:: + Objects unreachable from the refs in packs named with + --unpacked= option are added to the resulting pack, in + addition to the reachable objects that are not in packs marked + with *.keep files. This implies `--revs`. + +--pack-loose-unreachable:: + Pack unreachable loose objects (and their loose counterparts + removed). This implies `--revs`. + +--unpack-unreachable:: + Keep unreachable objects in loose form. This implies `--revs`. + +--delta-islands:: + Restrict delta matches based on "islands". See DELTA ISLANDS + below. + + +DELTA ISLANDS +------------- + +When possible, `pack-objects` tries to reuse existing on-disk deltas to +avoid having to search for new ones on the fly. This is an important +optimization for serving fetches, because it means the server can avoid +inflating most objects at all and just send the bytes directly from +disk. This optimization can't work when an object is stored as a delta +against a base which the receiver does not have (and which we are not +already sending). In that case the server "breaks" the delta and has to +find a new one, which has a high CPU cost. Therefore it's important for +performance that the set of objects in on-disk delta relationships match +what a client would fetch. + +In a normal repository, this tends to work automatically. The objects +are mostly reachable from the branches and tags, and that's what clients +fetch. Any deltas we find on the server are likely to be between objects +the client has or will have. + +But in some repository setups, you may have several related but separate +groups of ref tips, with clients tending to fetch those groups +independently. For example, imagine that you are hosting several "forks" +of a repository in a single shared object store, and letting clients +view them as separate repositories through `GIT_NAMESPACE` or separate +repos using the alternates mechanism. A naive repack may find that the +optimal delta for an object is against a base that is only found in +another fork. But when a client fetches, they will not have the base +object, and we'll have to find a new delta on the fly. + +A similar situation may exist if you have many refs outside of +`refs/heads/` and `refs/tags/` that point to related objects (e.g., +`refs/pull` or `refs/changes` used by some hosting providers). By +default, clients fetch only heads and tags, and deltas against objects +found only in those other groups cannot be sent as-is. + +Delta islands solve this problem by allowing you to group your refs into +distinct "islands". Pack-objects computes which objects are reachable +from which islands, and refuses to make a delta from an object `A` +against a base which is not present in all of `A`'s islands. This +results in slightly larger packs (because we miss some delta +opportunities), but guarantees that a fetch of one island will not have +to recompute deltas on the fly due to crossing island boundaries. + +When repacking with delta islands the delta window tends to get +clogged with candidates that are forbidden by the config. Repacking +with a big --window helps (and doesn't take as long as it otherwise +might because we can reject some object pairs based on islands before +doing any computation on the content). + +Islands are configured via the `pack.island` option, which can be +specified multiple times. Each value is a left-anchored regular +expressions matching refnames. For example: + +------------------------------------------- +[pack] +island = refs/heads/ +island = refs/tags/ +------------------------------------------- + +puts heads and tags into an island (whose name is the empty string; see +below for more on naming). Any refs which do not match those regular +expressions (e.g., `refs/pull/123`) is not in any island. Any object +which is reachable only from `refs/pull/` (but not heads or tags) is +therefore not a candidate to be used as a base for `refs/heads/`. + +Refs are grouped into islands based on their "names", and two regexes +that produce the same name are considered to be in the same +island. The names are computed from the regexes by concatenating any +capture groups from the regex, with a '-' dash in between. (And if +there are no capture groups, then the name is the empty string, as in +the above example.) This allows you to create arbitrary numbers of +islands. Only up to 14 such capture groups are supported though. + +For example, imagine you store the refs for each fork in +`refs/virtual/ID`, where `ID` is a numeric identifier. You might then +configure: + +------------------------------------------- +[pack] +island = refs/virtual/([0-9]+)/heads/ +island = refs/virtual/([0-9]+)/tags/ +island = refs/virtual/([0-9]+)/(pull)/ +------------------------------------------- + +That puts the heads and tags for each fork in their own island (named +"1234" or similar), and the pull refs for each go into their own +"1234-pull". + +Note that we pick a single island for each regex to go into, using "last +one wins" ordering (which allows repo-specific config to take precedence +over user-wide config, and so forth). + + +CONFIGURATION +------------- + +Various configuration variables affect packing, see +linkgit:git-config[1] (search for "pack" and "delta"). + +Notably, delta compression is not used on objects larger than the +`core.bigFileThreshold` configuration variable and on files with the +attribute `delta` set to false. + +SEE ALSO +-------- +linkgit:git-rev-list[1] +linkgit:git-repack[1] +linkgit:git-prune-packed[1] + +GIT +--- +Part of the linkgit:git[1] suite |