summaryrefslogtreecommitdiff
path: root/Documentation/technical/pack-format.txt
diff options
context:
space:
mode:
authorLibravatar Junio C Hamano <gitster@pobox.com>2010-01-30 16:03:10 -0800
committerLibravatar Junio C Hamano <gitster@pobox.com>2010-01-30 16:03:10 -0800
commit00d3278c8534a8244ae3447189401111e017fd5d (patch)
treef1c19903bc10ffe4816642040080fb6cfd5da376 /Documentation/technical/pack-format.txt
parentt6000lib: Fix permission (diff)
parentAdd a small patch-mode testing library (diff)
downloadtgif-00d3278c8534a8244ae3447189401111e017fd5d.tar.xz
Merge commit 'b319ef7' into jc/maint-fix-test-perm
* commit 'b319ef7': (8132 commits) Add a small patch-mode testing library git-apply--interactive: Refactor patch mode code t8005: Nobody writes Russian in shift_jis Fix severe breakage in "git-apply --whitespace=fix" Update release notes for 1.6.4 After renaming a section, print any trailing variable definitions Make section_name_match start on '[', and return the length on success send-email: detect cycles in alias expansion Show the presence of untracked files in the bash prompt. SunOS grep does not understand -C<n> nor -e Fix export_marks() error handling. git repack: keep commits hidden by a graft Add a test showing that 'git repack' throws away grafted-away parents git branch: clean up detached branch handling git branch: avoid unnecessary object lookups git branch: fix performance problem git svn: fix shallow clone when upstream revision is too new do_one_ref(): null_sha1 check is not about broken ref configure.ac: properly unset NEEDS_SSL_WITH_CRYPTO when sha1 func is missing janitor: useless checks before free ...
Diffstat (limited to 'Documentation/technical/pack-format.txt')
-rw-r--r--Documentation/technical/pack-format.txt116
1 files changed, 79 insertions, 37 deletions
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 9ce3c473ae..1803e64e46 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -1,9 +1,9 @@
GIT pack format
===============
-= pack-*.pack file has the following format:
+= pack-*.pack files have the following format:
- - The header appears at the beginning and consists of the following:
+ - A header appears at the beginning and consists of the following:
4-byte signature:
The signature is: {'P', 'A', 'C', 'K'}
@@ -34,18 +34,14 @@ GIT pack format
- The trailer records 20-byte SHA1 checksum of all of the above.
-= pack-*.idx file has the following format:
+= Original (version 1) pack-*.idx files have the following format:
- The header consists of 256 4-byte network byte order
integers. N-th entry of this table records the number of
objects in the corresponding pack, the first byte of whose
- object name are smaller than N. This is called the
+ object name is less than or equal to N. This is called the
'first-level fan-out' table.
- Observation: we would need to extend this to an array of
- 8-byte integers to go beyond 4G objects per pack, but it is
- not strictly necessary.
-
- The header is followed by sorted 24-byte entries, one entry
per object in the pack. Each entry is:
@@ -55,10 +51,6 @@ GIT pack format
20-byte object name.
- Observation: we would definitely need to extend this to
- 8-byte integer plus 20-byte object name to handle a packfile
- that is larger than 4GB.
-
- The file is concluded with a trailer:
A copy of the 20-byte SHA1 checksum at the end of
@@ -68,43 +60,42 @@ GIT pack format
Pack Idx file:
- idx
- +--------------------------------+
- | fanout[0] = 2 |-.
- +--------------------------------+ |
+ -- +--------------------------------+
+fanout | fanout[0] = 2 (for example) |-.
+table +--------------------------------+ |
| fanout[1] | |
+--------------------------------+ |
| fanout[2] | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
- | fanout[255] | |
- +--------------------------------+ |
-main | offset | |
-index | object name 00XXXXXXXXXXXXXXXX | |
-table +--------------------------------+ |
- | offset | |
- | object name 00XXXXXXXXXXXXXXXX | |
- +--------------------------------+ |
- .-| offset |<+
- | | object name 01XXXXXXXXXXXXXXXX |
- | +--------------------------------+
- | | offset |
- | | object name 01XXXXXXXXXXXXXXXX |
- | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- | | offset |
- | | object name FFXXXXXXXXXXXXXXXX |
- | +--------------------------------+
+ | fanout[255] = total objects |---.
+ -- +--------------------------------+ | |
+main | offset | | |
+index | object name 00XXXXXXXXXXXXXXXX | | |
+table +--------------------------------+ | |
+ | offset | | |
+ | object name 00XXXXXXXXXXXXXXXX | | |
+ +--------------------------------+<+ |
+ .-| offset | |
+ | | object name 01XXXXXXXXXXXXXXXX | |
+ | +--------------------------------+ |
+ | | offset | |
+ | | object name 01XXXXXXXXXXXXXXXX | |
+ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
+ | | offset | |
+ | | object name FFXXXXXXXXXXXXXXXX | |
+ --| +--------------------------------+<--+
trailer | | packfile checksum |
| +--------------------------------+
| | idxfile checksum |
| +--------------------------------+
- .-------.
+ .-------.
|
Pack file entry: <+
packed object header:
1-byte size extension bit (MSB)
type (next 3 bit)
- size0 (lower 4-bit)
+ size0 (lower 4-bit)
n-byte sizeN (as long as MSB is set, each 7-bit)
size0..sizeN form 4+7+7+..+7 bit integer, size0
is the least significant part, and sizeN is the
@@ -112,7 +103,58 @@ Pack file entry: <+
packed object data:
If it is not DELTA, then deflated bytes (the size above
is the size before compression).
- If it is DELTA, then
+ If it is REF_DELTA, then
20-byte base object name SHA1 (the size above is the
- size of the delta data that follows).
+ size of the delta data that follows).
delta data, deflated.
+ If it is OFS_DELTA, then
+ n-byte offset (see below) interpreted as a negative
+ offset from the type-byte of the header of the
+ ofs-delta entry (the size above is the size of
+ the delta data that follows).
+ delta data, deflated.
+
+ offset encoding:
+ n bytes with MSB set in all but the last one.
+ The offset is then the number constructed by
+ concatenating the lower 7 bit of each byte, and
+ for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
+ to the result.
+
+
+
+= Version 2 pack-*.idx files support packs larger than 4 GiB, and
+ have some other reorganizations. They have the format:
+
+ - A 4-byte magic number '\377tOc' which is an unreasonable
+ fanout[0] value.
+
+ - A 4-byte version number (= 2)
+
+ - A 256-entry fan-out table just like v1.
+
+ - A table of sorted 20-byte SHA1 object names. These are
+ packed together without offset values to reduce the cache
+ footprint of the binary search for a specific object name.
+
+ - A table of 4-byte CRC32 values of the packed object data.
+ This is new in v2 so compressed data can be copied directly
+ from pack to pack during repacking without undetected
+ data corruption.
+
+ - A table of 4-byte offset values (in network byte order).
+ These are usually 31-bit pack file offsets, but large
+ offsets are encoded as an index into the next table with
+ the msbit set.
+
+ - A table of 8-byte offset entries (empty for pack files less
+ than 2 GiB). Pack files are organized with heavily used
+ objects toward the front, so most object references should
+ not need to refer to this table.
+
+ - The same trailer as a v1 pack file:
+
+ A copy of the 20-byte SHA1 checksum at the end of
+ corresponding packfile.
+
+ 20-byte SHA1-checksum of all of the above.