1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
|
GIT pack format
===============
= pack-*.pack file has the following format:
- The header appears at the beginning and consists of the following:
4-byte signature:
The signature is: {'P', 'A', 'C', 'K'}
4-byte version number (network byte order):
GIT currently accepts version number 2 or 3 but
generates version 2 only.
4-byte number of objects contained in the pack (network byte order)
Observation: we cannot have more than 4G versions ;-) and
more than 4G objects in a pack.
- The header is followed by number of object entries, each of
which looks like this:
(undeltified representation)
n-byte type and length (4-bit type, (n-1)*7+4-bit length)
compressed data
(deltified representation)
n-byte type and length (4-bit type, (n-1)*7+4-bit length)
20-byte base object name
compressed delta data
Observation: length of each object is encoded in a variable
length format and is not constrained to 32-bit or anything.
- The trailer records 20-byte SHA1 checksum of all of the above.
= pack-*.idx file has the following format:
- The header consists of 256 4-byte network byte order
integers. N-th entry of this table records the number of
objects in the corresponding pack, the first byte of whose
object name are smaller than N. This is called the
'first-level fan-out' table.
Observation: we would need to extend this to an array of
8-byte integers to go beyond 4G objects per pack, but it is
not strictly necessary.
- The header is followed by sorted 24-byte entries, one entry
per object in the pack. Each entry is:
4-byte network byte order integer, recording where the
object is stored in the packfile as the offset from the
beginning.
20-byte object name.
Observation: we would definitely need to extend this to
8-byte integer plus 20-byte object name to handle a packfile
that is larger than 4GB.
- The file is concluded with a trailer:
A copy of the 20-byte SHA1 checksum at the end of
corresponding packfile.
20-byte SHA1-checksum of all of the above.
Pack Idx file:
idx
+--------------------------------+
| fanout[0] = 2 |-.
+--------------------------------+ |
| fanout[1] | |
+--------------------------------+ |
| fanout[2] | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| fanout[255] | |
+--------------------------------+ |
main | offset | |
index | object name 00XXXXXXXXXXXXXXXX | |
table +--------------------------------+ |
| offset | |
| object name 00XXXXXXXXXXXXXXXX | |
+--------------------------------+ |
.-| offset |<+
| | object name 01XXXXXXXXXXXXXXXX |
| +--------------------------------+
| | offset |
| | object name 01XXXXXXXXXXXXXXXX |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | offset |
| | object name FFXXXXXXXXXXXXXXXX |
| +--------------------------------+
trailer | | packfile checksum |
| +--------------------------------+
| | idxfile checksum |
| +--------------------------------+
.-------.
|
Pack file entry: <+
packed object header:
1-byte type (upper 4-bit)
size0 (lower 4-bit)
n-byte sizeN (as long as MSB is set, each 7-bit)
size0..sizeN form 4+7+7+..+7 bit integer, size0
is the most significant part.
packed object data:
If it is not DELTA, then deflated bytes (the size above
is the size before compression).
If it is DELTA, then
20-byte base object name SHA1 (the size above is the
size of the delta data that follows).
delta data, deflated.
|