summaryrefslogtreecommitdiff
path: root/Documentation/technical/pack-format.txt
AgeCommit message (Collapse)AuthorFilesLines
2020-08-19Merge branch 'ds/sha256-leftover-bits'Libravatar Junio C Hamano1-1/+6
midx and commit-graph files now use the byte defined in their file format specification for identifying the hash function used for object names. * ds/sha256-leftover-bits: multi-pack-index: use hash version byte commit-graph: use the "hash version" byte t/README: document GIT_TEST_DEFAULT_HASH
2020-08-17multi-pack-index: use hash version byteLibravatar Derrick Stolee1-1/+6
Similar to the commit-graph format, the multi-pack-index format has a byte in the header intended to track the hash version used to write the file. This allows one to interpret the hash length without having the context of the repository config specifying the hash length. This was not modified as part of the SHA-256 work because the hash length was automatically up-shifted due to that config. Since we have this byte available, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Add a new oid_version() method in midx.c similar to the one in commit-graph.c. This is specifically made separate from that implementation to avoid artificially linking the formats. The test impact requires a few more things than the corresponding change in the commit-graph format. Specifically, 'test-tool read-midx' was not writing anything about this header value to output. Since the value available in 'struct multi_pack_index' is hash_len instead of a version value, we output "20" or "32" instead of "1" or "2". Since we want a user to not have their Git commands fail if their multi-pack-index has the incorrect hash version compared to the repository's hash version, we relax the die() to an error() in load_multi_pack_index(). This has some effect on 'git multi-pack-index verify' as we need to check that a failed parse of a file that exists is actually a verify error. For that test that checks the hash version matches, we change the corrupted byte from "2" to "3" to ensure the test fails for both hash algorithms. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13docs: document SHA-256 pack and indicesLibravatar brian m. carlson1-15/+21
Now that we have SHA-256 support for packs and indices, let's document that in SHA-256 repositories, we use SHA-256 instead of SHA-1 for object names and checksums. Instead of duplicating this information throughout the document, let's just document that in SHA-1 repositories, we use SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10pack-format: correct multi-pack-index descriptionLibravatar Johannes Berg1-2/+3
The description of the multi-pack-index contains a small bug, if all offsets are < 2^32 then there will be no LOFF chunk, not only if they're all < 2^31 (since the highest bit is only needed as the "LOFF-escape" when that's actually needed.) Correct this, and clarify that in that case only offsets up to 2^31-1 can be stored in the OOFF chunk. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object offsetsLibravatar Derrick Stolee1-1/+14
The final pair of chunks for the multi-pack-index file stores the object offsets. We default to using 32-bit offsets as in the pack-index version 1 format, but if there exists an offset larger than 32-bits, we use a trick similar to the pack-index version 2 format by storing all offsets at least 2^31 in a 64-bit table; we use the 32-bit table to point into that 64-bit table as necessary. We only store these 64-bit offsets if necessary, so create a test that manipulates a version 2 pack-index to fake a large offset. This allows us to test that the large offset table is created, but the data does not match the actual packfile offsets. The multi-pack-index offset does match the (corrupted) pack-index offset, so a future feature will compare these offsets during a 'verify' step. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object id fanout chunkLibravatar Derrick Stolee1-0/+5
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object ids in a chunkLibravatar Derrick Stolee1-0/+4
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: write pack names in chunkLibravatar Derrick Stolee1-0/+6
The multi-pack-index needs to track which packfiles it indexes. Store these in our first required chunk. Since filenames are not well structured, add padding to keep good alignment in later chunks. Modify the 'git multi-pack-index read' subcommand to output the existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh to reflect this new output and the new expected number of chunks. Defense in depth: A pattern we are using in the multi-pack-index feature is to verify the data as we write it. We want to ensure we never write invalid data to the multi-pack-index. There are many checks that verify that the values we are writing fit the format definitions. This mainly helps developers while working on the feature, but it can also identify issues that only appear when dealing with very large data sets. These large sets are hard to encode into test cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-12multi-pack-index: add format detailsLibravatar Derrick Stolee1-0/+49
The multi-pack-index feature generalizes the existing pack-index feature by indexing objects across multiple pack-files. Describe the basic file format, using a 12-byte header followed by a lookup table for a list of "chunks" which will be described later. The file ends with a footer containing a checksum using the hash algorithm. The header allows later versions to create breaking changes by advancing the version number. We can also change the hash algorithm using a different version value. We will add the individual chunk format information as we introduce the code that writes that information. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13pack-format.txt: more details on pack file formatLibravatar Nguyễn Thái Ngọc Duy1-0/+92
The current document mentions OBJ_* constants without their actual values. A git developer would know these are from cache.h but that's not very friendly to a person who wants to read this file to implement a pack file parser. Similarly, the deltified representation is not documented at all (the "document" is basically patch-delta.c). Translate that C code to English with a bit more about what ofs-delta and ref-delta mean. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15The name of the hash function is "SHA-1", not "SHA1"Libravatar Thomas Ackermann1-7/+7
Use "SHA-1" instead of "SHA1" whenever we talk about the hash function. When used as a programming symbol, we keep "SHA1". Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12Merge branch 'maint-1.8.1' into maintLibravatar Junio C Hamano1-1/+3
* maint-1.8.1: fast-export: fix argument name in error messages Documentation: distinguish between ref and offset deltas in pack-format
2013-04-12Documentation: distinguish between ref and offset deltas in pack-formatLibravatar Stefan Saasen1-1/+3
eb32d236 introduced the OBJ_OFS_DELTA object that uses a relative offset to identify the base object instead of the 20-byte SHA1 reference. The pack file documentation only mentions the SHA1 based reference in its description of the deltified object entry. Update the pack format documentation to clarify that the deltified object representation refers to its base using either a relative negative offset or the absolute SHA1 identifier. Signed-off-by: Stefan Saasen <ssaasen@atlassian.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-01Documentation: avoid poor-man's small caps GITLibravatar Thomas Ackermann1-2/+2
In the earlier days, we used to spell the name of the system as GIT, to simulate as if it were typeset with capital G and IT in small caps. Later we stopped doing so at around 1.6.5 days. Let's stop doing so throughout the documentation. The name to refer to the whole system (and the concept it embodies) is "Git"; the command end-users type is "git". And document this in the coding guideline. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-16Documentation/technical: convert plain text files to asciidocLibravatar Thomas Ackermann1-4/+4
These were not originally meant for asciidoc, but they are already so close. Mark them up in asciidoc. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-06Add description of OFS_DELTA to the pack format descriptionLibravatar Peter Eriksen1-1/+15
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Acked-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-07Documentation: typofixLibravatar Ralf Wildenhues1-1/+1
Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-14Documentation: describe pack idx v2Libravatar linux@horizon.com1-33/+61
Lifted from the log message of c553ca25bd60dc9fd50b8bc7bd329601b81cee66 (pack-objects: learn about pack index version 2). Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07War on whitespaceLibravatar Junio C Hamano1-4/+4
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-03-22Documentation/pack-format.txt: Clear up description of types.Libravatar Peter Eriksen1-4/+6
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30Improved pack format documentation.Libravatar Shawn Pearce1-3/+8
While trying to implement a pack reader in Java I was mislead by some facts listed in this documentation as well as found a few details to be missing about the pack header. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-07Add Documentation/technical/pack-format.txtLibravatar Junio C Hamano1-0/+111
... along with the previous one, pack-heuristics, by popular demand. Signed-off-by: Junio C Hamano <junkio@cox.net>