From cb9daf16db0e9935179fd995785e94c996267268 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Fri, 5 Jun 2020 13:00:24 +0000 Subject: commit-graph: fix parsing the Chunk Lookup table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The commit-graph file format specifies that the chunks may be in any order. However, if the OID Lookup chunk happens to be the last one in the file, then any command attempting to access the commit-graph data will fail with: fatal: invalid commit position. commit-graph is likely corrupt In this case the error is wrong, the commit-graph file does conform to the specification, but the parsing of the Chunk Lookup table is a bit buggy, and leaves the field holding the number of commits in the commit-graph zero-initialized. The number of commits in the commit-graph is determined while parsing the Chunk Lookup table, by dividing the size of the OID Lookup chunk with the hash size. However, the Chunk Lookup table doesn't actually store the size of the chunks, but it stores their starting offset. Consequently, the size of a chunk can only be calculated by subtracting the starting offsets of that chunk from the offset of the subsequent chunk, or in case of the last chunk from the offset recorded in the terminating label. This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the ID of each chunk and store its starting offset, then we check the ID of the last seen chunk and calculate its size using its previously saved offset if necessary (at the moment it's only necessary for the OID Lookup chunk). Alas, while parsing the Chunk Lookup table we only interate through the "real" chunks, but never look at the terminating label, thus don't even check whether it's necessary to calulate the size of the last chunk. Consequently, if the OID Lookup chunk is the last one, then we don't calculate its size and turn don't run the piece of code determining the number of commits in the commit graph, leaving the field holding that number unchanged (i.e. zero-initialized), eventually triggering the sanity check in load_oid_from_graph(). Fix this by iterating through all entries in the Chunk Lookup table, including the terminating label. Note that this is the minimal fix, suitable for the maintenance track. A better fix would be to simplify how the chunk sizes are calculated, but that is a more invasive change, less suitable for 'maint', so that will be done in later patches. This additional flexibility of scanning more chunks breaks a test for "git commit-graph verify" so alter that test to mutate the commit-graph to have an even lower chunk count. Signed-off-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- commit-graph.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 196e817a84..7807d94562 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -277,7 +277,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, last_chunk_id = 0; last_chunk_offset = 8; chunk_lookup = data + 8; - for (i = 0; i < graph->num_chunks; i++) { + for (i = 0; i <= graph->num_chunks; i++) { uint32_t chunk_id; uint64_t chunk_offset; int chunk_repeated = 0; -- cgit v1.2.3 From fa7965309e709e6dcd59c38fc6743d17d82d538b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Fri, 5 Jun 2020 13:00:28 +0000 Subject: commit-graph: clean up #includes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Our CodingGuidelines says that it's sufficient to include one of 'git-compat-util.h' and 'cache.h', but both 'commit-graph.c' and 'commit-graph.h' include both. Let's include only 'git-compat-util.h' to loose a bunch of unnecessary dependencies; but include 'hash.h', because 'commit-graph.h' does require the definition of 'struct object_id'. 'commit-graph.h' explicitly includes 'repository.h' and 'string-list.h', but only needs the declaration of a few structs from them. Drop these includes and forward-declare the necessary structs instead. 'commit-graph.c' includes 'dir.h', but doesn't actually use anything from there, so let's drop that #include as well. Signed-off-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- commit-graph.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 7807d94562..6ed649388d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1,7 +1,5 @@ -#include "cache.h" -#include "config.h" -#include "dir.h" #include "git-compat-util.h" +#include "config.h" #include "lockfile.h" #include "pack.h" #include "packfile.h" -- cgit v1.2.3 From 2ad4f1a7c43cac4ba52cff56f9090d3322c67224 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Fri, 5 Jun 2020 13:00:29 +0000 Subject: commit-graph: simplify parse_commit_graph() #1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While we iterate over all entries of the Chunk Lookup table we make sure that we don't attempt to read past the end of the mmap-ed commit-graph file, and check in each iteration that the chunk ID and offset we are about to read is still within the mmap-ed memory region. However, these checks in each iteration are not really necessary, because the number of chunks in the commit-graph file is already known before this loop from the just parsed commit-graph header. So let's check that the commit-graph file is large enough for all entries in the Chunk Lookup table before we start iterating over those entries, and drop those per-iteration checks. While at it, take into account the size of everything that is necessary to have a valid commit-graph file, i.e. the size of the header, the size of the mandatory OID Fanout chunk, and the size of the signature in the trailer as well. Note that this necessitates the change of the error message as well, and, consequently, have to update the 'detect incorrect chunk count' test in 't5318-commit-graph.sh' as well. Signed-off-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- commit-graph.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 6ed649388d..9927762f18 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -272,6 +272,15 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, graph->data = graph_map; graph->data_len = graph_size; + if (graph_size < GRAPH_HEADER_SIZE + + (graph->num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH + + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) { + error(_("commit-graph file is too small to hold %u chunks"), + graph->num_chunks); + free(graph); + return NULL; + } + last_chunk_id = 0; last_chunk_offset = 8; chunk_lookup = data + 8; @@ -280,13 +289,6 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, uint64_t chunk_offset; int chunk_repeated = 0; - if (data + graph_size - chunk_lookup < - GRAPH_CHUNKLOOKUP_WIDTH) { - error(_("commit-graph chunk lookup table entry missing; file may be incomplete")); - free(graph); - return NULL; - } - chunk_id = get_be32(chunk_lookup + 0); chunk_offset = get_be64(chunk_lookup + 4); -- cgit v1.2.3 From 5cfa438a76f7f67f34f6f758b62943261ded1fb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Fri, 5 Jun 2020 13:00:30 +0000 Subject: commit-graph: simplify parse_commit_graph() #2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Chunk Lookup table stores the chunks' starting offset in the commit-graph file, not their sizes. Consequently, the size of a chunk can only be calculated by subtracting its offset from the offset of the subsequent chunk (or that of the terminating label). This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the id of each chunk and store its starting offset, then we check the id of the last seen chunk and calculate its size using its previously saved offset. At the moment there is only one chunk for which we calculate its size, but this patch series will add more, and the repeated chunk id checks are not that pretty. Instead let's read ahead the offset of the next chunk on each iteration, so we can calculate the size of each chunk right away, right where we store its starting offset. Signed-off-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- commit-graph.c | 26 +++++++++----------------- 1 file changed, 9 insertions(+), 17 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 9927762f18..84206f0f51 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -230,8 +230,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, const unsigned char *data, *chunk_lookup; uint32_t i; struct commit_graph *graph; - uint64_t last_chunk_offset; - uint32_t last_chunk_id; + uint64_t next_chunk_offset; uint32_t graph_signature; unsigned char graph_version, hash_version; @@ -281,18 +280,17 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, return NULL; } - last_chunk_id = 0; - last_chunk_offset = 8; chunk_lookup = data + 8; - for (i = 0; i <= graph->num_chunks; i++) { + next_chunk_offset = get_be64(chunk_lookup + 4); + for (i = 0; i < graph->num_chunks; i++) { uint32_t chunk_id; - uint64_t chunk_offset; + uint64_t chunk_offset = next_chunk_offset; int chunk_repeated = 0; chunk_id = get_be32(chunk_lookup + 0); - chunk_offset = get_be64(chunk_lookup + 4); chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; + next_chunk_offset = get_be64(chunk_lookup + 4); if (chunk_offset > graph_size - the_hash_algo->rawsz) { error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), @@ -312,8 +310,11 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, case GRAPH_CHUNKID_OIDLOOKUP: if (graph->chunk_oid_lookup) chunk_repeated = 1; - else + else { graph->chunk_oid_lookup = data + chunk_offset; + graph->num_commits = (next_chunk_offset - chunk_offset) + / graph->hash_len; + } break; case GRAPH_CHUNKID_DATA: @@ -368,15 +369,6 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, free(graph); return NULL; } - - if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP) - { - graph->num_commits = (chunk_offset - last_chunk_offset) - / graph->hash_len; - } - - last_chunk_id = chunk_id; - last_chunk_offset = chunk_offset; } if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { -- cgit v1.2.3 From bb4d60e5d536009e0c97bfa7f464416c29c04c81 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Fri, 5 Jun 2020 13:00:31 +0000 Subject: commit-graph: simplify write_commit_graph_file() #1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In write_commit_graph_file() one block of code fills the array of chunk IDs, another block of code fills the array of chunk offsets, then the chunk IDs and offsets are written to the Chunk Lookup table, and finally a third block of code writes the actual chunks. In case of optional chunks like Extra Edge List and Base Graphs List there is also a condition checking whether that chunk is necessary/desired, and that same condition is repeated in all those three blocks of code. This patch series is about to add more optional chunks, so there would be even more repeated conditions. Those chunk offsets are relative to the beginning of the file, so they inherently depend on the size of the Chunk Lookup table, which in turn depends on the number of chunks that are to be written to the commit-graph file. IOW at the time we set the first chunk's ID we can't yet know its offset, because we don't yet know how many chunks there are. Simplify this by initially filling an array of chunk sizes, not offsets, and calculate the offsets based on the chunk sizes only later, while we are writing the Chunk Lookup table. This way we can fill the arrays of chunk IDs and sizes in one go, eliminating one set of repeated conditions. Signed-off-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- commit-graph.c | 46 +++++++++++++++++----------------------------- 1 file changed, 17 insertions(+), 29 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 84206f0f51..79cddabcd1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1529,10 +1529,11 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) struct hashfile *f; struct lock_file lk = LOCK_INIT; uint32_t chunk_ids[MAX_NUM_CHUNKS + 1]; - uint64_t chunk_offsets[MAX_NUM_CHUNKS + 1]; + uint64_t chunk_sizes[MAX_NUM_CHUNKS + 1]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; + uint64_t chunk_offset; struct object_id file_hash; const struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -1573,50 +1574,34 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) } chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT; + chunk_sizes[0] = GRAPH_FANOUT_SIZE; chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP; + chunk_sizes[1] = hashsz * ctx->commits.nr; chunk_ids[2] = GRAPH_CHUNKID_DATA; + chunk_sizes[2] = (hashsz + 16) * ctx->commits.nr; + if (ctx->num_extra_edges) { chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES; + chunk_sizes[num_chunks] = 4 * ctx->num_extra_edges; num_chunks++; } if (ctx->changed_paths) { chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMINDEXES; + chunk_sizes[num_chunks] = sizeof(uint32_t) * ctx->commits.nr; num_chunks++; chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMDATA; + chunk_sizes[num_chunks] = sizeof(uint32_t) * 3 + + ctx->total_bloom_filter_data_size; num_chunks++; } if (ctx->num_commit_graphs_after > 1) { chunk_ids[num_chunks] = GRAPH_CHUNKID_BASE; + chunk_sizes[num_chunks] = hashsz * (ctx->num_commit_graphs_after - 1); num_chunks++; } chunk_ids[num_chunks] = 0; - - chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; - chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE; - chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr; - chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr; - - num_chunks = 3; - if (ctx->num_extra_edges) { - chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + - 4 * ctx->num_extra_edges; - num_chunks++; - } - if (ctx->changed_paths) { - chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + - sizeof(uint32_t) * ctx->commits.nr; - num_chunks++; - - chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + - sizeof(uint32_t) * 3 + ctx->total_bloom_filter_data_size; - num_chunks++; - } - if (ctx->num_commit_graphs_after > 1) { - chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + - hashsz * (ctx->num_commit_graphs_after - 1); - num_chunks++; - } + chunk_sizes[num_chunks] = 0; hashwrite_be32(f, GRAPH_SIGNATURE); @@ -1625,13 +1610,16 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) hashwrite_u8(f, num_chunks); hashwrite_u8(f, ctx->num_commit_graphs_after - 1); + chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; for (i = 0; i <= num_chunks; i++) { uint32_t chunk_write[3]; chunk_write[0] = htonl(chunk_ids[i]); - chunk_write[1] = htonl(chunk_offsets[i] >> 32); - chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff); + chunk_write[1] = htonl(chunk_offset >> 32); + chunk_write[2] = htonl(chunk_offset & 0xffffffff); hashwrite(f, chunk_write, 12); + + chunk_offset += chunk_sizes[i]; } if (ctx->report_progress) { -- cgit v1.2.3 From 7fbfe07ab4d4e58c0971dac73001b89f180a0af3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Fri, 5 Jun 2020 13:00:32 +0000 Subject: commit-graph: simplify write_commit_graph_file() #2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Unify the 'chunk_ids' and 'chunk_sizes' arrays into an array of 'struct chunk_info'. This will allow more cleanups in the following patches. Signed-off-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- commit-graph.c | 45 ++++++++++++++++++++++++--------------------- 1 file changed, 24 insertions(+), 21 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 79cddabcd1..887837e882 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1522,14 +1522,18 @@ static int write_graph_chunk_base(struct hashfile *f, return 0; } +struct chunk_info { + uint32_t id; + uint64_t size; +}; + static int write_commit_graph_file(struct write_commit_graph_context *ctx) { uint32_t i; int fd; struct hashfile *f; struct lock_file lk = LOCK_INIT; - uint32_t chunk_ids[MAX_NUM_CHUNKS + 1]; - uint64_t chunk_sizes[MAX_NUM_CHUNKS + 1]; + struct chunk_info chunks[MAX_NUM_CHUNKS + 1]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; @@ -1573,35 +1577,34 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); } - chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT; - chunk_sizes[0] = GRAPH_FANOUT_SIZE; - chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP; - chunk_sizes[1] = hashsz * ctx->commits.nr; - chunk_ids[2] = GRAPH_CHUNKID_DATA; - chunk_sizes[2] = (hashsz + 16) * ctx->commits.nr; - + chunks[0].id = GRAPH_CHUNKID_OIDFANOUT; + chunks[0].size = GRAPH_FANOUT_SIZE; + chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP; + chunks[1].size = hashsz * ctx->commits.nr; + chunks[2].id = GRAPH_CHUNKID_DATA; + chunks[2].size = (hashsz + 16) * ctx->commits.nr; if (ctx->num_extra_edges) { - chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES; - chunk_sizes[num_chunks] = 4 * ctx->num_extra_edges; + chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; + chunks[num_chunks].size = 4 * ctx->num_extra_edges; num_chunks++; } if (ctx->changed_paths) { - chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMINDEXES; - chunk_sizes[num_chunks] = sizeof(uint32_t) * ctx->commits.nr; + chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES; + chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; num_chunks++; - chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMDATA; - chunk_sizes[num_chunks] = sizeof(uint32_t) * 3 + chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA; + chunks[num_chunks].size = sizeof(uint32_t) * 3 + ctx->total_bloom_filter_data_size; num_chunks++; } if (ctx->num_commit_graphs_after > 1) { - chunk_ids[num_chunks] = GRAPH_CHUNKID_BASE; - chunk_sizes[num_chunks] = hashsz * (ctx->num_commit_graphs_after - 1); + chunks[num_chunks].id = GRAPH_CHUNKID_BASE; + chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1); num_chunks++; } - chunk_ids[num_chunks] = 0; - chunk_sizes[num_chunks] = 0; + chunks[num_chunks].id = 0; + chunks[num_chunks].size = 0; hashwrite_be32(f, GRAPH_SIGNATURE); @@ -1614,12 +1617,12 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) for (i = 0; i <= num_chunks; i++) { uint32_t chunk_write[3]; - chunk_write[0] = htonl(chunk_ids[i]); + chunk_write[0] = htonl(chunks[i].id); chunk_write[1] = htonl(chunk_offset >> 32); chunk_write[2] = htonl(chunk_offset & 0xffffffff); hashwrite(f, chunk_write, 12); - chunk_offset += chunk_sizes[i]; + chunk_offset += chunks[i].size; } if (ctx->report_progress) { -- cgit v1.2.3