From 58dc440b3cd34db29b54e87ef80e0da8cd507445 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Mon, 3 Sep 2018 14:49:21 +0000 Subject: fsck: document and test sorted skipList input MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ever since the skipList support was first added in cd94c6f91 ("fsck: git receive-pack: support excluding objects from fsck'ing", 2015-06-22) the documentation for the format has that the file is a sorted list of object names. Thus, anyone using the feature would have thought the list needed to be sorted. E.g. I recently in conjunction with my fetch.fsck.* implementation in 1362df0d41 ("fetch: implement fetch.fsck.*", 2018-07-27) wrote some code to ship a skipList, and went out of my way to sort it. Doing so seems intuitive, since it contains fixed-width records, and has no support for comments, so one might expect it to be binary searched in-place on-disk. However, as documented here this was never a requirement, so let's change the documentation. Since this is a file format change let's also document what was said about this in the past, so e.g. someone like myself reading the new docs can see this never needed to be sorted ("why do I have all this code to sort this thing..."). Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- Documentation/config.txt | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/config.txt b/Documentation/config.txt index eb66a11975..fd1b5837d0 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1710,7 +1710,7 @@ doing the same for `receive.fsck.` and `fetch.fsck.` will only cause git to warn. fsck.skipList:: - The path to a sorted list of object names (i.e. one SHA-1 per + The path to a list of object names (i.e. one SHA-1 per line) that are known to be broken in a non-fatal way and should be ignored. This feature is useful when an established project should be accepted despite early commits containing errors that @@ -1725,6 +1725,14 @@ Unlike variables like `color.ui` and `core.editor` the fall back on the `fsck.skipList` configuration if they aren't set. To uniformly configure the same fsck settings in different circumstances all three of them they must all set to the same values. ++ +Older versions of Git (before 2.20) documented that the object names +list should be sorted. This was never a requirement, the object names +can appear in any order, but when reading the list we track whether +the list is sorted for the purposes of an internal binary search +implementation, which can save itself some work with an already sorted +list. Unless you have a humongous list there's no reason to go out of +your way to pre-sort the list. gc.aggressiveDepth:: The depth parameter used in the delta compression -- cgit v1.2.3 From f706c42bab5a2ac4af45e0df7853af1fc5346c45 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Mon, 3 Sep 2018 14:49:22 +0000 Subject: fsck: document and test commented & empty line skipList input MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is currently no comment syntax for the fsck.skipList, this isn't really by design, and it would be nice to have support for comments. Document that this doesn't work, and test for how this errors out. These tests reveal a current bug, if there's invalid input the output will emit some of the next line, and then go into uninitialized memory. This is fixed in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- Documentation/config.txt | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/config.txt b/Documentation/config.txt index fd1b5837d0..0e1ce7de8b 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1712,10 +1712,13 @@ will only cause git to warn. fsck.skipList:: The path to a list of object names (i.e. one SHA-1 per line) that are known to be broken in a non-fatal way and should - be ignored. This feature is useful when an established project - should be accepted despite early commits containing errors that - can be safely ignored such as invalid committer email addresses. - Note: corrupt objects cannot be skipped with this setting. + be ignored. Comments ('#') and empty lines are not supported, and + will error out. ++ +This feature is useful when an established project should be accepted +despite early commits containing errors that can be safely ignored +such as invalid committer email addresses. Note: corrupt objects +cannot be skipped with this setting. + Like `fsck.` this variable has corresponding `receive.fsck.skipList` and `fetch.fsck.skipList` variants. -- cgit v1.2.3 From 12b1c50a426d84aadd9909ed42b1ed0b2d12d7e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Mon, 3 Sep 2018 14:49:23 +0000 Subject: fsck: document that skipList input must be unabbreviated MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Abbreviating the SHA-1s in the skipList input has never worked, but the documentation hasn't unambiguously stated that this is an error, and there was no test for it. Let's fix both since it would be easy for some later refactoring e.g. switch to accidentally switch to a looser OID parsing function, causing the tests before this change to pass, but for older versions of git to be incompatible with the new skipList format. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- Documentation/config.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/config.txt b/Documentation/config.txt index 0e1ce7de8b..3287c7ef8a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1710,7 +1710,7 @@ doing the same for `receive.fsck.` and `fetch.fsck.` will only cause git to warn. fsck.skipList:: - The path to a list of object names (i.e. one SHA-1 per + The path to a list of object names (i.e. one unabbreviated SHA-1 per line) that are known to be broken in a non-fatal way and should be ignored. Comments ('#') and empty lines are not supported, and will error out. -- cgit v1.2.3 From 3b41fb0cb217f4b4491f2e67ce4183e5d2a5d873 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Mon, 3 Sep 2018 14:49:27 +0000 Subject: fsck: use oidset instead of oid_array for skipList MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Change the implementation of the skipList feature to use oidset instead of oid_array to store SHA-1s for later lookup. This list is parsed once on startup by fsck, fetch-pack or receive-pack depending on the *.skipList config in use. I.e. only once per invocation, but note that for "clone --recurse-submodules" each submodule will re-parse the list, in addition to the main project, and it will be re-parsed when checking .gitmodules blobs, see fb16287719 ("fsck: check skiplist for object in fsck_blob()", 2018-06-27). Memory usage is a bit higher, but we don't need to keep track of the sort order anymore. Embed the oidset into struct fsck_options to make its ownership clear (no hidden sharing) and avoid unnecessary pointer indirection. The cumulative impact on performance of this & the preceding change, using the test setup described in the previous commit: Test HEAD~2 HEAD~ HEAD ---------------------------------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.70(7.31+0.38) 7.72(7.33+0.38) +0.3% 7.70(7.30+0.40) +0.0% 1450.5: fsck with 1 skipped bad commits 7.84(7.47+0.37) 7.69(7.32+0.36) -1.9% 7.71(7.29+0.41) -1.7% 1450.7: fsck with 10 skipped bad commits 7.81(7.40+0.40) 7.94(7.57+0.36) +1.7% 7.92(7.55+0.37) +1.4% 1450.9: fsck with 100 skipped bad commits 7.81(7.42+0.38) 7.95(7.53+0.41) +1.8% 7.83(7.42+0.41) +0.3% 1450.11: fsck with 1000 skipped bad commits 7.99(7.62+0.36) 7.90(7.50+0.40) -1.1% 7.86(7.49+0.37) -1.6% 1450.13: fsck with 10000 skipped bad commits 7.98(7.57+0.40) 7.94(7.53+0.40) -0.5% 7.90(7.45+0.44) -1.0% 1450.15: fsck with 100000 skipped bad commits 7.97(7.57+0.39) 8.03(7.67+0.36) +0.8% 7.84(7.43+0.41) -1.6% 1450.17: fsck with 1000000 skipped bad commits 7.72(7.22+0.50) 7.28(7.07+0.20) -5.7% 7.13(6.87+0.25) -7.6% Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: Rene Scharfe Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- Documentation/config.txt | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/config.txt b/Documentation/config.txt index 3287c7ef8a..161ffe259e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1731,11 +1731,12 @@ all three of them they must all set to the same values. + Older versions of Git (before 2.20) documented that the object names list should be sorted. This was never a requirement, the object names -can appear in any order, but when reading the list we track whether -the list is sorted for the purposes of an internal binary search -implementation, which can save itself some work with an already sorted -list. Unless you have a humongous list there's no reason to go out of -your way to pre-sort the list. +could appear in any order, but when reading the list we tracked whether +the list was sorted for the purposes of an internal binary search +implementation, which could save itself some work with an already sorted +list. Unless you had a humongous list there was no reason to go out of +your way to pre-sort the list. After Git version 2.20 a hash implementation +is used instead, so there's now no reason to pre-sort the list. gc.aggressiveDepth:: The depth parameter used in the delta compression -- cgit v1.2.3 From 371a6550741e38abe6162d891e8d4102dd20f5a6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Mon, 3 Sep 2018 14:49:28 +0000 Subject: fsck: support comments & empty lines in skipList MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It's annoying not to be able to put comments and empty lines in the skipList, when e.g. keeping a big central list of commits to skip in /etc/gitconfig, which was my motivation for 1362df0d41 ("fetch: implement fetch.fsck.*", 2018-07-27). Implement that, and document what version of Git this was changed in, since this on-disk format can be expected to be used by multiple versions of git. There is no notable performance impact from this change, using the test setup described a couple of commits back: Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.69(7.27+0.42) 7.86(7.48+0.37) +2.2% 1450.5: fsck with 1 skipped bad commits 7.69(7.30+0.38) 7.83(7.47+0.36) +1.8% 1450.7: fsck with 10 skipped bad commits 7.76(7.38+0.38) 7.79(7.38+0.41) +0.4% 1450.9: fsck with 100 skipped bad commits 7.76(7.38+0.38) 7.74(7.36+0.38) -0.3% 1450.11: fsck with 1000 skipped bad commits 7.71(7.30+0.41) 7.72(7.34+0.38) +0.1% 1450.13: fsck with 10000 skipped bad commits 7.74(7.34+0.40) 7.72(7.34+0.38) -0.3% 1450.15: fsck with 100000 skipped bad commits 7.75(7.40+0.35) 7.70(7.29+0.40) -0.6% 1450.17: fsck with 1000000 skipped bad commits 7.12(6.86+0.26) 7.13(6.87+0.26) +0.1% Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- Documentation/config.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/config.txt b/Documentation/config.txt index 161ffe259e..0906db3a99 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1712,8 +1712,9 @@ will only cause git to warn. fsck.skipList:: The path to a list of object names (i.e. one unabbreviated SHA-1 per line) that are known to be broken in a non-fatal way and should - be ignored. Comments ('#') and empty lines are not supported, and - will error out. + be ignored. On versions of Git 2.20 and later comments ('#'), empty + lines, and any leading and trailing whitespace is ignored. Everything + but a SHA-1 per line will error out on older versions. + This feature is useful when an established project should be accepted despite early commits containing errors that can be safely ignored -- cgit v1.2.3