summaryrefslogtreecommitdiff
path: root/t/t7508-status.sh
diff options
context:
space:
mode:
authorLibravatar Derrick Stolee <dstolee@microsoft.com>2020-05-11 11:56:11 +0000
committerLibravatar Junio C Hamano <gitster@pobox.com>2020-05-11 09:33:56 -0700
commit88093289cdcfe99c5a3d7ef6f36ee45aa3018824 (patch)
treede313c91da6df5a188929293717ac501a10cf229 /t/t7508-status.sh
parentbloom: parse commit before computing filters (diff)
downloadtgif-88093289cdcfe99c5a3d7ef6f36ee45aa3018824.tar.xz
Documentation: changed-path Bloom filters use byte words
In Documentation/technical/commit-graph-format.txt, the definition of the BIDX chunk specifies the length is a number of 8-byte words. During development we discovered that using 8-byte words in the Murmur3 hash algorithm causes issues with big-endian versus little- endian machines. Thus, the hash algorithm was adapted to work on a byte-by-byte basis. However, this caused a change in the definition of a "word" in bloom.h. Now, a "word" is a single byte, which allows filters to be as small as two bytes. These length-two filters are demonstrated in t0095-bloom.sh, and a larger filter of length 25 is demonstrated as well. The original point of using 8-byte words was for alignment reasons. It also presented opportunities for extremely sparse Bloom filters when there were a small number of changes at a commit, creating a very low false-positive rate. However, modifying the format at this point is unlikely to be a valuable exercise. Also, this use of single-byte granularity does present opportunities to save space. It is unclear if 8-byte alignment of the filters would present any meaningful performance benefits. Modify the format document to reflect reality. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t7508-status.sh')
0 files changed, 0 insertions, 0 deletions