diff options
author | Junio C Hamano <gitster@pobox.com> | 2021-12-10 14:35:06 -0800 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2021-12-10 14:35:06 -0800 |
commit | bd16b3c39f373944d5ae0c8c9bb93d5d4a5a9524 (patch) | |
tree | 0045810462c348517e2885818aa8e7d4d1c1af45 /ci/check-directional-formatting.bash | |
parent | Merge branch 'jc/fix-first-object-walk' (diff) | |
parent | ci: disallow directional formatting (diff) | |
download | tgif-bd16b3c39f373944d5ae0c8c9bb93d5d4a5a9524.tar.xz |
Merge branch 'js/ci-no-directional-formatting'
CI has been taught to catch some Unicode directional formatting
sequence that can be used in certain mischief.
* js/ci-no-directional-formatting:
ci: disallow directional formatting
Diffstat (limited to 'ci/check-directional-formatting.bash')
-rwxr-xr-x | ci/check-directional-formatting.bash | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/ci/check-directional-formatting.bash b/ci/check-directional-formatting.bash new file mode 100755 index 0000000000..e6211b141a --- /dev/null +++ b/ci/check-directional-formatting.bash @@ -0,0 +1,27 @@ +#!/bin/bash + +# This script verifies that the non-binary files tracked in the Git index do +# not contain any Unicode directional formatting: such formatting could be used +# to deceive reviewers into interpreting code differently from the compiler. +# This is intended to run on an Ubuntu agent in a GitHub workflow. +# +# To allow translated messages to introduce such directional formatting in the +# future, we exclude the `.po` files from this validation. +# +# Neither GNU grep nor `git grep` (not even with `-P`) handle `\u` as a way to +# specify UTF-8. +# +# To work around that, we use `printf` to produce the pattern as a byte +# sequence, and then feed that to `git grep` as a byte sequence (setting +# `LC_CTYPE` to make sure that the arguments are interpreted as intended). +# +# Note: we need to use Bash here because its `printf` interprets `\uNNNN` as +# UTF-8 code points, as desired. Running this script through Ubuntu's `dash`, +# for example, would use a `printf` that does not understand that syntax. + +# U+202a..U+2a2e: LRE, RLE, PDF, LRO and RLO +# U+2066..U+2069: LRI, RLI, FSI and PDI +regex='(\u202a|\u202b|\u202c|\u202d|\u202e|\u2066|\u2067|\u2068|\u2069)' + +! LC_CTYPE=C git grep -El "$(LC_CTYPE=C.UTF-8 printf "$regex")" \ + -- ':(exclude,attr:binary)' ':(exclude)*.po' |