diff options
author | Joel Holdsworth <jholdsworth@nvidia.com> | 2021-12-16 13:46:19 +0000 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2021-12-16 14:06:36 -0800 |
commit | 70c0d55349a50707166f9fb9a9720ac1c0530217 (patch) | |
tree | a014fa9b5833bb0f6639117ce1ed57ed501f0e8b /t/t1304-default-acl.sh | |
parent | git-p4: open temporary patch file for write only (diff) | |
download | tgif-70c0d55349a50707166f9fb9a9720ac1c0530217.tar.xz |
git-p4: resolve RCS keywords in bytes not utf-8
RCS keywords are strings that are replaced with information from
Perforce. Examples include $Date$, $Author$, $File$, $Change$ etc.
Perforce resolves these by expanding them with their expanded values
when files are synced, but Git's data model requires these expanded
values to be converted back into their unexpanded form.
Previously, git-p4.py would implement this behaviour through the use of
regular expressions. However, the regular expression substitution was
applied using decoded strings i.e. the content of incoming commit diffs
was first decoded from bytes into UTF-8, processed with regular
expressions, then converted back to bytes.
Not only is this behaviour inefficient, but it is also a cause of a
common issue caused by text files containing invalid UTF-8 data. For
files created in Windows, CP1252 Smart Quote Characters (0x93 and 0x94)
are seen fairly frequently. These codes are invalid in UTF-8, so if the
script encountered any file containing them, on Python 2 the symbols
will be corrupted, and on Python 3 the script will fail with an
exception.
This patch replaces this decoding/encoding with bytes object regular
expressions, so that the substitution is performed directly upon the
source data with no conversions.
A test for smart quote handling has been added to the
t9810-git-p4-rcs.sh test suite.
Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t1304-default-acl.sh')
0 files changed, 0 insertions, 0 deletions