diff options
author | Karsten Blees <karsten.blees@gmail.com> | 2015-07-01 21:10:47 +0200 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2015-07-01 14:55:53 -0700 |
commit | 3a59e5954ef19ac94522219c2f29d49a187d31d8 (patch) | |
tree | b79952098c087d313e92af57fb8fe0f87bb7a696 /t/t4000-diff-format.sh | |
parent | Second half of seventh batch (diff) | |
download | tgif-3a59e5954ef19ac94522219c2f29d49a187d31d8.tar.xz |
Documentation/i18n.txt: clarify character encoding support
As a "distributed" VCS, git should better define the encodings of its core
textual data structures, in particular those that are part of the network
protocol.
That git is encoding agnostic is only really true for blob objects. E.g.
the 'non-NUL bytes' requirement of tree and commit objects excludes
UTF-16/32, and the special meaning of '/' in the index file as well as
space and linefeed in commit objects eliminates EBCDIC and other non-ASCII
encodings.
Git expects bytes < 0x80 to be pure ASCII, thus CJK encodings that partly
overlap with the ASCII range are problematic as well. E.g. fmt_ident()
removes trailing 0x5C from user names on the assumption that it is ASCII
'\'. However, there are over 200 GBK double byte codes that end in 0x5C.
UTF-8 as default encoding on Linux and respective path translations in the
Mac and Windows versions have established UTF-8 NFC as de-facto standard
for path names.
Update the documentation in i18n.txt to reflect the current status-quo.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t4000-diff-format.sh')
0 files changed, 0 insertions, 0 deletions