diff options
author | Junio C Hamano <gitster@pobox.com> | 2018-05-08 15:59:22 +0900 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2018-05-08 15:59:22 +0900 |
commit | 1ac0ce4d32ab7a3546e7e84a562625576208c7db (patch) | |
tree | 23ebffb680b2fe7b69453bb55d8170f69d767a69 /Documentation | |
parent | Merge branch 'ab/nuke-emacs-contrib' (diff) | |
parent | convert: add round trip check based on 'core.checkRoundtripEncoding' (diff) | |
download | tgif-1ac0ce4d32ab7a3546e7e84a562625576208c7db.tar.xz |
Merge branch 'ls/checkout-encoding'
The new "checkout-encoding" attribute can ask Git to convert the
contents to the specified encoding when checking out to the working
tree (and the other way around when checking in).
* ls/checkout-encoding:
convert: add round trip check based on 'core.checkRoundtripEncoding'
convert: add tracing for 'working-tree-encoding' attribute
convert: check for detectable errors in UTF encodings
convert: add 'working-tree-encoding' attribute
utf8: add function to detect a missing UTF-16/32 BOM
utf8: add function to detect prohibited UTF-16/32 BOM
utf8: teach same_encoding() alternative UTF encoding names
strbuf: add a case insensitive starts_with()
strbuf: add xstrdup_toupper()
strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/config.txt | 6 | ||||
-rw-r--r-- | Documentation/gitattributes.txt | 88 |
2 files changed, 94 insertions, 0 deletions
diff --git a/Documentation/config.txt b/Documentation/config.txt index cb4c93f74a..8213c1236b 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -530,6 +530,12 @@ core.autocrlf:: This variable can be set to 'input', in which case no output conversion is performed. +core.checkRoundtripEncoding:: + A comma and/or whitespace separated list of encodings that Git + performs UTF-8 round trip checks on if they are used in an + `working-tree-encoding` attribute (see linkgit:gitattributes[5]). + The default value is `SHIFT-JIS`. + core.symlinks:: If false, symbolic links are checked out as small plain files that contain the link text. linkgit:git-update-index[1] and diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 1094fe2b5b..ee210be3ec 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -279,6 +279,94 @@ few exceptions. Even though... catch potential problems early, safety triggers. +`working-tree-encoding` +^^^^^^^^^^^^^^^^^^^^^^^ + +Git recognizes files encoded in ASCII or one of its supersets (e.g. +UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other +encodings (e.g. UTF-16) are interpreted as binary and consequently +built-in Git text processing tools (e.g. 'git diff') as well as most Git +web front ends do not visualize the contents of these files by default. + +In these cases you can tell Git the encoding of a file in the working +directory with the `working-tree-encoding` attribute. If a file with this +attribute is added to Git, then Git reencodes the content from the +specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded +content in its internal data structure (called "the index"). On checkout +the content is reencoded back to the specified encoding. + +Please note that using the `working-tree-encoding` attribute may have a +number of pitfalls: + +- Alternative Git implementations (e.g. JGit or libgit2) and older Git + versions (as of March 2018) do not support the `working-tree-encoding` + attribute. If you decide to use the `working-tree-encoding` attribute + in your repository, then it is strongly recommended to ensure that all + clients working with the repository support it. + + For example, Microsoft Visual Studio resources files (`*.rc`) or + PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16. + If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with + a `working-tree-encoding` enabled Git client, then `foo.ps1` will be + stored as UTF-8 internally. A client without `working-tree-encoding` + support will checkout `foo.ps1` as UTF-8 encoded file. This will + typically cause trouble for the users of this file. + + If a Git client, that does not support the `working-tree-encoding` + attribute, adds a new file `bar.ps1`, then `bar.ps1` will be + stored "as-is" internally (in this example probably as UTF-16). + A client with `working-tree-encoding` support will interpret the + internal contents as UTF-8 and try to convert it to UTF-16 on checkout. + That operation will fail and cause an error. + +- Reencoding content to non-UTF encodings can cause errors as the + conversion might not be UTF-8 round trip safe. If you suspect your + encoding to not be round trip safe, then add it to + `core.checkRoundtripEncoding` to make Git check the round trip + encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character + set) is known to have round trip issues with UTF-8 and is checked by + default. + +- Reencoding content requires resources that might slow down certain + Git operations (e.g 'git checkout' or 'git add'). + +Use the `working-tree-encoding` attribute only if you cannot store a file +in UTF-8 encoding and if you want Git to be able to process the content +as text. + +As an example, use the following attributes if your '*.ps1' files are +UTF-16 encoded with byte order mark (BOM) and you want Git to perform +automatic line ending conversion based on your platform. + +------------------------ +*.ps1 text working-tree-encoding=UTF-16 +------------------------ + +Use the following attributes if your '*.ps1' files are UTF-16 little +endian encoded without BOM and you want Git to use Windows line endings +in the working directory. Please note, it is highly recommended to +explicitly define the line endings with `eol` if the `working-tree-encoding` +attribute is used to avoid ambiguity. + +------------------------ +*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF +------------------------ + +You can get a list of all available encodings on your platform with the +following command: + +------------------------ +iconv --list +------------------------ + +If you do not know the encoding of a file, then you can use the `file` +command to guess the encoding: + +------------------------ +file foo.ps1 +------------------------ + + `ident` ^^^^^^^ |