summaryrefslogtreecommitdiff
path: root/compat/precompose_utf8.h
AgeCommit message (Collapse)AuthorFilesLines
2021-04-05precompose_utf8: make precompose_string_if_needed() publicLibravatar Torsten Bögershausen1-0/+1
commit 5c327502 (MacOS: precompose_argv_prefix(), 2021-02-03) uses the function precompose_string_if_needed() internally. It is only used from precompose_argv_prefix() and therefore static in compat/precompose_utf8.c Expose this function, it will be used in the next commit. While there, allow passing a NULL pointer, which will return NULL. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-03MacOS: precompose_argv_prefix()Libravatar Torsten Bögershausen1-1/+1
The following sequence leads to a "BUG" assertion running under MacOS: DIR=git-test-restore-p Adiarnfd=$(printf 'A\314\210') DIRNAME=xx${Adiarnfd}yy mkdir $DIR && cd $DIR && git init && mkdir $DIRNAME && cd $DIRNAME && echo "Initial" >file && git add file && echo "One more line" >>file && echo y | git restore -p . Initialized empty Git repository in /tmp/git-test-restore-p/.git/ BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-index --cached --numstat [snip] The command `git restore` is run from a directory inside a Git repo. Git needs to split the $CWD into 2 parts: The path to the repo and "the rest", if any. "The rest" becomes a "prefix" later used inside the pathspec code. As an example, "/path/to/repo/dir-inside-repå" would determine "/path/to/repo" as the root of the repo, the place where the configuration file .git/config is found. The rest becomes the prefix ("dir-inside-repå"), from where the pathspec machinery expands the ".", more about this later. If there is a decomposed form, (making the decomposing visible like this), "dir-inside-rep°a" doesn't match "dir-inside-repå". Git commands need to: (a) read the configuration variable "core.precomposeunicode" (b) precocompose argv[] (c) precompose the prefix, if there was any The first commit, 76759c7dff53 "git on Mac OS and precomposed unicode" addressed (a) and (b). The call to precompose_argv() was added into parse-options.c, because that seemed to be a good place when the patch was written. Commands that don't use parse-options need to do (a) and (b) themselfs. The commands `diff-files`, `diff-index`, `diff-tree` and `diff` learned (a) and (b) in commit 90a78b83e0b8 "diff: run arguments through precompose_argv" Branch names (or refs in general) using decomposed code points resulting in decomposed file names had been fixed in commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places" The bug report from above shows 2 things: - more commands need to handle precomposed unicode - (c) should be implemented for all commands using pathspecs Solution: precompose_argv() now handles the prefix (if needed), and is renamed into precompose_argv_prefix(). Inside this function the config variable core.precomposeunicode is read into the global variable precomposed_unicode, as before. This reading is skipped if precomposed_unicode had been read before. The original patch for preocomposed unicode, 76759c7dff53, placed precompose_argv() into parse-options.c Now add it into git.c::run_builtin() as well. Existing precompose calls in diff-files.c and others may become redundant, and if we audit the callflows that reach these places to make sure that they can never be reached without going through the new call added to run_builtin(), we might be able to remove these existing ones. But in this commit, we do not bother to do so and leave these precompose callsites as they are. Because precompose() is idempotent and can be called on an already precomposed string safely, this is safer than removing existing calls without fully vetting the callflows. There is certainly room for cleanups - this change intends to be a bug fix. Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should be done in future commits. [1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) [2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/ Reported-by: Daniel Troger <random_n0body@icloud.com> Helped-By: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15compat/precompose_utf8.h: use more common include guard styleLibravatar Elijah Newren1-1/+2
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05probe_utf8_pathname_composition: use internal strbufLibravatar Jeff King1-1/+1
When we are initializing a .git directory, we may call probe_utf8_pathname_composition to detect utf8 mangling. We pass in a path buffer for it to use, and it blindly strcpy()s into it, not knowing whether the buffer is large enough to hold the result or not. In practice this isn't a big deal, because the buffer we pass in already contains "$GIT_DIR/config", and we append only a few extra bytes to it. But we can easily do the right thing just by calling git_path_buf ourselves. Technically this results in a different pathname (before we appended our utf8 characters to the "config" path, and now they get their own files in $GIT_DIR), but that should not matter for our purposes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08git on Mac OS and precomposed unicodeLibravatar Torsten Bögershausen1-0/+45
Mac OS X mangles file names containing unicode on file systems HFS+, VFAT or SAMBA. When a file using unicode code points outside ASCII is created on a HFS+ drive, the file name is converted into decomposed unicode and written to disk. No conversion is done if the file name is already decomposed unicode. Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same result as open("\x41\xcc\x88",...) with a decomposed "Ä". As a consequence, readdir() returns the file names in decomposed unicode, even if the user expects precomposed unicode. Unlike on HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in precomposed unicode, but readdir() still returns file names in decomposed unicode. When a git repository is stored on a network share using SAMBA, file names are send over the wire and written to disk on the remote system in precomposed unicode, but Mac OS X readdir() returns decomposed unicode to be compatible with its behaviour on HFS+ and VFAT. The unicode decomposition causes many problems: - The names "git add" and other commands get from the end user may often be precomposed form (the decomposed form is not easily input from the keyboard), but when the commands read from the filesystem to see what it is going to update the index with already is on the filesystem, readdir() will give decomposed form, which is different. - Similarly "git log", "git mv" and all other commands that need to compare pathnames found on the command line (often but not always precomposed form; a command line input resulting from globbing may be in decomposed) with pathnames found in the tree objects (should be precomposed form to be compatible with other systems and for consistency in general). - The same for names stored in the index, which should be precomposed, that may need to be compared with the names read from readdir(). NFS mounted from Linux is fully transparent and does not suffer from the above. As Mac OS X treats precomposed and decomposed file names as equal, we can - wrap readdir() on Mac OS X to return the precomposed form, and - normalize decomposed form given from the command line also to the precomposed form, to ensure that all pathnames used in Git are always in the precomposed form. This behaviour can be requested by setting "core.precomposedunicode" configuration variable to true. The code in compat/precomposed_utf8.c implements basically 4 new functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(), precomposed_utf8_closedir() and precompose_argv(). The first three are to wrap opendir(3), readdir(3), and closedir(3) functions. The argv[] conversion allows to use the TAB filename completion done by the shell on command line. It tolerates other tools which use readdir() to feed decomposed file names into git. When creating a new git repository with "git init" or "git clone", "core.precomposedunicode" will be set "false". The user needs to activate this feature manually. She typically sets core.precomposedunicode to "true" on HFS and VFAT, or file systems mounted via SAMBA. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>