summaryrefslogtreecommitdiff
path: root/t/t3910-mac-os-precompose.sh
AgeCommit message (Collapse)AuthorFilesLines
2021-02-12Merge branch 'tb/precompose-prefix-too'Libravatar Junio C Hamano1-0/+16
When commands are started from a subdirectory, they may have to compare the path to the subdirectory (called prefix and found out from $(pwd)) with the tracked paths. On macOS, $(pwd) and readdir() yield decomposed path, while the tracked paths are usually normalized to the precomposed form, causing mismatch. This has been fixed by taking the same approach used to normalize the command line arguments. * tb/precompose-prefix-too: MacOS: precompose_argv_prefix()
2021-02-03MacOS: precompose_argv_prefix()Libravatar Torsten Bögershausen1-0/+16
The following sequence leads to a "BUG" assertion running under MacOS: DIR=git-test-restore-p Adiarnfd=$(printf 'A\314\210') DIRNAME=xx${Adiarnfd}yy mkdir $DIR && cd $DIR && git init && mkdir $DIRNAME && cd $DIRNAME && echo "Initial" >file && git add file && echo "One more line" >>file && echo y | git restore -p . Initialized empty Git repository in /tmp/git-test-restore-p/.git/ BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-index --cached --numstat [snip] The command `git restore` is run from a directory inside a Git repo. Git needs to split the $CWD into 2 parts: The path to the repo and "the rest", if any. "The rest" becomes a "prefix" later used inside the pathspec code. As an example, "/path/to/repo/dir-inside-repå" would determine "/path/to/repo" as the root of the repo, the place where the configuration file .git/config is found. The rest becomes the prefix ("dir-inside-repå"), from where the pathspec machinery expands the ".", more about this later. If there is a decomposed form, (making the decomposing visible like this), "dir-inside-rep°a" doesn't match "dir-inside-repå". Git commands need to: (a) read the configuration variable "core.precomposeunicode" (b) precocompose argv[] (c) precompose the prefix, if there was any The first commit, 76759c7dff53 "git on Mac OS and precomposed unicode" addressed (a) and (b). The call to precompose_argv() was added into parse-options.c, because that seemed to be a good place when the patch was written. Commands that don't use parse-options need to do (a) and (b) themselfs. The commands `diff-files`, `diff-index`, `diff-tree` and `diff` learned (a) and (b) in commit 90a78b83e0b8 "diff: run arguments through precompose_argv" Branch names (or refs in general) using decomposed code points resulting in decomposed file names had been fixed in commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places" The bug report from above shows 2 things: - more commands need to handle precomposed unicode - (c) should be implemented for all commands using pathspecs Solution: precompose_argv() now handles the prefix (if needed), and is renamed into precompose_argv_prefix(). Inside this function the config variable core.precomposeunicode is read into the global variable precomposed_unicode, as before. This reading is skipped if precomposed_unicode had been read before. The original patch for preocomposed unicode, 76759c7dff53, placed precompose_argv() into parse-options.c Now add it into git.c::run_builtin() as well. Existing precompose calls in diff-files.c and others may become redundant, and if we audit the callflows that reach these places to make sure that they can never be reached without going through the new call added to run_builtin(), we might be able to remove these existing ones. But in this commit, we do not bother to do so and leave these precompose callsites as they are. Because precompose() is idempotent and can be called on an already precomposed string safely, this is safer than removing existing calls without fully vetting the callflows. There is certainly room for cleanups - this change intends to be a bug fix. Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should be done in future commits. [1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) [2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/ Reported-by: Daniel Troger <random_n0body@icloud.com> Helped-By: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19t3[5-9]*: adjust the references to the default branch name "main"Libravatar Johannes Schindelin1-4/+4
This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t3[5-9]*.sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19tests: mark tests relying on the current default for `init.defaultBranch`Libravatar Johannes Schindelin1-0/+3
In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-30tests: make use of the test_must_be_empty functionLibravatar Ævar Arnfjörð Bjarmason1-2/+1
Change various tests that use an idiom of the form: >expect && test_cmp expect actual To instead use: test_must_be_empty actual The test_must_be_empty() wrapper was introduced in ca8d148daf ("test: test_must_be_empty helper", 2013-06-09). Many of these tests have been added after that time. This was mostly found with, and manually pruned from: git grep '^\s+>.*expect.* &&$' t Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-13diff: run arguments through precompose_argvLibravatar Alexander Rinass1-0/+42
When running diff commands, a pathspec containing decomposed unicode code points is not converted to precomposed unicode form under Mac OS X, but we normalize the paths in the index and the history to precomposed form on that platform. As a result, the pathspec would not match and no diff is shown. Unlike many builtin commands, the "diff" family of commands do not use parse_options(), which is how other builtin commands indirectly call precompose_argv() to normalize argv[] into precomposed form on Mac OSX. Teach these commands to call precompose_argv() themselves. Note that precomopose_argv() normalizes not just paths but all command line arguments, so things like "git diff -G $string" when $string has the decomposed form would first be normalized into the precomposed form and would stop hitting the same string in the decomposed form in the diff output with this change. It is not a problem per-se, as "log" family of commands already use parse_options() and call precompose_argv()--we can think of this change as making the "diff" family of commands behave in a similar way as the commands in the "log" family. Signed-off-by: Alexander Rinass <alex@fournova.com> Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-06Merge branch 'jk/utf8-switch-between-nfd-and-nfc'Libravatar Junio C Hamano1-0/+10
Document a known breakage with a test. * jk/utf8-switch-between-nfd-and-nfc: t3910: show failure of core.precomposeunicode with decomposed filenames
2014-04-30t3910-mac-os-precompose.sh: use the $( ... ) construct for command substitutionLibravatar Elia Pinto1-8/+8
The Git CodingGuidelines prefer the $(...) construct for command substitution instead of using the backquotes `...`. The backquoted form is the traditional method for command substitution, and is supported by POSIX. However, all but the simplest uses become complicated quickly. In particular, embedded command substitutions and/or the use of double quotes require careful escaping with the backslash character. The patch was generated by: for _f in $(find . -name "*.sh") do sed -i 's@`\(.*\)`@$(\1)@g' ${_f} done and then carefully proof-read. Signed-off-by: Elia Pinto <gitter.spiros@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-29t3910: show failure of core.precomposeunicode with decomposed filenamesLibravatar Jeff King1-0/+10
If you have existing decomposed filenames in your git repository (e.g., that were created with older versions of git that did not precompose unicode), a modern git with core.precomposeunicode set does not handle them well. The problem is that we normalize the paths coming from the disk into their precomposed form, and then compare them against the literal bytes in the index. This makes things better if you have the precomposed form in the index. It makes things worse if you actually have the decomposed form in the index. As a result, paths with decomposed filenames may have their precomposed variants listed as untracked files (even though the precomposed variants do not exist on-disk at all). This patch just adds a test to demonstrate the breakage. Some possible fixes are: 1. Tell everyone that NFD in the git repo is wrong, and they should make a new commit to normalize all their in-repo files to be precomposed. This is probably not the right thing to do, because it still doesn't fix checkouts of old history. And it spreads the problem to people on byte-preserving filesystems (like ext4), because now they have to start precomposing their filenames as they are adde to git. 2. Do all index filename comparisons using a UTF-8 aware comparison function when core.precomposeunicode is set. This would probably have bad performance, and somewhat defeats the point of converting the filenames at the readdir level in the first place. 3. Convert index filenames to their precomposed form when we read the index from disk. This would be efficient, but we would have to be careful not to write the precomposed forms back out to disk. 4. Introduce some infrastructure to efficiently match up the precomposed/decomposed forms. We already do something similar for case-insensitive files using name-hash.c. We might be able to adapt that strategy here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-27Set core.precomposeunicode to true on e.g. HFS+Libravatar Torsten Bögershausen1-1/+1
When core.precomposeunicode was introduced in 76759c7d, it was set to false on a unicode decomposing file system like HFS+ to be compatible with older versions of Git. The Mac OS users need to find out that this configuration exist and change it manually from false to true. A smoother workflow can be achieved, so set core.precomposeunicode to true on a decomposing file system. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-22t3910: use the UTF8_NFD_TO_NFC test prereqLibravatar Michael J Gruber1-146/+135
Besides reusing the new test prerequisite, this fixes also the issue that the current output is not TAP compliant and produces the output "no reason given" [for skipping]. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08git on Mac OS and precomposed unicodeLibravatar Torsten Bögershausen1-0/+164
Mac OS X mangles file names containing unicode on file systems HFS+, VFAT or SAMBA. When a file using unicode code points outside ASCII is created on a HFS+ drive, the file name is converted into decomposed unicode and written to disk. No conversion is done if the file name is already decomposed unicode. Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same result as open("\x41\xcc\x88",...) with a decomposed "Ä". As a consequence, readdir() returns the file names in decomposed unicode, even if the user expects precomposed unicode. Unlike on HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in precomposed unicode, but readdir() still returns file names in decomposed unicode. When a git repository is stored on a network share using SAMBA, file names are send over the wire and written to disk on the remote system in precomposed unicode, but Mac OS X readdir() returns decomposed unicode to be compatible with its behaviour on HFS+ and VFAT. The unicode decomposition causes many problems: - The names "git add" and other commands get from the end user may often be precomposed form (the decomposed form is not easily input from the keyboard), but when the commands read from the filesystem to see what it is going to update the index with already is on the filesystem, readdir() will give decomposed form, which is different. - Similarly "git log", "git mv" and all other commands that need to compare pathnames found on the command line (often but not always precomposed form; a command line input resulting from globbing may be in decomposed) with pathnames found in the tree objects (should be precomposed form to be compatible with other systems and for consistency in general). - The same for names stored in the index, which should be precomposed, that may need to be compared with the names read from readdir(). NFS mounted from Linux is fully transparent and does not suffer from the above. As Mac OS X treats precomposed and decomposed file names as equal, we can - wrap readdir() on Mac OS X to return the precomposed form, and - normalize decomposed form given from the command line also to the precomposed form, to ensure that all pathnames used in Git are always in the precomposed form. This behaviour can be requested by setting "core.precomposedunicode" configuration variable to true. The code in compat/precomposed_utf8.c implements basically 4 new functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(), precomposed_utf8_closedir() and precompose_argv(). The first three are to wrap opendir(3), readdir(3), and closedir(3) functions. The argv[] conversion allows to use the TAB filename completion done by the shell on command line. It tolerates other tools which use readdir() to feed decomposed file names into git. When creating a new git repository with "git init" or "git clone", "core.precomposedunicode" will be set "false". The user needs to activate this feature manually. She typically sets core.precomposedunicode to "true" on HFS and VFAT, or file systems mounted via SAMBA. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>