git-p4: handle utf16 filetype properly

One of the filetypes that p4 supports is utf16. Its behavior is odd in this case. The data delivered through "p4 -G print" is not encoded in utf16, although "p4 print -o" will produce the proper utf16-encoded file. When dealing with this filetype, discard the data from -G, and instead read the contents directly. An alternate approach would be to try to encode the data in python. That worked for true utf16 files, but for other files marked as utf16, p4 delivers mangled text in no recognizable encoding. Add a test case to check utf16 handling, and +k and +ko handling. Reported-by: Chris Li <git@chrisli.org> Acked-by: Luke Diamand <luke@diamand.org> Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Pete Wyckoff <pw@padd.com> 2011-09-17 19:16:14 -0400
committer: Junio C Hamano <gitster@pobox.com> 2011-10-17 20:45:28 -0700
commit: 55aa5714afc1310fd94936a8cd922e868d694bd2 (patch)
tree: 37f8c46abdbf700cfb2bb2d5812c6b3bc2de51a9 /contrib/fast-import
parent: git-p4 tests: refactor and cleanup (diff)
download: tgif-55aa5714afc1310fd94936a8cd922e868d694bd2.tar.xz
1 files changed, 11 insertions, 0 deletions
diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 2f7b270566..e69caf368d 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -1238,6 +1238,15 @@ class P4Sync(Command, P4UserMap):
             data = ''.join(contents)
             contents = [data[:-1]]
 
+        if file['type'].startswith("utf16"):
+            # p4 delivers different text in the python output to -G
+            # than it does when using "print -o", or normal p4 client
+            # operations.  utf16 is converted to ascii or utf8, perhaps.
+            # But ascii text saved as -t utf16 is completely mangled.
+            # Invoke print -o to get the real contents.
+            text = p4_read_pipe('print -q -o - "%s"' % file['depotFile'])
+            contents = [ text ]
+
         if self.isWindows and file["type"].endswith("text"):
             mangled = []
             for data in contents:
@@ -1245,6 +1254,8 @@ class P4Sync(Command, P4UserMap):
                 mangled.append(data)
             contents = mangled
 
+        # Note that we do not try to de-mangle keywords on utf16 files,
+        # even though in theory somebody may want that.
         if file['type'] in ('text+ko', 'unicode+ko', 'binary+ko'):
             contents = map(lambda text: re.sub(r'(?i)\$(Id|Header):[^$]*\$',r'$\1$', text), contents)
         elif file['type'] in ('text+k', 'ktext', 'kxtext', 'unicode+k', 'binary+k'):
author	Pete Wyckoff <pw@padd.com>	2011-09-17 19:16:14 -0400
committer	Junio C Hamano <gitster@pobox.com>	2011-10-17 20:45:28 -0700
commit	55aa5714afc1310fd94936a8cd922e868d694bd2 (patch)
tree	37f8c46abdbf700cfb2bb2d5812c6b3bc2de51a9 /contrib/fast-import
parent	git-p4 tests: refactor and cleanup (diff)
download	tgif-55aa5714afc1310fd94936a8cd922e868d694bd2.tar.xz