diff options
author | Jeff King <peff@peff.net> | 2014-08-28 08:32:58 -0400 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2014-08-28 10:32:52 -0700 |
commit | 75d3d6573ee809f35dd87ee4569e49b068ab4731 (patch) | |
tree | cbfd2347e1553b074749cd1f0e4ff25750e6231c /Documentation | |
parent | teach fast-export an --anonymize option (diff) | |
download | tgif-75d3d6573ee809f35dd87ee4569e49b068ab4731.tar.xz |
docs/fast-export: explain --anonymize more completely
The original commit made mention of this option, but not why
one might want it or how they might use it. Let's try to be
a little more thorough, and also explain how to confirm that
the output really is anonymous.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/git-fast-export.txt | 63 |
1 files changed, 59 insertions, 4 deletions
diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index 52831faca9..dbe9a46833 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -106,10 +106,9 @@ marks the same across runs. different from the commit's first parent). --anonymize:: - Replace all refnames, paths, blob contents, commit and tag - messages, names, and email addresses in the output with - anonymized data, while still retaining the shape of history and - of the stored tree. + Anonymize the contents of the repository while still retaining + the shape of the history and stored tree. See the section on + `ANONYMIZING` below. --refspec:: Apply the specified refspec to each ref exported. Multiple of them can @@ -147,6 +146,62 @@ referenced by that revision range contains the string 'refs/heads/master'. +ANONYMIZING +----------- + +If the `--anonymize` option is given, git will attempt to remove all +identifying information from the repository while still retaining enough +of the original tree and history patterns to reproduce some bugs. The +goal is that a git bug which is found on a private repository will +persist in the anonymized repository, and the latter can be shared with +git developers to help solve the bug. + +With this option, git will replace all refnames, paths, blob contents, +commit and tag messages, names, and email addresses in the output with +anonymized data. Two instances of the same string will be replaced +equivalently (e.g., two commits with the same author will have the same +anonymized author in the output, but bear no resemblance to the original +author string). The relationship between commits, branches, and tags is +retained, as well as the commit timestamps (but the commit messages and +refnames bear no resemblance to the originals). The relative makeup of +the tree is retained (e.g., if you have a root tree with 10 files and 3 +trees, so will the output), but their names and the contents of the +files will be replaced. + +If you think you have found a git bug, you can start by exporting an +anonymized stream of the whole repository: + +--------------------------------------------------- +$ git fast-export --anonymize --all >anon-stream +--------------------------------------------------- + +Then confirm that the bug persists in a repository created from that +stream (many bugs will not, as they really do depend on the exact +repository contents): + +--------------------------------------------------- +$ git init anon-repo +$ cd anon-repo +$ git fast-import <../anon-stream +$ ... test your bug ... +--------------------------------------------------- + +If the anonymized repository shows the bug, it may be worth sharing +`anon-stream` along with a regular bug report. Note that the anonymized +stream compresses very well, so gzipping it is encouraged. If you want +to examine the stream to see that it does not contain any private data, +you can peruse it directly before sending. You may also want to try: + +--------------------------------------------------- +$ perl -pe 's/\d+/X/g' <anon-stream | sort -u | less +--------------------------------------------------- + +which shows all of the unique lines (with numbers converted to "X", to +collapse "User 0", "User 1", etc into "User X"). This produces a much +smaller output, and it is usually easy to quickly confirm that there is +no private data in the stream. + + Limitations ----------- |