diff options
Diffstat (limited to 'Documentation/howto')
-rw-r--r-- | Documentation/howto/keep-canonical-history-correct.txt | 216 | ||||
-rw-r--r-- | Documentation/howto/recover-corrupted-object-harder.txt | 241 | ||||
-rw-r--r-- | Documentation/howto/setup-git-server-over-http.txt | 2 |
3 files changed, 456 insertions, 3 deletions
diff --git a/Documentation/howto/keep-canonical-history-correct.txt b/Documentation/howto/keep-canonical-history-correct.txt new file mode 100644 index 0000000000..35d48ef714 --- /dev/null +++ b/Documentation/howto/keep-canonical-history-correct.txt @@ -0,0 +1,216 @@ +From: Junio C Hamano <gitster@pobox.com> +Date: Wed, 07 May 2014 13:15:39 -0700 +Subject: Beginner question on "Pull is mostly evil" +Abstract: This how-to explains a method for keeping a + project's history correct when using git pull. +Content-type: text/asciidoc + +Keep authoritative canonical history correct with git pull +========================================================== + +Sometimes a new project integrator will end up with project history +that appears to be "backwards" from what other project developers +expect. This howto presents a suggested integration workflow for +maintaining a central repository. + +Suppose that that central repository has this history: + +------------ + ---o---o---A +------------ + +which ends at commit `A` (time flows from left to right and each node +in the graph is a commit, lines between them indicating parent-child +relationship). + +Then you clone it and work on your own commits, which leads you to +have this history in *your* repository: + +------------ + ---o---o---A---B---C +------------ + +Imagine your coworker did the same and built on top of `A` in *his* +repository in the meantime, and then pushed it to the +central repository: + +------------ + ---o---o---A---X---Y---Z +------------ + +Now, if you `git push` at this point, because your history that leads +to `C` lacks `X`, `Y` and `Z`, it will fail. You need to somehow make +the tip of your history a descendant of `Z`. + +One suggested way to solve the problem is "fetch and then merge", aka +`git pull`. When you fetch, your repository will have a history like +this: + +------------ + ---o---o---A---B---C + \ + X---Y---Z +------------ + +Once you run merge after that, while still on *your* branch, i.e. `C`, +you will create a merge `M` and make the history look like this: + +------------ + ---o---o---A---B---C---M + \ / + X---Y---Z +------------ + +`M` is a descendant of `Z`, so you can push to update the central +repository. Such a merge `M` does not lose any commit in both +histories, so in that sense it may not be wrong, but when people want +to talk about "the authoritative canonical history that is shared +among the project participants", i.e. "the trunk", they often view +it as "commits you see by following the first-parent chain", and use +this command to view it: + +------------ + $ git log --first-parent +------------ + +For all other people who observed the central repository after your +coworker pushed `Z` but before you pushed `M`, the commit on the trunk +used to be `o-o-A-X-Y-Z`. But because you made `M` while you were on +`C`, `M`'s first parent is `C`, so by pushing `M` to advance the +central repository, you made `X-Y-Z` a side branch, not on the trunk. + +You would rather want to have a history of this shape: + +------------ + ---o---o---A---X---Y---Z---M' + \ / + B-----------C +------------ + +so that in the first-parent chain, it is clear that the project first +did `X` and then `Y` and then `Z` and merged a change that consists of +two commits `B` and `C` that achieves a single goal. You may have +worked on fixing the bug #12345 with these two patches, and the merge +`M'` with swapped parents can say in its log message "Merge +fix-bug-12345". Having a way to tell `git pull` to create a merge +but record the parents in reverse order may be a way to do so. + +Note that I said "achieves a single goal" above, because this is +important. "Swapping the merge order" only covers a special case +where the project does not care too much about having unrelated +things done on a single merge but cares a lot about first-parent +chain. + +There are multiple schools of thought about the "trunk" management. + + 1. Some projects want to keep a completely linear history without any + merges. Obviously, swapping the merge order would not match their + taste. You would need to flatten your history on top of the + updated upstream to result in a history of this shape instead: ++ +------------ + ---o---o---A---X---Y---Z---B---C +------------ ++ +with `git pull --rebase` or something. + + 2. Some projects tolerate merges in their history, but do not worry + too much about the first-parent order, and allow fast-forward + merges. To them, swapping the merge order does not hurt, but + it is unnecessary. + + 3. Some projects want each commit on the "trunk" to do one single + thing. The output of `git log --first-parent` in such a project + would show either a merge of a side branch that completes a single + theme, or a single commit that completes a single theme by itself. + If your two commits `B` and `C` (or they may even be two groups of + commits) were solving two independent issues, then the merge `M'` + we made in the earlier example by swapping the merge order is + still not up to the project standard. It merges two unrelated + efforts `B` and `C` at the same time. + +For projects in the last category (Git itself is one of them), +individual developers would want to prepare a history more like +this: + +------------ + C0--C1--C2 topic-c + / + ---o---o---A master + \ + B0--B1--B2 topic-b +------------ + +That is, keeping separate topics on separate branches, perhaps like +so: + +------------ + $ git clone $URL work && cd work + $ git checkout -b topic-b master + $ ... work to create B0, B1 and B2 to complete one theme + $ git checkout -b topic-c master + $ ... same for the theme of topic-c +------------ + +And then + +------------ + $ git checkout master + $ git pull --ff-only +------------ + +would grab `X`, `Y` and `Z` from the upstream and advance your master +branch: + +------------ + C0--C1--C2 topic-c + / + ---o---o---A---X---Y---Z master + \ + B0--B1--B2 topic-b +------------ + +And then you would merge these two branches separately: + +------------ + $ git merge topic-b + $ git merge topic-c +------------ + +to result in + +------------ + C0--C1---------C2 + / \ + ---o---o---A---X---Y---Z---M---N + \ / + B0--B1-----B2 +------------ + +and push it back to the central repository. + +It is very much possible that while you are merging topic-b and +topic-c, somebody again advanced the history in the central repository +to put `W` on top of `Z`, and make your `git push` fail. + +In such a case, you would rewind to discard `M` and `N`, update the +tip of your 'master' again and redo the two merges: + +------------ + $ git reset --hard origin/master + $ git pull --ff-only + $ git merge topic-b + $ git merge topic-c +------------ + +The procedure will result in a history that looks like this: + +------------ + C0--C1--------------C2 + / \ + ---o---o---A---X---Y---Z---W---M'--N' + \ / + B0--B1---------B2 +------------ + +See also http://git-blame.blogspot.com/2013/09/fun-with-first-parent-history.html diff --git a/Documentation/howto/recover-corrupted-object-harder.txt b/Documentation/howto/recover-corrupted-object-harder.txt index 6f33dac0e0..9c4cd0915f 100644 --- a/Documentation/howto/recover-corrupted-object-harder.txt +++ b/Documentation/howto/recover-corrupted-object-harder.txt @@ -38,7 +38,7 @@ zlib were failing). Reading the zlib source code, I found that "incorrect data check" means that the adler-32 checksum at the end of the zlib data did not match the inflated data. So stepping the data through zlib would not help, as it -did not fail until the very end, when we realize the crc does not match. +did not fail until the very end, when we realize the CRC does not match. The problematic bytes could be anywhere in the object data. The first thing I did was pull the broken data out of the packfile. I @@ -195,7 +195,7 @@ halfway through: ------- I let it run to completion, and got a few more hits at the end (where it -was munging the crc to match our broken data). So there was a good +was munging the CRC to match our broken data). So there was a good chance this middle hit was the source of the problem. I confirmed by tweaking the byte in a hex editor, zlib inflating the @@ -240,3 +240,240 @@ But more importantly, git's hashing and checksumming noticed a problem that easily could have gone undetected in another system. The result still compiled, but would have caused an interesting bug (that would have been blamed on some random commit). + + +The adventure continues... +-------------------------- + +I ended up doing this again! Same entity, new hardware. The assumption +at this point is that the old disk corrupted the packfile, and then the +corruption was migrated to the new hardware (because it was done by +rsync or similar, and no fsck was done at the time of migration). + +This time, the affected blob was over 20 megabytes, which was far too +large to do a brute-force on. I followed the instructions above to +create the `zlib` file. I then used the `inflate` program below to pull +the corrupted data from that. Examining that output gave me a hint about +where in the file the corruption was. But now I was working with the +file itself, not the zlib contents. So knowing the sha1 of the object +and the approximate area of the corruption, I used the `sha1-munge` +program below to brute-force the correct byte. + +Here's the inflate program (it's essentially `gunzip` but without the +`.gz` header processing): + +-------------------------- +#include <stdio.h> +#include <string.h> +#include <zlib.h> +#include <stdlib.h> + +int main(int argc, char **argv) +{ + /* + * oversized so we can read the whole buffer in; + * this could actually be switched to streaming + * to avoid any memory limitations + */ + static unsigned char buf[25 * 1024 * 1024]; + static unsigned char out[25 * 1024 * 1024]; + int len; + z_stream z; + int ret; + + len = read(0, buf, sizeof(buf)); + memset(&z, 0, sizeof(z)); + inflateInit(&z); + + z.next_in = buf; + z.avail_in = len; + z.next_out = out; + z.avail_out = sizeof(out); + + ret = inflate(&z, 0); + if (ret != Z_OK && ret != Z_STREAM_END) + fprintf(stderr, "initial inflate failed (%d)\n", ret); + + fprintf(stderr, "outputting %lu bytes", z.total_out); + fwrite(out, 1, z.total_out, stdout); + return 0; +} +-------------------------- + +And here is the `sha1-munge` program: + +-------------------------- +#include <stdio.h> +#include <unistd.h> +#include <string.h> +#include <signal.h> +#include <openssl/sha.h> +#include <stdlib.h> + +/* eye candy */ +static int counter = 0; +static void progress(int sig) +{ + fprintf(stderr, "\r%d", counter); + alarm(1); +} + +static const signed char hexval_table[256] = { + -1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 08-0f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 10-17 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 18-1f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 20-27 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 28-2f */ + 0, 1, 2, 3, 4, 5, 6, 7, /* 30-37 */ + 8, 9, -1, -1, -1, -1, -1, -1, /* 38-3f */ + -1, 10, 11, 12, 13, 14, 15, -1, /* 40-47 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 48-4f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 50-57 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 58-5f */ + -1, 10, 11, 12, 13, 14, 15, -1, /* 60-67 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 68-67 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 70-77 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 78-7f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 80-87 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 88-8f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 90-97 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* 98-9f */ + -1, -1, -1, -1, -1, -1, -1, -1, /* a0-a7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* a8-af */ + -1, -1, -1, -1, -1, -1, -1, -1, /* b0-b7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* b8-bf */ + -1, -1, -1, -1, -1, -1, -1, -1, /* c0-c7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* c8-cf */ + -1, -1, -1, -1, -1, -1, -1, -1, /* d0-d7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* d8-df */ + -1, -1, -1, -1, -1, -1, -1, -1, /* e0-e7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* e8-ef */ + -1, -1, -1, -1, -1, -1, -1, -1, /* f0-f7 */ + -1, -1, -1, -1, -1, -1, -1, -1, /* f8-ff */ +}; + +static inline unsigned int hexval(unsigned char c) +{ +return hexval_table[c]; +} + +static int get_sha1_hex(const char *hex, unsigned char *sha1) +{ + int i; + for (i = 0; i < 20; i++) { + unsigned int val; + /* + * hex[1]=='\0' is caught when val is checked below, + * but if hex[0] is NUL we have to avoid reading + * past the end of the string: + */ + if (!hex[0]) + return -1; + val = (hexval(hex[0]) << 4) | hexval(hex[1]); + if (val & ~0xff) + return -1; + *sha1++ = val; + hex += 2; + } + return 0; +} + +int main(int argc, char **argv) +{ + /* oversized so we can read the whole buffer in */ + static unsigned char buf[25 * 1024 * 1024]; + char header[32]; + int header_len; + unsigned char have[20], want[20]; + int start, len; + SHA_CTX orig; + unsigned i, j; + + if (!argv[1] || get_sha1_hex(argv[1], want)) { + fprintf(stderr, "usage: sha1-munge <sha1> [start] <file.in\n"); + return 1; + } + + if (argv[2]) + start = atoi(argv[2]); + else + start = 0; + + len = read(0, buf, sizeof(buf)); + header_len = sprintf(header, "blob %d", len) + 1; + fprintf(stderr, "using header: %s\n", header); + + /* + * We keep a running sha1 so that if you are munging + * near the end of the file, we do not have to re-sha1 + * the unchanged earlier bytes + */ + SHA1_Init(&orig); + SHA1_Update(&orig, header, header_len); + if (start) + SHA1_Update(&orig, buf, start); + + signal(SIGALRM, progress); + alarm(1); + + for (i = start; i < len; i++) { + unsigned char c; + SHA_CTX x; + +#if 0 + /* + * deletion -- this would not actually work in practice, + * I think, because we've already committed to a + * particular size in the header. Ditto for addition + * below. In those cases, you'd have to do the whole + * sha1 from scratch, or possibly keep three running + * "orig" sha1 computations going. + */ + memcpy(&x, &orig, sizeof(x)); + SHA1_Update(&x, buf + i + 1, len - i - 1); + SHA1_Final(have, &x); + if (!memcmp(have, want, 20)) + printf("i=%d, deletion\n", i); +#endif + + /* + * replacement -- note that this tries each of the 256 + * possible bytes. If you suspect a single-bit flip, + * it would be much shorter to just try the 8 + * bit-flipped variants. + */ + c = buf[i]; + for (j = 0; j <= 0xff; j++) { + buf[i] = j; + + memcpy(&x, &orig, sizeof(x)); + SHA1_Update(&x, buf + i, len - i); + SHA1_Final(have, &x); + if (!memcmp(have, want, 20)) + printf("i=%d, j=%02x\n", i, j); + } + buf[i] = c; + +#if 0 + /* addition */ + for (j = 0; j <= 0xff; j++) { + unsigned char extra = j; + memcpy(&x, &orig, sizeof(x)); + SHA1_Update(&x, &extra, 1); + SHA1_Update(&x, buf + i, len - i); + SHA1_Final(have, &x); + if (!memcmp(have, want, 20)) + printf("i=%d, addition=%02x", i, j); + } +#endif + + SHA1_Update(&orig, buf + i, 1); + counter++; + } + + alarm(0); + fprintf(stderr, "\r%d\n", counter); + return 0; +} +-------------------------- diff --git a/Documentation/howto/setup-git-server-over-http.txt b/Documentation/howto/setup-git-server-over-http.txt index 6de4f3c487..f44e5e9458 100644 --- a/Documentation/howto/setup-git-server-over-http.txt +++ b/Documentation/howto/setup-git-server-over-http.txt @@ -181,7 +181,7 @@ On Debian: Most tests should pass. -A command line tool to test WebDAV is cadaver. If you prefer GUIs, for +A command-line tool to test WebDAV is cadaver. If you prefer GUIs, for example, konqueror can open WebDAV URLs as "webdav://..." or "webdavs://...". |