diff options
Diffstat (limited to 'Documentation/tutorial.txt')
-rw-r--r-- | Documentation/tutorial.txt | 1109 |
1 files changed, 1109 insertions, 0 deletions
diff --git a/Documentation/tutorial.txt b/Documentation/tutorial.txt new file mode 100644 index 0000000000..a2a7b7cd41 --- /dev/null +++ b/Documentation/tutorial.txt @@ -0,0 +1,1109 @@ +A short git tutorial +==================== +May 2005 + + +Introduction +------------ + +This is trying to be a short tutorial on setting up and using a git +archive, mainly because being hands-on and using explicit examples is +often the best way of explaining what is going on. + +In normal life, most people wouldn't use the "core" git programs +directly, but rather script around them to make them more palatable. +Understanding the core git stuff may help some people get those scripts +done, though, and it may also be instructive in helping people +understand what it is that the higher-level helper scripts are actually +doing. + +The core git is often called "plumbing", with the prettier user +interfaces on top of it called "porcelain". You may not want to use the +plumbing directly very often, but it can be good to know what the +plumbing does for when the porcelain isn't flushing... + + +Creating a git archive +---------------------- + +Creating a new git archive couldn't be easier: all git archives start +out empty, and the only thing you need to do is find yourself a +subdirectory that you want to use as a working tree - either an empty +one for a totally new project, or an existing working tree that you want +to import into git. + +For our first example, we're going to start a totally new archive from +scratch, with no pre-existing files, and we'll call it "git-tutorial". +To start up, create a subdirectory for it, change into that +subdirectory, and initialize the git infrastructure with "git-init-db": + + mkdir git-tutorial + cd git-tutorial + git-init-db + +to which git will reply + + defaulting to local storage area + +which is just git's way of saying that you haven't been doing anything +strange, and that it will have created a local .git directory setup for +your new project. You will now have a ".git" directory, and you can +inspect that with "ls". For your new empty project, ls should show you +three entries: + + - a symlink called HEAD, pointing to "refs/heads/master" + + Don't worry about the fact that the file that the HEAD link points to + doesn't even exist yet - you haven't created the commit that will + start your HEAD development branch yet. + + - a subdirectory called "objects", which will contain all the git SHA1 + objects of your project. You should never have any real reason to + look at the objects directly, but you might want to know that these + objects are what contains all the real _data_ in your repository. + + - a subdirectory called "refs", which contains references to objects. + + In particular, the "refs" subdirectory will contain two other + subdirectories, named "heads" and "tags" respectively. They do + exactly what their names imply: they contain references to any number + of different "heads" of development (aka "branches"), and to any + "tags" that you have created to name specific versions of your + repository. + + One note: the special "master" head is the default branch, which is + why the .git/HEAD file was created as a symlink to it even if it + doesn't yet exist. Basically, the HEAD link is supposed to always + point to the branch you are working on right now, and you always + start out expecting to work on the "master" branch. + + However, this is only a convention, and you can name your branches + anything you want, and don't have to ever even _have_ a "master" + branch. A number of the git tools will assume that .git/HEAD is + valid, though. + + [ Implementation note: an "object" is identified by its 160-bit SHA1 + hash, aka "name", and a reference to an object is always the 40-byte + hex representation of that SHA1 name. The files in the "refs" + subdirectory are expected to contain these hex references (usually + with a final '\n' at the end), and you should thus expect to see a + number of 41-byte files containing these references in this refs + subdirectories when you actually start populating your tree ] + +You have now created your first git archive. Of course, since it's +empty, that's not very useful, so let's start populating it with data. + + + Populating a git archive + ------------------------ + +We'll keep this simple and stupid, so we'll start off with populating a +few trivial files just to get a feel for it. + +Start off with just creating any random files that you want to maintain +in your git archive. We'll start off with a few bad examples, just to +get a feel for how this works: + + echo "Hello World" >hello + echo "Silly example" >example + +you have now created two files in your working directory, but to +actually check in your hard work, you will have to go through two steps: + + - fill in the "cache" aka "index" file with the information about your + working directory state + + - commit that index file as an object. + +The first step is trivial: when you want to tell git about any changes +to your working directory, you use the "git-update-cache" program. That +program normally just takes a list of filenames you want to update, but +to avoid trivial mistakes, it refuses to add new entries to the cache +(or remove existing ones) unless you explicitly tell it that you're +adding a new entry with the "--add" flag (or removing an entry with the +"--remove") flag. + +So to populate the index with the two files you just created, you can do + + git-update-cache --add hello example + +and you have now told git to track those two files. + +In fact, as you did that, if you now look into your object directory, +you'll notice that git will have added two new objects to the object +store. If you did exactly the steps above, you should now be able to do + + ls .git/objects/??/* + +and see two files: + + .git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238 + .git/objects/f2/4c74a2e500f5ee1332c86b94199f52b1d1d962 + +which correspond with the object with SHA1 names of 557db... and f24c7.. +respectively. + +If you want to, you can use "git-cat-file" to look at those objects, but +you'll have to use the object name, not the filename of the object: + + git-cat-file -t 557db03de997c86a4a028e1ebd3a1ceb225be238 + +where the "-t" tells git-cat-file to tell you what the "type" of the +object is. Git will tell you that you have a "blob" object (ie just a +regular file), and you can see the contents with + + git-cat-file "blob" 557db03de997c86a4a028e1ebd3a1ceb225be238 + +which will print out "Hello World". The object 557db... is nothing +more than the contents of your file "hello". + +[ Digression: don't confuse that object with the file "hello" itself. The + object is literally just those specific _contents_ of the file, and + however much you later change the contents in file "hello", the object we + just looked at will never change. Objects are immutable. ] + +Anyway, as we mentioned previously, you normally never actually take a +look at the objects themselves, and typing long 40-character hex SHA1 +names is not something you'd normally want to do. The above digression +was just to show that "git-update-cache" did something magical, and +actually saved away the contents of your files into the git content +store. + +Updating the cache did something else too: it created a ".git/index" +file. This is the index that describes your current working tree, and +something you should be very aware of. Again, you normally never worry +about the index file itself, but you should be aware of the fact that +you have not actually really "checked in" your files into git so far, +you've only _told_ git about them. + +However, since git knows about them, you can now start using some of the +most basic git commands to manipulate the files or look at their status. + +In particular, let's not even check in the two files into git yet, we'll +start off by adding another line to "hello" first: + + echo "It's a new day for git" >>hello + +and you can now, since you told git about the previous state of "hello", ask +git what has changed in the tree compared to your old index, using the +"git-diff-files" command: + + git-diff-files + +oops. That wasn't very readable. It just spit out its own internal +version of a "diff", but that internal version really just tells you +that it has noticed that "hello" has been modified, and that the old object +contents it had have been replaced with something else. + +To make it readable, we can tell git-diff-files to output the +differences as a patch, using the "-p" flag: + + git-diff-files -p + +which will spit out + + diff --git a/hello b/hello + --- a/hello + +++ b/hello + @@ -1 +1,2 @@ + Hello World + +It's a new day for git + +ie the diff of the change we caused by adding another line to "hello". + +In other words, git-diff-files always shows us the difference between +what is recorded in the index, and what is currently in the working +tree. That's very useful. + +A common shorthand for "git-diff-files -p" is to just write + + git diff + +which will do the same thing. + + + Committing git state + -------------------- + +Now, we want to go to the next stage in git, which is to take the files +that git knows about in the index, and commit them as a real tree. We do +that in two phases: creating a "tree" object, and committing that "tree" +object as a "commit" object together with an explanation of what the +tree was all about, along with information of how we came to that state. + +Creating a tree object is trivial, and is done with "git-write-tree". +There are no options or other input: git-write-tree will take the +current index state, and write an object that describes that whole +index. In other words, we're now tying together all the different +filenames with their contents (and their permissions), and we're +creating the equivalent of a git "directory" object: + + git-write-tree + +and this will just output the name of the resulting tree, in this case +(if you have done exactly as I've described) it should be + + 8988da15d077d4829fc51d8544c097def6644dbb + +which is another incomprehensible object name. Again, if you want to, +you can use "git-cat-file -t 8988d.." to see that this time the object +is not a "blob" object, but a "tree" object (you can also use +git-cat-file to actually output the raw object contents, but you'll see +mainly a binary mess, so that's less interesting). + +However - normally you'd never use "git-write-tree" on its own, because +normally you always commit a tree into a commit object using the +"git-commit-tree" command. In fact, it's easier to not actually use +git-write-tree on its own at all, but to just pass its result in as an +argument to "git-commit-tree". + +"git-commit-tree" normally takes several arguments - it wants to know +what the _parent_ of a commit was, but since this is the first commit +ever in this new archive, and it has no parents, we only need to pass in +the tree ID. However, git-commit-tree also wants to get a commit message +on its standard input, and it will write out the resulting ID for the +commit to its standard output. + +And this is where we start using the .git/HEAD file. The HEAD file is +supposed to contain the reference to the top-of-tree, and since that's +exactly what git-commit-tree spits out, we can do this all with a simple +shell pipeline: + + echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD + +which will say: + + Committing initial tree 8988da15d077d4829fc51d8544c097def6644dbb + +just to warn you about the fact that it created a totally new commit +that is not related to anything else. Normally you do this only _once_ +for a project ever, and all later commits will be parented on top of an +earlier commit, and you'll never see this "Committing initial tree" +message ever again. + +Again, normally you'd never actually do this by hand. There is a +helpful script called "git commit" that will do all of this for you. So +you could have just written + + git commit + +instead, and it would have done the above magic scripting for you. + + + Making a change + --------------- + +Remember how we did the "git-update-cache" on file "hello" and then we +changed "hello" afterward, and could compare the new state of "hello" with the +state we saved in the index file? + +Further, remember how I said that "git-write-tree" writes the contents +of the _index_ file to the tree, and thus what we just committed was in +fact the _original_ contents of the file "hello", not the new ones. We did +that on purpose, to show the difference between the index state, and the +state in the working directory, and how they don't have to match, even +when we commit things. + +As before, if we do "git-diff-files -p" in our git-tutorial project, +we'll still see the same difference we saw last time: the index file +hasn't changed by the act of committing anything. However, now that we +have committed something, we can also learn to use a new command: +"git-diff-cache". + +Unlike "git-diff-files", which showed the difference between the index +file and the working directory, "git-diff-cache" shows the differences +between a committed _tree_ and either the index file or the working +directory. In other words, git-diff-cache wants a tree to be diffed +against, and before we did the commit, we couldn't do that, because we +didn't have anything to diff against. + +But now we can do + + git-diff-cache -p HEAD + +(where "-p" has the same meaning as it did in git-diff-files), and it +will show us the same difference, but for a totally different reason. +Now we're comparing the working directory not against the index file, +but against the tree we just wrote. It just so happens that those two +are obviously the same, so we get the same result. + +Again, because this is a common operation, you can also just shorthand +it with + + git diff HEAD + +which ends up doing the above for you. + +In other words, "git-diff-cache" normally compares a tree against the +working directory, but when given the "--cached" flag, it is told to +instead compare against just the index cache contents, and ignore the +current working directory state entirely. Since we just wrote the index +file to HEAD, doing "git-diff-cache --cached -p HEAD" should thus return +an empty set of differences, and that's exactly what it does. + +[ Digression: "git-diff-cache" really always uses the index for its + comparisons, and saying that it compares a tree against the working + directory is thus not strictly accurate. In particular, the list of + files to compare (the "meta-data") _always_ comes from the index file, + regardless of whether the --cached flag is used or not. The --cached + flag really only determines whether the file _contents_ to be compared + come from the working directory or not. + + This is not hard to understand, as soon as you realize that git simply + never knows (or cares) about files that it is not told about + explicitly. Git will never go _looking_ for files to compare, it + expects you to tell it what the files are, and that's what the index + is there for. ] + +However, our next step is to commit the _change_ we did, and again, to +understand what's going on, keep in mind the difference between "working +directory contents", "index file" and "committed tree". We have changes +in the working directory that we want to commit, and we always have to +work through the index file, so the first thing we need to do is to +update the index cache: + + git-update-cache hello + +(note how we didn't need the "--add" flag this time, since git knew +about the file already). + +Note what happens to the different git-diff-xxx versions here. After +we've updated "hello" in the index, "git-diff-files -p" now shows no +differences, but "git-diff-cache -p HEAD" still _does_ show that the +current state is different from the state we committed. In fact, now +"git-diff-cache" shows the same difference whether we use the "--cached" +flag or not, since now the index is coherent with the working directory. + +Now, since we've updated "hello" in the index, we can commit the new +version. We could do it by writing the tree by hand again, and +committing the tree (this time we'd have to use the "-p HEAD" flag to +tell commit that the HEAD was the _parent_ of the new commit, and that +this wasn't an initial commit any more), but you've done that once +already, so let's just use the helpful script this time: + + git commit + +which starts an editor for you to write the commit message and tells you +a bit about what you're doing. + +Write whatever message you want, and all the lines that start with '#' +will be pruned out, and the rest will be used as the commit message for +the change. If you decide you don't want to commit anything after all at +this point (you can continue to edit things and update the cache), you +can just leave an empty message. Otherwise git-commit-script will commit +the change for you. + +You've now made your first real git commit. And if you're interested in +looking at what git-commit-script really does, feel free to investigate: +it's a few very simple shell scripts to generate the helpful (?) commit +message headers, and a few one-liners that actually do the commit itself. + + + Checking it out + --------------- + +While creating changes is useful, it's even more useful if you can tell +later what changed. The most useful command for this is another of the +"diff" family, namely "git-diff-tree". + +git-diff-tree can be given two arbitrary trees, and it will tell you the +differences between them. Perhaps even more commonly, though, you can +give it just a single commit object, and it will figure out the parent +of that commit itself, and show the difference directly. Thus, to get +the same diff that we've already seen several times, we can now do + + git-diff-tree -p HEAD + +(again, "-p" means to show the difference as a human-readable patch), +and it will show what the last commit (in HEAD) actually changed. + +More interestingly, you can also give git-diff-tree the "-v" flag, which +tells it to also show the commit message and author and date of the +commit, and you can tell it to show a whole series of diffs. +Alternatively, you can tell it to be "silent", and not show the diffs at +all, but just show the actual commit message. + +In fact, together with the "git-rev-list" program (which generates a +list of revisions), git-diff-tree ends up being a veritable fount of +changes. A trivial (but very useful) script called "git-whatchanged" is +included with git which does exactly this, and shows a log of recent +activity. + +To see the whole history of our pitiful little git-tutorial project, you +can do + + git log + +which shows just the log messages, or if we want to see the log together +with the associated patches use the more complex (and much more +powerful) + + git-whatchanged -p --root + +and you will see exactly what has changed in the repository over its +short history. + +[ Side note: the "--root" flag is a flag to git-diff-tree to tell it to + show the initial aka "root" commit too. Normally you'd probably not + want to see the initial import diff, but since the tutorial project + was started from scratch and is so small, we use it to make the result + a bit more interesting ] + +With that, you should now be having some inkling of what git does, and +can explore on your own. + + +[ Side note: most likely, you are not directly using the core + git Plumbing commands, but using Porcelain like Cogito on top + of it. Cogito works a bit differently and you usually do not + have to run "git-update-cache" yourself for changed files (you + do tell underlying git about additions and removals via + "cg-add" and "cg-rm" commands). Just before you make a commit + with "cg-commit", Cogito figures out which files you modified, + and runs "git-update-cache" on them for you. ] + + + Tagging a version + ----------------- + +In git, there's two kinds of tags, a "light" one, and a "signed tag". + +A "light" tag is technically nothing more than a branch, except we put +it in the ".git/refs/tags/" subdirectory instead of calling it a "head". +So the simplest form of tag involves nothing more than + + git tag my-first-tag + +which just writes the current HEAD into the .git/refs/tags/my-first-tag +file, after which point you can then use this symbolic name for that +particular state. You can, for example, do + + git diff my-first-tag + +to diff your current state against that tag (which at this point will +obviously be an empty diff, but if you continue to develop and commit +stuff, you can use your tag as an "anchor-point" to see what has changed +since you tagged it. + +A "signed tag" is actually a real git object, and contains not only a +pointer to the state you want to tag, but also a small tag name and +message, along with a PGP signature that says that yes, you really did +that tag. You create these signed tags with the "-s" flag to "git tag": + + git tag -s <tagname> + +which will sign the current HEAD (but you can also give it another +argument that specifies the thing to tag, ie you could have tagged the +current "mybranch" point by using "git tag <tagname> mybranch"). + +You normally only do signed tags for major releases or things +like that, while the light-weight tags are useful for any marking you +want to do - any time you decide that you want to remember a certain +point, just create a private tag for it, and you have a nice symbolic +name for the state at that point. + + + Copying archives + ----------------- + +Git archives are normally totally self-sufficient, and it's worth noting +that unlike CVS, for example, there is no separate notion of +"repository" and "working tree". A git repository normally _is_ the +working tree, with the local git information hidden in the ".git" +subdirectory. There is nothing else. What you see is what you got. + +[ Side note: you can tell git to split the git internal information from + the directory that it tracks, but we'll ignore that for now: it's not + how normal projects work, and it's really only meant for special uses. + So the mental model of "the git information is always tied directly to + the working directory that it describes" may not be technically 100% + accurate, but it's a good model for all normal use ] + +This has two implications: + + - if you grow bored with the tutorial archive you created (or you've + made a mistake and want to start all over), you can just do simple + + rm -rf git-tutorial + + and it will be gone. There's no external repository, and there's no + history outside of the project you created. + + - if you want to move or duplicate a git archive, you can do so. There + is "git clone" command, but if all you want to do is just to + create a copy of your archive (with all the full history that + went along with it), you can do so with a regular + "cp -a git-tutorial new-git-tutorial". + + Note that when you've moved or copied a git archive, your git index + file (which caches various information, notably some of the "stat" + information for the files involved) will likely need to be refreshed. + So after you do a "cp -a" to create a new copy, you'll want to do + + git-update-cache --refresh + + to make sure that the index file is up-to-date in the new one. + +Note that the second point is true even across machines. You can +duplicate a remote git archive with _any_ regular copy mechanism, be it +"scp", "rsync" or "wget". + +When copying a remote repository, you'll want to at a minimum update the +index cache when you do this, and especially with other peoples +repositories you often want to make sure that the index cache is in some +known state (you don't know _what_ they've done and not yet checked in), +so usually you'll precede the "git-update-cache" with a + + git-read-tree --reset HEAD + git-update-cache --refresh + +which will force a total index re-build from the tree pointed to by HEAD +(it resets the index contents to HEAD, and then the git-update-cache +makes sure to match up all index entries with the checked-out files). + +The above can also be written as simply + + git reset + +and in fact a lot of the common git command combinations can be scripted +with the "git xyz" interfaces, and you can learn things by just looking +at what the git-*-script scripts do ("git reset" is the above two lines +implemented in "git-reset-script", but some things like "git status" and +"git commit" are slightly more complex scripts around the basic git +commands). + +NOTE! Many (most?) public remote repositories will not contain any of +the checked out files or even an index file, and will _only_ contain the +actual core git files. Such a repository usually doesn't even have the +".git" subdirectory, but has all the git files directly in the +repository. + +To create your own local live copy of such a "raw" git repository, you'd +first create your own subdirectory for the project, and then copy the +raw repository contents into the ".git" directory. For example, to +create your own copy of the git repository, you'd do the following + + mkdir my-git + cd my-git + rsync -rL rsync://rsync.kernel.org/pub/scm/git/git.git/ .git + +followed by + + git-read-tree HEAD + +to populate the index. However, now you have populated the index, and +you have all the git internal files, but you will notice that you don't +actually have any of the _working_directory_ files to work on. To get +those, you'd check them out with + + git-checkout-cache -u -a + +where the "-u" flag means that you want the checkout to keep the index +up-to-date (so that you don't have to refresh it afterward), and the +"-a" flag means "check out all files" (if you have a stale copy or an +older version of a checked out tree you may also need to add the "-f" +flag first, to tell git-checkout-cache to _force_ overwriting of any old +files). + +Again, this can all be simplified with + + git clone rsync://rsync.kernel.org/pub/scm/git/git.git/ my-git + cd my-git + git checkout + +which will end up doing all of the above for you. + +You have now successfully copied somebody else's (mine) remote +repository, and checked it out. + + + Creating a new branch + --------------------- + +Branches in git are really nothing more than pointers into the git +object space from within the ".git/refs/" subdirectory, and as we +already discussed, the HEAD branch is nothing but a symlink to one of +these object pointers. + +You can at any time create a new branch by just picking an arbitrary +point in the project history, and just writing the SHA1 name of that +object into a file under .git/refs/heads/. You can use any filename you +want (and indeed, subdirectories), but the convention is that the +"normal" branch is called "master". That's just a convention, though, +and nothing enforces it. + +To show that as an example, let's go back to the git-tutorial archive we +used earlier, and create a branch in it. You do that by simply just +saying that you want to check out a new branch: + + git checkout -b mybranch + +will create a new branch based at the current HEAD position, and switch +to it. + +[ Side note: if you make the decision to start your new branch at some + other point in the history than the current HEAD, you can do so by + just telling "git checkout" what the base of the checkout would be. + In other words, if you have an earlier tag or branch, you'd just do + + git checkout -b mybranch earlier-branch + + and it would create the new branch "mybranch" at the earlier point, + and check out the state at that time. ] + +You can always just jump back to your original "master" branch by doing + + git checkout master + +(or any other branch-name, for that matter) and if you forget which +branch you happen to be on, a simple + + ls -l .git/HEAD + +will tell you where it's pointing. + +NOTE! Sometimes you may wish to create a new branch _without_ actually +checking it out and switching to it. If so, just use the command + + git branch <branchname> [startingpoint] + +which will simply _create_ the branch, but will not do anything further. +You can then later - once you decide that you want to actually develop +on that branch - switch to that branch with a regular "git checkout" +with the branchname as the argument. + + + Merging two branches + -------------------- + +One of the ideas of having a branch is that you do some (possibly +experimental) work in it, and eventually merge it back to the main +branch. So assuming you created the above "mybranch" that started out +being the same as the original "master" branch, let's make sure we're in +that branch, and do some work there. + + git checkout mybranch + echo "Work, work, work" >>hello + git commit hello + +Here, we just added another line to "hello", and we used a shorthand for +both going a "git-update-cache hello" and "git commit" by just giving the +filename directly to "git commit". + +Now, to make it a bit more interesting, let's assume that somebody else +does some work in the original branch, and simulate that by going back +to the master branch, and editing the same file differently there: + + git checkout master + +Here, take a moment to look at the contents of "hello", and notice how they +don't contain the work we just did in "mybranch" - because that work +hasn't happened in the "master" branch at all. Then do + + echo "Play, play, play" >>hello + echo "Lots of fun" >>example + git commit hello example + +since the master branch is obviously in a much better mood. + +Now, you've got two branches, and you decide that you want to merge the +work done. Before we do that, let's introduce a cool graphical tool that +helps you view what's going on: + + gitk --all + +will show you graphically both of your branches (that's what the "--all" +means: normally it will just show you your current HEAD) and their +histories. You can also see exactly how they came to be from a common +source. + +Anyway, let's exit gitk (^Q or the File menu), and decide that we want +to merge the work we did on the "mybranch" branch into the "master" +branch (which is currently our HEAD too). To do that, there's a nice +script called "git resolve", which wants to know which branches you want +to resolve and what the merge is all about: + + git resolve HEAD mybranch "Merge work in mybranch" + +where the third argument is going to be used as the commit message if +the merge can be resolved automatically. + +Now, in this case we've intentionally created a situation where the +merge will need to be fixed up by hand, though, so git will do as much +of it as it can automatically (which in this case is just merge the "b" +file, which had no differences in the "mybranch" branch), and say: + + Simple merge failed, trying Automatic merge + Auto-merging hello. + merge: warning: conflicts during merge + ERROR: Merge conflict in hello. + fatal: merge program failed + Automatic merge failed, fix up by hand + +which is way too verbose, but it basically tells you that it failed the +really trivial merge ("Simple merge") and did an "Automatic merge" +instead, but that too failed due to conflicts in "hello". + +Not to worry. It left the (trivial) conflict in "hello" in the same form you +should already be well used to if you've ever used CVS, so let's just +open "hello" in our editor (whatever that may be), and fix it up somehow. +I'd suggest just making it so that "hello" contains all four lines: + + Hello World + It's a new day for git + Play, play, play + Work, work, work + +and once you're happy with your manual merge, just do a + + git commit hello + +which will very loudly warn you that you're now committing a merge +(which is correct, so never mind), and you can write a small merge +message about your adventures in git-merge-land. + +After you're done, start up "gitk --all" to see graphically what the +history looks like. Notice that "mybranch" still exists, and you can +switch to it, and continue to work with it if you want to. The +"mybranch" branch will not contain the merge, but next time you merge it +from the "master" branch, git will know how you merged it, so you'll not +have to do _that_ merge again. + + + Merging external work + --------------------- + +It's usually much more common that you merge with somebody else than +merging with your own branches, so it's worth pointing out that git +makes that very easy too, and in fact, it's not that different from +doing a "git resolve". In fact, a remote merge ends up being nothing +more than "fetch the work from a remote repository into a temporary tag" +followed by a "git resolve". + +It's such a common thing to do that it's called "git pull", and you can +simply do + + git pull <remote-repository> + +and optionally give a branch-name for the remote end as a second +argument. + +The "remote" repository can even be on the same machine. One of +the following notations can be used to name the repository to +pull from: + + Rsync URL + rsync://remote.machine/path/to/repo.git/ + + HTTP(s) URL + http://remote.machine/path/to/repo.git/ + + GIT URL + git://remote.machine/path/to/repo.git/ + + SSH URL + remote.machine:/path/to/repo.git/ + + Local directory + /path/to/repo.git/ + +[ Digression: you could do without using any branches at all, by + keeping as many local repositories as you would like to have + branches, and merging between them with "git pull", just like + you merge between branches. The advantage of this approach is + that it lets you keep set of files for each "branch" checked + out and you may find it easier to switch back and forth if you + juggle multiple lines of development simultaneously. Of + course, you will pay the price of more disk usage to hold + multiple working trees, but disk space is cheap these days. ] + +It is likely that you will be pulling from the same remote +repository from time to time. As a short hand, you can store +the remote repository URL in a file under .git/branches/ +directory, like this: + + mkdir -p .git/branches + echo rsync://kernel.org/pub/scm/git/git.git/ \ + >.git/branches/linus + +and use the filename to "git pull" instead of the full URL. +The contents of a file under .git/branches can even be a prefix +of a full URL, like this: + + echo rsync://kernel.org/pub/.../jgarzik/ + >.git/branches/jgarzik + +Examples. + + (1) git pull linus + (2) git pull linus tag v0.99.1 + (3) git pull jgarzik/netdev-2.6.git/ e100 + +the above are equivalent to: + + (1) git pull rsync://kernel.org/pub/scm/git/git.git/ HEAD + (2) git pull rsync://kernel.org/pub/scm/git/git.git/ tag v0.99.1 + (3) git pull rsync://kernel.org/pub/.../jgarzik/netdev-2.6.git e100 + + + Publishing your work + -------------------- + +So we can use somebody else's work from a remote repository; but +how can _you_ prepare a repository to let other people pull from +it? + +Your do your real work in your working directory that has your +primary repository hanging under it as its ".git" subdirectory. +You _could_ make that repository accessible remotely and ask +people to pull from it, but in practice that is not the way +things are usually done. A recommended way is to have a public +repository, make it reachable by other people, and when the +changes you made in your primary working directory are in good +shape, update the public repository from it. This is often +called "pushing". + +[ Side note: this public repository could further be mirrored, + and that is how kernel.org git repositories are done. ] + +Publishing the changes from your local (private) repository to +your remote (public) repository requires a write privilege on +the remote machine. You need to have an SSH account there to +run a single command, "git-receive-pack". + +First, you need to create an empty repository on the remote +machine that will house your public repository. This empty +repository will be populated and be kept up-to-date by pushing +into it later. Obviously, this repository creation needs to be +done only once. + +[ Digression: "git push" uses a pair of programs, + "git-send-pack" on your local machine, and "git-receive-pack" + on the remote machine. The communication between the two over + the network internally uses an SSH connection. ] + +Your private repository's GIT directory is usually .git, but +your public repository is often named after the project name, +i.e. "<project>.git". Let's create such a public repository for +project "my-git". After logging into the remote machine, create +an empty directory: + + mkdir my-git.git + +Then, make that directory into a GIT repository by running +git-init-db, but this time, since its name is not the usual +".git", we do things slightly differently: + + GIT_DIR=my-git.git git-init-db + +Make sure this directory is available for others you want your +changes to be pulled by via the transport of your choice. Also +you need to make sure that you have the "git-receive-pack" +program on the $PATH. + +[ Side note: many installations of sshd do not invoke your shell + as the login shell when you directly run programs; what this + means is that if your login shell is bash, only .bashrc is + read and not .bash_profile. As a workaround, make sure + .bashrc sets up $PATH so that you can run 'git-receive-pack' + program. ] + +Your "public repository" is now ready to accept your changes. +Come back to the machine you have your private repository. From +there, run this command: + + git push <public-host>:/path/to/my-git.git master + +This synchronizes your public repository to match the named +branch head (i.e. "master" in this case) and objects reachable +from them in your current repository. + +As a real example, this is how I update my public git +repository. Kernel.org mirror network takes care of the +propagation to other publicly visible machines: + + git push master.kernel.org:/pub/scm/git/git.git/ + + +[ Digression: your GIT "public" repository people can pull from + is different from a public CVS repository that lets read-write + access to multiple developers. It is a copy of _your_ primary + repository published for others to use, and you should not + push into it from more than one repository (this means, not + just disallowing other developers to push into it, but also + you should push into it from a single repository of yours). + Sharing the result of work done by multiple people are always + done by pulling (i.e. fetching and merging) from public + repositories of those people. Typically this is done by the + "project lead" person, and the resulting repository is + published as the public repository of the "project lead" for + everybody to base further changes on. ] + + + Packing your repository + ----------------------- + +Earlier, we saw that one file under .git/objects/??/ directory +is stored for each git object you create. This representation +is convenient and efficient to create atomically and safely, but +not so to transport over the network. Since git objects are +immutable once they are created, there is a way to optimize the +storage by "packing them together". The command + + git repack + +will do it for you. If you followed the tutorial examples, you +would have accumulated about 17 objects in .git/objects/??/ +directories by now. "git repack" tells you how many objects it +packed, and stores the packed file in .git/objects/pack +directory. + +[ Side Note: you will see two files, pack-*.pack and pack-*.idx, + in .git/objects/pack directory. They are closely related to + each other, and if you ever copy them by hand to a different + repository for whatever reason, you should make sure you copy + them together. The former holds all the data from the objects + in the pack, and the latter holds the index for random + access. ] + +If you are paranoid, running "git-verify-pack" command would +detect if you have a corrupt pack, but do not worry too much. +Our programs are always perfect ;-). + +Once you have packed objects, you do not need to leave the +unpacked objects that are contained in the pack file anymore. + + git prune-packed + +would remove them for you. + +You can try running "find .git/objects -type f" before and after +you run "git prune-packed" if you are curious. + +[ Side Note: "git pull" is slightly cumbersome for HTTP transport, + as a packed repository may contain relatively few objects in a + relatively large pack. If you expect many HTTP pulls from your + public repository you might want to repack & prune often, or + never. ] + +If you run "git repack" again at this point, it will say +"Nothing to pack". Once you continue your development and +accumulate the changes, running "git repack" again will create a +new pack, that contains objects created since you packed your +archive the last time. We recommend that you pack your project +soon after the initial import (unless you are starting your +project from scratch), and then run "git repack" every once in a +while, depending on how active your project is. + +When a repository is synchronized via "git push" and "git pull", +objects packed in the source repository are usually stored +unpacked in the destination, unless rsync transport is used. + + + Working with Others + ------------------- + +Although git is a truly distributed system, it is often +convenient to organize your project with an informal hierarchy +of developers. Linux kernel development is run this way. There +is a nice illustration (page 17, "Merges to Mainline") in Randy +Dunlap's presentation (http://tinyurl.com/a2jdg). + +It should be stressed that this hierarchy is purely "informal". +There is nothing fundamental in git that enforces the "chain of +patch flow" this hierarchy implies. You do not have to pull +from only one remote repository. + + +A recommended workflow for a "project lead" goes like this: + + (1) Prepare your primary repository on your local machine. Your + work is done there. + + (2) Prepare a public repository accessible to others. + + (3) Push into the public repository from your primary + repository. + + (4) "git repack" the public repository. This establishes a big + pack that contains the initial set of objects as the + baseline, and possibly "git prune-packed" if the transport + used for pulling from your repository supports packed + repositories. + + (5) Keep working in your primary repository. Your changes + include modifications of your own, patches you receive via + e-mails, and merges resulting from pulling the "public" + repositories of your "subsystem maintainers". + + You can repack this private repository whenever you feel + like. + + (6) Push your changes to the public repository, and announce it + to the public. + + (7) Every once in a while, "git repack" the public repository. + Go back to step (5) and continue working. + + +A recommended work cycle for a "subsystem maintainer" who works +on that project and has an own "public repository" goes like this: + + (1) Prepare your work repository, by "git clone" the public + repository of the "project lead". The URL used for the + initial cloning is stored in .git/branches/origin. + + (2) Prepare a public repository accessible to others. + + (3) Copy over the packed files from "project lead" public + repository to your public repository by hand; preferrably + use rsync for that task. + + (4) Push into the public repository from your primary + repository. Run "git repack", and possibly "git + prune-packed" if the transport used for pulling from your + repository supports packed repositories. + + (5) Keep working in your primary repository. Your changes + include modifications of your own, patches you receive via + e-mails, and merges resulting from pulling the "public" + repositories of your "project lead" and possibly your + "sub-subsystem maintainers". + + You can repack this private repository whenever you feel + like. + + (6) Push your changes to your public repository, and ask your + "project lead" and possibly your "sub-subsystem + maintainers" to pull from it. + + (7) Every once in a while, "git repack" the public repository. + Go back to step (5) and continue working. + + +A recommended work cycle for an "individual developer" who does +not have a "public" repository is somewhat different. It goes +like this: + + (1) Prepare your work repository, by "git clone" the public + repository of the "project lead" (or a "subsystem + maintainer", if you work on a subsystem). The URL used for + the initial cloning is stored in .git/branches/origin. + + (2) Do your work there. Make commits. + + (3) Run "git fetch origin" from the public repository of your + upstream every once in a while. This does only the first + half of "git pull" but does not merge. The head of the + public repository is stored in .git/refs/heads/origin. + + (4) Use "git cherry origin" to see which ones of your patches + were accepted, and/or use "git rebase origin" to port your + unmerged changes forward to the updated upstream. + + (5) Use "git format-patch origin" to prepare patches for e-mail + submission to your upstream and send it out. Go back to + step (2) and continue. + + +[ to be continued.. cvsimports ] |