summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2007-01-15Optimize index creation on large object sets in fast-import.Libravatar Shawn O. Pearce1-3/+8
When we are generating multiple packfiles at once we only need to scan the blocks of object_entry structs which contain objects for the current packfile. Because the most recent blocks are at the front of the linked list, and because all new objects going into the current file are allocated from the front of that list, we can stop scanning for objects as soon as we identify one which doesn't belong to the current packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Don't create a final empty packfile in fast-import.Libravatar Shawn O. Pearce1-11/+17
If the last packfile is going to be empty (has 0 objects) then it shouldn't be kept after the import has terminated, as there is no point to the packfile. So rather than hashing it and making the index file, just delete the packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Implemented manual packfile switching in fast-import.Libravatar Shawn O. Pearce1-24/+65
To help importers which are dealing with massive amounts of data fast-import needs to be able to close the packfile it is currently writing to and open a new packfile for any additional data that will be received. A new 'checkpoint' command has been introduced which can be used by the frontend import process to force this to occur at any time. This may be useful to ensure a very long running import doesn't lose any work due to unexpected failures. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Remove unnecessary duplicate_count in fast-import.Libravatar Shawn O. Pearce1-3/+4
There is little reason to be keeping a global duplicate_count value when we also keep it per object type. The global counter can easily be computed at the end, once all processing has completed. This saves us a couple of machine instructions in an unimportant part of code. But it looks slightly better to me to not keep two counters around. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Restructure fast-import to support creating multiple packfiles.Libravatar Shawn O. Pearce1-121/+124
Now that we are starting to see some really large projects (such as KDE or a fork of FreeBSD) get imported into Git we're running into the upper limit on packfile object count as well as overall byte length. The KDE and FreeBSD projects are both likely to require more than 4 GiB to store their current history, which means we really need multiple packfiles to handle their content. This is a fairly simple restructuring of the internal code to help us support creating multiple packfiles from within fast-import. We are now adding a 5 digit incrementing suffix to the end of the basename supplied to us by the caller, permitting up to 99,999 packs to be generated in a single fast-import run. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15Misc. type cleanups within fast-import.Libravatar Shawn O. Pearce1-5/+5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Improve reuse of sha1_file library within fast-import.Libravatar Shawn O. Pearce1-144/+31
Now that the sha1_file.c library routines use the sliding mmap routines to perform efficient access to portions of a packfile I can remove that code from fast-import.c and just invoke it. One benefit is we now have reloading support for any packfile which uses OBJ_OFS_DELTA. Another is we have significantly less code to maintain. This code reuse change *requires* that fast-import generate only an OBJ_OFS_DELTA format packfile, as there is absolutely no index available to perform OBJ_REF_DELTA lookup in while unpacking an object. This is probably reasonable to require as the delta offsets result in smaller packfiles and are faster to unpack, as no index searching is required. Its also only a temporary requirement as users could always repack without offsets before making the import available to older versions of Git. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Merge branch 'master' into sp/fast-importLibravatar Shawn O. Pearce404-15208/+36654
I'm bringing master in early so that the OBJ_OFS_DELTA implementation is available as part of the topic. This way git-fast-import can learn about this new slightly smaller and faster packfile format, and can generate them directly rather than needing to have them be repacked with git-pack-objects. Due to the API changes in master during the period of development of git-fast-import, a few minor tweaks to fast-import.c are needed to produce a working merge. I've done them here as part of the merge to ensure bisection always works. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Allow creating branches without committing in fast-import.Libravatar Shawn O. Pearce1-1/+7
Some importers may want to create a branch long before they actually commit to it, or in some cases they may never commit to the branch but they still need the ref to be created in the repository after the import is complete. This extends the 'reset ' command to automatically create a new branch if the supplied reference isn't already known as a branch. While I'm at it I also modified the syntax of the reset command to terminate with an empty line, like commit and tag operate. This just makes the command set more consistent. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Support creation of merge commits in fast-import.Libravatar Shawn O. Pearce1-0/+58
Some importers are able to determine when branch merges occurred within their source data. In these cases they will want to supply the correct commits to fast-import so that a proper merge commit will exist in Git. This is now supported by supplying a 'merge ' command after the commit message and optional from command. A merge is not actually performed by fast-import, its assumed that the frontend performed any sort of merging activity already and that fast-import should simply be storing its result. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Fix repository corruption when using marks for modified blobs.Libravatar Shawn O. Pearce1-0/+1
Apparently we did not copy the blob SHA1 into the stack variable 'sha1' when a mark is used to refer to a prior blob. This code was not previously tested as the Mozilla CVS -> git-fast-import program always fed us full SHA1s for modified blobs and did not use the mark feature there. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Additional fast-import tree delta corruption cleanups.Libravatar Shawn O. Pearce1-11/+11
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Correct tree corruption problems in fast-import.Libravatar Shawn O. Pearce1-5/+11
The new tree delta implementation caused blob SHA1s to be used instead of a tree SHA1 when a tree was written out. This really only appeared to happen when converting an existing file to a tree, but may have been possible in some other situations. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Replace ywrite in fast-import with the standard write_or_die.Libravatar Shawn O. Pearce1-20/+7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Reuse the same buffer for all commits/tags in fast-import.Libravatar Shawn O. Pearce1-20/+23
Since most commits and tag objects are around the same size and we only generate one at a time we can reuse the same buffer rather than xmalloc'ing and free'ing the buffer every time we generate a commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Recycle data buffers for tree generation in fast-import.Libravatar Shawn O. Pearce1-14/+32
We only ever generate at most two tree streams at a time. Since most trees are around the same size we can simply recycle the buffers from one tree generation to the next rather than constantly xmalloc'ing and free'ing them. This should perform slightly better when handling a large number of trees as malloc has less work to do. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Implemented tree delta compression in fast-import.Libravatar Shawn O. Pearce1-65/+155
We now store for every tree entry two modes and two sha1 values; the base (aka "version 0") and the current/new (aka "version 1"). When we generate a tree object we also regenerate the prior version object and use that as our base object for a delta. This strategy saves a significant amount of memory as we can continue to use the atom pool for file/directory names and only increases each tree entry by an additional 24 bytes of memory. Branches should automatically delta against their ancestor tree, unless the ancestor tree is already at the delta chain limit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Converted hash memcpy/memcmp to new hashcpy/hashcmp/hashclr.Libravatar Shawn O. Pearce1-26/+26
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Don't crash fast-import if no branch log was requested.Libravatar Shawn O. Pearce1-1/+2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added 'reset' command to clear a branch's tree.Libravatar Shawn O. Pearce1-0/+32
Sometimes an import frontend may need to work with a temporary branch which will actually contain many different branches over the life of the import. This is especially useful when the frontend needs to create a tag from a set of file versions which are otherwise never a commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Map only part of the generated pack file at any point in time.Libravatar Shawn O. Pearce1-20/+24
When generating a very large pack file (for example close to 1 GB in size) it may be impossible for the kernel to find a contiguous free range within a 32 bit address space for the mapping to be located at. This is especially problematic on large imports where there is a lot of malloc activity occuring within the same process and the malloc'd regions may straddle the previously mapped regions, thereby creating large holes in the address space. So instead we map only 128 MB of the pack at any given time. This will likely increase the number of times the file gets mapped (with additional system time required to update the page tables more frequently) but will allow the program to handle packs up to 4 GB in size. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Fixed compile error in fast-import.Libravatar Shawn O. Pearce1-1/+1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Fixed GPF in fast-import caused by unterminated linked list.Libravatar Shawn O. Pearce1-1/+2
fast-import was encounting a GPF when it ran out of free tree_entry objects but didn't know this was the cause because the last tree_entry wasn't terminated with a NULL pointer. The missing NULL pointer occurred when we allocated additional entries via xmalloc but didn't set the last tree_entry's "next" pointer to NULL. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added --branch-log to option to fast-import.Libravatar Shawn O. Pearce1-5/+37
This option can be used to have a record of every commit, the mark (if supplied) and branch name of the commit recorded into a log file when the commit is generated. This log can be useful to verify the results of an import as the commits can be compared to some source repository matching commits through the mark value. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added option to export the marks table when fast-import terminates.Libravatar Shawn O. Pearce1-1/+35
The marks table can be used by the frontend to load any commit after the import and compare it to whatever data the frontend knows about that commit. If the mark idnums can be easily correlated to some reference source then its relatively trivial to compare the GIT tree to the reference to verify the accuracy of the import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Account for tree entry memory costs in fast-import.Libravatar Shawn O. Pearce1-0/+1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Moved from command to after data to help cvs2svn.Libravatar Shawn O. Pearce1-3/+4
cvs2svn has three phases: begin_commit, middle_commit, end_commit. The ancester is computed in the middle_commit phase. So its easier to generate a stream if the from command appears after the commit message itself but before the file change commands. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Remove branch creation command from fast-import.Libravatar Shawn O. Pearce1-98/+71
Jon Smirl was finding it difficult to alter cvs2svn to generate branch commands prior to the first commit of the same branch. This change moves the 'from' command to be an optional parameter of the 'commit' command, thereby allowing a new branch to be defined at the moment it gets used to create the first commit on that branch. This change makes it impossible to create a branch with no commits on it as at least one commit is needed to register the branch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Round out memory pool allocations in fast-import to pointer sizes.Libravatar Shawn O. Pearce1-0/+3
Some architectures (e.g. SPARC) would require that we access pointers only on pointer-sized alignments. So ensure the pool allocator rounds out non-pointer sized allocations to the next pointer so we don't generate bad memory addresses. This could have occurred if we had previously allocated an atom whose string was not a whole multiple of the pointer size, for example. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Implemented tree reloading in fast-import.Libravatar Shawn O. Pearce1-13/+149
Tree reloading allows fast-import to swap out the least-recently used branch by simply deallocating the data structures from memory that were associated with that branch. Later if the branch becomes active again it can lazily recreate those structures on demand by reloading the necessary trees from the pack file it originally wrote them to. The reloading process is implemented by mmap'ing the pack into memory and using a much tighter variant of the pack reading code contained in sha1_file.c. This was a blatent copy from sha1_file.c but the unpacking functions were significantly simplified and are actually now in a form that should make it easier to map only the necessary regions of a pack rather than the entire file. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Implemented 'tag' command in fast-import.Libravatar Shawn O. Pearce1-0/+125
Tags received from the frontend are generated in memory in a simple linked list in the order that the tag commands were sent by the frontend. If multiple different tag objects for the same tag name get generated the last one sent by the frontend will be the one that gets written out at termination. Multiple tag objects for the same name will cause all older tags of the same name to be lost. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added branch load counter to fast-import.Libravatar Shawn O. Pearce1-2/+4
If the branch load count exceeds the number of branches created then the frontend is causing fast-import to page branches into and out of memory due to the way its ordering its commits. Performance can likely be increased if the frontend were to alter its commit sequence such that it stays on one branch before switching to another branch, then never returns to the prior branch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added mark store/find to fast-import.Libravatar Shawn O. Pearce1-17/+87
Marks are now saved when the mark directive gets used by the frontend and may be used in place of a SHA1 expression to locate a previous SHA1 which fast-import may have generated. This is particularly useful with commits where the frontend does not (easily) have the ability to compute the SHA1 for an arbitrary commit but needs it to generate a branch or tag from that commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Converted fast-import to accept standard command line parameters.Libravatar Shawn O. Pearce1-6/+28
The following command line options are now accepted before the pack name: --objects=n # replaces the object count after the pack name --depth=n # delta chain depth to use (default is 10) --active-branches=n # maximum number of branches to keep in memory Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Fixed segfault in fast-import after growing a tree.Libravatar Shawn O. Pearce1-5/+10
Growing a tree caused all subtrees to be deallocated and put back into the free list yet those subtree's contents were still actively in use. Consequently they were doled out again and got stomped on elsewhere. Releasing a tree is now performed in two parts, either releasing only the content array or releasing the content array and recursively releasing the subtree(s). Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Allow symlink blobs in trees during fast-import.Libravatar Shawn O. Pearce1-0/+1
If a frontend is smart enough to import a symlink then we should let them do so. We'll assume that they were smart enough to first generate a blob to hold the link target, as that's how symlinks get represented in GIT. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Changed fast-import's pack header creation to use pack.hLibravatar Shawn O. Pearce1-9/+8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Converted fast-import to a text based protocol.Libravatar Shawn O. Pearce1-160/+326
Frontend clients can now send a text stream to fast-import rather than a binary stream. This should facilitate developing frontend software as the data stream is easier to view, manipulate and debug my hand and Mark-I eyeball. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Implement blob ID validation in fast-import.Libravatar Shawn O. Pearce1-4/+24
When accepting revision SHA1 IDs from the frontend verify the SHA1 actually refers to a blob and is known to exist. Its an error to use a SHA1 in a tree if the blob doesn't exist as this would cause git-fsck-objects to report a missing blob should the pack get closed without the blob being appended into it or a subsequent pack. So right now we'll just ask that the frontend "pre-declare" any blobs it wants to use in a tree before it can use them. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added tree and commit writing to fast-import.Libravatar Shawn O. Pearce1-177/+729
The tree of the current commit can be altered by file_change commands before the commit gets written to the pack. The file changes are rather primitive as they simply allow removal of a tree entry or setting/adding a tree entry. Currently trees and commits aren't being deltafied when written to the pack and branch reloading from the current pack doesn't work, so at most 5 branches can be worked with at any one time. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Implemented branch handling and basic tree support in fast-import.Libravatar Shawn O. Pearce1-5/+165
This provides the basic data structures needed to store trees in memory while we are processing them for a branch. What we are attempting to do is track one complete tree for each branch that the frontend has registered with us through the 'newb' (new_branch) command. When the frontend edits that tree through 'updf' or 'delf' commands we'll mark the affected tree(s) as being dirty and recompute their objects during 'comt' (commit). Currently the protocol is decidedly _not_ user friendly. I crashed fast-import by giving it bad input data from Perl. I may try to improve upon it, or at least upon its error handling. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added basic command handler to fast-import.Libravatar Shawn O. Pearce1-14/+46
Moved the new_blob logic off into a new subroutine and invoked it when getting the 'blob' command. Added statistics dump to STDERR when the program terminates listing what it did at a high level. This is somewhat interesting. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Refactored fast-import's internals for future additions.Libravatar Shawn O. Pearce1-66/+83
Too many globals variables were being used not not enough code was resuable to process trees and commits so this is a simple refactoring of the existing blob processing code to get into a state that will be easier to handle trees and commits in. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Cleaned up memory allocation for object_entry structs.Libravatar Shawn O. Pearce1-47/+46
Although its easy to ask the user to tell us how many objects they will need, its probably better to dynamically grow the object table in large units. But if the user can give us a hint as to roughly how many objects then we can still use it during startup. Also stopped printing the SHA1 strings to stdout as no user is currently making use of that facility. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Added automatic index generation to fast-import.Libravatar Shawn O. Pearce1-19/+163
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-14Created fast-import, a tool to quickly generating a pack from blobs.Libravatar Shawn O. Pearce3-0/+216
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-13git-commit documentation: -a adds and also removesLibravatar Junio C Hamano1-1/+2
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-13git-remote: no longer silent on unknown commands.Libravatar Quy Tonthat1-1/+6
Signed-off-by: Quy Tonthat <qtonthat@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-13git-svn: fix tests to work with older svnLibravatar Eric Wong2-2/+4
Some of the recent changes and shortcuts to the tests broke things for people using older versions of svn: t9104-git-svn-follow-parent.sh: v1.2.3 (from SuSE 10.0 as reported by riddochc on #git (thanks!)) required an extra 'svn up'. I was also able to reproduce this with v1.1.4 (Debian Sarge). lib-git-svn.sh: SVN::Repos bindings in versions up to and including 1.1.4 (Sarge again) do not pass fs-config options to the underlying library. BerkeleyDB repositories also seem completely broken on all my Sarge machines; so not using FSFS does not seem to be an option for most people. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-12Make git-prune-packed a bit more chatty.Libravatar Junio C Hamano3-9/+22
Steven Grimm noticed that git-repack's verbosity is inconsistent because pack-objects is chatty and prune-packed is not. This makes the latter a bit more chatty and gives -q option to squelch it. Signed-off-by: Junio C Hamano <junkio@cox.net>