summaryrefslogtreecommitdiff
path: root/vcs-svn/fast_export.c
AgeCommit message (Collapse)AuthorFilesLines
2012-02-02vcs-svn: allow import of > 4GiB filesLibravatar Jonathan Nieder1-6/+9
There is no reason in principle that an svn-format dump would not be able to represent a file whose length does not fit in a 32-bit integer. Use off_t consistently to represent file lengths (in place of using uint32_t in some contexts) so we can handle that. Most svn-fe code is already ready to do that without this patch and passes values of type off_t around. The type mismatch from stragglers was noticed with gcc -Wtype-limits. While at it, tighten the parsing of the Text-content-length field to make sure it is a number and does not overflow, and tighten other overflow checks as that value is passed around and manipulated. Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-23vcs-svn: reset first_commit_done in fast_export_initLibravatar Dmitry Ivankov1-0/+1
first_commit_done has zero as a default value, but it is not reset back to zero in fast_export_init. Reset it back to zero so that each export will have proper initial state. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-21vcs-svn: do not initialize report_buffer twiceLibravatar Dmitry Ivankov1-12/+0
When importing from a dump with deltas, first fast_export_init calls buffer_fdinit, and then init_report_buffer calls fdopen once again when processing the first delta. The second initialization is redundant and leaks a FILE *. Remove the redundant on-demand initialization to fix this. Initializing directly in fast_export_init is simpler and lets the caller pass an int specifying which fd to use instead of hard-coding REPORT_FILENO. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15vcs-svn: avoid hangs from corrupt deltasLibravatar Jonathan Nieder1-6/+9
A corrupt Subversion-format delta can request reads past the end of the preimage. Set sliding_view::max_off so such corruption is caught when it appears rather than blocking in an impossible-to-fulfill read() when input is coming from a socket or pipe. Inspired-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15vcs-svn: guard against overflow when computing preimage lengthLibravatar Jonathan Nieder1-1/+14
Signed integer overflow produces undefined behavior in C and off_t is a signed type. For predictable behavior, add some checks to protect in advance against overflow. On 32-bit systems ftell as called by buffer_tmpfile_prepare_to_read is likely to fail with EOVERFLOW when reading the corresponding postimage, and this patch does not fix that. So it's more of a futureproofing measure than a complete fix. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15Merge branch 'db/delta-applier' into db/text-deltaLibravatar Jonathan Nieder1-1/+1
* db/delta-applier: vcs-svn: cap number of bytes read from sliding view test-svn-fe: split off "test-svn-fe -d" into a separate function
2011-05-26vcs-svn: implement text-delta handlingLibravatar David Barr1-1/+108
Handle input in Subversion's dumpfile format, version 3. This is the format produced by "svnrdump dump" and "svnadmin dump --deltas", and the main difference between v3 dumpfiles and the dumpfiles already handled is that these can include nodes whose properties and text are expressed relative to some other node. To handle such nodes, we find which node the text and properties are based on, handle its property changes, use the cat-blob command to request the basis blob from the fast-import backend, use the svndiff0_apply() helper to apply the text delta on the fly, writing output to a temporary file, and then measure that postimage file's length and write its content to the fast-import stream. The temporary postimage file is shared between delta-using nodes to avoid some file system overhead. The svn-fe interface needs to be more complicated to accomodate the backward flow of information from the fast-import backend to svn-fe. The backflow fd is not needed when parsing streams without deltas, though, so existing scripts using svn-fe on v2 dumps should continue to work. NEEDSWORK: generalize interface so caller sets the backflow fd, close temporary file before exiting Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-05-26Merge branch 'db/svn-fe-code-purge' into svn-feLibravatar Jonathan Nieder1-24/+24
* db/svn-fe-code-purge: vcs-svn: drop obj_pool vcs-svn: drop treap vcs-svn: drop string_pool vcs-svn: pass paths through to fast-import Conflicts: vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/repo_tree.h vcs-svn/string_pool.c vcs-svn/svndump.c vcs-svn/trp.txt
2011-05-26Merge branch 'db/vcs-svn-incremental' into svn-feLibravatar Jonathan Nieder1-16/+129
This teaches svn-fe to incrementally import into an existing repository (at last!) at the expense of less convenient UI. Think of it as growing pains. This opens the door to many excellent things, and it would be a bad idea to discourage people from building on it for much longer. * db/vcs-svn-incremental: vcs-svn: avoid using ls command twice vcs-svn: use mark from previous import for parent commit vcs-svn: handle filenames with dq correctly vcs-svn: quote paths correctly for ls command vcs-svn: eliminate repo_tree structure vcs-svn: add a comment before each commit vcs-svn: save marks for imported commits vcs-svn: use higher mark numbers for blobs vcs-svn: set up channel to read fast-import cat-blob response Conflicts: t/t9010-svn-fe.sh vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/svndump.c
2011-03-27vcs-svn: add missing cast to printf argumentLibravatar Jonathan Nieder1-1/+2
gcc -m32 correctly warns: vcs-svn/fast_export.c: In function 'fast_export_commit': vcs-svn/fast_export.c:54:2: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 2 has type 'unsigned int' [-Wformat] Fix it. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26vcs-svn: handle log message with embedded NULLibravatar Jonathan Nieder1-5/+7
Pass the log message by strbuf instead of as a C-style string and use fwrite instead of printf to write it to fast-import so embedded '\0' bytes can be preserved. Currently "git log" doesn't show the embedded NULs but "git cat-file commit" can. While at it, stop including system headers from repo_tree.h. git source files need to include git-compat-util.h (or cache.h or builtin.h) sooner to ensure the appropriate feature test macros are defined. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22vcs-svn: pass paths through to fast-importLibravatar David Barr1-24/+24
Now that there is no internal representation of the repo, it is not necessary to tokenise paths. Use strbuf instead and bypass string_pool. This means svn-fe can handle arbitrarily long paths (as long as a strbuf can fit them), with arbitrarily many path components. While at it, since we now treat paths in their entirety, only quote when necessary. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22Merge branch 'db/strbufs-for-metadata' into db/svn-fe-code-purgeLibravatar Jonathan Nieder1-7/+7
* db/strbufs-for-metadata: vcs-svn: use strbuf for author, UUID, and URL vcs-svn: use strbuf for revision log Conflicts: vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/svndump.c
2011-03-22Merge branch 'db/length-as-hash' (early part) into db/svn-fe-code-purgeLibravatar Jonathan Nieder1-2/+11
* 'db/length-as-hash' (early part): vcs-svn: implement perfect hash for top-level keys vcs-svn: implement perfect hash for node-prop keys vcs-svn: improve reporting of input errors vcs-svn: make buffer_copy_bytes return length read vcs-svn: make buffer_skip_bytes return length read vcs-svn: improve support for reading large files Conflicts: vcs-svn/fast_export.c vcs-svn/svndump.c
2011-03-22vcs-svn: use strbuf for author, UUID, and URLLibravatar David Barr1-7/+7
Use strbufs and strings instead of interned strings for values of rev, dump, and node fields that happen to be strings. After this change, the only remaining string_pool use is for paths in the repo_tree API and internals. Functional change: treat an empty author, UUID, or URL as none at all. So for example, in repos where the first revision has an empty svn:author property, the first rev will be treated as by "nobody" rather than by a person with empty name and email address created by prepending an @ sign to the repository UUID. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22vcs-svn: improve reporting of input errorsLibravatar Jonathan Nieder1-2/+11
Catch input errors and exit early enough to print a reasonable diagnosis based on errno. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: use mark from previous import for parent commitLibravatar David Barr1-1/+1
With this patch, overlapping incremental imports work. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: handle filenames with dq correctlyLibravatar Jonathan Nieder1-10/+9
Quote paths passed to fast-import so filenames with double quotes are not misinterpreted. One might imagine this could help with filenames with newlines, too, but svn does not allow those. Helped-by: David Barr <daivd.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: quote paths correctly for ls commandLibravatar David Barr1-1/+1
This bug was found while importing rev 601865 of ASF. [jn: with test] Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: eliminate repo_tree structureLibravatar Jonathan Nieder1-14/+94
Rely on fast-import for information about previous revs. This requires always setting up backward flow of information, even for v2 dumps. On the plus side, it simplifies the code by quite a bit and opens the door to further simplifications. [db: adjusted to support final version of the cat-blob patch] [jn: avoiding hard-coding git's name for the empty tree for portability to other backends] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: add a comment before each commitLibravatar Jonathan Nieder1-0/+5
Current svn-fe produces output like this: blob mark :7382321 data 5 hello blob mark :7382322 data 5 Hello commit mark :3 [...] M 100644 :7382321 hello.c M 100644 :7382322 hello2.c This means svn-fe has to keep track of the paths modified in each commit and the corresponding marks, instead of dealing with each file as it arrives in input and then forgetting about it. A better strategy would be to use inline blobs: commit mark :3 [...] M 100644 inline hello.c data 5 hello [...] As a first step towards that, teach svn-fe to notice when the collection of blobs for each commit starts and write a comment ("# commit 3.") there. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: save marks for imported commitsLibravatar Jonathan Nieder1-0/+1
This way, a person can use svnadmin dump $path | svn-fe | git fast-import --relative-marks --export-marks=svn-revs to get a list of what commit corresponds to each svn revision (plus some irrelevant blob names) in .git/info/fast-import/svn-revs. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: set up channel to read fast-import cat-blob responseLibravatar David Barr1-0/+28
Set up some plumbing: teach the svndump lib to pass a file descriptor number to the fast_export lib, representing where cat-blob/ls responses can be read from, and add a get_response_line helper function to the fast_export lib to read a line from that file. Unfortunately this means that svn-fe needs file descriptor 3 to be redirected from somewhere (preferrably the cat-blob stream of a fast-import backend); otherwise it will fail: $ svndump <path> | svn-fe fatal: cannot read from file descriptor 3: Bad file descriptor For the moment, "svn-fe 3</dev/null" works as a workaround but it will not work for very long. A fast-import backend that can retrieve old commits is needed in order to be able to fulfill svn "Node-copyfrom-rev" requests that refer to revs from a previous run. [jn: with new change description] Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26vcs-svn: teach line_buffer to handle multiple input filesLibravatar Jonathan Nieder1-3/+3
Collect the line_buffer state in a newly public line_buffer struct. Callers can use multiple line_buffers to manage input from multiple files at a time. svn-fe's delta applier will use this to stream a delta from svnrdump and the preimage it applies to from fast-import at the same time. The tests don't take advantage of the new features, but I think that's okay. It is easier to find lingering examples of nonreentrant code by searching for "static" in line_buffer.c. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2010-09-12vcs-svn: Fix some printf format compiler warningsLibravatar Ramsay Jones1-4/+5
In particular, on systems that define uint32_t as an unsigned long, gcc complains as follows: CC vcs-svn/fast_export.o vcs-svn/fast_export.c: In function `fast_export_modify': vcs-svn/fast_export.c:28: warning: unsigned int format, uint32_t arg (arg 2) vcs-svn/fast_export.c:28: warning: int format, uint32_t arg (arg 3) vcs-svn/fast_export.c: In function `fast_export_commit': vcs-svn/fast_export.c:42: warning: int format, uint32_t arg (arg 5) vcs-svn/fast_export.c:62: warning: int format, uint32_t arg (arg 2) vcs-svn/fast_export.c: In function `fast_export_blob': vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 2) vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 3) CC vcs-svn/svndump.o vcs-svn/svndump.c: In function `svndump_read': vcs-svn/svndump.c:260: warning: int format, uint32_t arg (arg 3) In order to suppress the warnings we use the C99 format specifier macros PRIo32 and PRIu32 from <inttypes.h>. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14vcs-svn: Avoid %z in format stringLibravatar Jonathan Nieder1-2/+3
In the spirit of v1.6.4-rc0~124 (MinGW: Fix compiler warning in merge-recursive, 2009-05-23), use a 32-bit integer instead; the dump file parser does not support any better, anyway. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14Infrastructure to write revisions in fast-export formatLibravatar David Barr1-0/+74
repo_tree maintains the exporter's state and provides a facility to to call fast_export, which writes objects to stdout suitable for consumption by fast-import. The exported functions roughly correspond to Subversion FS operations. . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete update the current commit, based roughly on the corresponding Subversion FS operation. . repo_commit calls out to fast_export to write the current commit to the fast-import stream in stdout. . repo_diff is used by the fast_export module to write the changes for a commit. . repo_reset erases the exporter's state, so valgrind can be happy. [rr: squelched compiler warnings] [jn: removed support for maintaining state on-disk, though we may want to add it back later] Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>