Project Home

svndumpfilter3

Table of Contents

Description

A rewrite of Subversion's svndumpfilter in pure Python, that allows you to untangle move/copy operations between excluded and included sets of files/dirs, by converting them into additions. If you use this option, it fetches the original files from a given repository.

Important

Some people have been reporting a bug with this script, that it will create an empty file on a large repository. It worked great for the split that I had to do on my repository, but I have no time to fix the problem that occurs for some other people's repositories (I really, really do not have the time to work on this). If you find the glitch, please send me a patch. I think the problem is likely to be a minor one. If you need this for your business and you're willing to pay hourly rates, I might be able to find someone to work on it (perhaps me (http://furius.ca/home/consulting.html), depending on schedule).

The list of <path> paths are the paths to filter in the repository. You pipe the dumpfile through stdin. If you want to untangle the copy operations, you need a live repository and to use --untangle=REPOS_PATH. Like this:

cat dumpfile | svndumpfilter3 --untangle=/my/svnroot project1 project2

The paths can include wildcards, and can consist of multiple parts, like:

cat dumpfile | svndumpfilter3 tags/proj.*/subproj trunk/subproj

Each component of the path is seperated and matched separately (hence the above would match for instance tags/proj-1.2/subproj but not tags/proj-1.2/a/subproj).

Note

This script's interface is only slightly different than Subversion's svndumpfilter, it does not take subcommands; its default behaviour is that of the 'include' subcommand of svndumpfilter. If you need 'exclude' behaviour, just invoke it with the --exclude option.

This is useful if you want to split a repository for which files have been copied or moved between filtered and non-filtered locations. The resulting dump would be illegal if we just ignored these, because Subversion records the copy/move operations only.

Chapter 5 hints about this, for more details about the problem, see there:

Also, copied paths can give you some trouble. Subversion supports copy operations in the repository, where a new path is created by copying some already existing path. It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. In order to make the dump data self-sufficient, svndumpfilter needs to still show the addition of the new path including the contents of any files created by the copy-and not represent that addition as a copy from a source that won't exist in your filtered dump data stream. But because the Subversion repository dump format only shows what was changed in each revision, the contents of the copy source might not be readily available. If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths.

Future Work

  • We still need to implement the per-subcommand options of svndumpfilter. Feel free to do so if you need it, or contact Martin Blais for subcontracting (I will do this for money, right now I have no time).

Credits

This code is originally based on Simon Tatham's svndumpfilter2, but we significantly changed the main loop and are using 'svnadmin dump' to fetch old revisions rather than working recursively with 'svnlook cat'. The problem I was having was that svndumpfilter2 was running out of memory so I had to rewrite it.

svndumpfilter2 tracks all files itself in order to replicate the required revisions, and it uses svnlook cat to fetch them, which is fast. This consumes a lot of memory (I could not run it on a 126MB repository with 2800 revisions on my P4 1GB RAM server). svndumpfilter3 does not track the revisions itself, instead it uses svnadmin dump with the original svndumpfilter to produce the necessary lumps to insert them in the output. This operation is much slower, but it does not matter if you have relatively few move/copy operations between excluded directories, which I think is by far the common case for multiple project roots (it was my case).

[2009-01-08] An bugfix patch was provided by Jamie Townsend <jtownsen at progress dot com>.

[2009-04-05] Minor path matching improvements by Matthias Troffaes <matthias dot troffaes at gmail dot com>

[2012-02-21] A few fixes from "Bernhard M. Wiedemann" <bwiedemann dot suse at de> were applied.

[2012-06-16] More fixed from "Bernhard M. Wiedemann" <bwiedemann dot suse at de> were applied; Bernard mentions: "this helped produce 100% identical results for our split of a 4GB dump." so I applied confidently without testing. More details at his copy: https://github.com/bmwiedemann/svndumpfilter3

[2012-12-22] Bug fix submitted by Jeppe Oeland on explicit usage of 'md5' for hashlib.

Important Note

I cannot guarantee anything about your data (see the legal terms above). If you lose data by using this program, THAT IS YOUR OWN PROBLEM. Do not forget to MAKE BACKUPS in case something goes wrong. This is your own responsibility. Always make backups.

[2009-09-18] Here is a note from a user about a potential problem with the preservation of properties, with the >100 hr/week workload, I have no time to look into it at the moment:

From "Fldvri Gyrgy" To blais@furius.ca Subject Critical bug in svndumpfilter3? Show full header Hello Martin,

First of all, your tool helped me a lot in my task. But I think I have found a critical bug in svndumpfilter3 which can cause loss of revisioned properties. Please check it and distribute a fix if you have time.

I experienced that some files and folders lost their revisioned properties after filtering them by svndumpfilter3. It was not some of the properties, but all of them, and I have not found any pattern at first. By comparing the input and output dumps I realized, that the problem occures with files/folders which has modifications committed.

The root cause is that if a modification does not tough the properties, the lump of that change will not contain properties section at all, but svndumpfilter3 will add an empty one anyway. But this empty properties section means, that properties has been modified, and after modification there are no porperties anymore.

I propose something like this: during read of lump check if there were properties section read at all, and use this info to decide if it should be written.

Usage

svndumpfilter3 [<options>] [<path> ...]

Please Donate!

Important

This computer program or library is provided for free. I am aware that some of the programs that I provide for free allow people to get their work done faster or better, save them time and money. If you are using this program for benefit, especially if you are using it within a commercial environment and it saves you time or work, please consider making a donation by sending me a book from my Amazon Wishlist or by a direct donation to my company's PayPal account by clicking on the link below.

Download

Download program here.