Description | What's new | Download | Usage | Configuration | FAQ | To do | Feedback | Links
Current version: 1.3 (2007.04.09)
sync2cd is an incremental archiving tool. It allows backing up complete filesystem hierarchies to multiple backup media (e.g. CD-R). Files are archived incrementally, i.e. only new or changed files are stored during an archive operation.
All entity types are supported: directories, files, symlinks, named pipes, sockets, block and character devices.
This version includes the following improvements:
Discarded()
configuration function.Compress()
configuration function, and compress archive
descriptors by default.For information about previous versions, see the ChangeLog
file included in the distribution.
sync2cd is released under the GNU General Public License version 2. The current version can be downloaded here.
sync2cd requires at least Python 2.4, and
provides an installer based on distutils
. This means that installation
to the default location (/usr
) is as simple as:
python setup.py install
To install sync2cd to a specific location, e.g. /usr/local
, enter:
python setup.py install --prefix=/usr/local
sync2cd has been tested with Python 2.4.3 on Linux. It might work with other version and platform combinations, but I'm pretty sure a POSIX platform is needed, so if you are trying to run sync2cd on Windows, you will probably need to install cygwin. If you were able to run sync2cd with another configuration, please drop me a line (and, if necessary, send me a patch ;-)
Basic usage information is output with sync2cd --help
:
$ sync2cd --help sync2cd 1.3 Incremental archiving tool to CD/DVD Copyright (C) 2007 Remy Blank Usage: sync2cd [commands] [options] config_file Commands: -c, --create Create a new archive descriptor -g, --graft-list Output a graft list for an archive -h, --help Show this text -p, --print Print archive information -r, --restore Restore from archives -s, --status Print current synchronization status -y, --copy Copy files of an archive to destination Options: -a N, --archive N Operate on archive number N -b GLOB, --glob GLOB Add glob GLOB to pattern list -d DIR, --destination DIR Copy or restore into directory DIR -m N, --medium-size N Set archive medium size to N -n CMD, --mounter CMD Mount media using CMD for restore --sort HOW File sorting key (time or alpha) -v, --verbose Be more verbose -x EXP, --regexp EXP Add regular expression EXP to pattern list
Commands define what will be done and what will be output to stdout. Several commands
can be specified at the same time, and will be executed in a sensible order (e.g.
--create
before --graft-list
).
.gz
extension is added.
mkisofs
with the -graft-points
option.--verbose
is also specified, the list of files contained in the archive is also output.
--glob
and
--regexp
, or all items if no pattern was specified.--verbose
is also specified, the list
of files that need to be archived is also output.--destination
. The folder structure of the source
is kept in the destination folder. File attributes (ownership, permissions,
times) are not preserved. This command allows e.g. backing up onto a harddrive
or a DVD-RAM, or splitting a folder structure into smaller chunks.Options allow passing arguments to the selected commands. If the same option is specified on the command line and in the configuration file, the command line takes precedence.
N
.
Note that this option will have no effect with --create
and will be overridden by the newly created archive.Exclude()
configuration file function
description. If the pattern matches a directory, all items below it will be
matched as well.N
. Corresponds to the function
MediumSize()
in the configuration file.
sync2cd_mounter.sh
script provided with sync2cd.Sort()
in the configuration file.
Here are a few basic examples:
Create a new archive and burn a CD-R (that's basically what the script
sync2cd_mkcd.sh
included in the distribution executes):
sync2cd -c -g pictures | mkisofs -J -r -graft-points -path-list - -quiet | cdrecord -v -waiti -data -
Create a new archive and burn a DVD (that's what the script
sync2cd_mkdvd.sh
included in the distribution executes):
sync2cd -c -g pictures | growisofs -Z /dev/dvd -J -r -graft-points -path-list - -quiet
Create a new archive and copy the files contained in this archive to the specified destination folder (which could be a DVD-RAM disk):
sync2cd -c -v -y -d /mnt/dvdram pictures
Check how much data remains to be archived:
sync2cd -s pictures
Print the contents of archive number 2
:
sync2cd -p -a 2 -v pictures
Restore all music files with genre "Rock" and "Pop" to ~/tmp:
sync2cd -r -v -n "sync2cd_mounter.sh /mnt/cdrom" -d ~/tmp -b "Music/Rock/**.mp3" -b "Music/Pop/**.mp3" Music
The configuration file is actually Python code calling functions defined in sync2cd.py and passing configuration information. The functions available are described below.
Set the current working directory to path
before starting. All paths
specified with Input()
are relative to this directory. This option
corresponds to the -C
or --directory
option
of tar
.
Default: | . (current directory) |
Example: | BaseDir("/home") |
Specify if archive descriptors should be compressed, and which compressor to use.
If arg=False
, descriptors will not be compressed. If
arg=True
, a default compressor will be selected (currently
bz2
). arg
can also be the name of the compressor
to use (gz
, bz2
).
Default: | bz2 |
Example: | Compress("gz") |
Mark one or more archives as discarded. Files that were contained in the given archives will be included in the next archive creation operation, as if the archives had never existed.
Default: | none |
Example: | Discarded(2, 4, 5) |
Exclude files matching the shell pattern pattern
from the archive.
Several exclude patterns can be specified. The pattern matching is done against
the path relative to BaseDir()
. If a directory matches an exclude
pattern, it is not recursed into.
As usual with shell patterns, a *
wildcard matches zero or more
characters except path separators (e.g. "/" on *nix). A new wildcard,
**
, matches zero or more characters, including path separators.
Example: | ExcludeGlob("Music/Country/**.mp3") This excludes all mp3 files in Music/Country and in all
subdirectories. |
Exclude files matching the regular expression pattern
from the
archive. Several exclude patterns can be specified. The pattern matching is
done against the path relative to BaseDir()
. If a directory
matches an exclude pattern, it is not recursed into.
For more information about regular expression syntax in Python, see this page.
Example: | ExcludeRegexp("Music/Country/.*\\.mp3") This excludes all mp3 files in Music/Country and in all subdirectories
(note escaping of "\"). |
Specify the hash function to be used to check files for content modification.
Currently supported: md5
(128 bits), sha1
(160 bits).
Default: | md5 |
Example: | HashFunction("sha1") |
Add a file or directory to be archived. Several inputs can be specified.
The use of a directory name always implies that the subdirectories below should be
included in the archive. path
must be a relative path specification,
and is interpreted relative to BaseDir()
.
Example: | Input("Music") |
Set the maximum size of an archive to size
. This is typically used to
span a backup over multiple media.
size
is an integer giving the size in bytes, or a string containing a
floating-point value optionally followed by the suffix k
, M
,
G
, T
, P
, E
.
Default: | 0 (no limit) |
Example: | MediumSize("4.2G") |
Specify how files should be selected for inclusion when creating an archive.
sort
can be either time
(the default) or
alpha
.
When creating a new archive, the list of files to be included is sorted according
to this criterion, either by modification time if time
was specified,
or by path name if alpha
was specified. Then, files are selected
for inclusion starting at the top of the list, until they fill one medium. The
remaining files are left for a subsequent creation run.
In other words, time
stores the oldest files first, and
alpha
keeps files more or less together (by directory).
Default: | time |
Example: | Sort("alpha") |
Here are a few examples of configuration files.
Archive an mp3 collection:
MediumSize("690M") # Fit archives on CD-R HashFunction("sha1") # Use a good hash BaseDir("/home") # cd to this directory Input("Music") # Archive this tree Exclude("Music/atrontc.vtc") # Exclude generated files Exclude("Music/Playlists/*.m3u")
Archive a digital photo and video collection. Note how backslashes are escaped in Python strings. For more information about regular expressions and Python strings, see this page.
MediumSize("690M") # Fit archives on CD-R HashFunction("sha1") # Use a good hash Compress("gz") # Use a fast compressor BaseDir("/home/mirror/hobbes") # cd to this directory Input("pictures") # Archive photos Input("videos") # and videos Discarded(2, 3) # Archives 2 and 3 have been discarded # Exclude thumbnails and small versions that are generated Exclude("pictures**/thumb/t_*.jpg") ExcludeRegexp("pictures/([^/]+/)*small/s_[^/]+\\.jpg")
I lost a backup medium. How will my incremental backup remain consistent?
Just mark the descriptor corresponding to the lost medium as discarded in the configuration file, and make a new backup. The files that were stored on the lost medium and still exist will be put on the new medium.
Only a subset of the files that need to be archived fit on a medium. How does sync2cd select which files are to be stored on the next archive?
In versions up to 0.9, the oldest files were stored on the next archive (based on
the file modification time). Starting with version 1.0, the selection criterion
can be selected with the --sort
option and the Sort()
configuration function. Currently, the only alternative is to select files
by depth-first alphabetical traversal order.
Where are the archive descriptors created?
They are created in the same directory as the configuration file. The file name of
the descriptors is the concatenation of the configuration file name, a dot,
the padded archive number, and a .gz
extension if compression is enabled.
For example, if the configuration file is Music
, the descriptors will be
named Music.0001.gz
, Music.0002.gz
, and so on.
Do I have to keep all created archive descriptors?
No, you only need to keep the last one. sync2cd determines which files have to be archived during a creation operation by comparing the live filesystem with the contents of the last archive descriptor it can find.
What medium size should I specify to record a DVD?
The growisofs
manpage specifies the capacity of a DVD as 4'700'000'000
bytes, or 4.377GB. So 4.2GB should be a reasonable value (considering that the
archive descriptor, as well as the directory structure, will also take some space).
Backup's good. Now, where's restore?
Well, the whole point of a backup is never to use it, right? ;-)
Seriously, restore functionality has been added as of version 0.9. I would be happy to receive some feedback if it works or doesn't work for you.
What is the parameter that is passed to the mounter script on restore?
It is the base name of the current archive descriptor, without .gz
extension
(even if compression is enabled). For example, if the configuration file is
/home/joe/sync2cd/Music
and restoring needs archives 7 and 12, the script
will be called first with Music.0007
, then with Music.0012
,
and finally with Music.0000
(the 0th descriptor indicates the end of the
restore operation, and allows e.g. ejecting the last medium).
I accidentally deleted one of the numbered archive descriptors. Can I re-create it from the backup medium?
sync2cd
adds a hidden directory at the root of every backup medium,
named .sync2cd
, that contains the configuration file, the archive
descriptor for the medium, as well as the sync2cd.py
script itself.
So you can get the descriptor back from there. The hidden directory is added
with both --graft-list
and --copy
.
The following features will be added to sync2cd as time permits.
If you are using or trying to use sync2cd, I would be happy to hear from you! I'm especially interested in the following:
In any case, just drop me an e-mail.
Copyright (C) 2007 Remy Blank