Wednesday, August 21, 2013

 

rsync Overwrote a Newer File!

Normal: read the man pages.

Confusing archive with merge

One might naïvely suppose that the behavior of the -a --archive option will be to replace files only by newer files of the same name, or to ask whether or not to replace files. However, rsync is intended to just run without interaction, so the latter behavior would be inappropriate. Why would it replace files regardless? The command is to archive, not to merge, so to make a copy of each and every source file is correct, whether or not the source is older than an existing file of the same name.

I noticed this when I tried to use the -a option for merging two directories. Fortunately, I ran a test beforehand, as one should do if one doesn't want to risk losing files. In this instance, I had two versions of a file, one in the current directory and one in a subdirectory. My terminal session (Linux) went as follows (I was working with rsync with both source and destination on my local host):

$ mkdir joe
$ rsync -av parents/parents.org~ joe/
$ ll joe
 ... 20166 2012-05-10 13:24 parents.org~*
$ rsync -av parents.org~ joe
$ ll joe
 ... 11758 2012-01-20 18:33 parents.org~

Clearly, the second rsync had overwritten a newer (larger) file by an older (and smaller) file of the same name. Studying the man page, one sees that the archive option comprises the following options, but not the -u |--update option:
-r
recursive, copying subdirectory contents
-l
copy symlinks as symlinks
-p
preserve permissions
-t
preserve timestamps
-g
preserve group
-o
preserve owner
-D
include --devices and --specials (if executed by su?)
The -u | --update option skips files that are newer on the receiver.

To merge with rsync

For the merge operation, I switched to

$ rsync -avu parents.org~ joe
$ rsync -avu parents/parents.org~ joe

This works as long as one specifies files in such a way as to avoid the recursive application to subdirectories. To merge just the files of the current directory with just the files of a subdirectory so as to have all the files and the most recent versions in one place, I must not do this:

$ mkdir joy
$ rsync -avu * joy

That would also create joy/joy (and fill it), joy/joy/joy (and fill it), ad infinitum recursively, I presume.  I have not actually tried it, and I do hope rsync would detect the circularity of the self-reference and throw an error without executing. 

This could work if the target were not in the descent of the current (source) directory; the problem arises when writing to the destination adds files to the source!

One Solution

To merge just the files and not the subdirectories, one solution is to call for all the options composing -a except -r.

$ mkdir jay
$ rsync -lptgoDvu * jay

The next step (in this scheme, since one could have copied the source directories in the opposite order) is copying the source subdirectory into the --new-- destination subdirectory. The same options will not work: the source directory will be skipped. One must add the -d option to process the directory non-recursively; this will copy subdirectory names along with the directory's files, but not the contents of the subdirectory.

$ rsync -lptgoDvu parents/ jay

A Better Solution

A better solution is to cd parents (where parents is the directory which is the second source of files) so that no descent into a subdirectory is needed when rsync is called. The target destination must then be adjusted, of course.

$ mkdir jay
$ rsync -lptgoDvu * jay
$ cd parents
$ rsync -lptgoDvu * ../jay
~/jay now contains all the files--but none of the directories--that were in ~/, ~/parents or both, in the latest version if there were duplicate file names.

Tags: :

This page is powered by Blogger. Isn't yours?