Copy Files in Linux Using rsync
When copying or moving large numbers of files, the generic UNIX utilities cp and mv are actually dangerous. Since the operations can take a long time, there is a fair chance something will happen to interrupt or stop the copy or move. When that happens on a move operation, your data will be in a inconsistant state with part of it still in the original location and part of it in the target destination. Even for a plain copy, restarting the copy is less than ideal as it will recopy everything that already got copied. This same issue applies to the SSH copy utility scp.
The rsync utility is a very advanced file transfer utility that solves these issues. It is installed on both OSX and Linux. If you look at the man page for rsync you will see it has a ton of options. Don’t let that apparent complexity scare you. Using it for most copy or move jobs is
very simple.
Simple Copy/Move Example
Take a look at this example of copying /source/dir/to/copy into /target/dir
rsync -avP /source/dir/to/copy /target/dir/
The end result will be a copy of /source/dir/to/copy located at /target/dir/copy. You can at this point actually run the exactly same command again. In fact you should do this to verify the copy. rsync looks at each file and only copies over what is not present or is different at the target destination. If you want to see each file that gets copied as it
happens, add the -v option. Also, you can add -P to get a progress bar on each files which is helpful when you have very large files.
If your intention was to move the data instead of just copy, you would then just run
rm -r /source/dir/to/copy
GOTCHA WARNING: one thing to be careful of is trailing slashes. Normally you NEVER want a trailing slash on the source directory but you DO want a trailing slash on the target directory. See the man page for more info.
Group Permissions
With the -a option the rsync will try to preserve both the exact permissions and group of the source. When you are copying data to one of your share groups areas, this can be problematic as it will ignore the sticky group bit as discussed in Understanding Group Permissions in UNIX. So instead you should run rsync with the following options:
rsync -rltP –chmod=ugo=rwX ..
On all Martinos CentOS7 machines, we have defined a global option alias -Z that does the above. So on these machines you can just run:
rsync -aZP ..
The key rule here to remember is use the -Z option when you are rsyncing files INTO your group storage areas
Other common options you may which to use:
-H
File Syncing and Mirror Backup
In the simple example above, if there are files in the target destination that are not present at the source, they will be left alone and not touched. Sometimes you want to the target destination to become an exact copy of the source, aka “a mirror”. To do that you want files on the target destination side to be deleted if they do not exist at the source. To do this you simply add the –delete option to rsync.
rsync -aZP –delete /source/dir/to/copy /target/dir/
Now any files under /target/dir/copy that are not also present under /source/dir/to/copy will be deleted.
File Transfer over the Network
If you want to copy/move files to a directory over the network to another computer, you simply need to preface the destination directory with the hostname of the remote computer to copy to followed by a colon.
rsync -aZP /source/dir/to/copy remotehost:/target/dir/
You will be prompted for your password on the remote host before the copy starts. If your user name is different on the remotehost than on the computer you are running rsync, then you need to specify username@remotehost rather than just remotehost.
If you are using rsync from a remote site outside MGH, please use the gateway server door.nmr.mgh.harvard.edu for your data transfer purposes.