rsync notes

What...

rsync is possibly one of the most useful tools to the Unix administrator. If you aren't using it, learn it. If you are, you can always probably use it better.

The rsync man page. Required, if not reading, at least reference. MOST of what you need is there.

This page is my usage notes, things I learned the hard way, and will probably forget if I don't put them some place where I can find them. Maybe it will be useful to you.


--filter Include/Exclude statements

This looks so easy. And for very simple cases, it can be. But for more complex cases, it gets really weird really fast. Personally, I like to put the include/exclude statements into a separate file, then use a:
    --filter="merge file.txt"
rsync option to use them. This can simplify your command line to something like this:
    rsync $OPTS --filter="merge file.txt" $SOURCE $DEST
But here's an example...

I have a system that has ten copies of the same kind of data. One directory holds the production data, the other nine are development copies -- here, the data isn't interesting, it's the programs and scripts to process the data that are important.

Here's an rsync filter file that will grab just individual directories from certain subdirectories:

+ /
- /mnt
- /proc
- /sql_extract
- /tmp
- /dev
- /LDU1
- /opt/ATMDIALOG_BACKUP
- /lpp
- /usr/java*/
- /usr/lpp/
+ /APP
+ /APP/APP551
+ /APP/APP623
+ /APP/APP623/BATCH/
+ /APP/APP623/SPECFILES/
+ /APP/APP623/DATAFILES/
+ /APP/APP623/HELPFILES/
+ /APP/APP623/REPORTSPECS/
- /APP/APP623/*
+ /APP/APP650
+ /APP/APP650/BATCH/
+ /APP/APP650/SPECFILES/
+ /APP/APP650/DATAFILES/
+ /APP/APP650/HELPFILES/
+ /APP/APP650/REPORTSPECS/
- /APP/APP650/*
+ /APP/APP777
+ /APP/APP777/BATCH/
+ /APP/APP777/SPECFILES/
+ /APP/APP777/DATAFILES/
+ /APP/APP777/HELPFILES/
+ /APP/APP777/REPORTSPECS/
- /APP/APP777/*
+ /APP/APP789
+ /APP/APP789/BATCH/
+ /APP/APP789/SPECFILES/
+ /APP/APP789/DATAFILES/
+ /APP/APP789/HELPFILES/
+ /APP/APP789/REPORTSPECS/
- /APP/APP789/*
+ /APP/APP810
+ /APP/APP810/BATCH/
+ /APP/APP810/SPECFILES/
+ /APP/APP810/DATAFILES/
+ /APP/APP810/HELPFILES/
+ /APP/APP810/REPORTSPECS/
- /APP/APP810/*
+ /APP/APP952
+ /APP/APP952/BATCH/
+ /APP/APP952/SPECFILES/
+ /APP/APP952/DATAFILES/
+ /APP/APP952/HELPFILES/
+ /APP/APP952/REPORTSPECS/
- /APP/APP952/*
+ /APP/APP953
+ /APP/APP953/BATCH/
+ /APP/APP953/SPECFILES/
+ /APP/APP953/DATAFILES/
+ /APP/APP953/HELPFILES/
+ /APP/APP953/REPORTSPECS/
- /APP/APP953/*
- /APP/*
Here's the magic -- Everything UP TO the desired directory must first be "+" marked. So, if you want /APP/APP952/BATCH but not the rest of /APP/APP952/, the answer is you want to have:
+ /
+ /APP
+ /APP/APP952
+ /APP/APP952/BATCH
- /APP/APP952/*
- /APP/*
to make it work. This will NOT work:
- /			# BROKE!
+ /APP/APP952/BATCH	# BROKE!
as you are excluding everything...and it will never look in /APP/APP952 to find the BATCH directory.

Here's an example of grabbing just one directory:

+ /
+ /etc
- /*
You can read that as "get the root directory, get the /etc directory, then skip everything you haven't already got"

Hopefully, this saves some hair pulling.


rsync delta-transfer may not be your friend

rsync's big claim to fame has always been its ability to find what needs to be sent and copy over just the parts of files that changed, they call it their "Delta-xfer algorithm".

Newsflash: this is something you very possibly want to disable.

In real life, I've seen systems spend far more time trying to figure out how little to transfer than simply transferring entire files if a change has been made. In fact, we saw HUGE problems with this at an employer of mine -- sometimes the transfers would go fast, other times slow. But overall, things were ALWAYS faster if we used the -W option.

--whole-file, -W         copy files whole (w/o delta-xfer algorithm)
I'd suggest benchmarking it before making a decision, but I have yet to find something that delta-xfers make better. Most likely, it was a bigger win back in the dial-up and slow Internet days, but when moving big files with lots of little changes over fat (but high-latency) pipes, or with local fast links, I've just found disabling it with a -W always seems to make things faster and more consistent.

Compression may not be your friend

RSync supports compression with the -z option. SSH also supports compression.

You definitely don't want both rsync and SSH doing compression. And you may not want either doing it. As usual, benchmark, don't speculate.

Compression trades CPU time for network bandwidth, and these days, you may be short of neither, or both.


Hard link copies seem to be restartable!

One nifty unix concept beyond the scope here is the "hard link". If copied sloppily, multiple hard links to one file can turn into multiple copies of the same file.

rsync supports preserving hard links by using the -H option. However, I've alway been nervous about this -- copying a hard linked file properly is complicated, and I figured that if an rsync task got interrupted, the hard links might get messed up and turned into multiple copies of the same file.

I'm happy to report that I've had a few opportunities to test this (not intentionally!), and it seems under at least some circumstances, rsync will happily resume a transfer and copy hard links properly if interrupted. This was tested and verified on OpenBSD, but I suspect this means it will work on most major Unix OSs.



 

Holland Consulting home page
Contact Holland Consulting


since November 10, 2017

(C)opyright  2017, 2023 Nick Holland, Holland Consulting

Published: 11/10/2017
Revised: 6/15/2023