the code to IBS can be downloaded here: ibs.txt. You will want to rename it to not have the .txt extension when it is installed.

IBS code

First of all, I'm claiming no particular skill at coding. I'm better than the vast majority of the human population (which doesn't code at all, so not that hard to beat them!), but I'm probably pretty lame when it comes to people who really know how to do this. So if you think something I say here is better than how you would have done it, maybe it is. If you think you would have done it differently, maybe your way is better. Or maybe it's just two different ways to accomplish the same task.

Earlier versions of this "application" were multiple scripts -- one to prep new directories, one to actually do the backup, one to control the rate of launch of the backups and gather the output to a script, etc. The problem I ran into with these was trying to keep them all in sync. So, I decided to pull them all into one bigger script. Probably not a good idea while developing an idea, but once it is demonstrated, I think it will work better.

Error handling

I consider this more an administrative script, not a full blown commercial grade "application", so code is more focused on readability than handling every imaginable error. That, and I'm lazy.

Errors that this system will run into boil down to two big types:

Fatal errors that require the stopping of any attempts to do backups.
Individual system errors -- a particular backup fails, but doesn't mean all should be terminated.

This leads to two big blocks of error checking -- before any job runs, and for each individual job. Once a job starts, the only remaining task is to check the return code of rsync, as pretty much everything is handled by rsync at that point. Yeah, this is basically 500+ lines of code to set up an rsync command.

Overall outline of the script

set some variables
Test if it was a request to generate a config template file
check and source the config file
check that all variables are set and maybe valid before trying going further

function preflightcheck -- test individual node for likely failure

function prep -- build and link a directory structure for a new backup client

function buhost -- backup one host
function reportheader -- print header for the backup report
function bureport -- print backup summary info for ONE host

Main body:
Look for "dash options", both "additive" options like -v and commands to execute
expand commandline host lists into a list of individual machines
for all machines
Run the intended command against the list of individual machines -- backgrounded.

code discussion

preflightcheck

If you find any new ways for a system being backed up to fail, put it in preflightcheck. Right now, it verifies the target directory exists, that it can invoke rsync on the remote machine, and in the process, tests name resolution, and login (this is a weak point in the error checking, testing each individual issue might be more user friendly, but usually the problem isn't hard to spot).

It also tests for a minimum number of daily and month-end backups. Early versions sometimes managed to rm all their old backups in the event of a serious problem that wasn't fixed, and administrators will sometimes "pull" too many backups out of the rotation without putting new ones in. The current setup assumes that all systems have the same minimum number of ME and daily backups. This may not be a good decision in your case, and maybe a per-host config file would be good.

Another benefit of checking for a minimum number of daily and ME backups is that if the file system the backups are stored on isn't mounted, this test will fail. A good application would give explicit error messages that were different between "insufficient number of backups" and "whoah! Where's the data?".

prep

Setup of a backup directory isn't normally hard, just create an appropriate number of approprately named directories, but as that might be ten or more directories and a symlink between IBSBASE and the physical storage.

I use 2000-00-xx and 2000-00-xx-ME for empty template directories, because they are obviously just place holders, but will be rotated out like any "real" backup directory. Maybe you want to vary the number of retained backups by system. You can do that simply by adding (or removing) daily or ME backups -- the number of backups before a run and after the run should be the same.

Most of the IBS operations use just a system name, and any path info is stripped off. "prep" needs a complete file name, so it is invoked differently by the main code of the script, and does not expand the *.list files. "prep" can also prepare multiple IBS systems in one invocation, but list expansion doesn't make sense, so I don't see this as a limitation. But being this was kinda bolted on late in the coding process, additional systems to prep are handled by shifting the parameters and recursively invoking prep again. I initially considered it a lazy hack, but I'm kinda liking how it doesn't impact readability of the code much. Opinions may vary.

Possible Improvements

prep assumes that there will be a backup directory, IBSBASE, where it looks like all the backups are based, and symlinks to the "real" storage location. On a small configuration, you might want to have the storage and IBSBASE be one and the same. This is not supported by my code here and I don't plan on testing it, so it is left up to you.

A "prep" tends to be done once per system, but if you want a "per host" configuration of retention policy, maybe the right answer is for IBS to "normalize" the number of backups every time it runs. That way, if a backup is "pulled" for any reason, a new replacement will be added automatically.

buhost

Backs up one host. The actual backup is a one-line call to rsync, but everything else here is about building that one line and making sure it is correct. First trick is to decide if this is the first time the backup has been run today. If so, we want to delete the oldest backup, and the most recent backup is the target for --link-dest. If this is NOT the first time a backup was run against this system today, we assume there was a backup failure, so we will use the PREVIOUS backup for --link-dest target, and not delete the oldest.

Perhaps an example might help about now. Let's say this is our backup directory for a system named "node1":

/bu/node1 $ ls -1
2020-00-01-ME/
2022-00-01-ME/
2022-00-02-ME/
2022-06-01-ME/
2022-07-01-ME/
2022-07-30/
2022-07-31/
2022-08-01-ME/
2022-08-02/
2022-08-03/
2022-08-04/
2022-08-05/
2022-08-06/

IF today is 2022-08-07, the backup has not been run yet today. So, 2022-07-30 will be deleted, and a new 2022-08-07 will be created, and 2022-08-06 will be the source of links from files that haven't changed.

So the command will be something like:

# rsync $OPTIONS --link-dest 2022-08-06 $HOST $IBSBASE/$HOST/2022-08-07

However, if today is 2022-08-06, a backup has already been attempted today, so nothing will be deleted, 2022-08-05 will be the source for links of unchanged files, and the backup 2022-08-06 will be resumed into 2022-08-06.

So in this case, the command will look something like:

# rsync $OPTIONS --link-dest 2022-08-05 $HOST $IBSBASE/$HOST/2022-08-06

Month-end backups

Month-end backups are handled by tacking a -ME on the end of the current directory name, and otherwise, the exact same logic is used.

The --link-dest of the current backup is the previous (hopefully good) backup, whether that is an ME or not. The only place where the ME backups are handled significantly differntly is during the deletion step -- -ME backups are only deleted when new -ME backups are being made, non-ME backups are only deleted when non-ME backups are being made.

Filter handling

rsync has a nifty, though perhaps not totally intuitive, filtering system for which files to include and ignore. IBS as presented here has two filters, a default and a per-machine filter. If no per-machine filter is used, then there is a fallback to the default filter; if the machine specific filter exists, then it is used and the default is ignored. Early versions of IBS had the filter in the $IBSBASE/$HOST directory, but this meant they didn't normally get backed up, and that could be kinda important, so they got moved to the config directory, typically /etc/ibs/.

The syntax of the --filter=filter.file files is discussed here.

Log files

An important part of the IBS system is the fairly verbose logging provided by rsync, but some additional information is stuffed into the log files before rsync is run. I have always stuffed the log files into $IBSBASE/z-logs but that could be a bad idea for many environments. I figured I was unlikely to create systems to back up starting with a 'z', so this puts them at the very end of a directory listing, but in many cases, this is not a valid assumption. Feel free to stick your logs into /var/log or some other good location for your needs. These files will need to be cleaned out once in a while.

However, at one installation I did, the actual data that needed to be backed up consisted of a big data directory, and a bunch of development directories that only tiny amounts needed to be backed up (the development work, not the data, that was just a copy of the old data of some time back and of zero value). The production directory was not the same on each system, and the number of development directories could change from day to day, and the backup administrators might have no idea it was happening. It begged for an automated generation of filters on a per-system basis, every night. This was entirely doable, but did take a bit of testing. But the result was fantastic -- every backup had all the production data, but also just the development reports and applications, not the test data. The disk savings on the system was huge, maybe 1/20th of the "grab everything" would have taken.

So this is definitely something that could be customized for individual sites.

bureport and reportheader

Together, "reportheader" and "bureport" functions create a report that should be run and sent out to all administrators daily (or as often as you run IBS backups). "reportheader" displays everything before the individual system output, "bureport" displays one line of output for each system, and is invoked with the name of the system to be reported upon. So for a full report, it looks like:

reportheader
for each host; do
    bureport host
done

In this script, the reportheader function checks and displays the status of the disk array, the space used and available on all the storage arrays, and the time the last backup was completed at. How these things are done will vary between platforms and even configurations. But all are recommended to be done somewhere, somehow.

There's a bit of math and digging through the report files looking for the important numbers. I've had past issues with rsync changing the way some of the data is displayed, so I'd expect at times in the future, you will have to change the strings grep'ed for in the report files as rsync gets updated. You may even have to get fancy if rsync changes the way info is presented and you can't update all your rsync's at the same time.

I'm very fond of putting the EOT (End of Text) line at the end of reports and other output. Somewhere, somewhen I saw "End of Text" as a signal that "all output you expect has been delivered" message, and it stuck with me. I may be the only person that uses it like that in the 21st century, so feel free to change the message, but I think all reports and programs should end with a message saying what program/script generated the report, what the command line was, who ran it, and what system it came from. It has proven useful many times.

Main body

First we look for "dash" options, and set the default CMD and other options based on those. The parser is very primative and it is probably fairly easy to craft a command line it will do the wrong thing with. Hopefully not a destructive wrong thing.

Note that the "prep" function is handled a bit differently. The function directly gets the list of machines to prepare for backups, no lists are expanded, no leading and trailing stuff is trimmed out.

Then, we fluff out the list of machines that are to be processed. Names are normalized -- leading directories and trailing slashes are removed. IF the target ends in a .list, it is assumed to be a list of machine names in the $CONFIGDIR directory, in which case, the comments and blank lines are stripped out of that file and the list of machines in that file are added to the list of things to process.

Finally, for each machine, the requested task is run. Time consuming tasks are backgrounded, fast tasks are left in the foreground. The main loop keeps an eye on how many rsync processes are running, if the number passes $MAXRSYNCS, then the new task is held off until the number of rsyncs drops below $MAXRSYNCS.

Possible improvements:

Found a situation where a file system had an error on it that prevented the removal a subdirectory, and thus the entire backup directory. This resulted in a build-up of extra backup directories, since IBS always tried (and failed) to delete the oldest. A possible fix would be to rename the failed backup directory to another name, say, rm-YYYY-MM-DD, and then rm that. bureport should then check for any stray rm-* files and report. On the other hand, in almost two decades of doing this kind of backup, I've only seen this problem once, so not sure it's worth the slight additional complexity of the code. (btw: the fix was to dismount the volume in question, fsck it, and then remount it, and rm the extra directories)
I'd like to compress and tar the existing log files when they get old. These log files compress well, tar'ing them together into one file would help them be easier to manage. I'm thinking gzip the file, then tar -r it to an annual archive.

Incremental Backup System home Page
Contact Holland Consulting

since August 8, 2022