the code to IBS can be downloaded here: ibs.txt. You will want to rename it to not have the .txt extension when it is installed.
Earlier versions of this "application" were multiple scripts -- one to prep new directories, one to actually do the backup, one to control the rate of launch of the backups and gather the output to a script, etc. The problem I ran into with these was trying to keep them all in sync. So, I decided to pull them all into one bigger script. Probably not a good idea while developing an idea, but once it is demonstrated, I think it will work better.
Errors that this system will run into boil down to two big types:
set some variables
Test if it was a request to generate a config template file
check and source the config file
check that all variables are set and maybe valid before trying going further
function preflightcheck -- test individual node for likely failure
function prep -- build and link a directory structure for a new backup client
function buhost -- backup one host
function reportheader -- print header for the backup report
function bureport -- print backup summary info for ONE host
Main body:
Look for "dash options", both "additive" options like -v and commands to executeexpand commandline host lists into a list of individual machines
for all machines
Run the intended command against the list of individual machines -- backgrounded.
It also tests for a minimum number of daily and month-end backups. Early versions sometimes managed to rm all their old backups in the event of a serious problem that wasn't fixed, and administrators will sometimes "pull" too many backups out of the rotation without putting new ones in. The current setup assumes that all systems have the same minimum number of ME and daily backups. This may not be a good decision in your case, and maybe a per-host config file would be good.
Another benefit of checking for a minimum number of daily and ME backups is that if the file system the backups are stored on isn't mounted, this test will fail. A good application would give explicit error messages that were different between "insufficient number of backups" and "whoah! Where's the data?".
I use 2000-00-xx and 2000-00-xx-ME for empty template directories, because they are obviously just place holders, but will be rotated out like any "real" backup directory. Maybe you want to vary the number of retained backups by system. You can do that simply by adding (or removing) daily or ME backups -- the number of backups before a run and after the run should be the same.
Most of the IBS operations use just a system name, and any path info is stripped off. "prep" needs a complete file name, so it is invoked differently by the main code of the script, and does not expand the *.list files. "prep" can also prepare multiple IBS systems in one invocation, but list expansion doesn't make sense, so I don't see this as a limitation. But being this was kinda bolted on late in the coding process, additional systems to prep are handled by shifting the parameters and recursively invoking prep again. I initially considered it a lazy hack, but I'm kinda liking how it doesn't impact readability of the code much. Opinions may vary.
A "prep" tends to be done once per system, but if you want a "per host" configuration of retention policy, maybe the right answer is for IBS to "normalize" the number of backups every time it runs. That way, if a backup is "pulled" for any reason, a new replacement will be added automatically.
Perhaps an example might help about now. Let's say this is our backup directory for a system named "node1":
IF today is 2022-08-07, the backup has not been run yet today. So, 2022-07-30 will be deleted, and a new 2022-08-07 will be created, and 2022-08-06 will be the source of links from files that haven't changed./bu/node1 $ ls -1 2020-00-01-ME/ 2022-00-01-ME/ 2022-00-02-ME/ 2022-06-01-ME/ 2022-07-01-ME/ 2022-07-30/ 2022-07-31/ 2022-08-01-ME/ 2022-08-02/ 2022-08-03/ 2022-08-04/ 2022-08-05/ 2022-08-06/
So the command will be something like:
# rsync $OPTIONS --link-dest 2022-08-06 $HOST $IBSBASE/$HOST/2022-08-07
However, if today is 2022-08-06, a backup has already been attempted today, so nothing will be deleted, 2022-08-05 will be the source for links of unchanged files, and the backup 2022-08-06 will be resumed into 2022-08-06.
So in this case, the command will look something like:
# rsync $OPTIONS --link-dest 2022-08-05 $HOST $IBSBASE/$HOST/2022-08-06
The --link-dest of the current backup is the previous (hopefully good) backup, whether that is an ME or not. The only place where the ME backups are handled significantly differntly is during the deletion step -- -ME backups are only deleted when new -ME backups are being made, non-ME backups are only deleted when non-ME backups are being made.
The syntax of the --filter=filter.file files is discussed here.
However, at one installation I did, the actual data that needed to be backed up consisted of a big data directory, and a bunch of development directories that only tiny amounts needed to be backed up (the development work, not the data, that was just a copy of the old data of some time back and of zero value). The production directory was not the same on each system, and the number of development directories could change from day to day, and the backup administrators might have no idea it was happening. It begged for an automated generation of filters on a per-system basis, every night. This was entirely doable, but did take a bit of testing. But the result was fantastic -- every backup had all the production data, but also just the development reports and applications, not the test data. The disk savings on the system was huge, maybe 1/20th of the "grab everything" would have taken.
So this is definitely something that could be customized for individual sites.
In this script, the reportheader function checks and displays the status of the disk array, the space used and available on all the storage arrays, and the time the last backup was completed at. How these things are done will vary between platforms and even configurations. But all are recommended to be done somewhere, somehow.reportheader for each host; do bureport host done
There's a bit of math and digging through the report files looking for the important numbers. I've had past issues with rsync changing the way some of the data is displayed, so I'd expect at times in the future, you will have to change the strings grep'ed for in the report files as rsync gets updated. You may even have to get fancy if rsync changes the way info is presented and you can't update all your rsync's at the same time.
I'm very fond of putting the EOT (End of Text) line at the end of reports and other output. Somewhere, somewhen I saw "End of Text" as a signal that "all output you expect has been delivered" message, and it stuck with me. I may be the only person that uses it like that in the 21st century, so feel free to change the message, but I think all reports and programs should end with a message saying what program/script generated the report, what the command line was, who ran it, and what system it came from. It has proven useful many times.
Note that the "prep" function is handled a bit differently. The function directly gets the list of machines to prepare for backups, no lists are expanded, no leading and trailing stuff is trimmed out.
Then, we fluff out the list of machines that are to be processed. Names are normalized -- leading directories and trailing slashes are removed. IF the target ends in a .list, it is assumed to be a list of machine names in the $CONFIGDIR directory, in which case, the comments and blank lines are stripped out of that file and the list of machines in that file are added to the list of things to process.
Finally, for each machine, the requested task is run. Time consuming tasks are backgrounded, fast tasks are left in the foreground. The main loop keeps an eye on how many rsync processes are running, if the number passes $MAXRSYNCS, then the new task is held off until the number of rsyncs drops below $MAXRSYNCS.
Incremental Backup System home
Page
Contact Holland Consulting
since August 8, 2022
Page Copyright 2022, Nick Holland, Holland Consulting.