Incremental Backup System
In the never-ending quest for a Good Backup System, I present
this bit of "crap code" -- Incremental Backup System, or IBS.
TL;DR version
IBS provides a very useful, very capable disk-to-disk backup system. After the
first backup run, all following backups are incremental based on the
previous day's backups, and yet through the magic of Unix hard links,
each individual backup LOOKS like a "full" backup, but only takes up
disk space incrementally.
Restoring from one of these incremental backups is trivial -- absolutely
everything you backed up is in one place, there's no need to restore
the "full backup" and then the ten or so "incrementals".
The real magic in IBS backups is in the usefullness of the backup. A
traditional backup tends to be useful for only one thing -- restoring
data. IBS backups can be used to quickly find answers to common
administrative questions, such as, "I see this user 'bob' -- when did it
get added to this system? how many systems does 'bob' exist on? What
DNS resolvers are my systems using? If you can ask a question in a way
that the existence or the contents of a file can answer it,
you can query an entire corporation of servers in moments.
Dovetailing nicely into this is the File Alteration Reporting Tool,
which looks for files that have been changed that you didn't expect
changes on. Using FART is not required for IBS to be very useful,
but at least as implemented here, FART is dependant upon IBS.
Note that what I'm providing here is a set of sample scripts based on
lessons learned over almost 20 years of using this process. This is
not being presented as a "finished product". No appology is made for
this -- every environment is different in some ways, and at some point
it is easier to just implement a system that does exactly what you
want than to crow-bar an off-the-shelf solution into your needs.
If you are after a turn-key solution, you should probably look elsewhere.
I'm interested. Tell more
What this system does well:
- Incremental backups: After the first backup, all backups should be
purely incrementals from the backup before.
- "rsync" and "ssh" are the backup client on the remote system being
backed up. Most Unix-like OSs either include these tools or have them
readily available for installation.
- Disk-based backups. Cheap media, cheap systems they attach to.
- Minimal network traffic
- Efficent use of disk space -- store many backups in a modest amount of
space.
- Incredibly useful backups. Most backups are pretty much useless until
you need to restore data, you will find these IBS backups are something you
refer to regularly. This has potential security issues, but also means
that your IBS system will be maintained and used -- unlikely to go looking
for a backup and find out the system died a month ago and no one noticed.
I've also found that while these systems often start out being wedged into
a corner of a machine with some extra disk space, after a while, the utility
of this system ends up justifying the maintenance and upgrade of the IBS
system.
- Good protection from ransomware attacks -- no code from the machines
being backed up is run on the server, so a ransomware attack on the machine
being backed up should not be a problem on the backup server.
In fact, if your backup size and time suddenly explode on you, that may
be an indication that a ransomeware attack just tookk place, changing
huge qantities of what should be "stable" files.
IF the backup server is a different (and incompatible) Unix version than
the backed up systems, this probably becomes "very good protection",
however, it is potentially possible to infect the backup server separately.
- Saving a backup for more than the standard rotation is trivial --
just rename the backup directory to something that is non-standard and it
is effectively pulled out of the rotation.
Don't forget to put a replacement in, though!
- Can make backups of very old systems. If the system to be backed up
has or can have any version of rsync installed, this system can work.
This is not a justification for running obsolete systems, but ... I know
it happens. And backups should be made. I used this to make backups of
Solaris 6 systems in the Solaris 10 days.
What this system DOESN'T do so well:
- Geographically diversity in the backup storage. All your backups are
in one basket. That being said -- it is very easy to set up multiple IBS
systems at different locations.
- Long-term archival storage: If you back something up on an IBS
system and put it away for ten years without any maintenance, odds are,
you will have trouble getting it back. The hard drive may not work, you
may be unable to find hardware that can read it, the OS that it was
built on may not run on newer hw, there may be no OS that can read the
old data from the disks.
However, moving old backups to new hardware and disks is very definitely
possible. Managing ten years of monthly backups might not be fun, though.
- Segmented security: Administrators of the IBS system will have full
visibility of everything on every system that IBS backs up. and full
access to the systems that are being backed up.
You will not be keeping secrets from your administrators.
- Encryption of data: while full disk encryption is an option, the files
will be visibile in unencrypted form when the storage volumes are unlocked.
But that's where the "incredibly useful backups" comes in, to be able to
search for data, you need to be able to read the data on the backup system.
You need to decide what is right for your application and risks.
- Bare metal restoration: If you are hoping to tap a couple keys and
have a failed machine restore completely from a backup to "bare metal"
(empty machine), this isn't it. IBS is much more about restoring
individual files. You can do a full restore by reloading the OS, the
apps, then use IBS to restore the config files and data files. But you
will have to have enough understanding of your application to know where
those data and config files are.
- Permission and ownership restoration: This is primarily a file
restoration system. Simple Unix permissions are generally ok. File and
group ownership are a bit more tricky. Fancy stuff like SELinux or
other extensions beyond basic Unix permissions probably will not be
recovered well, especially if copied between different Unix platforms.
- Unix-centric: Unfortunately, this does not do a very good job at
non-Unix backups. This may be changing, now that Windows
supports ssh.
- Supervision: Tomorrow's backup efficiency is based on today's
backup going well. If today's backup fails for some reason, tomorrow's
should succeed, but it may be slow and network and disk space intensive.
The reporting application will spot failures for you to fix, but you
will need to do it. And that "bump" in backup disk space can last as
long as the backups are retained. A human will need to keep an eye on
the backups.
- Turnkey: This is not a "ready to roll" application, and much of the
configuration consists of rewriting scripts. Honestly, if you are a
skilled Unix administrator, you may well find rewriting my scripts to
your needs easier than learning the configuration adjustements of a
commercial product.
- Politically correct name: yep, the name was chosen to offend (in
a light-hearted silly sort of way). My conscience is clear, however,
the person who came up with the name "IBS" had plenty of digestive system
problems, has had them for decades, and choses to laugh at them.
If this offends you, you have three options, in roughly the order of
recommendation:
- Get over it.
- Steal the idea and give it a more PC name.
- Reinvent the wheel.
Still interested? Ready for Special High Intensity Training!
The Script
- ibs: Most of the IBS functionality
embedded in this one file. This has comments about how the code works.
Might be accurate.
Soon(?) to come:
- File Alteration Reporting Tool:
Monitor for changes in "interesting" files, ignore changes in files you
expect changes from. Intended as a security tool, looking for
unexpected changes in files, proven to be an invaluable system
monitoring tool.
Sounds complicated, can you help us?
Maybe. Contact me and let us see what we can
do together.
Who's the competition?
well, I'm not anticipating making any money on the sale of this program, and can
only dream of maybe making some money on consulting and setup, so "competition"
isn't the right word. But, hey, the idea isn't mine, and while I think I've been
doing it longer than most, other people have come out with rsync --link-dest projects
as well. I'm just late to documenting it. If you have any other rsync --link-dest
projects that I should list here, let me know.
- Dirvish: This is where I
originally got the idea of using the magic of hardlinks and rsync.
Dirvish predates the rsync --link-dest option -- it basically
made a complete copy of the entire previous backup (via hard links) to
another directory, then rsync'd over it, grabbing the files that
changed. Brilliant idea, but really obsoleted by rsync
--link-dest. Kinda abandonware, it seems.
- rsnapshot: If you are looking for a
finished product rather than a starting point for your own uses, this
might be it. However, rsnapshot is 7000+ lines of Perl vs. less than
600 of ksh for IBS. You decide which you would rather adjust to your
needs (hey, download 'em both, try it! As I said, this isn't hostile
competition). Also -- their docs indicate they don't handle the "remote
system down" error condition well, which IBS does fine (now, IBS might
have 20 failure conditions rsnapshot handles better, just that one lept
out at me).
- Tarsnap: ok, this is almost
completely different in every way from IBS, so it probably shouldn't be
here. However, Tarsnap's and IBS's strengths and weaknesses complment
each other, so IBS + Tarsnap might be a fantastic combination (and in
fact, that's how I've used IBS in corporate environments -- as an
additional backup system, not an ONLY backup system).
Holland Consulting home
Page
Contact Holland Consulting
since February 20, 2022
Page Copyright 2022, Nick Holland, Holland Consulting.