Incremental Backup System
In the never-ending quest for a Good Backup System, I present
this bit of "crap code" -- Incremental Backup System, or IBS.
TL;DR version
IBS provides a very useful, very capable disk-to-disk backup system. After the
first backup run, all following backups are incremental based on the
previous backup, and yet through the magic of Unix hard links,
each individual backup LOOKS like a "full" backup, but only takes up
disk space incrementally.
The only dependencies are rsync (any modern version on the backup
server, incredibly old supported on the client), and a POSIX shell
(ksh, sh or bash) on the backup server.
Restoring from one of these incremental backups is trivial -- absolutely
everything you backed up is in one place, there's no need to restore the
"full backup" and then the ten or so "incrementals". You can directly
access all the backed up files of any historic backup, no reconstruction
is needed. You can directly inspect and compare any versions of files
between any backups.
The real magic in IBS backups is in the use-fullness of the backup. A
traditional backup tends to be useful for only one thing -- restoring
data. IBS backups can be used to quickly find answers to common
administrative questions, such as, "I see this user 'bob' -- when did he
get added to this system? how many systems does 'bob' exist on? What
DNS resolvers are my systems using? If you can ask a question in a way
that the existence or the contents of a file can answer it,
you can query an entire corporation of servers in moments.
I have implemented this system on a Atom powered Netbook doing
"self-backups" of its hard disk to a SD card installations of
over 400 servers and over 20TB of data.
I've implemented it on systems with a 250GB hard disk on a Pentium
II to small AIX systems with SANs for storage.
Dovetailing nicely into this is the File Alteration Reporting Tool,
which looks for files that have been changed that you didn't expect
changes on. Using FART is not required for IBS to be very useful,
but at least as implemented here, FART is dependant upon IBS.
Note that what I'm providing here is a set of sample scripts based on
lessons learned over almost 20 years of using this process. This is
not being presented as a "finished product". No apology is made for
this -- every environment is different in some ways, and at some point
it is easier to just implement a system that does exactly what you
want than to crow-bar an off-the-shelf solution into your needs.
If you are after a turn-key solution, you should probably look elsewhere.
I'm interested. Tell more
What this system does well:
- Incremental backups: After the first backup, all backups should be
purely incrementals from the backup before.
- "rsync" and "ssh" are the backup client on the remote system being
backed up. Most Unix-like OSs either include these tools or have them
readily available for installation. Almost any version of rsync will
work on the client.
- "rsync", "ssh" and sh/ksh/bash compatible shell on the backup server
are all the dependencies.
- Disk-based backups. Cheap media, cheap systems they attach to.
- Minimal network traffic
- Efficient use of disk space -- store many backups in a modest amount of
space.
- Incredibly useful backups! Most backups are pretty much
useless until you need to restore data. You will find these IBS backups
are something you refer to regularly. This has potential security
issues, but also means that your IBS system will be maintained and used
-- unlikely to go looking for a backup and find out the system died a
month ago and no one noticed. I've also found that while these systems
often start out being wedged into a corner of a machine with some extra
disk space, after a while, the utility of this system ends up justifying
the maintenance and upgrade of the IBS system.
- Educational and informative backups: I have often found things were
broken when I found the IBS backup report showed abnormalities that other
monitoring systems missed.
- Good protection from ransomeware attacks -- no code from the machines
being backed up is run on the server, so a ransomeware attack on the machine
being backed up should not be a problem on the backup server.
In fact, if your backup size and time suddenly explode on you, that may
be an indication that a ransomeware attack just took place, changing
huge quantities of what should be "stable" files.
IF the backup server is a different (and incompatible) Unix version than
the backed up systems, this probably advances from "good" to "very
good protection",
however, it is potentially possible to infect the backup server separately.
- Saving a backup for more than the standard rotation is trivial --
just rename the backup directory to something that is non-standard and it
is effectively pulled out of the rotation.
Don't forget to put a replacement in, though!
- Can make backups of very old systems. If the system to be backed up
has or can have any version of rsync installed and be communicated to
via ssh, this system can work.
This is not a justification for running obsolete systems, but ... I know
it happens. And backups should be made. I used this to make backups of
Solaris 6 systems in the Solaris 10 days.
What this system DOESN'T do so well:
- Geographically diversity in the backup storage. All your backups are
in one basket. That being said -- it is very easy to set up multiple IBS
systems at different locations.
- Long-term archival storage: If you back something up on an IBS
system and put it away for ten years without any maintenance, odds are,
you will have trouble getting it back. The hard drive may not work, you
may be unable to find hardware that can read it, the OS that it was
built on may not run on newer HW, there may be no OS that can read the
old data from the disks.
However, moving old backups to new hardware and disks is very definitely
possible. Managing ten years of monthly backups might not be fun, though.
- Segmented security: Administrators of the IBS system will have full
visibility of everything on every system that IBS backs up. and full
access to the systems that are being backed up.
You will not be keeping secrets from your administrators.
- Encryption of data: while full disk encryption is an option, the files
will be visible in unencrypted form when the storage volumes are unlocked.
But that's where the "incredibly useful backups" comes in, to be able to
search for data, you need to be able to read the data on the backup system.
You need to decide what is right for your application and risks.
- Bare metal restoration: If you are hoping to tap a couple keys and
have a failed machine restore completely from a backup to "bare metal"
(empty machine), this isn't it. IBS is much more about restoring
individual files. You can do a full restore by reloading the OS, the
apps, then use IBS to restore the config files and data files. But you
will have to have enough understanding of your application to know where
those data and config files are.
- Permission and ownership restoration: This is primarily a file
restoration system. Simple Unix permissions are generally OK. File and
group ownership are a bit more tricky. Fancy stuff like SELinux or
other extensions beyond basic Unix permissions probably will not be
recovered well, especially if copied between different Unix platforms.
- Unix-centric: Unfortunately, this does not do a very good job at
non-Unix backups. This may be changing, now that Windows
supports ssh.
- Supervision is required: Tomorrow's backup efficiency is based on today's
backup going well. If today's backup fails for some reason, tomorrow's
should succeed, but it may be slow and network and disk space intensive.
The reporting application will spot failures for you to fix, but you
will need to do it. And that "bump" in backup disk space can last as
long as the backups are retained. A human will need to keep an eye on
the backups, and will probably need to clean up ones that failed.
- Turnkey: This is not a "ready to roll" application, and much of the
configuration consists of rewriting scripts. Honestly, if you are a
skilled Unix administrator, you may well find rewriting my scripts to
your needs easier than learning the configuration adjustments of a
commercial product.
- Politically correct name: yep, the name was chosen to offend (in
a light-hearted silly sort of way). My conscience is clear, however,
the person who came up with the name "IBS" had plenty of digestive system
problems, has had them for decades, and chooses to laugh at them.
If this offends you, you have three options, in roughly the order of
recommendation:
- Get over it.
- Steal the idea and give it a more PC name.
- Reinvent the wheel.
Still interested? Ready for Special High Intensity Training!
The Script
- IBS: Most of the IBS functionality
embedded in this one file. This has comments about how the code works.
Might be accurate.
- File Alteration Reporting Tool:
Monitor for changes in "interesting" files, ignore changes in files you
expect changes from. Intended as a security tool, looking for
unexpected changes in files, proven to be an invaluable system
monitoring tool.
Sounds complicated, can you help us?
Maybe. Contact me and let us see what we can
do together.
Who's the competition?
well, I'm not anticipating making any money on the sale of this program,
and can only dream of maybe making some money on consulting and setup,
so "competition" isn't the right word. But, hey, the idea isn't mine,
and while I think I've been doing it longer than most, other people have
come out with rsync --link-dest projects as well. I'm just late to
documenting it. If you have any other rsync --link-dest projects that I
should list here, let me know.
- Dirvish: This is where I
originally got the idea of using the magic of hardlinks and rsync.
Dirvish predates the rsync --link-dest option -- it basically
made a complete copy of the entire previous backup (via hard links) to
another directory, then rsync'd over it, grabbing the files that
changed. Brilliant idea, but really obsoleted by rsync
--link-dest. Kinda abandonware, it seems.
- dump and
restore: the classic
Unix backup and restore. Good for near bare-metal restoration of
systems back to exactly what they were when backed up, but otherwise,
near useless until you need them. And odds are, you haven't tested
a restore, so you have no idea if you can get your data back.
- rsnapshot: If you are looking for a
finished product rather than a starting point for your own uses, this
might be it. However, rsnapshot is 7000+ lines of Perl vs. less than
600 of ksh for IBS. You decide which you would rather adjust to your
needs (hey, download 'em both, try it! As I said, this isn't hostile
competition). Also -- their docs indicate they don't handle the "remote
system down" error condition well, which IBS does fine (now, IBS might
have 20 failure conditions rsnapshot handles better, just that one leapt
out at me).
- Tarsnap: OK, this is almost
completely different in every way from IBS, so it probably shouldn't be
here. However, Tarsnap's and IBS's strengths and weaknesses complement
each other, so IBS + Tarsnap might be a fantastic combination (and in
fact, that's how I've used IBS in corporate environments -- as an
additional backup system, not an ONLY backup system).
- rdiff-backup: Stores your
most current backup in easily restored form, older backups are stored
as "reverse diffs" of the more recent backups. Requires Python.
Holland Consulting home
Page
Contact Holland Consulting
since February 20, 2022
Page Copyright 2022, 2023 Nick Holland, Holland Consulting.