I've been wanting to write something up on workflows for quite some time. My career and daily activities centre around information. I generate and collect information, parse it, process it, repackage it and, eventually, disseminate. This takes time and energy, and finding new ways of making everything more efficient is, I think, generally positive.
I subscribe to the notion that data need to exist in several physical places, one of which should be offsite. Following this, my backup strategy more or less looks as follows:
There are several obvious components. The multiple hard drives, all named after Belgian cities or towns, form the physical backbone of my local backup procedures. In addition, I also make use of two cloud services for off-site storage, syncing and archival purposes, as I'll explain below.
The heart of my backup is a series of clones that are updated using SuperDuper! each night as I sleep. Using a simple shell script (powered by Lingon 3), three USB hard drives (Desselgem, Klinge and Lobbes) are mounted at 12:55am every night. Each of these drives are 2.5" drives and at least 500GB in size. (Klinge is actually a two-bay enclosure holding two WD Caviar Black 7200RPM drives in a RAID 1 configuration.) Beginning at 01:00, SuperDuper! launches and clones my primary computer (a late-2008 15" Macbook Pro) to each drive. I allow 1.5 hours for this to happen, after which Lingon 3 launches another shell script that unmounts the drives. The benefit of having a shell script doing the mounting and unmounting is that these drives are only used once a day (in the middle of the night), so there is little rational reason to have them spining when not needed.
As mentioned, another key part of my backup regime is off-site storage. This is managed in two ways. The first is an "occasional" clone (2.5" size) that is kept at my University office. I say "occasional" because, truthfully, I'm not that diligent in keeping this off-site drive up-to-date. Much of the reason for this is the ubiquity and ease-of-use of the numerous cloud services which are available. I use two: Amazon's remarkable S3 service and the excellent SugarSync. I'll describe how I use each in turn.
S3 is great because it offers excellent reliability (although I've not really tested this extensively) at a reasonable price (US$0.125 per GB of storage per month). I use S3 to back up my key work and personal files. For my work files, I use Arq to poll my MBP every hour for any and all file changes. These get subsequently uploaded to S3. I keep all of my personal data in a large sparsebundle with heavy encryption. This is duplicated on S3, and I use Panic's Transmit (on an ad hoc basis) to mirror the contents to a separate S3 bucket. I've quite a few other S3 buckets. One is is little more than a place to put large files that don't fit anywhere else but, eventually, I think I might need. A good example is miscellaneous Skype recordings that I make when discussing research with overseas colleagues. I record the conversation, but have little interest in keeping the large file on my hard drive at all times. Thus, I've set up DropZone to allow for a drag-and-drop to this S3 bucket.
I also have several other S3 buckets that have been useful for longer-term storage (yes, I am aware of Glacier, but am waiting for apps to leverage the API). One functions as archival storage for my lecture recordings. I've recorded every lecture and public event I've spoken at since 2004. Narcassistic, perhaps, but having access to these comes in handy every now and then. All of these recordings clock in at nearly 10GB, and I don't need to keep anything prior to, say, 2009 on my MBP or any of my local backups. With S3, storing about 7GB of my old lectures costs a paltry US$10 per year.
Another S3 bucket I use for archival purposes contains old CVs. In academia, one's CV is a record of their life. We use it to demonstrate our activities and, ultimately, remind ourselves of what we've done (and at what pace). There is really no rhyme or reason to when I elect to upload a CV to this bucket; it seems to be more or less random these days, thus begging for some kind of automation script.
I have another S3 bucket that houses various iterations of my many DevonThink Pro Office (DTPO) databases. These can be quite large, like my lecture recordings, but having older copies may one day come in handy just in case (knock on wood) something drastic happens to a particular database. DTPO is, however, a solid application, and I've never had a serious problems. Still, always better to be safe than sorry. Finally, my entire Dropbox folder is copied and updated to S3. This would seem a bit superfluous (given Dropbox uses S3), but it's peace of mind. As it turns out, this is not the only place where my Dropbox is backed up (see below).
One good thing about the S3 service is that you can choose the geographic location of your buckets. For instance, my work and personal buckets are located in Singapore. This geographically balances SugarSync's servers being in the United States.
One of the challenges with working with two computers is keeping things synchronised. I find SugarSync manages this beautifully. My other computer is a 2011 11" Macbook Air. It is what I use when I travel or head to the office, and it drives all my Keynote presentations. Despite its importance in my workflow, however, the MBA is little more than a modern equivalent of a dumb terminal. It doesn't get backed up because everything on it already exists in several other places. With SugarSync, if I'm in my office (or overseas), the minute I save a file it is automatically pushed up by SugarSync to its servers. Because it is also running on my MPB at home, SugarSync senses a change in the file in the cloud and immediately pulls it down. Seamless. This can be pretty magical when, for instance, I'm in New Zealand working and my main computer in Winnipeg gets updated within seconds.
I've read about how important versioning is for coders and programmers, but I think it can be equally important for writers as well. I've tried git, but I didn't really take to it. Maybe in the future, but not right now. I wanted a cleaner way of managing versions of working files, and I think I found one. On the MBP, I use ChronoSync to copy to an external hard drive (called Malderen in the figure above) new versions of files created. It does this by using the application's built-in scheduler, which launches every five minutes and pushes to that external hard drive any new versions of files. So, while SugarSync pushes up the most recent version (and overwrites the previous version), Chronosync is instructed to copy over the new file but not erase the previous one, thus creating a local versioning system. I can go back to any version I was working on that will be at least older than five minutes. And, I can keep as many as I wish. This has come in handy a few times. As a backup to this versioning process, I have Malderen copy itself to another hard drive (Latinne) whenever the latter is mounted. The obvious weak link is that Latinne needs to be mounted manually. The problem is that I can easily forget to do this for, say, weeks on end. In my defense, Latinne is a large 3.5" drive inside a rather noisy enclosure, so I don't want the enclosure running all the time.
I mentioned Dropbox earlier. I use this service to work with colleagues all over the world. It is extremely efficient. I mentioned above that I mirror the entire contents of my dropbox (under 2GB) to Amazon's S3. I also apply version control on my entire Dropbox. Every 10 minutes, Chrnosync checks my Dropbox folder for any updated files and these subsequently get copied to Malderen.
Wrap-up, and some caveats
There are a few things in my backup workflow that get triggered manually. First, keeping my numerous and large DTPO database in sync between two computers is not feasible using SugarSync. When I am actively using multiple databases, the constant changes would trigger endless uploads (and downloads) to SugarSync servers. This seems pointless, and thus I resort to a manually-triggered ChronoSync script (they call it a synchronizer) that copies over new databases from one computer to another (wirelessly). This can take time, and I have to remember to do it, but it works. The same holds true for my personal sparsebundle, which contains peritnent family information and documents. Keeping this on two computers without using the cloud means another manual ChronoSync script/synchronizer. Some might say cumbersome, but it is manageable.
This entire backup system has been years in the making, and is constantly being refined. That said, I've finally got it to the point where I don't worry too much about data loss. Years ago, while a grad student in Canada, the 486 computer that my Masters advisor let me use for my own Thesis work succumbed to a then-common boot sector virus. I didn't lose anything, but I thought I had. Since then, I've practiced what some might call overkill when it comes to backups, but I've never lost anything of importance for more than two minutes. Knock on wood.