Thursday, January 14, 2010

A Strategy for Data Backup

Backing up your data is a must. Since image files are the lifeblood of any photographer's business it only makes sense to invest in a backup solution that protects the business' most important assets.

Q: What kind of solution is the best?

A: It depends!

For a commercial photographer who maintains a library of images that can be sold for stock, licensed, etc., a backup solution has to be about as bulletproof as possible. On the other hand, a wedding or portrait photographer is unlikely to do any work on a finished session from two years ago and would only need to keep the finished jpegs and album design files from a client job. If they want to be a little more thorough they could keep the RAW files and Lightroom catalogs.

Keep in mind that the cost of any solution increases exponentially with the amount of complexity involved. So if you want a mirrored system that includes offsite backup, RAID servers (never just one), and the ability to restore in a few hours -- you're talking serious dough.

Over this post and a follow up I'll outline what I think is a solid, cost-effective, and reliable solution for wedding and portrait photographers.

First I'd like to offer the tenets of my backup strategy:

  • Figure out what you need
  • Redundancy is not a bad word
  • Storage is cheap.
  • Duplication is key.
  • Complexity is the enemy.
  • Keep your data moving.
  • Take it off site.
  • It's cheaper to buy a lot of drives than to recover one
  • Test your backups

Figure out what you need

Answering some basic questions about your business will shape your backup strategy and help you prioritize your efforts:

What kind of photographer are you?

How likely are you to use the images again for licensing or sale?

What is the data retention policy for your business?

What is the retention policy in your contract?

What is your shooting format (RAW, Jpeg, DNG)?

What is your output format (Jpeg, PSD, Tiff)?

Redundancy is not a bad word

The desire of any good backup system is to be redundant. You want to have your data in a pristine state in multiple locations, preferably on site and some off site.

Storage is cheap

Hard drives are ridiculously large, very reliable, and incredibly cheap. There is no excuse that you can make for not having backups. Hard drives and physical media (DVD, Blu-ray, Flash drives) are getting cheaper, smaller, and more dense all the time. As of January 2010 it's possible to get a 2 terabyte (2 x 1000 gigabytes) drive for less than $200. Blu-ray discs are about $30 for a 50 GB per disc, which is easily enough space to back up a single job or RAW files. This low cost leads to my next tenet...

Duplication is key

I don't think I know anyone who hasn't had a hard drive fail or accidentally deleted an important file, so having a *relatively current* backup of important things is a must. These aren't the old days, where you tucked the film negatives into protective sleeves and put them in a file cabinet. In the digital age there is no quality loss from making copies of your files, so buy multiple hard drives and make copies of those important files! It's a great way to guard against catastrophic data loss. Adding in Blu-ray discs or traditional DVDs is a great way to diversify the duplication process for even more redundancy.

Complexity is the enemy

You may or may not have heard the phrase "RAID is NOT a backup" (It's worth a Google search) That argument about RAID notwithstanding, the important point is that most photographers are not IT people with years of experience working in data centers, troubleshooting hardware issues, and restoring mission-critical data from failures. In general,dealing with proprietary RAID controller cards, striping, configurations, and the like is almost guaranteed to end badly. Software RAID is also a bad idea from a performance and reliability standpoint.

If you're a wedding or portrait photographer it's unlikely that you're dealing with large databases or massive single files. Image files just need to be catalogued decently (so you can find the images you're looking for if you have to restore) and stored in multiple places (redundancy). Keeping it simple may mean having more drives but it also means you'll have a much better chance of having the files you need when that day comes.

Keep your data moving

When new technology comes out and hits a reasonable price, like the newer hard drives or Blu-ray discs, it's a good idea to move your old backup data into the new format. That way you've moved your data to a newer location, effectively increasing redundancy and the life expectancy of that data.

Take it off site

Off site backup just means not in one physical location. It doesn't need to be at some data center or in "the cloud" necessarily. If you can afford to do the Enterprise-class off site backup and your needs are heavy duty, then by all means go for it. At the same time the combination of large RAW file sizes and the bandwidth limitations of even broadband internet limit the effectiveness of going off site over the Internet.

If you have a studio separate from your home then it can be as simple as just having copies of your data at your home. If you run a home-based studio then simply storing a stack of hard drives in a safe location, say your in-laws or a friend's house, can be an effective solution. There's also the option of safe deposit boxes at your bank as well.

It's cheaper to buy a lot of drives than to recover one

The question is: Pay now or pay later?

If you don't have a backup strategy it's likely you'll pay quite a bit at one time when things do go wrong. The high cost of data recovery services is a reality you should be able to avoid (most of the time) with a good backup strategy. I've seen the cost of restoring one drive of client files be over $2,000 -- and that was for a drive that the company couldn't fully restore.

And then there's the awful possibility of litigation for losing important client files if they haven't yet been delivered.

For $2,000 a photographer could easily purchase a bunch of large hard drives, protective cases, a Blu-ray burner, a stack of Blu-ray discs, and DVDs. Considering the value of your data it's foolish to trust a single hard drive or single disc when there is no downside to having additional copies of your work saved in multiple locations.

Test your backups

A backup is only good if it can actually be used. Even though it's a boring exercise it's absolutely crucial to test your backups to make sure they actually work.

It's worth it to simulate a disaster scenario every few months (or at least once a year) to ensure all that backing up is actually effective.


Damon is the technical (and bag-carrying) assistant to Agnes Lopez, a commercial and wedding photographer who works primarily in Ponte Vedra Beach and Amelia Island, Florida. When he isn't standing still as a lighting test dummy, setting up a c-stand, or holding a reflector, Damon works as an IT Business Analyst, where he gets to solve technical problems on a daily basis.

12 comments:

  1. Ahhh this is such a great post as I am currently wondering how to set everything up with my mac. I have tons of discs and externals but I am really looking to set up a RAID system as well.

    What is your opinion about Mozy?

    ReplyDelete
  2. My next post will outline a cost-effective but still robust system that will minimize the chance of data loss. It's a little under $2000 total and should serve the needs of most wedding and portrait photographers.

    I think the cloud services like Mozy, BackBlaze, etc. are a good option for storing finished jobs, but keep in mind their Terms of Service actually prohibit "professional" use, so the $5/month level is not really a completely legitimate option. There's also the issue of upload bandwidth, with most ISPs throttling users that are too active.

    ReplyDelete
  3. Oh, and *avoid* RAIDs.

    A good conversation is here at this link.

    Here are a few reasons I liked:

    RAID doesn't protect you against a file being deleted.

    RAID doesn't protect you against a file being overwritten.

    RAID doesn't protect you from your system being compromised and all of your data being overwritten, deleted, or corrupted.

    RAID doesn't protect you if the building burns down.

    It should also be mentioned that a hardware fault in the raid controller can easily corrupt the data on all attached disks. So while you reduce the danger from disk failures you add the danger of raid controller failures.

    ReplyDelete
  4. I saw this link on Agnes' FB page and thought "wow, Damon and I were just talking about this yesterday...". Then I clicked the link and see it's you. Good job!

    Jimmy Saal

    ReplyDelete
  5. RAIDs are great for local storage/backup, but they are not a complete backup solution in and of themselves (though they are almost always an important component of an onsite/offsite solution). There is a great discussion out on the interwebs about the fact that MTBF (mean time between failures) has held roughly constant as drive sizes have grown, meaning that it is increasingly likely that a parity RAID rebuild will fail. On a large, mission critical array (several TB), I'd only trust RAID6.

    Re: accidental data loss, for a fairly technical user you can run rsync as a cron job to create daily/hourly snapshots like Time Machine. Linux soft raid (mdadm) largely eliminates controller issues (at least from the RAID perspective, you could still have to replace your SATA controller).

    Also, I like CrashPlan for offsite, they have a ridiculously affordable package that allows you to back up every machine you own and are cross-platform (mac/windows/linux). Another fairly cheap offsite solution is Amazon S3 if you want to pay by the byte. Many FTP clients now support this, and as such it would also be easy to phase out old work by date as it becomes unnecessary to retain.

    ReplyDelete
  6. As a digital archivist, I think you've outlined the problem and solutions very well here. The need for data backup is not just an issue for professionals, however, people also need to back up their personal files! But I digress...

    I would caution that the long-term reliability of technology like Blu-Ray is not yet tested or known (DVDs are actually terrible - do not use them for backing up your data, and if you do, make at least 5 copies), so any backup strategy that involves Blu-Ray should also include another storage media backup, such as a hard drive. Actually, digital preservationists recommend that in addition to making multiple copies (one copy is no copy), you should store those on different types of storage media.

    Also, you allude to what we call "refreshing" which is moving data to new storage media periodically. This is not only becuase better and bigger media are coming out all the time, but also because the lifespan of any digital storage media is very short - only 3-5 years according to professional IT estimates. So refreshing your storage media in these intervals is really important.

    I was thinking a cloud-based solution might actually work well for photographers (in combination with another backup media). Even RAW images files aren't that big. But maybe I'm a little biased, since I work with video...

    ReplyDelete
  7. "Data expands to fill the space available for storage."

    @Adam: It's pretty scary to think that RAIDs are so likely to fail, but it's almost a given really.

    Good point on the rsync/Time Machine-type applications. We use the Mac application ChronoSync to do nightly backups of various drives and folders (in various states of work). I like the idea of capturing work at various stages.

    The numbers on Amazon S3 haven't come down enough to cheaply store RAW files yet except for the highest-end shooters. I'm excited for the future when cloud storage + FIOS offerings mean the end of the local hard drive.

    @Kara: Thank you so much for your comments. Since you are an actual Digital Archivist and this is one of the major topics of discussion for your field, I appreciate your input.

    I agree that DVDs are notorious and Blu-ray is untested, but I figure a little variety probably never hurt anyone on the backup side. If it's as simple as popping a $20 Blu-ray disc into a computer and backing up 50 GB of RAW files onto another medium that's certainly better than trusting one or two hard drives alone.

    To your point about the cloud-based storage, I see a time coming where it becomes an easy part of the workflow. Unfortunately bandwidth limitations make it difficult to store and -- perhaps more importantly -- restore files in a timely manner.

    ReplyDelete
  8. Thanks Damon! I think a frighteningly large amount of people are highly unaware of the dangers of not backing up their files/data, and those who are are often highly mistaken on what is effective/efficient. I'm looking forward to your next post!

    ReplyDelete
  9. Mozy makes my MacBookPro lock up like crazy. I would find everything had come to a slooooow crawl, and every time it was Mozy causing it. Nice knowing my Mac files are backed up though - mostly personal things, as I don't store client's work on it.

    MobileMe was a lot less annoying as far as not bogging things down.

    Thanks for sharing this! I look forward to part 2!

    ReplyDelete
  10. We use backblaze and it doesn't slow down the machines noticably - the bandwith from Comcast allows about 10GB to 15GB a day to upload (this is from actual experience).

    We use both Super Duper and ChronoSync to clone every hard drive every night. Super Duper is super simple, ChronoSync is more complex but more flexible - it can back up across the network and even merge data bydirectionally, not something I'd recommend for hard drives but we use that for our iDisk on the MobileMe servers since I put data there from both several machines including my phone and want it all backed up to the desktop machine in the office.

    Smugmug has replaced DVDs for us, all of our finished client files, and family pictures for that matter, get uploaded to Smugmug as soon as they're edited.

    ReplyDelete
  11. I back up every photo in 6 places, I am totally neurotic about it and don't really understand those who are not. Great article BTW!

    ReplyDelete
  12. Thanks Damon! With my 1st season finished and 20+ jobs sitting one 1 GDrive external with no Backup :0 I figured for this season I need a plan. I just yesterday went out & bought a LaCie 2Big Quadra for it's RAID feature because it's what I thought I "needed" but really had no clue what RAID meant. After speaking with a Serv Rep @ Micro Center and calling a LaCie Tech about their drive I realized it's not what I needed. I returned it today and felt lost again about what to do.

    I then tuned into my buddies @ DWF Forum and they were talking about this same topic with a link to this article.

    I'm glad I found it. You make it quite easy to understand & offer a solution to implement! I will go about starting on a smaller scale than yours but starting nonetheless.

    Muchas Gracias man!

    JA

    ReplyDelete