CrashPlan and OS X: Love in the Metaverse

I love CrashPlan. I love Mac OS X. And I love metadata.

All three are related and if you're running CrashPlan on Mac OS X, you may want to read this.

What is metadata?

You probably know what CrashPlan and OS X are, but do you know what metadata is? Metadata is data about data. The earliest (and best) example of metadata I can think of is “File Creation Date.”

When saving a file on your hard drive, your computer automatically records information about when the file was created, right? You don't manually say, “I created this on 3/19/99 at 12:00pm.” Your computer does it for you.

iTunes Movie Rental Backup Problem

I back up my Macbook with CrashPlan. My files are automatically backed up in real time to two destinations within a few minutes of being changed. Because I back up my iTunes folder, any changes in that folder are automatically backed up too! Changes include my favorites, playlists, and movie rentals!

A movie rental file is 1-2GB file that gets automatically deleted after 24 hours. If you back it up, it won't play after 24 hours anyway. Basically, it's a complete waste of time to back up. How do we tell CrashPlan to avoid unnecessary backups?

Metadata to the Rescue

Well, it turns out that really isn't a problem at all! iTunes attaches metadata that says, “Don't back me up” to each movie rental file. CrashPlan is smart enough to note the request and ignore the file.

Any program on OS X can tag any file with the “Don't back this up” metadata. Another example of this would be VMware, a product that lets you run Windows on Mac. The virtual disk file “vmdk” is tagged with “Don't back this up” and we honor that request.

Generally this approach is the best one. Let the developers agree on what is important for backup and what isn't. Let customers worry about more important things.

But what if you don't agree with VMware and you want to back up that VMware image?

What is Being Excluded?

Finding out what is excluded from backup due to metadata is easy, but requires interaction with the OS X command line. Enter the following line in Terminal to find out which files are being excluded on your system:

sudo mdfind "com_apple_backup_excludeItem = 'com.apple.backupd'"

The above command queries spotlight to tell you every file that has the metadata “com_apple_backup_excludeItem” which means “Don't back this up.”

Removing Exclusion Metadata

In my case, I agreed with the developers in all but one case: my Windows Vista virtual disk called “windowsvista.vmdk” located in the “windowsvista” folder. Here is how to remove the metadata on this file to make sure it will be backed up by CrashPlan:

xattr -d com.apple.metadata:com_apple_backup_excludeItem windowsvista.vmdk  #This removes the meta data from vmdk file
sudo mdfind "com_apple_backup_excludeItem = 'com.apple.backupd'"            #Check again to make sure it's gone

Now CrashPlan is backing up my windowsvista disk. If I was running TimeMachine, it would also back up the vmdk file over and over again quickly filling up my destination drive. CrashPlan doesn't suffer from this issue as it knows which parts of the file changed and only sends those. This explains why VMware excluded their disk image from backup and I suspect there are other programs out there that do so as well.

It is wise to make sure your critical files are not marked with metadata that says, “Don't back me up!”