ARCHIVAL DIGITAL PHOTO BACKUP
Backing up your photos so that they last for 100 years is no longer as simple as having an archival print made and stored in a safe frame. Modern digital images and scans require an intimate understanding of topics such as file format, data degradation, media type and ever-changing storage technologies. This tutorial summarizes the best strategies in three stages — what to store, how and where to backup, and what to do once everything's archived — so that you can be confident your photos will stand the test of time.
ARCHIVAL FILE FORMATS FOR PHOTO STORAGE
Here's a topic that keeps many photographers up at night: how can you be truly sure that the photos you are saving will be readable on computers 10, 50 or 100 years down the road, with vastly different technology? Will Canon, Nikon, Sony or another camera manufacturer's proprietary RAW format still have full software support, and will the images be reproduced exactly as before when loaded?
Unfortunately, the photo on the right will not necessarily last as long as the one on the left did.
However, if the necessary precautions are taken, not only will the photo on the right be preserved, but it also won't be subject to the gradual fading and deterioration of the photo from 1890.
The chosen file type is therefore an important first consideration when backing up archives of your photos. The table below compares the most common file formats:
|Archival File Format||Size||Quality||Software Compatibility|
|TIFF (8 bit)||Medium||Medium||Excellent|
|TIFF (16 bit)||Largest||High||Excellent|
|RAW files: CR2, NEF, etc||Large||Highest||Good now;
Questionable years later
Excellent years later (in theory)
JPEG files are by far the most likely to be widely supported many years down the road; after all, JPEG has become a near standard for images on the internet. If you already have a lot of photos taken in JPEG, then the choice of what format to store them in is easy: leave them as JPEG files. However, for future photos, it's highly advised that you shoot in RAW if your camera supports it, as discussed later.
TIFF files are a close second to JPEG when it comes to compatibility, but are much higher quality because they do not use JPEG's lossy image compression. For many, TIFF achieves an optimal balance. However, TIFF files either preserve much less about the original photo (if the bit depth is 8-bit), or are even larger than RAW files despite preserving a little less of the original image (if the bit depth is 16-bit).
RAW files are certainly the best when it comes to preserving what was originally captured, while still being smaller than 16-bit TIFF files. However, nearly every camera has a slightly different RAW file, so it's highly unlikely that general software 10-20 years later will be able to open every one of these file types correctly. RAW file backup therefore leaves two options: (i) to convert them to some other format, or (ii) to backup the RAW files in their native format until some later date when you start to notice compatibility issues, and a suitable replacement format exists.
Many feel that a suitable format already exists: the Digital Negative (DNG) file format, which was created by Adobe to address many of the problems associated with longer term archival storage. It is an open standard and royalty free, so you can be sure the files can be more easily and universally opened in the future. DNG aims to combine the compatibility advantages of TIFF and JPEG file formats with the quality and efficiency advantages of your camera's original RAW files.
However, even DNG is not future-proof. With the exception of Adobe software, support is still not as universal as one would like it to be for a format that aims to be archival (although this is rapidly changing). Further, companies go in and out of existence (remember the once dominant Kodak?), DNG itself has version numbers, and DNG is helpless if sensor technologies change dramatically.
Another consideration is how to store various edited versions of your files, which is something that DNG does not address. Multiple 16-bit TIFF, PSD or other files can quickly become extremely large and unmanageable. The best way to conserve storage space is to save file types that preserve the editing steps, but do not actually apply them as an additional saved TIFF file. RAW conversion software often has the ability to store the settings you used to convert the RAW file, such as XMP sidecar files (for Camera RAW), catalog files (for Lightroom), and library files (for Aperture), amongst others. In Photoshop, using and storing adjustment layers is also a great way to avoid multiple intermediate files for each edit.
Unfortunately, many of the formats used for storing edited photos are also subject to future compatibility issues. Fortunately, this is one area where changing technology can mean you'd like to rework certain images using the latest software and techniques. Just make sure that you also have an archived version of the unaltered original photo.
Overall, the only fail-proof solution is to keep your data up-to-date. Every few years it's a good idea to convert file types that are in danger of becoming obsolete.
CHOOSING PHOTO BACKUP MEDIA
Even if we use a compatible file format, how can we be sure that these files will later be accessible on our chosen backup device or media? Remember 5.25 inch floppy disks? In fact, the US Federal Government is so concerned about this topic that they house and maintain computers at various stages of advancement — just in case a file can only be loaded on one of these older computer setups.
CD, DVD, Blu-Ray, or other removable media has been the primary method of consumer backup for quite some time. They have the advantage of being reasonably inexpensive and broadly compatible. Probably their biggest drawback is inconsistency; some removable media lasts only 5-10 years, while others claim a lifetime of 50-100 years. It can often be difficult to tell which longevity category your media purchase falls under.
Do not assume that all writable media is created equal. There's often a dramatic difference in longevity between one brand and another. Pay attention to the type of dyes used (blue, gold, silver, etc), to online accelerated aging tests, and to reports of issues with a particular model/batch.
External hard drives, while a newcomer on the backup scene, have made great progress since they've dropped in price tremendously over the past several years. Hard drives can store a tremendous amount of information in a small area, are quite fast, and permit the backed up data to be immediately accessible and modifiable. Over time they can gradually demagnetize, but the biggest concern is that they may not spin up because their internal motor has failed (although no data is lost, it can be expensive to recover). Another concern would be if the eSATA, USB or firewire connector becomes obsolete.
Tape backup, while once the "go to" method of archiving data, is becoming increasingly marginalized, and today is only really used for large corporate backups. Consumer models are less common, and they haven't quite kept pace with the storage density progress that's been made with hard drives. Further, some tapes are much more vulnerable to humidity, water damage and other external factors than are external hard drives or other removable media. Their biggest advantages are that (i) they are very inexpensive for high volume backups, and (ii) they do not require an internal motor, and thus have no risk of not spinning up for access (unlike hard drives).
Unfortunately, the only future-proof solution is to migrate your data over to the latest technology every 3-5 years. Fortunately, storage technologies have been increasing in capacity exponentially, so your old 10 photo DVD's can be combined into just one Blu-Ray or a fraction of an external hard drive — and one would expect that 10 or more of these could easily be combined into just one of the next storage technology, and so on. This means that even if you continue to accrue photos, the amount of work required to transfer these will not necessarily increase each time you need to do so.
PRESERVING IMAGE INTEGRITY
No matter what the backup media, all data degrades over time, and errors can occur each time you copy your images from one location to another. The chemicals in a DVD disk gradually decompose, tapes and hard drives eventually become demagnetized, and flash memory can lose its charge. All of these processes are inevitable. The following is a real-world example of what it can look like when a photograph becomes corrupted:
Noticing the above flaw requires zooming in 100% and inspecting specific regions of the photograph, even though it would be easily visible as a colored streak in a print. This is a little unsettling considering that most people have hundreds if not many thousands of photographs; identifying each and every corrupt image would clearly be unrealistic.
Further, image corruptions become replicated in each subsequent backup copy, and would go unnoticed until a print were made many years later.
A storage technique that employs parity, checksum or other data verification files is the only way to systematically spot these problems before they permanently alter your photo archives. That is the only reason the photo in the above example (and others) were identified before they became a problem. The following chart outlines some of the most common techniques for preventing, verifying and repairing corrupt photographs:
|Type||Primary Use||How It Works|
|RAID 1,5,10||PREVENTION||A RAID 1, 5 or 10 is an array of disk drives with fault protection in case one of your drives fail. These can continue to operate even if a drive fails, without losing any information. However, they can also substantially increase costs since they require additional disk drives and a RAID controller.|
|SFV or MD5 Checksum Files||VERIFICATION||Checksum files verify that a file or copy is identical to its original. They are effectively digital fingerprints, which are created based on every 0 and 1 in a digital file. When even one bit of the file changes, it's almost guaranteed that the fingerprint won't match. However, that's all they do: inform you when there was an error.|
|Parity or Recovery Files||REPAIR||Parity files can be used to repair minor damage without requiring a full duplicate of the original. They store carefully chosen redundant information about a file; if part of the original becomes corrupt, the parity file can be used along with the surviving portions of the corrupt file to re-create the original data. However, parity files take up increasingly more space if you want to recover files which are more badly damaged.|
Technical Notes: Although it's beyond the scope of this article, RAID comes in many varieties; RAID 1 is effectively two disks containing identical data at all times; RAID 5 is three or more disks where one drive contains parity data; RAID 10 requires four drives, and is similar to RAID 1 except it improves performance by simultaneously reading/writing to multiple drives. RAID 0 should not be used with critical data since it increases the failure rate in exchange for better performance.
If you routinely work with very important photographs, the best protection is achieved by using RAID while editing on your computer and between backups, and storing MD5/SFV checksum and parity files along with your archived photographs. Alternatively, using an enterprise-class solid state drive (SSD) can improve both performance and reliability.
A simpler solution would be to store two backup copies immediately after the photo is captured. This way you do not need to worry about complicated RAID or parity files, but you will still need to store SFV or MD5 checksum files** along with each archived photo. There are far too many programs that can read or create SFV and MD5 files to list here; a quick search engine query will yield several free options. If you ever identify a corrupt file, then the other backup copy can be used as a replacement. Not having RAID means that there's no protection against losing intermediate edited files on your computer, but these are usually much less important than the unaltered originals.
**Technical Notes: A checksum is a digital fingerprint that verifies the integrity/identity of a file. SFV stands for "single file verify", and contains a list of checksums corresponding to a list of files. MD5 checksums were created to not just verify the integrity of a file, but to also verify its authenticity (that no person had intentionally modified a file). CRC checksums are much quicker to calculate than their equivalent MD5 checksums, but MD5 checksums are also more sensitive to file changes. There are other checksum file types available, but SFV and MD5 are currently the most widely supported.
Regardless, it's important to keep your data "fresh" by copying it to some other media after 5-10 years — even if the file format or media isn't in danger of becoming obsolete.
WHERE TO STORE YOUR PHOTO ARCHIVES
The best location to store your archival photo backups is in a cool, dry place with a reasonably constant environment and minimal need for movement. If there's a chance of humidity, be sure to seal the media in a plastic bag prior to storage.
However, unforeseen accidents such as theft and fire can occur, so any fail-proof backup strategy should make use of multiple backup locations. This could mean having a duplicate archive in a safety deposit box, at a friend or family's household or at some remote online server. If your internet connection is fast, backups can even be transferred regularly and systematically via FTP. Depending on the size and quantity of your photos, some even treat online photo sharing sites as backup locations. However, this is not an option for true digital negatives, such as RAW files, since they cannot be displayed as is.
Try and stick to a regular backup schedule with an easy to follow naming convention. After all, if you cannot find a photo once it's been archived then it's as good as lost.
MINIMIZING RISK OF ACCIDENTAL DELETION
OK, so we've now gone to great lengths to ensure that (1) the file format will be readable, (2) the backup media will be loadable and (3) the accuracy of each photo will be preserved identically. What's preventing someone from mistakenly deleting or overwriting some of your photo archive? Of course, clearly labeling your media is a must, but it might also be a good idea to make the archived photos read only, and to password-protect the photos folder and/or media. However, adding a password is a double edged sword, because it means there's always the possibility of forgetting the password. If this is a concern, then simply use a password of "password", since the purpose is to add another barrier to inadvertent deletion as opposed to preventing unauthorized access.
SUMMARY OF ARCHIVAL PHOTO BACKUP OPTIONS
Photographers can be loosely grouped into one of two categories:
Backup Strategy: Casual photographers should try and save their JPEG files using the highest quality "superfine" (or similar setting) to minimize image compression artifacts. Each batch of photos should be backed up in two copies on removable media, ideally with SFV or MD5 checksum files to identify if any image later becomes corrupt. Archived photos should be transferred over to new media every 5 years to keep the storage technology up to date, and to prevent corrupt images by keeping the data fresh.
Backup Strategy: Discerning photographers should always save their photos using their camera's RAW file format. Any photo editing should ideally occur on a computer with duplicate hard drives in RAID 1 or a SSD, otherwise unaltered photos should be backed up immediately after capture. RAW files should either be converted to the DNG format prior to archiving or saved in their native format. When possible, edited versions of photos should be stored as processing steps (such as in XMP catalog files) as opposed to separate TIFF files. Each batch of photo backups should be written to at least two media, and all images should be stored along with SFV or MD5 checksum files and parity information, just in case a repair is needed. Each set of backups should be stored in a different physical building. Archived RAW or DNG files should be converted to some other format every 3-5 years to maintain software compatibility; each of these backups should be on new media using the latest storage technology to keep their data fresh.