On Monday, I was thrilled to attend a workshop entitled Digital Preservation for Video, presented by Linda Tadic for Independent Media Art Preservation (IMAP) . The workshop was held in San Francisco at the Bay Area Video Coalition (BAVC). The scope of the event was to cover some of the key considerations in digitizing video and creating a digital preservation program at the DIY level (i.e. without a huge IT department backing you up). A few of the institutions represented by attendees included BAVC, the Pacific Film Archive, the California Institute of the Arts, the California Academy of Science, the Sierra Club, the San Francisco Symphony, and the California Film Institute.
Prior to this workshop, I hadn’t had a great deal of exposure to the digital preservation challenges of moving visual materials. In fact, I confess that I hardly knew anything about the current physical formats used for video storage, nor much about the hard work that is necessary for digitizing them. Most of the attendees have done their share of digitizing moving images (or of outsourcing the digitization), and I think that most of us were there to explore the answers to the question of “now what?”
The Move to File-Based Video Storage
Physical moving image storage formats are on death row. We spent the bulk of the morning going over the characteristics of different physical media and their expiration dates, which served as an effective motivator for digitization and instilling all but panic among the attendees.
Unlike paper, the magnetic tapes, reels, and discs that moving images are physically stored on are on a very tight deadline; aside from succumbing to format obsolescence, most of the media is reaching the end of its life expectancy, after which the images on them will simply not exist anymore. To give some examples of formats that I was more familiar with, the life span of VHS is approximately 15 years, while MiniDV, DVCam, and Video are 5-10 years. This illustrates a point that in some cases, it isn’t necessary to digitize the oldest things first.
Digitization is arguably the increasingly best preservation option for some of these formats, and it is important that the road to digitization doesn’t result in a dead end. That is why we need to ensure that once digitization has occurred, there is a digital preservation plan in place to ensure that the video content will continue to survive, especially since the original physical sources of the content will be dead in short while.
Indeed, we are observing a shift from format-based physical video storage to the file-based storage of digital video content. Preservation will no longer be about making the tapes last as long as possible, but by caring for the digital files representing the content that the tapes once held.
Preservation Concerns for Digital Video
I appreciate how Linda was adamant in reminding us that digital preservation is not a one-time fix for digital video longevity. She was very clear in telling us that it requires a constant guardianship consisting of a deliberate, scheduled management of the digital files. To use her phrasing, there is no “store-and-ignore” solution. Preservation activities involve keeping file formats current so that they can be accessed by the software of the now. It also involves exercising the hard drives that your files may be stored on and not letting them sit idle for more than 6 months. It requires diligent updating of the files’ accompanying preservation metadata so that changes to the files can be tracked and managed.
Linda also stated what nobody likes to hear about digital preservation: that there is no one way to do things, and that there is no one set of instructions to follow that will help you save your content. As with all file types, the preservation decisions you make will depend on your content, your files types, your storage, and your intended access methods. So, in the case of making storage selections and creating a plan, knowledge is power. I’ll try to summarize some of the key points covered.
Before committing to a single file format for all of your files, you need to consider something truly important for maintaining the integrity of your master files: extra copies! These come in a couple of flavors. You’ll want a copy of your master file that can be used to make other copies (referred to as a mezzanine-level copy). You really don’t want to touch you master files (preservation-level copies). Depending on how you are providing access or distributing your content, you may also want to create compressed copies of your files that aren’t quite so large (distribution copies). Distribution copies could very well be a completely different file format than your preservation copies.
In any preservation activity, there is a clear preference for storing your content in open file formats. This is guaranteed to result in easier migrations down the road, and decreases your dependence on proprietary organizations. It is prudent to be aware of file formats that are too open, however, in that everyone is using the format in a different way. This seems to be the case with MXF wrappers.
One open file format discussed was JPEG200, which, surprisingly, seems like a pretty good contender for the preferred file format for video. Most people in the US aren’t aware that JPEG2000 can be used for video (I sure wasn’t), but talk to a European involved in storing and preserving digital video content, and they’ll wonder why you aren’t already using it. It has also been adopted by the Digital Cinema Initiative. JPEG2000 is 3:1 lossless, is good for storage, and plays back uncompressed (making it unsuitable as a file format for most types of distribution copies). Data can also be stored in the wrapper in an XML stream. A potential downfall is that use of JPEG2000 in the US is currently largely led by the adoption of the SAMMA hardware system, which seems to be necessary in order to make the format compatible with anything else. (Update: It was also pointed out to me in an email that educational institutions using the SAMMA system tend to use JPEG2000, and that there are production implementations of JPEG2000 as well.)
Other open file formats discussed were Uncompressed 8-bit and 10-bit, and DV25 and DV50, which are better candidates for mezzanine-level files. Proprietary formats include Apple’s ProRes and a wide range of wrappers such as AVI (Microsoft), Quicktime, and Windows Media.
Once you have your files you’ll want to have a solid filenaming scheme. The important points to consider when creating a naming convention are to be consistent, avoid punctuation, and allow the file names to reflect context to the extent that the name can still make sense outside of the hierarchy of it intended storage collection Basically, don’t use files names like 0001.mov. Try to include acronyms for institutions and subcollections along with an object ID: ab_cd_12345_20100602.mpg
A good thing to know about your collection of digital video is how much of it you have so that you can evaluate your options for storing it. Depending on size, you’ll want to choose different physical storage carriers. Options discussed included optical media like discs, external hard drives, RAID systems, digital linear tapes (i.e. LTO), and cloud storage. Optical media and linear tapes will face the same life expectancy problems that the original sources of digitized video are currently facing, so this is a relatively short-term solution but at least they have known lifespans that you can pretty much count on.
Some notes about external hard drives are that the smaller the drive, the more prone it will be to failure, but drives that are very large should be avoided so that if it fails, you won’t lose too much. Also, don’t purchase the least expensive drives out there; brands like Western Digital and Samsung were noted to be more reliable. The hard drive option to watch for in the future are solid-state drives with non-moving parts – which at this point are still pretty new, expensive, and prone to heat problems. Current hard drives need to be exercised every 6 months or so, and you’ll want to replace them every 3 to 5 years.
Cloud storage wasn’t recommended as a realistic option due to security issues, considerations related to downloads of large files, and the fact that in terms of digital preservation, cloud storage takes you right out the Trusted Digital Repository realm.
The take-home message from this section of the workshop is that redundancy is critical: you will have failures. It’s be a good idea to make copies of all of your files in all their formats (preservation, mezzanine, and access levels), and store your copies in different geographic locations. It is also wise to diversify your storage media.
When librarians and archivists think of metadata, they are probably first thinking about descriptive metadata: what is the content of the item, what subjects are covered? Digital preservation unleashes a whole new type of technical metadata that must be kept for each item, in addition to any descriptive metadata. It’s aptly referred to as Preservation Metadata. It will include information about the original source file, what software it was created on, when it was ingested into your storage system, how big it is, checksum algorithms, and each and every change that is made to the file after ingestion.
Linda says that “you can do anything, as long as your metadata tracks it.” The preservation metadata will become the history of your files, and the breadcrumb trail to follow if you need to go to Plan B. There are many schemes that can be used for deciding what is important to include in your preservation metadata, but Linda pointed out that you’ll probably end up making your own scheme anyway, likely drawing from some of the pre-existing schemes to suit your specific needs. Some of the preservation metadata schemes discussed included PREMIS, SMPTE and PBCORE.
More Preservation Actions
The other parts of the workshop that discussed digital preservation procedures were more general, and applicable to most types of digital content. We went over the merits of Trusted Digital Repositories (TDRs), and basic procedures to run through when ingesting and migrating files (file type characterization and validation, checksums, etc).
The OAIS model was breezed over, which is understandable due to its complexity and given the overall scope of the workshop. But it made me think about how scalable the OAIS model actually is: even if you are dealing with a smaller collection, you can still implement the basic workflows and concepts. I acknowledge that there are certainly situations where the model may be overkill depending on the size and type of the collection being managed, but there are still some incredibly valuable components that could be pulled from it: the digital preservation vocabulary and information packages concepts in particular.
I think it is worth noting some of the concerns expressed by the attendees, since they were probably pretty common to many peoples’ reactions when faced with initiating a digital preservation program. One attendee acknowledged that “ten years suddenly seems like a long time” in regards to format life expectancy. Some shared concerns were related to the sheer size of the task, the need to prioritize because of the cost-prohibitive activities involved, lack of personal technical expertise, the challenges of creating or locating a TDR, and an imminent feeling that action needed to be taken immediately.
It may also have felt discouraging to the attendees that the act of putting together a digital preservation program isn’t a final fix, because it will require constant staff monitoring and action – which is a particular challenge to smaller organizations. It is worth noting that a feasible solution, to at least this final concern, may be around the corner. Linda is heading a project in development called the Audiovisual Archive Network (AVAN), which is aiming to provide hosted digital video preservation services as a non-profit organization.