Skip to content
  • Home
  • About
  • Digitization Specs
  • Resources

Digital Preservation for Beginners

Ideas behind the field

Metadata

METS for Transferable Metadata

June 30, 2010M.Amaral5 Comments

METS is the Metadata Encoding and Transmission Standard, which is applied to encoding metadata via a standardized XML schema.  METS handles all types of metadata that is relevant to preservation: descriptive, administrative, and technical/structural metadata are all included in the schema, and a METS document will serve as the container for all of this information about a digital object.

The schema was initially developed for the digital library community, and has thus extended to the digital repository and preservation communities.  The fact that METS confines varying types of an object’s metadata to one standard XML-based file type is excellent news for sharing and preserving resources.

Boat Transfer
Photo by Flickr user SyN+H, CC license

As is evident through the experience of many current digital preservation programs, collaboration among multiple institutions is a very strategic move for a successful digital preservation program.  Using METS as a guideline for creating readable and transferable metadata ensures a more seamless sharing experience.  It also aids in escape strategies should the repository or institution hosting the repository fails and the digital objects need to be transferred to someone else’s care.

History

The beginnings of a standardized metadata scheme for collections of digital objects can be traced back to 1997, when UC Berkeley and the Digital Library Federation (DLF) initiated a project to further the concept of digital libraries sharing resources.  By 2001, the DLF-sponsored METS schema emerged, which is supported by the Library of Congress, and was made a NISO standard in 2004, and was renewed in 2006.

By 2006, it had become clear that METS could not only serve as an answer to the interoperability needs associated with sharing digital objects, but that METS is also valuable for preservation purposes.  Jerome McDonough (2006) states that “the METS standard can be considered one of many efforts to try to determine…how complex sets of data and metadata might best be encoded to support both information exchange and information longevity.”

How it’s Used

The OAIS reference model considers an acceptable digital object as one that includes the original content as well as the metadata required to understand the content, its structure, its rendering needs, and its preservation history.  This information plus the actual content forms a complete “information package,” which comes in the flavors of SIPs, AIPs, and DIPs, depending on the object’s role in a repository, as discussed in the OAIS reference model.  The metadata that comes in each of these flavors is referred to as the Preservation Description Information (PDI). (Note: DIPs do not always have PDIs since they are the distribution versions.)

We know from the OAIS model that a PDI categorizes a digital object’s metadata into reference, provenance, context, and fixity categories.  METS is capable of fulfilling these metadata requirements with corresponding sections in each METS document:

  • Descriptive <dmdSec>
  • Administrative <amdSec> (covers provenance and rights)
  • File Groups <fileGrp> (lists any and all files that comprise the digital object)
  • Structural Map <structMap>
  • Structural Links <structLink>
  • Behavior <behaviorSec>

It is important to realize, however, that according to the METS standard, the only required part of a METS document is the Structural Map.  So in order for METS to be effective when applied to preservation, there must be information in each of these sections (FYI – a truly complete METS file will also include a header <metsHdr>).The Seven METS sections

So where do we get this information to fill up a METS file?  The answer is PREMIS.

METS and PREMIS – A Perpetual Preservation Honeymoon

You may recall that PREMIS is also an XML schema that has been developed for preservation metadata.  The PREMIS structure is based on entities and semantic units that will harbor information about a digital object that is necessary for supporting and recording digital preservation actions.

What’s important here is that PREMIS will sit inside the METS document.  You can see an example of this here.  All of the preservation information will be present in the PREMIS file, and by nesting the PREMIS data into the METS file, the metadata becomes transferable to other repositories.

The flexibility of both of these schemas implies that there are variations and complications with integrating PREMIS and METS.  The Library of Congress created a working draft of guidelines for this process, which is viewable here (PDF, 25K).

Helpful METS Resources

  • METS Primer (Revised 4/2010) (PDF, 1.53MB) – Readable, and has color images and examples.
  • PREMIS in METS toolbox, information about the project here.
  • METS Creation Tools.
McDonough, J. (2006). METS: Standardized Encoding for Digital Library Objects. International Journal on Digital Libraries, (6)2, 148-158.
Metadata, StandardsMETS, OAIS, PREMIS

Video Digital Preservation Workshop

June 9, 2010June 9, 2010M.Amaral3 Comments

On Monday, I was thrilled to attend a workshop entitled Digital Preservation for Video, presented by Linda Tadic for Independent Media Art Preservation (IMAP) .  The workshop was held in San Francisco at the Bay Area Video Coalition (BAVC).  The scope of the event was to cover some of the key considerations in digitizing video and creating a digital preservation program at the DIY level (i.e. without a huge IT department backing you up).  A few of the institutions represented by attendees included BAVC, the Pacific Film Archive, the California Institute of the Arts, the California Academy of Science, the Sierra Club, the San Francisco Symphony, and the California Film Institute.

Prior to this workshop, I hadn’t had a great deal of exposure to the digital preservation challenges of moving visual materials.  In fact, I confess that I hardly knew anything about the current physical formats used for video storage, nor much about the hard work that is necessary for digitizing them.  Most of the attendees have done their share of digitizing moving images (or of outsourcing the digitization), and I think that most of us were there to explore the answers to the question of “now what?”

The Move to File-Based Video Storage

Physical moving image storage formats are on death row.  We spent the bulk of the morning going over the characteristics of different physical media and their expiration dates, which served as an effective motivator for digitization and instilling all but panic among the attendees.

film reel
Photo by Serena Epstein, CC Attribution-Noncommercial-Share Alike 2.0 Generic license

Unlike paper, the magnetic tapes, reels, and discs that moving images are physically stored on are on a very tight deadline; aside from succumbing to format obsolescence, most of the media is reaching the end of its life expectancy, after which the images on them will simply not exist anymore.  To give some examples of formats that I was more familiar with, the life span of VHS is approximately 15 years, while MiniDV, DVCam, and Video are 5-10 years.  This illustrates a point that in some cases, it isn’t necessary to digitize the oldest things first.

Digitization is arguably the increasingly best preservation option for some of these formats, and it is important that the road to digitization doesn’t result in a dead end.  That is why we need to ensure that once digitization has occurred, there is a digital preservation plan in place to ensure that the video content will continue to survive, especially since the original physical sources of the content will be dead in short while.

Indeed, we are observing a shift from format-based physical video storage to the file-based storage of digital video content.  Preservation will no longer be about making the tapes last as long as possible, but by caring for the digital files representing the content that the tapes once held.

Preservation Concerns for Digital Video

I appreciate how Linda was adamant in reminding us that digital preservation is not a one-time fix for digital video longevity.  She was very clear in telling us that it requires a constant guardianship consisting of a deliberate, scheduled management of the digital files.  To use her phrasing, there is no “store-and-ignore” solution.  Preservation activities involve keeping file formats current so that they can be accessed by the software of the now.  It also involves exercising the hard drives that your files may be stored on and not letting them sit idle for more than 6 months.  It requires diligent updating of the files’ accompanying preservation metadata so that changes to the files can be tracked and managed.

Linda also stated what nobody likes to hear about digital preservation: that there is no one way to do things, and that there is no one set of instructions to follow that will help you save your content.  As with all file types, the preservation decisions you make will depend on your content, your files types, your storage, and your intended access methods.  So, in the case of making storage selections and creating a plan, knowledge is power.  I’ll try to summarize some of the key points covered.

Continue reading →

Conferences, Digitization, Metadata, StandardsOAIS, video

PREMIS for Preservation Metadata

October 30, 2009July 7, 2010M.Amaral2 Comments

The OAIS reference model calls for digital files to have preservation metadata, or “preservation descriptive information.”  This preservation metadata would outline the significant technical and historical (think format migrations, etc.) information about a given digital file that will be useful for the effective preservation of that file.
PREMIS is intended to help institutions produce this metadata.

PREMIS is several things:  First, it’s an acronym for Preservation Metadata: Implementation Strategies.  Second, it’s a very active working group with members from many countries and institutions who are trying to answer how to best describe digital files for preservation purposes.  They are often involved in periodical informational workshops.  Third, it is a very detailed description of a metadata scheme that is available in the PREMIS Data Dictionary (PDF), which is what I’m going to be referring to for the rest of this post.  Version 2.0 of the Data Dictionary was released in 2008.

I think that to put it simply, the PREMIS Data Dictionary provides a clear guide to what specific information needs to be known about a digital collection and its individual objects in order to best support any digital preservation activities.  Following the PREMIS guidelines would result in a specific and formulaic set of metadata that is for preservation purposes.  PREMIS also attempts to create a standardized set of preservation metadata, which would strengthen communication between management teams of different repositories, and also allow for the easy sharing and interoperability of this metadata with other PREMIS-conformant repositories.

The PREMIS metadata structure is based on filling in a lot of blanks which are specific to each file to be preserved.  There five different types of categories (called entities) of blanks to complete:

PREMIS Data Dictionary cover
PREMIS Data Dictionary
  • Intellectual Entities
  • Objects
  • Rights
  • Agents
  • Events

The blanks to be filled in under each of these entities are referred to as semantic units, and they were identified by the PREMIS team.  The semantic unit entries for each digital files are very specific pieces of information that are important to the preservation process.  The PREMIS Data Dictionary presents and describes all of these semantic units for each entity.

Some examples of a digital file’s potential semantic units would include:
-the program on which the file was created
-the version of that program
-the operating system on which that program ran
-who created the file
-the rights associated with the file
-when the file was ingested into the preservation system
-dates the file was validated
-and so on.

It is very detailed, and I’d really just recommend that you flip through it, unless you are the one responsible for implementing it.  It is so detailed, in fact, that it might even be a good place to start when you are in the beginning stages of developing a digital preservation program, as it will tell you the kind of preservation information that is important to collect.

I can definitely understand that there is quite a large learning curve for using PREMIS at your institution.  There is a lot of information and implementation training to go through, and it is likely that such training would be unprecedented.  (Not to mention that your whole digital preservation program is likely unprecedented at your institution!)  Perhaps, though, learning PREMIS can be approached just as the learning of other new metadata schemas has been approached in the past.

Suitcases

To wrap up, I’d like to say that the good news is that PREMIS is designed to collect this preservation metadata automatically!  It is also highly important to know that the PREMIS Data Dictionary is supported by an XML structure.  This is relevant because this allows PREMIS records to be shared or transferred between preservation systems…which has excellent implications for cross-institutional cooperation and collaboration.  I hope that once my understanding of these processes grow, I will be able to share it in a future post (Update: See my post on METS).

For further exploration, the Library of Congress has a non-intimidating page full of resources for everything PREMIS.  This includes an overview by Priscilla Caplan (PDF) and a tutorial that is much more in-depth than this post.  And finally, an open PREMIS implementation fair was held earlier this month, and the presentation slides are posted here.

Sample PREMIS records (Updated July 2010)

I’ve noticed that many people arrive at this post by searching for an example of a PREMIS record.  I originally didn’t include one, but I want to do so now.  The two links below represent segments of a single PREMIS record provided by the Library of Congress.  You’ll see the semantic units affiliated with each entity (i.e. the “blanks” that need to be filled in for each category of metadata within the PREMIS record, as defined by the Data Dictionary).  The two examples pertain to the same digital object, which is a portrait of Louis Armstrong, viewable here.

  • This first link will take you to the PREMIS information for the Object entity.  The semantic units are listed, and the “Value” column contains the information that has been entered about the characteristics of this specific digital object.
  • This second link will take you to the metadata for the Events entity of this same digital object.  The “Value” column will again contain the institutionally-added information, this time about actions and preservation events related to this specific digital object.

To come back to the bigger picture, keep in mind that these two links only represent segments of what would be a single PREMIS record.  As listed above, there are 5 separate entities, each with a slew of semantic units to hold specific information about the object.

Suitcase photo by masochismtango on Flickr, Creative Commons Attribution-Share Alike 2.0 Generic license.
MetadataPREMIS

Thinking about Metadata

July 29, 2009January 6, 2010M.AmaralLeave a comment

Effective retrieval of documents designated for digital preservation will rely on assigning good metadata to each document/data set/image/audio file/etc that is to be preserved in a given repository. There are a few issues related to this process.

First, we must ask what the standards are for assigning the metadata. There are many existing models for metadata standards that are currently used for different file types, but they are generally file type-specific. Examples include Dublin Core for web pages and MARC for bibliographic records.

Secondly, will the kind of file-specific metadata that is assigned independently of a storage system hold up in a mixed storage preservation environment? Or should a new model be applied on top of the existing metadata? Additionally, the trouble with file-specific metadata is that unless you are creating library records, nobody is required to do it. In many cases, people create as much or as little as they want, and that sometimes means none at all. So even if there are standards in place, following them is optional, subjective, and splotchy.

In digital preservation, there are new considerations for the information that needs to be included in metadata. This includes the digital history of each item: what software was used, and which version of that software? what platform was used? what other formats has this item existed in before its current state? has it been migrated from different formats? how many times, and by who?

So now, not only is there the underlying metadata that clarifies the “aboutness” of the item, but new metadata related to digital preservation practices will need to be created. I’d like to assume that this second layer of metadata, unlike the file-specific metadata, will face some stricter regulations and will be required and consistently updated. Keep an eye out because I will discuss some of the existing metadata models for digital repositories and preservation practices in later, more specific posts.

But at the moment, we are left with two levels of metadata, and no real standards for either level. We have models to follow, but each owner of a digital repository can make their own decisions about which to use and how much they want to stick to it.

This could be disastrous for federated digital repository searching in the future.

Image by Flickr user Jametiks, under a Creative Commons Attribution 2.0 Generic license.

Original publication date: 7/20/09

Metadata, StandardsStandards

Search blog

Pages

  • About
  • Digitization Specs
  • Resources

Collaborations Conferences Digitization Metadata News Projects Repositories Standards Tech Uncategorized

Archives

  • December 2012
  • December 2010
  • October 2010
  • June 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
Create a free website or blog at WordPress.com.
Cancel
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy