Skip to content
  • Home
  • About
  • Digitization Specs
  • Resources

Digital Preservation for Beginners

Ideas behind the field

PREMIS

METS for Transferable Metadata

June 30, 2010M.Amaral5 Comments

METS is the Metadata Encoding and Transmission Standard, which is applied to encoding metadata via a standardized XML schema.  METS handles all types of metadata that is relevant to preservation: descriptive, administrative, and technical/structural metadata are all included in the schema, and a METS document will serve as the container for all of this information about a digital object.

The schema was initially developed for the digital library community, and has thus extended to the digital repository and preservation communities.  The fact that METS confines varying types of an object’s metadata to one standard XML-based file type is excellent news for sharing and preserving resources.

Boat Transfer
Photo by Flickr user SyN+H, CC license

As is evident through the experience of many current digital preservation programs, collaboration among multiple institutions is a very strategic move for a successful digital preservation program.  Using METS as a guideline for creating readable and transferable metadata ensures a more seamless sharing experience.  It also aids in escape strategies should the repository or institution hosting the repository fails and the digital objects need to be transferred to someone else’s care.

History

The beginnings of a standardized metadata scheme for collections of digital objects can be traced back to 1997, when UC Berkeley and the Digital Library Federation (DLF) initiated a project to further the concept of digital libraries sharing resources.  By 2001, the DLF-sponsored METS schema emerged, which is supported by the Library of Congress, and was made a NISO standard in 2004, and was renewed in 2006.

By 2006, it had become clear that METS could not only serve as an answer to the interoperability needs associated with sharing digital objects, but that METS is also valuable for preservation purposes.  Jerome McDonough (2006) states that “the METS standard can be considered one of many efforts to try to determine…how complex sets of data and metadata might best be encoded to support both information exchange and information longevity.”

How it’s Used

The OAIS reference model considers an acceptable digital object as one that includes the original content as well as the metadata required to understand the content, its structure, its rendering needs, and its preservation history.  This information plus the actual content forms a complete “information package,” which comes in the flavors of SIPs, AIPs, and DIPs, depending on the object’s role in a repository, as discussed in the OAIS reference model.  The metadata that comes in each of these flavors is referred to as the Preservation Description Information (PDI). (Note: DIPs do not always have PDIs since they are the distribution versions.)

We know from the OAIS model that a PDI categorizes a digital object’s metadata into reference, provenance, context, and fixity categories.  METS is capable of fulfilling these metadata requirements with corresponding sections in each METS document:

  • Descriptive <dmdSec>
  • Administrative <amdSec> (covers provenance and rights)
  • File Groups <fileGrp> (lists any and all files that comprise the digital object)
  • Structural Map <structMap>
  • Structural Links <structLink>
  • Behavior <behaviorSec>

It is important to realize, however, that according to the METS standard, the only required part of a METS document is the Structural Map.  So in order for METS to be effective when applied to preservation, there must be information in each of these sections (FYI – a truly complete METS file will also include a header <metsHdr>).The Seven METS sections

So where do we get this information to fill up a METS file?  The answer is PREMIS.

METS and PREMIS – A Perpetual Preservation Honeymoon

You may recall that PREMIS is also an XML schema that has been developed for preservation metadata.  The PREMIS structure is based on entities and semantic units that will harbor information about a digital object that is necessary for supporting and recording digital preservation actions.

What’s important here is that PREMIS will sit inside the METS document.  You can see an example of this here.  All of the preservation information will be present in the PREMIS file, and by nesting the PREMIS data into the METS file, the metadata becomes transferable to other repositories.

The flexibility of both of these schemas implies that there are variations and complications with integrating PREMIS and METS.  The Library of Congress created a working draft of guidelines for this process, which is viewable here (PDF, 25K).

Helpful METS Resources

  • METS Primer (Revised 4/2010) (PDF, 1.53MB) – Readable, and has color images and examples.
  • PREMIS in METS toolbox, information about the project here.
  • METS Creation Tools.
McDonough, J. (2006). METS: Standardized Encoding for Digital Library Objects. International Journal on Digital Libraries, (6)2, 148-158.
Metadata, StandardsMETS, OAIS, PREMIS

PREMIS for Preservation Metadata

October 30, 2009July 7, 2010M.Amaral2 Comments

The OAIS reference model calls for digital files to have preservation metadata, or “preservation descriptive information.”  This preservation metadata would outline the significant technical and historical (think format migrations, etc.) information about a given digital file that will be useful for the effective preservation of that file.
PREMIS is intended to help institutions produce this metadata.

PREMIS is several things:  First, it’s an acronym for Preservation Metadata: Implementation Strategies.  Second, it’s a very active working group with members from many countries and institutions who are trying to answer how to best describe digital files for preservation purposes.  They are often involved in periodical informational workshops.  Third, it is a very detailed description of a metadata scheme that is available in the PREMIS Data Dictionary (PDF), which is what I’m going to be referring to for the rest of this post.  Version 2.0 of the Data Dictionary was released in 2008.

I think that to put it simply, the PREMIS Data Dictionary provides a clear guide to what specific information needs to be known about a digital collection and its individual objects in order to best support any digital preservation activities.  Following the PREMIS guidelines would result in a specific and formulaic set of metadata that is for preservation purposes.  PREMIS also attempts to create a standardized set of preservation metadata, which would strengthen communication between management teams of different repositories, and also allow for the easy sharing and interoperability of this metadata with other PREMIS-conformant repositories.

The PREMIS metadata structure is based on filling in a lot of blanks which are specific to each file to be preserved.  There five different types of categories (called entities) of blanks to complete:

PREMIS Data Dictionary cover
PREMIS Data Dictionary
  • Intellectual Entities
  • Objects
  • Rights
  • Agents
  • Events

The blanks to be filled in under each of these entities are referred to as semantic units, and they were identified by the PREMIS team.  The semantic unit entries for each digital files are very specific pieces of information that are important to the preservation process.  The PREMIS Data Dictionary presents and describes all of these semantic units for each entity.

Some examples of a digital file’s potential semantic units would include:
-the program on which the file was created
-the version of that program
-the operating system on which that program ran
-who created the file
-the rights associated with the file
-when the file was ingested into the preservation system
-dates the file was validated
-and so on.

It is very detailed, and I’d really just recommend that you flip through it, unless you are the one responsible for implementing it.  It is so detailed, in fact, that it might even be a good place to start when you are in the beginning stages of developing a digital preservation program, as it will tell you the kind of preservation information that is important to collect.

I can definitely understand that there is quite a large learning curve for using PREMIS at your institution.  There is a lot of information and implementation training to go through, and it is likely that such training would be unprecedented.  (Not to mention that your whole digital preservation program is likely unprecedented at your institution!)  Perhaps, though, learning PREMIS can be approached just as the learning of other new metadata schemas has been approached in the past.

Suitcases

To wrap up, I’d like to say that the good news is that PREMIS is designed to collect this preservation metadata automatically!  It is also highly important to know that the PREMIS Data Dictionary is supported by an XML structure.  This is relevant because this allows PREMIS records to be shared or transferred between preservation systems…which has excellent implications for cross-institutional cooperation and collaboration.  I hope that once my understanding of these processes grow, I will be able to share it in a future post (Update: See my post on METS).

For further exploration, the Library of Congress has a non-intimidating page full of resources for everything PREMIS.  This includes an overview by Priscilla Caplan (PDF) and a tutorial that is much more in-depth than this post.  And finally, an open PREMIS implementation fair was held earlier this month, and the presentation slides are posted here.

Sample PREMIS records (Updated July 2010)

I’ve noticed that many people arrive at this post by searching for an example of a PREMIS record.  I originally didn’t include one, but I want to do so now.  The two links below represent segments of a single PREMIS record provided by the Library of Congress.  You’ll see the semantic units affiliated with each entity (i.e. the “blanks” that need to be filled in for each category of metadata within the PREMIS record, as defined by the Data Dictionary).  The two examples pertain to the same digital object, which is a portrait of Louis Armstrong, viewable here.

  • This first link will take you to the PREMIS information for the Object entity.  The semantic units are listed, and the “Value” column contains the information that has been entered about the characteristics of this specific digital object.
  • This second link will take you to the metadata for the Events entity of this same digital object.  The “Value” column will again contain the institutionally-added information, this time about actions and preservation events related to this specific digital object.

To come back to the bigger picture, keep in mind that these two links only represent segments of what would be a single PREMIS record.  As listed above, there are 5 separate entities, each with a slew of semantic units to hold specific information about the object.

Suitcase photo by masochismtango on Flickr, Creative Commons Attribution-Share Alike 2.0 Generic license.
MetadataPREMIS

Search blog

Pages

  • About
  • Digitization Specs
  • Resources

Collaborations Conferences Digitization Metadata News Projects Repositories Standards Tech Uncategorized

Archives

  • December 2012
  • December 2010
  • October 2010
  • June 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
Blog at WordPress.com.
Cancel
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy