The OAIS reference model calls for digital files to have preservation metadata, or “preservation descriptive information.” This preservation metadata would outline the significant technical and historical (think format migrations, etc.) information about a given digital file that will be useful for the effective preservation of that file.
PREMIS is intended to help institutions produce this metadata.
PREMIS is several things: First, it’s an acronym for Preservation Metadata: Implementation Strategies. Second, it’s a very active working group with members from many countries and institutions who are trying to answer how to best describe digital files for preservation purposes. They are often involved in periodical informational workshops. Third, it is a very detailed description of a metadata scheme that is available in the PREMIS Data Dictionary (PDF), which is what I’m going to be referring to for the rest of this post. Version 2.0 of the Data Dictionary was released in 2008.
I think that to put it simply, the PREMIS Data Dictionary provides a clear guide to what specific information needs to be known about a digital collection and its individual objects in order to best support any digital preservation activities. Following the PREMIS guidelines would result in a specific and formulaic set of metadata that is for preservation purposes. PREMIS also attempts to create a standardized set of preservation metadata, which would strengthen communication between management teams of different repositories, and also allow for the easy sharing and interoperability of this metadata with other PREMIS-conformant repositories.
The PREMIS metadata structure is based on filling in a lot of blanks which are specific to each file to be preserved. There five different types of categories (called entities) of blanks to complete:
- Intellectual Entities
The blanks to be filled in under each of these entities are referred to as semantic units, and they were identified by the PREMIS team. The semantic unit entries for each digital files are very specific pieces of information that are important to the preservation process. The PREMIS Data Dictionary presents and describes all of these semantic units for each entity.
Some examples of a digital file’s potential semantic units would include:
-the program on which the file was created
-the version of that program
-the operating system on which that program ran
-who created the file
-the rights associated with the file
-when the file was ingested into the preservation system
-dates the file was validated
-and so on.
It is very detailed, and I’d really just recommend that you flip through it, unless you are the one responsible for implementing it. It is so detailed, in fact, that it might even be a good place to start when you are in the beginning stages of developing a digital preservation program, as it will tell you the kind of preservation information that is important to collect.
I can definitely understand that there is quite a large learning curve for using PREMIS at your institution. There is a lot of information and implementation training to go through, and it is likely that such training would be unprecedented. (Not to mention that your whole digital preservation program is likely unprecedented at your institution!) Perhaps, though, learning PREMIS can be approached just as the learning of other new metadata schemas has been approached in the past.
To wrap up, I’d like to say that the good news is that PREMIS is designed to collect this preservation metadata automatically! It is also highly important to know that the PREMIS Data Dictionary is supported by an XML structure. This is relevant because this allows PREMIS records to be shared or transferred between preservation systems…which has excellent implications for cross-institutional cooperation and collaboration. I hope that once my understanding of these processes grow, I will be able to share it in a future post (Update: See my post on METS).
For further exploration, the Library of Congress has a non-intimidating page full of resources for everything PREMIS. This includes an overview by Priscilla Caplan (PDF) and a tutorial that is much more in-depth than this post. And finally, an open PREMIS implementation fair was held earlier this month, and the presentation slides are posted here.
Sample PREMIS records (Updated July 2010)
I’ve noticed that many people arrive at this post by searching for an example of a PREMIS record. I originally didn’t include one, but I want to do so now. The two links below represent segments of a single PREMIS record provided by the Library of Congress. You’ll see the semantic units affiliated with each entity (i.e. the “blanks” that need to be filled in for each category of metadata within the PREMIS record, as defined by the Data Dictionary). The two examples pertain to the same digital object, which is a portrait of Louis Armstrong, viewable here.
- This first link will take you to the PREMIS information for the Object entity. The semantic units are listed, and the “Value” column contains the information that has been entered about the characteristics of this specific digital object.
- This second link will take you to the metadata for the Events entity of this same digital object. The “Value” column will again contain the institutionally-added information, this time about actions and preservation events related to this specific digital object.
To come back to the bigger picture, keep in mind that these two links only represent segments of what would be a single PREMIS record. As listed above, there are 5 separate entities, each with a slew of semantic units to hold specific information about the object.