Effective retrieval of documents designated for digital preservation will rely on assigning good metadata to each document/data set/image/audio file/etc that is to be preserved in a given repository. There are a few issues related to this process.
First, we must ask what the standards are for assigning the metadata. There are many existing models for metadata standards that are currently used for different file types, but they are generally file type-specific. Examples include Dublin Core for web pages and MARC for bibliographic records.
Secondly, will the kind of file-specific metadata that is assigned independently of a storage system hold up in a mixed storage preservation environment? Or should a new model be applied on top of the existing metadata? Additionally, the trouble with file-specific metadata is that unless you are creating library records, nobody is required to do it. In many cases, people create as much or as little as they want, and that sometimes means none at all. So even if there are standards in place, following them is optional, subjective, and splotchy.
In digital preservation, there are new considerations for the information that needs to be included in metadata. This includes the digital history of each item: what software was used, and which version of that software? what platform was used? what other formats has this item existed in before its current state? has it been migrated from different formats? how many times, and by who?
So now, not only is there the underlying metadata that clarifies the “aboutness” of the item, but new metadata related to digital preservation practices will need to be created. I’d like to assume that this second layer of metadata, unlike the file-specific metadata, will face some stricter regulations and will be required and consistently updated. Keep an eye out because I will discuss some of the existing metadata models for digital repositories and preservation practices in later, more specific posts.
But at the moment, we are left with two levels of metadata, and no real standards for either level. We have models to follow, but each owner of a digital repository can make their own decisions about which to use and how much they want to stick to it.
This could be disastrous for federated digital repository searching in the future.
Image by Flickr user Jametiks, under a Creative Commons Attribution 2.0 Generic license.
Original publication date: 7/20/09