Digital Preservation for Beginners

June 30, 2010

June 9, 2010

January 6, 2010

Copyright and Digital Preservation

Filed under: Standards — M.Amaral @ 9:41 am
Tags:

Before you read any further, please note that I am not an expert in copyright law…by a long shot.  What you’ll find below is a discussion about how copyright law affects digital preservation as I understand it.  Copyright law is very complex, especially in regards to dealing with the “new” issues presented by the digital environment.  Hopefully you will find the references I have listed below, and the items on the Resources page, useful in getting started.

The Problem with Copyright:

The absolute biggest barrier that copyright presents to preserving digital materials is the copyright owner’s exclusive right to reproduce and adapt a work.  Making copies of digital items and adapting them in various ways are generally the first steps of preservation — think of making copies to back things up, and the act of making changes to digital objects during digital format migrations.

Another impediment to digital preservation efforts are the dissemination restrictions that copyright law upholds.  Digital preservation is closely tied to access, yet this main goal of any preservation effort is restricted by current copyright law.  The glory of digital items is that they can theoretically be accessed from anywhere, and by multiple simultaneous users.  But copyright law hasn’t quite caught up to accommodate the digital environment and allow us to (legally) use and preserve digital items in the full capacity that the medium allows.

Determining the duration of copyright is somewhat confusing since it depends on when the work was created (or in some cases, when it was published versus when it was created).  Various acts of legislation over many years complicate the law because they have resulted in different copyright durations and renewal lengths.  Bitlaw provides a concise write up for the summary-inclined among us.

Exceptions to Copyright Law:

Libraries and archives follow the copyright provisions laid out by Section 108 of Title 17 (The Copyright Law) of the US Code (available here).  Libraries and archives are strong candidates for hosting digital preservation initiatives, so that’s why I’m focusing on them.  If the library or archive making the copies is open to the public or allows access to researchers from non-affiliated institutions, then it is not an infringement to make copies for preservation or replacement purposes under the conditions that:

  • the item is already currently held in the collections
  • the item “is damaged, deteriorating, lost, or stolen [not good for digital items, as it will be too late once damage has occurred] or if the existing format in which the work is stored has become obsolete.
  • the copy is not distributed in a digital format outside the walls of the library (italics added to emphasize the impracticality of this rule)

Additionally, libraries and archives are allowed to make up to three copies of unpublished works for preservation purposes, and up to three copies of published works for replacement purposes.  So, even with the compromises made for libraries in Section 108, there are problematic implications for digital preservation.  Since digital preservation is so closely tied to accessibility, libraries would be extremely limited in how they can preserve – and then share – digital material.

There is hope; people are aware of these limitations.  In March 2008, the Section 108 Study Group released a report of suggestions to improve Section 108 and advance it into a more digitally-oriented mindset.   These suggestions include allowing copies of works to be made prior to damage or loss; make copies of publicly accessible websites with an opt-out option (see the Internet Archive in the following section); and lift the three-copy preservation or replacement limit.

And let’s not forget about Fair Use.  I won’t get in to it deeply here, but it’s a doctrine within Title 17 (Section 107) that actually reduces the copyright holder’s exclusive rights.  It allows people to reproduce parts of copyrighted works.  It is a totally vague and subjective doctrine, and seems to be more of a defense against infringement lawsuits rather than a right.

An Aside about Copyright and the Web:

There are many web-archiving projects, the resulting files of which will need to be included in preservation processes.  Like content that is created off the web, web-based content is also protected under copyright law unless it is stated otherwise.  The Internet Archive’s approach to harvesting web content for archiving is to collect everything from which its crawlers are not excluded, and to provide an opt-out policy for anyone who specifically does not want to be included.  While the legality of this method is up for debate, the Internet Archive has avoided many infringement suits via their “willingness to respect the wishes of those copyright owners who want to limit and control the reproduction of their copyrighted works” (Hirtle, 2003).

Since the introduction of Web 2.0, web content on a given web page may also have more than one creator.  So, obtaining copyright permission for preservation purposes may be more challenging than contacting one person.  In the case of blogs, for example, blog writers do not own the copyright to comments other people have left (Biederman & Andrews, 2008).  To take this one step further up the difficulty scale, think of the challenges introduced by anonymous comments with no clear author.

Additional Restrictions:

Outside of the general US copyright law that is applied to a work, we must also take into consideration the licensing restrictions that may be associated with subscription materials.  These will likely have their own rules and implications for preservation, especially given that which is made more clear by the Digital Millennium Copyright Act (DMCA).  The DMCA prohibits “circumventing technological access controls to obtain access to copyrighted works,” meaning if access to the work is password-protected, you cannot create a work-around to allow others to get to it (Besek, 2003).

No Real Precedents:

Finally, I think another basic challenge with copyright is that there are no precedents for many of the issues that digital preservation activities bring to the surface.  This is especially true in regards to the Fair Use exemptions, which are judged on a subjective basis.  The Fair Use exemption could be a saving grace for preservation activities, but until it has proven to be so in an infringement challenge or lawsuit, it is a very big risk to assume that this can be the case for all instances.  It’s likely not the right preservation decision to wait until copyright law catches up with the needs of our digital environment.  So…who wants to try first?

Biederman, C. J., & Andrews, D. (2008, May 1). Applying copyright law to user-generated content. Los Angeles Lawyer, 12.
Besek, J. (Jan 2003). Copyright issues relevant to the creation of a digital archive: A preliminary assessment.  CLIR.  Retrieved Jan 5, 2010 from http://www.clir.org/pubs/reports/pub112/contents.html
Hirtle, P.B. (2003).  Digital preservation and copyright.  Retrieved Jan 5, 2010 from http://fairuse.stanford.edu/commentary_and_analysis/2003_11_hirtle.html

October 1, 2009

Why There is No Single Preservation Strategy

Filed under: Standards — M.Amaral @ 10:26 am

The following are some thoughts that I had about why there isn’t one digital preservation strategy that can be applied to all digital preservation programs.  As wonderful as it would be to find one standardized solution that fits everyone’s needs, it’s essentially an impossibility.  What’s below is something I wrote up for some coursework, but I thought I’d share it here, too.

There are two ways to answer the question of why there is no universally applicable digital preservation strategy.  The first is at the institutional level, and the second is at the level of the digital objects intended for preservation.

Digital preservation efforts so far have been tied to institutions interested in maintaining access to digital objects over time.  Being tied to an institution for funding and support will come with governance, policies, administrations, departments units, stakeholders to please, and service missions which will all be very specific to the institution.  These factors will all be guiding principles in the way a digital preservation strategy will be created at a given institution.  And this is fine; Pennock (2006) even goes so far as to state that “digital preservation policies are most effective when integrated into the overall organisational policy framework.”  But this would prevent a universal digital preservation model from being possible simply due to all the “personalizations” that would need to take place in order to meet the needs of the institution as well as the capabilitibubbleses allowed by whatever funding is available.

The second way to answer the question of why there is not a single digital preservation method that can be applied to everything is at the level of the digital objects.  When it comes to determining the actual preservation method, some ways work better than others depending on the type of file at hand, and the needs associated with that individual file.  For example, van der Hoeven (2004) points out that migration is an effective preservation method for widely supported file formats, but it might not be good for files that must maintain high levels of authenticity.  He even goes on to state that “…no one size fits all solution is possible.  Digital documents differ from each other in too many ways and are used for many different purposes by many different users.”
If we are to come up with an effective digital preservation strategy (at both the institutional and document levels), we must remain aware of the options, and expect to employ more than one method, strategy, and tool set.

Pennock, M. (2006).  “JISC Briefing paper: digital preservation, continued access to authentic digital assets.”  Retrieved September 30, 2009 from
van der Hoeven, J. R. (2004). “Permanent Access Technology for the virtual heritage.”  Retrieved September 30, 2009 from http://jeffrey.famvdhoeven.nl/dd/Researchtask%20IBM%20TU%20Delft%20-%20J.R.%20van%20der%20Hoeven.pdf
Photo by Tambako the Jaguar under a Creative Commons Attribution-No Derivative Works 2.0 Generic license.

August 3, 2009

OAIS Reference Model Part II: The Model

Filed under: Repositories,Standards — M.Amaral @ 2:28 pm
Tags: ,

Welcome to Part II of my OAIS Reference Model crash course!  By now you probably have noticed that I have refrained from including in this post any of the many graphed images that are in the OAIS reference model document.  This is because before I had a basic understanding of the model, these images seemed supremely complicated and confusing…kind of like Power Point slides with too many words.  I hope that what I provide here is a substantial enough understanding of the OAIS model to make the images less frightening when you do eventually encounter them.

Model Roles:

To start, it is important to recognize the three types of people that will be affiliated with a repository within the OAIS framework: the Producers of the repository’s content, the Managers of the content and repository, and the Consumers who use the content stored in the repository.  Each phase of the preservation process effects these three roles.  The ingest, the processing and storage, and the accessing of digital objects

The Model in Brief:

The document for the OAIS reference model has several key areas of content:

  • Terminology: An awesome vocabulary and glossary for the operations and information structures of repositories is located in Section 1.
  • Mandatory responsibilities: A list of the things that a repository must do in order be considered an OAIS-type repository comprises Section 3.  One particular action that this section calls for is identifying a designated producer/consumer community and ensuring that the information within the repository (metadata, etc), should be independently understandable (and accessible) by this community.  This means that “the community should be able to understand the information without needing the assistance of the experts who produced the information.”  Read this for more detail about the other mandatory responsibilities.
  • A model for ingesting, storing, and providing access to stored items, including a very smart model for capturing each item’s metadata (Content Information) and preservation metadata (Preservation Description Information).  Together, this data is discussed as an item’s “packaging information.”  It is intended to include information about an item’s context in order to fulfill one of an OAIS-type repository’s mandatory responsibilities.  This is all discussed in Section 2.
  • An outline for administrative management of the repository and the OAIS functions is presented in Section 4.  This discusses working with the creators of the digital objects and the objectives behind the day-to-day mangement of the repository. The administrative role also oversees the general planning and governance of the repositories, and include policy and preservation decisions.
  • Actual preservation methods: Preservation processes such as digital migration and emulation are examined in Section 5.  Preservation Planning is obviously a central part of any repository’s role.
  • Archive and repository interoperability: concepts behind repository interoperability and federation are discussed and explained in Section 6.  Heavy cooperation between repositories to develop common local standards in order to make this a possibility.

By following the OAIS model and the mandatory responsibilities which it entails, a repository will gain recognition as an OAIS-type archive or repository.  It is beneficial for a repository to be recognized as such because it means that the well-documented archival standards of the OAIS model will have been applied to help ensure the effective long-term storage, retrieval, and preservation of digital documents.  Another benefit is that communication with similarly-purposed OAIS repositories will be easy and fluid.

OAIS in Action:

DSpace and Fedora are two repository software platforms that have included OAIS-compliance capabilities in their product.  This helps pave the road for any repository that is built using either of these open source systems to follow procedures from the OAIS model.

What I would love to find or collect is a list of actual digital archives and repositories that are following the OAIS model either by the book or in some variation.  If anyone has a suggestion, please post a comment!

July 29, 2009

OAIS Reference Model Part I: Background and Influence

Filed under: Repositories,Standards — M.Amaral @ 7:00 pm
Tags: ,

The OAIS model is an international standard that has been adopted for guiding the long term preservation of digital data and documents.  In fact, the OAIS model is an ISO standard (ISO 14721:2003): it was developed by the Consultative Committee for Space Data Systems (CCSDS) in 2002, and was adopted as an ISO standard in 2003.  The document is freely available, despite the fact that most ISO documentation is usually sold as a service.  It’s a hefty 148-pages, available in PDF form here.

oais

Photo by OliBac licensed under Creative Commons

The OAIS model is a standardized model describing a way that digital repositories intended for preservation purposes can be run.  Within this model, you will not find a standard for metadata.  It also does not endorse any particular repository platform, software, protocols or implementation procedure.  The OAIS model is simply a set of standardized guidelines intended to aid the people and systems behind a repository that has been designated with the responsibility of maintaining documents for archival purposes over a long period of time.

OAIS stands for Open Archival Information System, the word open referring to the open and public process under which this model was developed.  Participation in its initial development was encouraged by the CCSDS, and as an ISO standard, it will go under review every five years.

Because the OAIS model is a recognized standard, its users have formed a default sub-community within the digital preservation community.  But it has also been very beneficial to the digital preservation community at large and has helped promote progressive thinking and discussion.  Here are some key reasons why the OAIS model is so helpful to the digital preservation process and community:

  • It has standardized the terminology associated with digital preservation
  • It has outlined the duties and services of a preservation repository
  • It has outlined a way that information should be attributed and managed within a repository
  • It has mobilized community discussions about repository standards and certification
  • It has included preservation metadata as an important part of the preservation process
  • It focuses on long-term preservation, but lets “long-term” be defined by the repository managers
  • OAIS-type archives are committed to a set of defined responsibilities

As a final note, is important to make it clear that the OAIS model is by no means a requirement for a digital repository; while it is a recognized way of running a repository, it is not the only way.  It may not fit for some repositories, depending on their intended size, resources, and designated communities.  But admittedly, when a repository chooses not to follow the OAIS recommendations, it cannot fall under the umbrella of the most widely-used and understood digital archive standard.

————

Here are some resources that were incredibly useful for me while writing this post and the one to follow:

  • I really benefited from reading this post by John Mark Ockerbloom, the editor of the blog Everybody’s Libraries.  I almost considered forgoing my own entry and just directing readers directly to his!
  • And then I found this post and was blown away by how thorough it is.  It’s really well done and I’d encourage you to check it out.
  • This page is a brief run-down of OAIS from the JISC Standards Catalogue.

Continue on to Part II

ISO Standards

Filed under: Standards — M.Amaral @ 11:55 am
Tags:

ISO is the commonly used name for the International Organization for Standardization. This is an international, non-governmental organization that creates standards based on a consensus of international committee members.

One ISO standard that is relevant to digital preservation practices is the OAIS model.

Additionally, there is a working group attempting to create an ISO standard for digital repository certification, which I think is an excellent idea. A wiki is maintained here with information related to their regular remote meetups and the documentation they are creating and collecting to assist in the process of writing a standard. A useful glossary of digital preservation terms can also be found on their wiki.

Original publication date: 7/20/09

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.