Copyright and Digital Preservation

Before you read any further, please note that I am not an expert in copyright law…by a long shot.  What you’ll find below is a discussion about how copyright law affects digital preservation as I understand it.  Copyright law is very complex, especially in regards to dealing with the “new” issues presented by the digital environment.  Hopefully you will find the references I have listed below, and the items on the Resources page, useful in getting started.

The Problem with Copyright:

The absolute biggest barrier that copyright presents to preserving digital materials is the copyright owner’s exclusive right to reproduce and adapt a work.  Making copies of digital items and adapting them in various ways are generally the first steps of preservation — think of making copies to back things up, and the act of making changes to digital objects during digital format migrations.

Another impediment to digital preservation efforts are the dissemination restrictions that copyright law upholds.  Digital preservation is closely tied to access, yet this main goal of any preservation effort is restricted by current copyright law.  The glory of digital items is that they can theoretically be accessed from anywhere, and by multiple simultaneous users.  But copyright law hasn’t quite caught up to accommodate the digital environment and allow us to (legally) use and preserve digital items in the full capacity that the medium allows.

Determining the duration of copyright is somewhat confusing since it depends on when the work was created (or in some cases, when it was published versus when it was created).  Various acts of legislation over many years complicate the law because they have resulted in different copyright durations and renewal lengths.  Bitlaw provides a concise write up for the summary-inclined among us.

Exceptions to Copyright Law:

Libraries and archives follow the copyright provisions laid out by Section 108 of Title 17 (The Copyright Law) of the US Code (available here).  Libraries and archives are strong candidates for hosting digital preservation initiatives, so that’s why I’m focusing on them.  If the library or archive making the copies is open to the public or allows access to researchers from non-affiliated institutions, then it is not an infringement to make copies for preservation or replacement purposes under the conditions that:

  • the item is already currently held in the collections
  • the item “is damaged, deteriorating, lost, or stolen [not good for digital items, as it will be too late once damage has occurred] or if the existing format in which the work is stored has become obsolete.
  • the copy is not distributed in a digital format outside the walls of the library (italics added to emphasize the impracticality of this rule)

Additionally, libraries and archives are allowed to make up to three copies of unpublished works for preservation purposes, and up to three copies of published works for replacement purposes.  So, even with the compromises made for libraries in Section 108, there are problematic implications for digital preservation.  Since digital preservation is so closely tied to accessibility, libraries would be extremely limited in how they can preserve – and then share – digital material.

There is hope; people are aware of these limitations.  In March 2008, the Section 108 Study Group released a report of suggestions to improve Section 108 and advance it into a more digitally-oriented mindset.   These suggestions include allowing copies of works to be made prior to damage or loss; make copies of publicly accessible websites with an opt-out option (see the Internet Archive in the following section); and lift the three-copy preservation or replacement limit.

And let’s not forget about Fair Use.  I won’t get in to it deeply here, but it’s a doctrine within Title 17 (Section 107) that actually reduces the copyright holder’s exclusive rights.  It allows people to reproduce parts of copyrighted works.  It is a totally vague and subjective doctrine, and seems to be more of a defense against infringement lawsuits rather than a right.

An Aside about Copyright and the Web:

There are many web-archiving projects, the resulting files of which will need to be included in preservation processes.  Like content that is created off the web, web-based content is also protected under copyright law unless it is stated otherwise.  The Internet Archive’s approach to harvesting web content for archiving is to collect everything from which its crawlers are not excluded, and to provide an opt-out policy for anyone who specifically does not want to be included.  While the legality of this method is up for debate, the Internet Archive has avoided many infringement suits via their “willingness to respect the wishes of those copyright owners who want to limit and control the reproduction of their copyrighted works” (Hirtle, 2003).

Since the introduction of Web 2.0, web content on a given web page may also have more than one creator.  So, obtaining copyright permission for preservation purposes may be more challenging than contacting one person.  In the case of blogs, for example, blog writers do not own the copyright to comments other people have left (Biederman & Andrews, 2008).  To take this one step further up the difficulty scale, think of the challenges introduced by anonymous comments with no clear author.

Additional Restrictions:

Outside of the general US copyright law that is applied to a work, we must also take into consideration the licensing restrictions that may be associated with subscription materials.  These will likely have their own rules and implications for preservation, especially given that which is made more clear by the Digital Millennium Copyright Act (DMCA).  The DMCA prohibits “circumventing technological access controls to obtain access to copyrighted works,” meaning if access to the work is password-protected, you cannot create a work-around to allow others to get to it (Besek, 2003).

No Real Precedents:

Finally, I think another basic challenge with copyright is that there are no precedents for many of the issues that digital preservation activities bring to the surface.  This is especially true in regards to the Fair Use exemptions, which are judged on a subjective basis.  The Fair Use exemption could be a saving grace for preservation activities, but until it has proven to be so in an infringement challenge or lawsuit, it is a very big risk to assume that this can be the case for all instances.  It’s likely not the right preservation decision to wait until copyright law catches up with the needs of our digital environment.  So…who wants to try first?

Biederman, C. J., & Andrews, D. (2008, May 1). Applying copyright law to user-generated content. Los Angeles Lawyer, 12.
Besek, J. (Jan 2003). Copyright issues relevant to the creation of a digital archive: A preliminary assessment.  CLIR.  Retrieved Jan 5, 2010 from http://www.clir.org/pubs/reports/pub112/contents.html
Hirtle, P.B. (2003).  Digital preservation and copyright.  Retrieved Jan 5, 2010 from http://fairuse.stanford.edu/commentary_and_analysis/2003_11_hirtle.html
Advertisements

A Budding Branch?

This evening I sat in on a lecture given by John Phillips, a Management Consultant at Information Technology Decisions.  John was giving an overview of what he saw as the similarities and differences between the three main branches of information management professionals: librarians, archivists, and records managers.  What was not included in this list were digital preservationists.

Now, as someone who is not actually working in the field, I may be remiss in assuming that digital preservation has yet earned thusly titled professionals.  But I think if this is not yet the case, then it certainly will be in the future…once it becomes clear that professionals from the other three branches of information management cannot be expected to all have expert-level knowledge of digital preservation practices….which will become clear because everyone in information management really needs to starting thinking about technological obsolescence.

The point of this post, though, is to point out a major correlation between records managers and what I would be inclined to think of as digital record preservationists.  As John pointed out, records managers differ from librarians and archivists because, 1) they tend to work in business or corporate environments, and 2) they are OK with – and are expected to – throw things away after they are no longer of value to the owning organization.

photo by Sebastiano Pitruzzello

Upon an item’s accession into a repository, records managers will asses the value of an object, and then revisit that assessment later on in the course of retention decisions.  If the item is no longer worth keeping, it is discarded.  This is also the (theoretical) case with digital preservationists.  In digital preservation, OAIS-type repositories are intended to preserve digital items for as long as those items are of value to their designated communities.  This implies that at some point, a digital item may no longer have value, and therefore continued preservation efforts for that item are not economically justified.  Throwing things away is a dirty job, but just as we can’t possibly collect everything out there, perhaps we can’t keep it all, either.

But let’s not discredit those clingy librarians.  John gave an interesting guesstimate regarding the types of respective repositories information professionals work with.  Among records managers, archivists, and librarians, librarians deal with the highest proportion of electronic to physical records out of all three professions.  (John’s guesstimate was 40% electronic / 60% physical, in comparison to IT professionals, who are 100% electronic by nature.)  The numbers for records managers were 30% electronic / 70% physical, which is still quite a lot of paper to be dealing with.
So if librarians are handling the highest proportions of electronic items out of these three groups, we can make a big case for libraries to be the battle grounds for creating leaders in digital preservation.  Technological and file format obsolescence will hit libraries the hardest if these numbers are accurate.  As contenders with the most to lose, libraries are poised to harbor the most institutional support for digital preservation initiatives…and perhaps spawn the fourth major branch of information professionals.

Why There is No Single Preservation Strategy

The following are some thoughts that I had about why there isn’t one digital preservation strategy that can be applied to all digital preservation programs.  As wonderful as it would be to find one standardized solution that fits everyone’s needs, it’s essentially an impossibility.  What’s below is something I wrote up for some coursework, but I thought I’d share it here, too.

There are two ways to answer the question of why there is no universally applicable digital preservation strategy.  The first is at the institutional level, and the second is at the level of the digital objects intended for preservation.

Digital preservation efforts so far have been tied to institutions interested in maintaining access to digital objects over time.  Being tied to an institution for funding and support will come with governance, policies, administrations, departments units, stakeholders to please, and service missions which will all be very specific to the institution.  These factors will all be guiding principles in the way a digital preservation strategy will be created at a given institution.  And this is fine; Pennock (2006) even goes so far as to state that “digital preservation policies are most effective when integrated into the overall organisational policy framework.”  But this would prevent a universal digital preservation model from being possible simply due to all the “personalizations” that would need to take place in order to meet the needs of the institution as well as the capabilitibubbleses allowed by whatever funding is available.

The second way to answer the question of why there is not a single digital preservation method that can be applied to everything is at the level of the digital objects.  When it comes to determining the actual preservation method, some ways work better than others depending on the type of file at hand, and the needs associated with that individual file.  For example, van der Hoeven (2004) points out that migration is an effective preservation method for widely supported file formats, but it might not be good for files that must maintain high levels of authenticity.  He even goes on to state that “…no one size fits all solution is possible.  Digital documents differ from each other in too many ways and are used for many different purposes by many different users.”
If we are to come up with an effective digital preservation strategy (at both the institutional and document levels), we must remain aware of the options, and expect to employ more than one method, strategy, and tool set.

Pennock, M. (2006).  “JISC Briefing paper: digital preservation, continued access to authentic digital assets.”  Retrieved September 30, 2009 from
van der Hoeven, J. R. (2004). “Permanent Access Technology for the virtual heritage.”  Retrieved September 30, 2009 from http://jeffrey.famvdhoeven.nl/dd/Researchtask%20IBM%20TU%20Delft%20-%20J.R.%20van%20der%20Hoeven.pdf
Photo by Tambako the Jaguar under a Creative Commons Attribution-No Derivative Works 2.0 Generic license.

JHOVE and JHOVE2

When curating digital files for storage in a digital repository, being certain of an object’s file type format is very important for preservation purposes and for future accessibility.  JHOVE is an open-source, Java-based framework that will identify, validate, and characterize the formats of digital objects.  This tool can be integrated into an institution’s workflow associated with populating a digital repository.  If the repository is OAIS-compliant, the workflow integration would occur during the creation and validation phase of an information package in the digital object’s ingestion.

JHOVE’s three steps of identification, validation, and characterization will result in us knowing a great deal about a digital object’s technical properties.  We’ll know the object’s format (identification), we’ll know that it is what it says it is (validation), and we’ll know about the significant format-specific properties of the object (characterization).

JHOVE’s name comes from the partnership that spawned between JSTOR and the Harvard University Library in 2003 to create the software, and stands for JSTOR/Harvard Object Validation Environment.

The Digital Curation Centre did a case study in 2006 which provides a concise background of the JHOVE project.

JHOVE2

The September 2008 Library of Congress Digital Preservation Newsletter reports on the development of JHOVE2.  The users and creators of JHOVE decided to address what they saw as shortcomings and improve the tool for JHOVE2.  However, the project has moved to the guidance of the California Digital Library, Portico, and Stanford University.jhove2

The most notable change to come in JHOVE2 is the shuffling around of the original  three-step process outlined by JHOVE.  For JHOVE2, the whole process is now considered to characterize a digital object by identifying, validating, and reporting the inherent properties of the object that would be significant to its preservation.  Added to this process is an assessment feature that determines the digital object’s acceptability for an institution’s repository, based on locally-defined policies.

A really exciting and valuable improvement in JHOVE2 will be the ability to address the characterization of digital objects that are comprised of more than one type of format.

One of the developers of JHOVE and JHOVE2 is Stephen Abrams of the California Digital Library, who has been designated as a Digital Preservation Pioneer by the Library of Congress.

JHOVE2 is expected to be released early 2010, but the prototype code was made available last month for viewing and comments.  Here are the FAQ from the project’s site.  The completed product, like its predecessor, will also be available under an open source license.

There will be a JHOVE2 Workshop following iPRES 2009.

iPRES

iPRES is an annual international conference on the Preservation of Digital Objects.  Current research and projects are presented by authors of papers that have been selected by a comprehensive review process.  The papers tend to focus on technological research and from authors’ experiences in implementing and practicing different preservation strategies.  iPRES 2009 marks the sixth year the conference has been happening, and it is taking place October 5-6th at the Mission Bay Conference Center in San Francisco, CA.  The California Digital Library is acting as this year’s host and is thus leading the internal conference planning and local preparations.

Last year’s conference was hosted by the British Library and was held in London.  Previous to that, iPRES 2007 was organized by the National Science and Technology Library of China and was held in Beijing.  More information about previous conferences can be found here.

iPRES 2009 has posted a two-track draft program, which reveals that David Kirsh and a panel from members of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access will give the keynote addresses.

Also of interest to this year’s conference is the string of related events that follow it.  These events are taking place in San Francisco as well, and might make for exciting ways for iPRES attendees to tack on a couple of extra days to their stay in the city.

Digital Preservation Coalition (DPC)

The Digital Preservation Coalition was established in 2001.  It is a UK-based non-profit whose members share the goal of raising awareness and sharing knowledge about digital preservation.  I think their first success in achieving this goal was to create an international organization of members.

Membership is open to all parties, given that they are non-profit or collective.  There are different tiers of involvement in which members can participate, from being a funding supporter of a specific project to full membership, which costs 10,000 GB pounds/year.  A list of members can be found here.

Mission

Reading through the mission of the DPC is like looking at a hit-list of many of the key issues of digital preservation efforts.  Primarily, it is easy to appreciate that the DPC recognizes the necessity of collaboration in an effective digital preservation strategy by openly stating the very harrowing admission that no organization can “address all the challenges alone.”  Sharing progress and ideas is fundamental to this effort.  But the DPC also encourages individual projects done by members in order to promote more homegrown institutional and sector-level preservation practices and policies.

My favorite part of their mission is this:
“Instituting a concerted and co-ordinated effort to get digital preservation on the agenda of key stakeholders in terms that they will understand and find persuasive.”

I’m glad that someone has taken this part of the digital preservation process to the battlegrounds.  No matter how well-planned and coordinated any digital preservation project may be, they all need funding.  And funding will probably have to come from parties that have not thought of or even necessarily heard of digital preservation and its importance.  Explaining the process and need is really Step 1 in any successful attempt to secure support and funding.

In general, I would think that digital preservation efforts are at an advantage for getting funding because once its goals are understood, it would be difficult for a truly invested stakeholder to overlook its relevance.  The DPC has really addressed connecting the dots between the people involved in digital preservation projects and the people who need to support these efforts in this part of their mission statement.

What the DPC Does

The DPC produces and shares information regarding research and practice within the digital preservation community.  They also work on promoting technology and standards, including coordinating recommendations for the 5-year review of the OAIS standard.  There is a clearly defined list of other goals and objectives here.

Their website is a comprehensive hub for their reports and activities, and also lists the projects of its members – arranged by type.  You will also find various training opportunities and a quarterly newsletter produced in concert with PADI.

The DPC also administers the international Digital Preservation Award.

What I think is probably their magnum opus up to this point is their Handbook.

The Handbook

This incredibly useful handbook is maintained by the DPC.  It goes far beyond the OAIS model guidelines by including more information and concepts, as well as information about selecting materials.  The handbook is meant to be “of interest to all those involved in the creation and management of digital materials,” and I think it really is.  A brief look at it will show you:

  • A who, what, why, how overview of digital preservation
  • A glossary of definitions and concepts
  • A run-down of media storage formats
  • Preservation strategies at the institutional level
  • Organizational, workflow, and institutional collaboration strategies
  • Acquisition and selection guidelines with an incredible supplementary flow chart for selecting materials

One final note I’d like to make regards the UK-centric view their mission proclaims this organization has.  This is an organization comprised of international members who are all making strides together in preserving global digital artifacts.  I think that just because the DPC is based in the UK, and it aims to place UK digital preservation strategies into an international context, we all stand to benefit from it as a resource and organization.  One shouldn’t be deterred from participating or from using what the DPC has to offer for this reason!