When curating digital files for storage in a digital repository, being certain of an object’s file type format is very important for preservation purposes and for future accessibility. JHOVE is an open-source, Java-based framework that will identify, validate, and characterize the formats of digital objects. This tool can be integrated into an institution’s workflow associated with populating a digital repository. If the repository is OAIS-compliant, the workflow integration would occur during the creation and validation phase of an information package in the digital object’s ingestion.
JHOVE’s three steps of identification, validation, and characterization will result in us knowing a great deal about a digital object’s technical properties. We’ll know the object’s format (identification), we’ll know that it is what it says it is (validation), and we’ll know about the significant format-specific properties of the object (characterization).
JHOVE’s name comes from the partnership that spawned between JSTOR and the Harvard University Library in 2003 to create the software, and stands for JSTOR/Harvard Object Validation Environment.
The Digital Curation Centre did a case study in 2006 which provides a concise background of the JHOVE project.
The September 2008 Library of Congress Digital Preservation Newsletter reports on the development of JHOVE2. The users and creators of JHOVE decided to address what they saw as shortcomings and improve the tool for JHOVE2. However, the project has moved to the guidance of the California Digital Library, Portico, and Stanford University.
The most notable change to come in JHOVE2 is the shuffling around of the original three-step process outlined by JHOVE. For JHOVE2, the whole process is now considered to characterize a digital object by identifying, validating, and reporting the inherent properties of the object that would be significant to its preservation. Added to this process is an assessment feature that determines the digital object’s acceptability for an institution’s repository, based on locally-defined policies.
A really exciting and valuable improvement in JHOVE2 will be the ability to address the characterization of digital objects that are comprised of more than one type of format.
One of the developers of JHOVE and JHOVE2 is Stephen Abrams of the California Digital Library, who has been designated as a Digital Preservation Pioneer by the Library of Congress.
JHOVE2 is expected to be released early 2010, but the prototype code was made available last month for viewing and comments. Here are the FAQ from the project’s site. The completed product, like its predecessor, will also be available under an open source license.