JHOVE2: Next-Generation Architecture for Format-Aware Characterization

Assessment Module Specification

Version: 2.0.0

Issued: 4 August 2010

Status: Draft


Copyright © 2010 by The Regents of the University of California, Ithaka Harbors, Inc., and The Board of Trustees of Leland Stanford Junior University. All rights reserved.

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

JHOVE2 Project Team

California Digital Library
Stephen Abrams
Patricia Cruse
John Kunze
Isaac Rabinovitch
Marisa Strong
Perry Willett / Portico
John Meyer
Sheila Morrissey
Stanford University
Richard Anderson
Tom Cramer
Hannah Frost / Library of Congress
Martha Anderson
Justin Littman
With help from
Walter Henry
Nancy Hoebelheinrich
Keith Johnson
Evan Owens

Assessment Module Specification Page 27 of 7

Preface

JHOVE2 is a Java framework and application for next-generation format-aware characterization. Characterization is the process of examining a formatted digital object and automatically extracting or deriving representation information about that object that is indicative of its significant nature and useful for purposes of classification, analysis, and use. For more information, visit http://jhove.org.

This document covers the specification, design, implementation, and configuration of the JHOVE2 Assessment module.

Acknowledgments

The JHOVE2 project is funded by the Library of Congress as part of its National Digital Information Infrastructure Preservation Program (NDIIPP).

Version History

Version / Date / Notes
2.0.0 / 4 Aug 2010 / Draft

Contents

Preface 3

Acknowledgments 3

Version History 3

Contents 4

1 Introduction 5

2 Identification 5

3 References 6

4 Terminology and conventions 8

5 Reportable Properties 10

6 Configuration 19

7 Implementation Notes 26

1 Introduction

This is the specification for the JHOVE2 Assessment Module.

Assessment can be used to perform policy-based evaluations of a digital object. Assessment outcomes that are included in the JHOVE2 output can support decision-making processes related to determination of preservation risk, level of service, or actions to be taken. The assessment module does not initiate any actions on its own, although the surrounding workflows may utilize assessment results to guide control mechanisms.

The Assessment Module performs rule-based analysis of the reportable properties previously generated during characterization of a source unit. Property values that can be analyzed include format identity, extracted features, and validation status.

Assessment rules can be modified, extended, or replaced through local configuration without requiring programming changes to the JHOVE2 core framework or assessment module. For this reason, assessment is considered a subjective determination. This evaluation differs from validity, which is a more objective determination based on level of conformance of a digital object to the normative syntactic and semantic requirements defined by the authoritative specification of the object's format.

Note that assessment refers to the appraisal of an instance of a format, not of the format itself. For example, it would be a specific WAVE file that is assessed, not the WAVE format.

2 Identification

Assessment Module
JHOVE2 module name / AssessmentModule
JHOVE2 module identifier / http://jhove2.org/terms/reportable/org/jhove2/module/assess/AssessmentModule
JHOVE2 module class / org.jhove2.module.assess.AssessmentModule

3 References

Recent efforts to develop and apply assessment methodologies in digital object workflows and repository operations include:

AONS II (Automated Obsolescence Notification System), National Library of Australia and APSR3

CIV (Configurable Image Validator), Library of Congress (Michael Stelmach)

Institutional Technology Profiles, National Library of New Zealand

The following are considered informative with respect to assessment objectives:

Planets (2009). Survey Analysis Report, IST-2006-033789, DT11-D1.
http://www.planets-project.eu/market-survey/reports/docs/Planets_DT11-D1_SurveyReport.pdf

Rog, J. and van Wijk, C. (2008). Evaluating File Formats for Long-term Preservation. National Library of the Netherlands; The Hague, The Netherlands. http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf.

Pearson, D. and Webb, C. (2008). Defining File Format Obsolescence: A Risky Journey. International Journal of Digital Curation. Vol 1: No 3. http://www.ijdc.net/index.php/ijdc/article/view/76

De Vorsey, K. and McKinney, P. (2009). One Man’s Obsoleteness is Another Man’s Innovation: A Risk Analysis Methodology for Digital Collections. Presented at Archiving 2009, Arlington, Virginia, May 2009.

Predicate logic

Conditional (programming) - Wikipedia
http://en.wikipedia.org/wiki/Conditional_(programming)

The IF-THEN-ELSE Construct
http://www.cs.ucla.edu/ldl/tutorial/node22.html

Predicates and Quantifiers
http://www.csm.ornl.gov/~sheldon/ds/sec1.6.html

MVEL expression language software used for rule evaluation

MVEL - Home
http://mvel.codehaus.org/

MVEL Language Guide
http://mvel.codehaus.org/Language+Guide+for+2.0

Expression Languages (OGNL and MVEL)
http://javaforu.blogspot.com/2007/10/expression-languages-ognl-and-mvel.html

Why the MVEL scipting language for JBoss Rules
http://blog.athico.com/2007/05/why-mvel-scipting-language-for-jboss.html

Pluggable dialects for a rule engine - Patent application
http://www.faqs.org/patents/app/20090063382

Open Source Expression Languages in Java
http://java-source.net/open-source/expression-languages

Java Open Source Expression Languages
http://www.java-opensource.com/open-source/expression-languages.html

Rule Language Standards(why have just std when you can have many?)

RuleML Homepage
http://www.ruleml.org/

Production Rule Representation (PRR)
http://www.omg.org/spec/PRR/

Rule Interchange Format (RIF) Working Group
http://www.w3.org/2005/rules/wiki/RIF_Working_Group

Rules Engine technology

Open standards related to Rule Engines
http://stackoverflow.com/questions/1016139/what-are-all-the-open-standards-related-to-rule-engine

JSR 94: JavaTM Rule Engine API
http://jcp.org/en/jsr/detail?id=94

Rule Engines Overview
http://jadex-rules.informatik.uni-hamburg.de/xwiki/bin/view/Resources/Rule+Engines

Top 10 Java Business Rule Engines
http://blog.taragana.com/index.php/archive/top-10-java-business-rule-engines/

JSR-94 (Java Rule Engine API) ill-designed?
http://www.mhaller.de/archives/81-Why-is-JSR-94-Java-Rule-Engine-API-ill-designed.html

Introduction to Drools
http://www.intltechventures.com/presentations/2008-01-26-Introduction-to-Drools.pdf

Drools - 1 Minute DRL Tutorial
http://legacy.drools.codehaus.org/1+Minute+DRL+Tutorial

Drools JBoss Rules 5.0 Developer's Guide
http://book.pdfchm.net/drools-jboss-rules-5-0-developers-guide/9781847195647/

4 Terminology and conventions

Source Unit (org.jhove2.core.source.Source)

A source unit is a JHOVE2 abstraction that represents a digital object to be characterized. A source unit is usually a single file, but it might be a bytestream with a file or an aggregation of other source units.

Format Module (org.jhove2.module.format.FormatModule)

A format module does the parsing, feature extraction, and validation of a source unit. The pairing of a source unit with a format module is performed by the JHOVE2 dispatcher based on the format identification. Depending on the precision of the identification and the availability of format modules, it is possible for 0, 1, or multiple format modules to be executed against a source unit. Each format module instance that has performed feature extraction is added to the source unit's list of modules.

Reportable Property (org.jhove2.annotation.ReportableProperty)

A reportable property is a field of a JHOVE2 java object that has been annotated so that the JHOVE2 displayer will include that property value in the JHOVE2 output. A property may be a field of a format module, or may be nested more deeply within the Reportable object hierarchy that is used within a module to store the data extracted by characterization.

RuleSet (org.jhove2.module.assess.RuleSet)

For each type of format module it is possible to define one or more rules to be evaluated against the reportable property values that were discovered by the module during feature extraction. A collection of rules targeted at a specific module is called a RuleSet. The module's object type (objectFilter) is specified using the java classname of the module. Note that assessment is, in principle, not limited to analysis of format module objects. Any type of object that is referenced by the source unit (or even the source unit itself) might contain reportable property data worthy of analysis. But at present, only format modules and the source unit itself are being considered.

Rule (org.jhove2.module.assess.Rule)

Each Rule in a RuleSet defines the assertion tests to be applied against the reportable property values stored in instances of a format module (or other supported object type). These properties include the format Identification, features extracted from the source unit, and validity.

The specification of a rule uses an if-then-else construct whose basic structure (in pseudo code form) looks like this:

If (condition) Then

(consequent)

Else

(alternative)

End If

The condition portion of the rule consists of a quantifier and a list of predicate expressions like this:

{ALL_OFF | ANY_OF | NONE_OF}

(predicate)

(predicate)

...

Each predicate is a string containing a boolean expression written using a domain specific language. Currently only MVEL is supported. The simplest form of expression would be:

{reportableProperty} {logical operator} {value}

Assertions that can be evaluated include testing for the presence/absence of a property, testing constraints on property values, testing the presence of a substring in a property value, testing the presence of a specified value in a collection of values, or testing combinations of the above. More precise discussion of the syntax to specify predicates will be included in the configuration discussion.

The quantifier specifies the situation in which the rule as a whole will evaluate to true:

·  ALL_OF = all of the predicates are true.

·  ANY_OF = any of the predicates is true.

·  NONE_OF = all of the predicates are false.

The specification of a rule also includes:

·  the rule name (unique in the rule set)

·  the description (to document the reason for the rule)

·  the consequent (a string value to report if the rule evaluates to true)

·  the alternative (a string value to report if the rule evaluates to false)

·  An enable boolean flag that controls whether a rule is active or dormant.

Each RuleSet also has a name, description, consequent, alternative, and enable values. In this case the consequent string is reported if all the rules in the set evaluate to true.

Examples of actual Rules and RuleSets can be found below in the configuration section of this document

AssessmentResult (org.jhove2.module.assess.AssessmentResult)

For each combination of Rule and appropriate object instance, there is created a corresponding AssessmentResult that contains the assessment outcome for the given rule against the object being evaluated. This outcome includes a reference to the rule being invoked, the overall boolean evaluation of the condition being assessed, and methods to report the string value of the rule's consequent or alternative, and the details of the boolean evaluation for each of the predicates in the rule.

AssessmentResultSet (org.jhove2.module.assess.AssessmentResultSet)

The AssessmentResult(s) for an object's assessment are collected in an AssessmentResultSet, which also contains the summary boolean value of for the collection of rules.

Assessment module results are tied to Source Units. An assessment module instance, containing all the AssessmentResultSets appropriate for that source unit, will be appended to the source unit’s module list, following the list of identification and format modules that were applied to the source.

5 Reportable Properties

AssessmentModule [class/interface]

AssessmentResultSets Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentModule/AssessmentResultSets
Type / java.util.List<org.jhove2.module.assess.AssessmentResultSet>
Description / Assessment Results
Reference
Version Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/Module/Version
Type / java.lang.String
Description / Module version identifier.
Reference
ReportableName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/reportable/AbstractReportable/ReportableName
Type / java.lang.String
Description / Reportable name
Reference
ReleaseDate Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/Module/ReleaseDate
Type / java.lang.String
Description / Module release date.
Reference
RightsStatement Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/Module/RightsStatement
Type / java.lang.String
Description / Module rights statement.
Reference
Developers Property
IdentifierIdentifier / http://jhove2.org/terms/message/org/jhove2/module/assess/AssessmentModule/AssessmentMessageshttp://jhove2.org/terms/property/org/jhove2/module/Module/Developers
TypeType / java.util.List<org.jhove2.core.Message>java.util.List<org.jhove2.core.Agent>
DescriptionDescription / Assessment Messages.Module developers.
ReferenceAssessmentMessagesReference / Message
Scope Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/Module/Scope
Type / org.jhove2.module.Module.Scope
Description / Module scope: generic or specific (to a source unit.
Reference
WrappedProduct Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/Module/WrappedProduct
Type / org.jhove2.core.WrappedProduct
Description / External product wrapped by the module.
Reference
Note Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/Module/Note
Type / java.lang.String
Description / Module informative note.
Reference
TimerInfo Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/Module/TimerInfo
Type / org.jhove2.core.TimerInfo
Description / Timer info for this module.
Reference

AssessmentResultSet [class/interface]

RuleSetName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResultSet/RuleSetName
Type / java.lang.String
Description / Rule Name
Reference
ReportableName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/reportable/AbstractReportable/ReportableName
Type / java.lang.String
Description / Reportable name
Reference
RuleSetDescription Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResultSet/RuleSetDescription
Type / java.lang.String
Description / RuleSet Description
Reference
ObjectFilter Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResultSet/ObjectFilter
Type / java.lang.String
Description / Object Filter
Reference
BooleanResult Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResultSet/BooleanResult
Type / org.jhove2.module.format.Validator.Validity
Description / RuleSet Boolean Result
Reference
NarrativeResult Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResultSet/NarrativeResult
Type / java.lang.String
Description / Narrative Result
Reference
AssessmentResults Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResultSet/AssessmentResults
Type / java.util.List<org.jhove2.module.assess.AssessmentResult>
Description / Assessment Results
Reference

AssessmentResult [class/interface]

RuleName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResult/RuleName
Type / java.lang.String
Description / Rule Name
Reference
ReportableName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/reportable/AbstractReportable/ReportableName
Type / java.lang.String
Description / Reportable name
Reference
RuleDescription Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResult/RuleDescription
Type / java.lang.String
Description / Rule Description
Reference
BooleanResult Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResult/BooleanResult
Type / org.jhove2.module.format.Validator.Validity
Description / Boolean Result
Reference
NarrativeResult Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResult/NarrativeResult
Type / java.lang.String
Description / Narrative Result
Reference
AssessmentMessages Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResult/AssessmentMessages
Type / java.util.List<java.lang.String>
Description / Assessment Messages.
Reference
AssessmentDetails Property
Identifier / http://jhove2.org/terms/property/org/jhove2/module/assess/AssessmentResult/AssessmentDetails
Type / java.lang.String
Description / Conditional Details
Reference

Agent [class/interface]

Name Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Name
Type / java.lang.String
Description / Agent name.
Reference
ReportableName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/reportable/AbstractReportable/ReportableName
Type / java.lang.String
Description / Reportable name
Reference
Type Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Type
Type / org.jhove2.core.Agent.Type
Description / Agent scope.
Reference
Affiliation Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Affiliation
Type / org.jhove2.core.Agent
Description / Personal agent corporate affiliation.
Reference
Address Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Address
Type / java.lang.String
Description / Agent postal address.
Reference
Telephone Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Telephone
Type / java.lang.String
Description / Agent telephone number.
Reference
Fax Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Fax
Type / java.lang.String
Description / Agent fax number.
Reference
Email Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Email
Type / java.lang.String
Description / Agent email address.
Reference / Selection.Style = ActiveDocument.Styles("Heading 3")
URI Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/URI
Type / java.lang.String
Description / Agent URI.
Reference
Note Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/Agent/Note
Type / java.lang.String
Description / Agent informative note.
Reference

TimerInfo [class/interface]

ElapsedTime Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/TimerInfo/ElapsedTime
Type / org.jhove2.core.Duration
Description / Elapsed time, milliseconds.
Reference
ReportableName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/reportable/AbstractReportable/ReportableName
Type / java.lang.String
Description / Reportable name
Reference

WrappedProduct [class/interface]

Name Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Name
Type / java.lang.String
Description / Product informative name.
Reference
ReportableName Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/reportable/AbstractReportable/ReportableName
Type / java.lang.String
Description / Reportable name
Reference
Version Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Version
Type / java.lang.String
Description / Product version.
Reference
ReleaseDate Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/ReleaseDate
Type / java.lang.String
Description / Product release date.
Reference
RightsStatement Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/RightsStatement
Type / java.lang.String
Description / Product rights statement.
Reference
Developers Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Developers
Type / java.lang.String
Description / Product developers.
Reference
Authority Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Authority
Type / java.lang.String
Description / Product maintenance authority.
Reference
Environments Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Environments
Type / java.lang.String
Description / Product environments, i.e. operating systems.
Reference
Languages Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Languages
Type / java.lang.String
Description / Product source code languges.
Reference
Constraints Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Constraints
Type / java.lang.String
Description / Product constraints.
Reference
isOpenSource Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/isOpenSource
Type / boolean
Description / Product open source status
Reference
Note Property
Identifier / http://jhove2.org/terms/property/org/jhove2/core/WrappedProduct/Note
Type / java.lang.String
Description / Product note.
Reference

6 Configuration