National Science, Mathematics, Engineering, And

DANA (Digital Archive Network for Anthropology)A Model For Digital Archiving Page 13 of 13

DANA

(Digital Archive Network for Anthropology)

A Model For Digital Archiving[1]

Jeffrey T. Clark, Brian M. Slator, Aaron Bergstrom, Francis Larson, Richard Frovarp, James E. Landrum III, William Perrizo, William Jockheck

Departments of Sociology/Anthropology, Computer Science, and Information Technology Services, North Dakota State University

Abstract

This is a report of work on an internet-based digital library called the Digital Archive Network for Anthropology (DANA). DANA provides a model for a generalized method for implementing digital archives. This federation of databases will link researchers, students, and the general public to distributed databases that include realistic, accurate, three-dimensional (3D), visual representations of artifacts, fossils, and other objects, along with 2D digitized documents (e.g., maps, plan views, excavation profiles, and photographs) and various associated data. DANA will have a strong data mining component which will allow users to find relationships in the data which correspond to facts about the actual artifacts and fossils. The data mining techniques involve a new spatial data mining structure called the Ptree. DANA is being created through development and implementation of cross-platform, open standards that will facilitate interoperability and exchange of information between remote systems. DANA enables dynamic use of digital models, virtual measurement tools, and an array of data supplied by contributing content providers (collaborators) for education and scholarly research.

1. Introduction

As an extension of the concepts of a digital library, researchers at NDSU are working on sharing access to research materials to including collections of physical artifacts that are in collections around the world. This generalization to share “virtual” artifacts has the potential to make artifacts available to a much wider audience with a corresponding increase in research. For many academic areas physical artifacts remain central. Access to, and preservation of, those remains are essential. Unfortunately, access and preservation are subject to several potential problems [1]. These problems include what some have called a “Crisis in Curation” [2,3,4] resulting from the combined effects of increased data recovery over the last few decades and a critical shortage of curation space. Even if adequate space exists for collections, access to those materials by scholars, and especially the general public, is severely restricted by very limited exhibit space. Moreover, repositories of human heritage materials are relatively few in number and unevenly distributed around the globe, resulting in an overwhelming inequality of access. Travel to repositories can be costly and may be difficult, especially for those with limited financial resources. Yet another problem is that many antiquities are fragile and should not be subject to repeated exposure and handling. Also, handling of culturally sensitive materials may not be appropriate or permissible. In short, scholars as well as the general public too often lack the opportunity to explore visually the vast majority of objects that tell the story of what it means to be human.

Technology now presents us with the opportunity to provide some correction to those problems. Laser scanners, or digitizers, and sophisticated software can be used to create three-dimensional models of the external shape and surface characteristics of nearly any object. These models can be digitally stored and retrieved for viewing on a computer screen, or sent to others via a portable storage medium (CD ROM, DVD, zip disk, etc.) or Internet connection. Moreover, model data can be archived, along with associated contextual data, in an openly available, Internet accessible, relational database. Such a database will significantly improve access to and analyses, modeling, and preservation of the material remains of human heritage. As digital representations, these materials can be used repeatedly for future studies and teaching without damage to the rare and sometimes delicate objects. Researchers will be able to store their findings, compare them with those of others, and carry out analyses on large material collections that are otherwise not available.

Faculty and staff at North Dakota State University in Fargo, North Dakota, are engaged in a long-term project to create such a database. We are not seeking, however, to create a single, enormous database for cultural heritage. Instead, we are engaged in establishing an architecture that will provide a seamless linking of multiple databases, forming a Digital Archive Network for Anthropology (DANA). DANA will constitute a federation of distributed, interoperable databases, each with specific content of value to archaeology, anthropology, and related fields in cultural heritage. Thus, an Archive search can be initiated from any of the established entry points and will access the entire network of databases.

A unique feature of DANA is the inclusion of accurate, measurable, three-dimensional (3D) models of material objects. These models can be variously manipulated to be viewed from all angles, and they are sufficiently precise to allow for a wide range of detailed measurements. Virtual calipers are available within the 3D viewer for basic measurements (length, width, etc.), and more advanced morphometric tools will be made available in the future. While the 3D visualizations are central to DANA, the Archive is not restricted to 3D models. Objects can be represented by sets of digital photographs, CAD drawings, and contextual data in relational databases.

This online network of anthropology resources will allow reliable, “anytime, anywhere" access to content and services. By enabling dynamic use of digital models, virtual tools, and an array of data supplied by contributing content providers, DANA will be an enormously valuable resource for education and research. It can be accessed and used at all levels, from K-12, to undergraduate, to post-graduate, to professional, to lifelong learning. Users can take advantage of the Archive for formal research (e.g., sophisticated shape analysis, comparative studies, etc.) or simply to satisfy curiosity. Some users will also be contributors to DANA, as we will provide a mechanism for qualified professionals to add new data. As an open resource, DANA will promote sharing of information while still protecting pre-published data as well as culturally sensitive objects and associated information. In short, the Archive will provide a unique and innovative set of products and services.

DANA will include a data mining component based on the SMILEY suite of web available data mining tools ([8]). SMILEY will include data mining tools based on Fourier and wavelet transformation, Kriging, association rule mining, classification and clustering. The tabular data in DANA will be distributed across the entire network and the subsequent mining will be performed in a highly parallel fashion involving all the servers in the network.

In this paper, we outline our work, to date, in developing DANA, and our vision of how DANA will operate in the future. It must be kept in mind that this is an evolving project that will undoubtedly undergo many shifts in direction as new ideas develop and are put into play. We seek collaborative relationships with colleagues in anthropology, computer science, and related fields in the development and implementation of DANA.

2. The DANA system

The Archaeology Technologies Lab (ATL) at North Dakota State University (NDSU) was founded in 1999 for the purpose of developing a 3D artifact modeling procedure and a digital artifact database. The initial goal of creating a digital database of artifacts in the NDSU archaeology collection soon grew into the concept of a database network for anthropological materials, or DANA. A research team was assembled with participants bringing varied interests and expertise (archaeology, computer science, and information technology services), and a common goal of expanding the digital modeling capability of the ATL to create a global, digital archiving system. We have made significant progress in developing databases for prehistoric stone tools from the Samoan Islands (in Polynesia), hominid cranial endocasts (casts made from the inside of skulls of humans and closely related species), and Native-American and historic European-American artifacts from North Dakota. These materials provided test cases for working out the scanning and modeling procedures, generalized database format, user interface, system architecture, navigation system, and other features of a digital archive.

The ATL now has two non-contact laser scanners and several well-provisioned workstations and laptop computers for creating accurate 3D models of artifacts. We also have an array of software for modeling, creating virtual environments, and related activities. The computing infrastructure for the NDSU database employs multi-tiered, multi-server system architecture. This system consists of a database server, a server for the 3D and other visualization data, the NDSU web server, and other servers for the development and testing of the application and servlets. DANA’s multi-tiered architecture allows load balancing through the use of different servers to fulfill specific functions, thereby significantly enhancing system performance and responsiveness for end users. Other institutions may employ other infrastructural setups without jeopardizing interoperability, but our experimental configuration has worked with impressive efficiency and speed. We are currently using Oracle 8i for our database, but other participants will be able to use different relational database management systems (RDBMSs). Tests with these other systems will soon be carried out.

DANA will also include a data mining component so that researchers can examine relationships in the data across data types and locations. This mining capability can be user driven or automatic. The SMILEY suite of web available data mining tools ([8]), which includes Fourier and wavelet transformations, Kriging, association rule miners, classifiers and clustering components, will be the basis of this component of DANA.

The SMILEY architecture uses the Java-enabled classic client/server paradigm. SMILEY fully utilizes the advantages of distributed computing, by using the client machine's processing power to achieve good response times. When a user accesses the SMILEY-enabled web site using a Java capable web browser (like Netscape or Microsoft's Internet Explorer), the browser will download a number of SMILEY applets into the user's client machine. The applets then contact the SMILEY server for a specific remote sensing imagery data set. The SMILEY server retrieves the required data set either from local disk or remotely from a SMILEY data server through a dedicated network. Before transferring imagery data to the client, the SMILEY server will cut out the snapshot needed by the client and pre-format imagery data for easy client handling. The applets perform image analysis and data mining functions with a user-selected view of the image on the client host. This provides performance directly relating to the processing power of the client machine. Currently, most PCs running in typical research location and at individuals’ homes have the power and memory needed to run all functions that SMILEY provides. The structure of SMILEY is shown below.

Figure 3. SMILEY (V1.1) system architecture

There are two sets of components in SMILEY. One set is passive in nature (stores of data and procedures), while the other is active.

The passive components are:

·  Data sets of imagery (data store).

This data can consist of any type of imagery, stored both locally (with respect to other components) as well as remotely (accessible through the network). Because the volume of the data store can be massive in nature, it is likely that most data will be stored away from the active components. The data is stored in a hierarchical fashion, in descending order of popularity. This allows hot (i.e. heavily used) data to be stored relatively close to the active components for quick access. Set of stored procedures (proc store. The main functionality of SMILEY is implemented as a set of procedures written in Java. These, along with other additional miscellaneous documents (like on-line help pages) and procedures, are stored in the proc store.

The active components are:

·  Remote WWW browser (or web browser).

Most graphical web browsers have an embedded Java interpreter to allow seamless integration of applet execution with normal web browsing.

·  WWW server (or Web server).

A WWW server is mainly used for transferring hypertext documents (written in HTML) from a central store to a WWW browser. The server may also be used to transfer other types of documents (such as graphics) and for execution of procedures to accomplish simple tasks (such as authentication, or dynamic file creation). SMILEY uses the web server for transferring data from the procedure store to the web browser at the remote site. When a remote user invokes SMILEY, the functional applets will be downloaded to the browser via the web server.

·  SMILEY server

The purpose of the SMILEY server is to catalog the items composing the data store and to transfer to remotely executing applets, the necessary remote data requested by them.

2.1. Operation and Participation

In order to establish interoperability of the participating network databases, we are using open standards such as SQL and XML (eXtensible Markup Language) combined with new Java Technologies [6]. XML, which is a platform independent and license-free text format, provides a mark-up language to contain data in a way that facilitates data sharing between computer processes. As such, it will provide a means of standardization for Internet distribution of data. With the use of XML Schema or Document Type Definitions (DTDs), the data in an XML document are well described and easily shared. Our goal is to enable DANA participants to use whichever RDBMS they choose. This is more easily accomplished when vender-specific commands are avoided, so we use generic SQL statements for table creation and data insertion and retrieval. Our current operational prototype works with Oracle and PostgresQL, and we anticipate that it will work with Informix, DB2, SQL Server, and other fully functional RDBMSs with Java database connections (JDBC).

We will be defining an Anthropology Markup Language (AnthML), created from XML Schema specification, to aid in the interchange of anthropological data. This requires close collaboration with numerous other anthropologists. We will also create a glossary of terms along with a thesaurus that will list synonyms for the terms used in AnthML.

The DANA network will use servlets to handle database queries and other communications with the remote servers. This allows us to load the JDBC drivers at the servlet level for the appropriate database systems under that servlet's control. The connection to the RDBMS is implemented through a connection pool that can be set to open connections only as needed. As a result, the pool will not use more connections than are required at any given time, which will maximize efficiency and cost effectiveness by not tying up more licenses than needed.