On the Creation of XML Data of Historical Earthquake Materials

GIS and Historical Documents

- Application of GIS to historical earthquakes documents in Japan -

Shoichiro Hara (NIJL), Masatoshi ISHIKAWA (U. Shimane), Masato KOYAMA (Shizuoka U.),

Kenji SATAKE (AIST), Yoshinobu TSUJI (U. Tokyo), Yukio HAYAKAWA (Gunma U.),

Masaharu EBARA (U. Tokyo), Shoji SASAMOTO (Shinshu U.), Masaaki TAKAHASHI (Kobe U.),

Satoshi TARASHIMA (TNM), Akiyoshi FUJITA (Tenri U.), Toshifumi YATA (Niigata U.),

Katsuhiko ISHIBASHI (Kobe U.)

KEY WORDS: historical seismology, historical records, historical earthquake, Japan, digitization, database,

XML, GIS

ABSTRACT: The full-text database on historical earthquake documents in the ancient and medieval ages in Japan has been crated. The aim is to digitize all historical documents on earthquake activities and disasters before 17th century of the Japan Islands, and is intended to compose a seismic intensity database. This paper will describe the digital transcription procedure and its application to geo-temporal systems.

1. INTRODUCTION

The two editions, “Collection of Historical Documents on Earthquakes in Japan, Enlarged and Revised Edition (Zotei Dai-Nippon Jishin Shiryou)” edited by Musha (1941, 1943a, b, 1951) and “Historical Documents on Earthquakes in Japan, New Collection (Sihshu Nippon Jishin Shiryou)” edited by Usami and his colleagues (1981-1994), are regarded as the fundamental research data for the past earthquake activities and disasters of the Japanese Islands (hereinafter SOURCES). However, some researchers have pointed out their low reliability because these documents were collected without precise examinations and text critiques, and they have been apprehensive that this low reliability will lead to wrong conclusions. Furthermore, as these editions are huge printed materials, it is very difficult for researchers to retrieve information.

The new research project of constructing the full-text database of “Historical Earthquake Documents in the Ancient and Medieval Ages in Japan” has started to solve above problems. This project is an interdisciplinary collaboration among seismologists, volcanologists, historians and information scientists and is aimed to critically examine all the collected documents, select appropriate ones, revise them correctly and to reorganize the new documents in digital form. This project is also intended to compose a seismic intensity database compatible with the Intensity Data Points (IDP) prevailing in Europe.

This paper will describe digitization of XML documents, full-text database construction, and geo-temporal data application to “TimeMap.”

2. CREATION of XML DATA

“Historical Earthquake Documents in the Ancient and Medieval Ages in Japan” are the collections of records on earthquakes activities and disasters in the Japan Islands based on the SOURCES in digital form. This project restricts the target within the ancient and medieval ages (before 17th century) because historical resources in these ages have somewhat different characteristics from those in early modern ages (mostly Edo Period) and historical records in the early modern ages are too huge for small research group to treat within short research term.

The records of the SOURCES are complied in every earthquake, and each record is arranged in order of the general description of an earthquake, the information of an original material, and descriptions of events extracted from the original material (Fig.1 and 2). As the SOURCES are collections of sentences from various original materials, they keep original descriptions partially. Then keeping precise layout information of the Source is not significant. On the other hand, there are many editorial notes on the SOURCES, which should be kept on the digital versions.

Fig.1 Structure of the Source Documents

Furthermore, digitized documents must be available to many purposes such as DTP, databases, and Web pages. Also revision works are needed parallel to digitization because extraction and posting errors must be corrected, and many annotations and notes will be added while revising. Therefore, XML is introduced to markup digital documents. Considering above-mentioned purposes on reorganized digital documents, following markup frameworks are arranged while defining XML DTD:

1) Logical elements such as titles, records, sections, etc. are marked-up.

2) Layout information such as the number of columns in a page, the numbers of lines in a page, and the numbers of characters in a line are not marked-up.

3) Descriptive information such as rubi, some reading-aids (Kaeri-ten), and nonstandard coded characters (Gaiji) are kept as long as possible.

4) Editorial notes are kept, and notes in the revision works should be kept.

Historical researchers read the SOURCE and put some marks on pages that instruct data input personnel to insert appropriate XML tags. Mark examples are a title of the earthquake (same as the beginning position of earthquake information), the region of general descriptions of the earthquake, a title of a material (same as the beginning position of a material that contains some information on the earthquake), a record that mentions events on the earthquake, a region of a note, a region of an annotation, a compulsory line feed, and so on. If data input personnel find nonstandard coded characters (Gaiji), they investigate Kanji-Character-Dictionary and allocate an appropriate dictionary code. Each earthquake is identified by an “Earthquake ID” that is allocated based on the date of the earthquake (the Julian calendar). Dates are described by the Japanese lunar calendar, the Julian calendar and the Gregorian calendar.

Digitization has almost finished and historians and seismologists begin revising the SOURCES documents. The fig.3 shows an example of the marked-up SOURCES, and the fig.4 is its DTP printing on Web pages.

3. CREATION of DATABASE

The tentative version of the full-text database of “The Historical Earthquake Documents in the Ancient and Medieval Ages in Japan” is derived from the above mentioned reorganized XML documents. This system has an engine that converts XML documents into HTML document. Then users can access the documents by general purpose browsers that can display the documents in vertical writing and give users natural look and feel (Fig.4). This full-text database is based on PostgreSQL and PHP, and it realizes general user functions such as sorting, searching, and so on.

3. CREATION of GEO-TEMPORAL DATA

The primary contents of the SOURCES are about the time, the location and the events related to an earthquake, and these will be important view points for researchers to retrieve and reorganize earthquake data. A preliminary examination has been carried on to use the SOURCES data as geo-temporal studies. Following earthquake data are used for this examination (976/7/17, 1096/12/11, 1185/8/6, 1325/11/27, 1361/7/26, 1498/9/11, 1586/1/18, 1596/9/1, 1596/9/5).

In the examination, every word related to locations such as place names, houses, areas, cities, towns, villages, ruins and so on is extracted from the SOURCES documents. These locations are allocated to present areas or points then their longitudes and latitudes are estimated. Every word related to time is also extracted and converted to Gregorian calendar date. These data are organized as Serial Number (SID), Earthquake Name (ENAME), Earthquake ID (EID), Resource Name (RNAME), Resource ID (RID), Record ID (RECID), Japanese Lunar Calendar Date (JDATE), Gregorian Calendar Date (NDATE), Place Name in the SOURCES (ONAME), Present Place or Location Name (PNAME), Latitude (DLAT), Longitude (DLON) and Note as shown fig.6. Record IDs are used to link a point on TimeMap and related XML record.

TmeMap is used to show geo-temporal relationship graphically. In this examination, event data created by above method, 14 shape files on active faults of Japanese Islands (Active Fault Shape File from Tanaka and Imaizumi eds., 2002, Digital Active Fault Map of Japan, Univ. Tokyo Press, DAFM2086), a shape file on Japanese map, and a shape file of the world countries are superimposed on TimeMap (Fig.7).

4. CONCLUSION

We have just finished digitizing the SOURCES and creating first-version database. Revision works are carrying on creating high reliable documents. Various problems have been clarified concerning existing collections of historical earthquake documents.

We are trying to construct a new user interface to facilitate revision works on an XML document, and the studies on database system configurations, data conversions, and effective user interfaces are necessary. The goal of the database system development is to realize effective data handling functions for researchers and the automatic cooperation with the seismic intensity database. Verification of these applications is a future research subject.