An Initial Pilot Experience on Generating Complex Ontology Instances from Scientific Bibliographies on Real Biological Domains

José F. Aldana Montes(1), Rafael Berlanga-Llavorí(2), Roxana Danger(3), Raul Montañés-Martínez(4), Mª del Mar Rojano-Muñoz(1), Francisca Sánchez-Jiménez(4)

(1) Khaos Research Group. Department of Computer Languages and Computing Science.

Higher Technical School of Computer Science Engineering. University of Málaga.

Campus de Teatinos. 29071 Malaga. Spain.

( )

(2) University of Oriente, Santiago de Cuba, Cuba

now at TKBG

(3) TKBG. Department of Computer Languages and Computer Systems. Universitat Jaume I, Castellón, Spain

( )

(4) ProCel Lab. Department of Molecular Biology and Biochemistry. Faculty of Sciences. University of Málaga. Campus de Teatinos. 29071 Malaga. Spain. ( ).

Abstract

In this paper we present a first insight into the generation of complex ontology instances from scientific bibliographies like the one in PubMed/PubChem on a real biological domain: Polyamines and Histamine. There is evidence for the involvement of both in cancer and other inflammation- and/or angiogenesis-dependent diseases, but multiple questions concerning the molecular processes behind these effects still remain to be solved.

Both histamine and polyamines have similar chemical structures and metabolic pathways. Furthermore, in several relevant physiological or pathological situations both histamine and polyamines are present and, indeed, there is some degree of cross-talk between them. Unfortunately the available data is widely dispersed throughout the specialized literature of very different areas of biomedicine

We address the problem of automatically generating ontology instances starting from a collection of PDF documents stored in a bibliographic database. Given a domain ontology, which models and describes what we are searching for, the structure of the document is extracted in order to generate a mapping between the ontology and the document text. Using this mapping the ontology is populated with the extracted knowledge.

We adopted the Histidine decarboxylase (HDC), which is the enzyme responsible for histamine synthesis, as the pilot molecule to contrast our knowledge extraction efforts because we have worked extensively on its expression, turnover and structure-function relationships, and have developed both the first three-dimensional model and the first review on this enzyme. Nevertheless, due to the nature of the protein metabolism, once the instance generation process is validated on this enzyme, it could be easily scaled-up to any other enzyme, related or not to amine metabolism.

Keywords

Semantic Web, Ontology population, Instance Extraction, Biogenic Amines, Histidine decarboxylase