LinKFactory® : an Advanced Formal Ontology Management System

Werner Ceusters

Language and Computing (L&C)

Het Moorhof, Hazenakkerstraat 20 A

9520 Zonnegem, Belgium

Peter Martens

Language and Computing (L&C)

Het Moorhof, Hazenakkerstraat 20 A

9520 Zonnegem, Belgium

Abstract

As the web becomes more and more a worldwide platform for e-commerce, the creation of formal ontologies in all business sectors becomes crucial. It will become increasingly important to have computers understood what the real meaning is of the content of web pages, and of the data in the databases lying behind them. The real challenge will be to create formal ontologies that are processable and exchangeable by machines.

This paper describes LinKFactory®, a formal terminology and ontology management system that makes the creation and management of large scale, complex, multilingual and formal ontologies possible. We will explain the possibilities of LinKFactory® based on our experiences in creating a formal representation of the medical world, named LinKBase®, and linking it to several third party ontologies.

Keywords

Formal ontologies, semantic/linguistic knowledge base, ontology management system, terminology management system

INTRODUCTION

For many years, numerous teams, mainly of academic origin, have been working on systems that can handle terminology and the complex relations between individual terms. All those systems suffer from at least one of the major setbacks :

·  Insufficiently formalised (designed for human use, not machine use)

·  Not capable of handling the required large numbers of knowledge objects that form an adequate ontology

·  Not designed to handle linguistic aspects, sometimes not even multiple language entries

To make the semantic web a success, these three setbacks will have to be overcome. Knowledge will have to be formalized so that machines worldwide have a shared and common understanding of the information provided. The systems developed will have to be able to handle enormous amounts of information very fast. As the web is a universal system, different languages will have to be supported, i.e. the system and the ontology’s developed will have to be language-independent, but however linkable to all languages.

The existing problem that researchers and industries have is to build and maintain an environment that makes it possible to create the needed large formal ontology’s while keeping processing time at a minimum. Formal ontology has been recently defined as the systematic, formal, axiomatic development of the logic of all forms and modes of being [3]. Management systems for smaller ontology’s have been developed (ODE [1], WebOnto [4], Ontolingua [5], HoZo [6], JOE [7], Protégé [8], OntoSaurus [9], …), but none of these are capable to deal with the enormous and complex ontology’s that will be needed to support the semantic web.

To resolve these problems (initially for the medical environment) L&C worked on creating a formal Ontology Management System, called LinKFactory®. The intent of the project was to implement a knowledge representation and compatible reasoning mechanism in a database structure. Among the objectives set for developing the data-model were:

·  The ability to fully model a classification (ontology) of concepts with all their relevant relationships and definitions.

·  The ability to connect languages with this conceptual model and use it for natural language understanding.

·  The ability to connect the resulting association of terms and concepts with third party terminology systems such as SNOMED or ICD-9.

·  All entities in the database should be versioned so that references can be made to older versions of objects without losing that information.

During the course of the project several extra capabilities were added to these base requirements that served to enrich the structure and allow for even more sophisticated features.

All this had to be modelled as efficiently as possible, and in such a way that it would allow easy manipulation from an application layer.

THE TOOL : LinKFactory®

General Description

LinKFactory® is the formal ontology management system, developed by L&C, used to build en manage the medical linguistic knowledge base LinKBase®. LinKFactory® is a tool that can be used to build large and complex language-independent formal ontology’s. “Language-independent” has to be understood in terms of independency from any specific language (such as English, French, Dutch, …), but not from language as a medium of communication. It is also not limited to small ontology’s, as most of the existing ontology editors are.

The fact that the ontology’s are language-independent has some major consequences on the type of applications that can run on top of them. It will, for example, be much easier to search for relevant information on the web (or a thesaurus): the search can be done in one language in free text. This free text search will be linked to language-independent concepts (based on the semantics) that will be the basis for the information retrieval. Since terms in several languages are attached to the concepts using a linguistic ontology [2], also relevant info in other languages can be retrieved, while semantically irrelevant information will not appear in the list of results.

System Architecture of LinKFactory®

LinKFactory® stores the data in a relational database (we currently use Oracle). Access to the database is abstracted away by a set of functions that are “natural” when dealing with ontology’s: get-children, find-path, join concepts, get terms for concept X, …

One of the main requirements of the project was that a server-side component should be developed that would allow developers to use a standardized API to program applications on top of the semantic database without requiring intimate knowledge of the internal structure of the database.

This component would also have to be database-independent (Oracle, Sybase, SqlServer have been tested), capable of dealing with multiple concurrent users and it would have to be stable. LinKFactory® is also platform independent (Windows, Solaris, Unix and Linux tested). Combining all these requirements made it clear that Java was to be the platform of choice seeing as it supports all of the above and has become a stable and mature technology in the last year.

We finally settled on RMI (Remote Method Invocation) as our technology of choice because of its simplicity and proven robustness. This means that our server-side component is a Java Application that extends java.rmi.Remote. The application requires an RMI registry (a sort of Domain Name Server for RMI servers) to be running in order for it to be able to register itself and for clients to be able to connect to the RMI server.

The LinKFactory® system consists of 2 major components (see figure 1), the LinKFactory® Server, and the LinKFactory® Workbench (client-side component). The LinkFactory® Workbench allows the user to browse and model the LinKBase® data.

Figure 1 : LinKFactory® components

The workbench is a dynamic framework for the LinKFactory® Beans. Each bean has its own specific functionality and limited view to the underlying formal ontology, but combining a set of beans in the workbench can provide the user with a powerful tool to view an manage the data stored in the semantic database. The workbench provides the user with an optimal flexibility to create a customized tool to view and manage the data in the ontology.

Different views on the semantic network are implemented as Java beans. Examples are: Concept tree, Concept criteria and full definitions, Linktype tree, Criteria list, Term list, Search pane, Properties panel, Reverse relations, … The LinKFactory® framework is implemented in 100% pure Java code. The modular design is done using Java beans organized and linked in a freely configurable workspace.

Each user can create multiple views on the semantic network using the beans available. The beans are organized in several workspaces designed by the user. Each workspace can contain multiple frames upon which the beans are laid out. Once the layout work is finished, links can be established between the beans used.

Each of the layouts defined can be saved as Java code and stored in the database. Layouts can be defined on different levels: Organization, Group, User.

Each bean can have multiple incoming and outgoing links where appropriate. Beans can also be linked inter frame. Each bean has specific properties, which can be set at runtime. This approach allows for the different types of tasks to be performed using the optimal layout for the task at hand.

Several quality assurance mechanisms are build in: versioning, user tracking, user hierarchies, formal sanctioning with possibility to overrule, sibling-detection, linktype hierarchy, etc.

Specifications of the Available Beans :

General

The different beans provide information on and a view of different parts of the ontology’s build. All of these beans can be linked to each other, when an outgoing link from bean 1 matches an incoming link from bean 2. A bar on top of each bean shows the other beans the bean has been linked to and also the direction of the link. Other items in the bean bar are the bean label, the button to display/edit the bean properties and the possibility to refresh the bean contents. Optional items (dependent on the kind of bean) are a shortcut to the linktype filter property and a dragable item possibility.

Most important beans

The ConceptTree (see figure 2) bean provides the user with a view to the hierarchical relations in the semantic network of concepts. As concepts can have multiple parents (network structure) and the representation is a tree-view, the network structure is split up into the matching tree representation. Modifications to the structure can be made by means of drag and drop.

The functionalities of the ConceptTree bean include search by knowledge name, search by terms; modify hierarchy, history of searches. The bean properties provide a way to specify the number of siblings to display, the font, the child depth, the number of children to display, the preferred language, the parent depth and the leaf-node child depth.

Figure 2 : the ConceptTree bean

A second important bean is the Full Definition bean (see figure 3). This bean shows the user the hierarchical and non-hierarchical relations a concept has with other concepts. These relations are sub classed in the relations explicitly specified for this concept (beneath the node labeled CRITERIA), and the implicit relations (beneath the node labeled INHERITED CRITERIA) (figure 2). It also shows the full definitions, i.e. the sufficient criteria to uniquely identify a concept, for this concept. Explicit relations and full definitions can be added, removed or modified (by drag and drop).

We introduced the notion of concept-definition and concept-criteria, which allows us to group a number of concept-criteria (essentially relationships) to form a full definition. In this way a concept could not only have multiple full definitions and loose concept-criteria, but also the definitions could overlap.

Concept-criteria are the equivalent of what used to be complex-concepts; they represent a relationship between two concepts by use of a linktype.

This diagram illustrates how full definitions could be constructed:


The hypothetical concept C has 5 relationships (CONCEPT_CRITERIA) and 2 full definitions (FD1 and FD2). FullDef1 consists of 3 concept-criteria: L1, L2 and L3 and FullDef2 consists of L3 and L4. L5 is simply a loose concept-criterium not belonging to any full definitions.

Figure 3 : the Full Definition bean

The ReverseConcept bean (see figure 4) shows the reverse concept bean shows the relations other concepts have with the selected concept. The node labeled Reverse ConceptCriteria shows the explicit relations other concepts have with the selected concept. The node labeled Inherited Reverse ConceptCriteria shows the implicit relations other concepts have with this concept, i.e.: the explicit relations other concepts have with a concept that is a explicit child of the selected concept, hence the concepts have an implicit relation with the featured concept. The inherited reverse relations are not shown by default.

Figure 4 : the Reverse Concept bean

The LinkType bean (see figure 5) provides the user with a view to the hierarchical relations in the semantic network of linktypes.

As linktypes can have multiple parents (network structure) and the representation is a tree-view, the network structure is split up into the matching tree representation. Modifications to the structure can be made by means of drag and drop.

Linktypes were deemed to have a hierarchy just like concepts so we added the LINKTYPE_TREE to represent this; this simple construct suffices because there is only a hierarchical parent-child relationship between linktypes.

This hierarchy will have an effect on the constraints (see below) because when a linktype is used in a concept-criterium it automatically implies all the parent-linktypes are used.

e.g.: When there is a link HAS-BONAFIDE-BOUNDARY and it has HAS-BOUNDARY as parent then that parent is also implied when the child is used in a relationship.

The Translate bean (see figure 6) shows the list of terms related to the selected concept, the selected linktype or the selected criterium in a certain language. Terms can be added, modified or removed. Several Translate beans can be viewed simultaneously giving terms in different languages, all linked to the language-independent concept. This construction f.e. makes it possible to place an application on top of the ontology

Figure 5 : LinkType Tree bean

Figure 6 : Translate bean

Other available beans include Concept Properties bean, LinkType Properties bean, Criteria bean, Bookmark bean and others.

All of these beans can be selected by the user and linked to each other, as such creating a powerful environment for browsing and editing large ontology’s. An ontology that has been build with LinKFactory® is LinKBase®.

LinKBase® : A LARGE FORMAL ONTOLOGY BUILD WITH LinKFactory®

Since the initial focus of L&C was the medical world, we started to construct a formal representation of the medical world. We used LinKFactory® to do this. LinKBase® is a large multi-lingual medical formal terminology system covering most parts of healthcare. The fact that LinKBase® currently contains over 1,000,000 medical concepts and over 350 linktypes (with over 3,000,000 linktype instantiations), gives a good indication of the size of the ontology’s LinKFactory® can handle.