7

Abstract

More and more people are using the Web every day, with access from multiple devices, not only possessing the ability to read content, but also the ability to contribute information. This is one of the reasons the Web has become so successful and very large, it has a decentralised design with multiple systems hosting various data.

Human’s posses the ability to find exactly what they require from the web, where as computers base searches on keyword matching, without understanding the content within. The reason for this is the traditional methods for developing web pages use mark-up languages, that specify how to display the content, but provide no or minimal semantics.

The Semantic Web aims to solve this problem by developing machine accessible code, to provide a better user experience and structure to the World Wide Web. This research considers the technology involved and how an application may be applied using semantic technology.

Acknowledgements

The three years at University have been demanding not only for me but also for my wife and family, I would like to thank them for their continued support, to whom I owe everything.

I would also like to thank my independent study supervisor, Dr Alan Phelan, not only his guidance, but also for the support throughout my final year. The Semantic Web was a subject that I found very stimulating and without Alan’s encouragement may have been impossible to comprehend.

Finally, I would like to thank the University of Worcester for all the experiences I have had while studying. I have met many interesting people and made friends of both students and lecturers, who I also owe thanks to for their support and continued motivation.

Abbreviations

AI Artificial Intelligence

API Application Programming Interface

DCMI Dublin Core Metadata Initiative

GUI Graphical User Interface

HTML Hypertext Mark-up Language

MDA Model Driven Architecture

OWL Web Ontology Language

PIM Platform-Independent Model

PSM Platform-Specific Model

RDF Resource Description Framework

RDFs Resource Description Framework Schema

SPARQL SPARQL Protocol and RDF Query Language

SQL Structured Query Language

UML Unified Modelling Language

URI Uniform Resource identifier

W3C World Wide Web Consortium

XML Extensible Mark-up Language

Table of Contents

Abstract 2

Acknowledgements 3

Abbreviations 4

1. Introduction 7

1.1. Background 7

1.2. Why the web is not enough 8

1.3. What are Semantics? 10

2. Literature Review 11

2.1. Knowledge representation 11

2.2. First Order Logic 12

2.3. Description Logic and Graph Theory 14

2.4. Ontology 15

2.5. Artificial Intelligence and Agents 17

2.6. Summary 17

3. The Layers of the Semantic Web 19

3.1. A Layered Architecture 19

3.2. Uniform Resource Identifiers (URIs) 20

3.3. XML (Extensible Mark-up Language) 20

3.4. Resource Description Framework (RDF) 22

3.5. Resource Description Framework Schema (RDFS) 24

3.6. Web Ontology Language (OWL) 26

3.7. Summary 28

4. Application Outline 29

4.1. Project 29

4.2. The Current System 29

4.3. Evaluation 29

5. Design 31

5.1. Data Design 31

5.2. Proposed Solution 31

5.3. Methodology 32

5.4. Unified Modelling Language (UML) 32

5.5. Framework 33

5.6. Model Design 34

5.7. Ontology Design 35

5.8. Implementation 36

6. Evaluation 37

6.1. Lessons Learnt 37

6.2. Conclusion 40

References 42

Bibliography 44

APPENDICIES A – E 45

APPENDICIES F – I 46

1. Introduction

1.1. Background

The Webs success is mainly due to its decentralised design: web pages are hosted by multiple computers and are linked to other documents either stored on the same or different computers. This design has made the World Wide Web the greatest repository of information ever assembled by man; the data is instantaneously available to anyone with an internet connection. Media resources, documents, people and products can be displayed and or provided by individuals at any time, which has caused an exponential growth of the internet (Heflin, 2001).

The growth of the internet has also become its weakness, the volume of available information has become so large, that it is becoming more difficult to locate useful data. Various search engines have tried to improve the way that a user may search for information; however the process still includes multiple searches and clicks. Search engines are still search engines, in that they do what they are told, a user may expect them to perform tasks, for example, finding a hotel in France and locating all the English restaurants nearby or finding and booking the best flight to the least expensive holiday location. To fulfil both these tasks would involve a long search process on multiple sites, once the sites are found the content would need to be read, integrated and then decided upon. The human interaction in the search process is still very high, in fact who is doing the searching?

The web has been developed for user interaction and is not designed to be processed by machines. Web pages are built using languages that tell the machine what to do or where to go, but it will not understand what the text means. Which means each link or document will just lead to another and another. The meaning behind each piece of text is what the computer needs to understand to be able to process the web pages intelligently; Tim Berners-Lee (2001) the inventor of the Web has described a way of doing this, by adding languages to the web which translate the meaning of web pages.

The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

(Berners-Lee et al., 2001)

The following paper discusses the technologies and underlying concepts of the Semantic Web, giving an overview of how it is slowly becoming a reality. A review on various semantic languages and their origins is given with the aim to implement the languages in a semantic application. As there is a close link between philosophy, Artificial Intelligence and the Semantic Web, all areas are discussed and definitions are provided in terms of ‘Computer Science’. To access the application please refer to Appendix A. ‘Application Guidelines’.

1.2. Why the web is not enough

Currently the web is a vast repository of data and a powerful tool for locating information, however having so much data has also become an issue, according to Lacy (2005 pp.4):

“Although the current web is truly incredible, it does not provide enough structure to support advance computer processing content.”

The webs simplicity enabled its growth; documents created using HTML (Hyper Text Mark-up Language) are easily developed with very few constraints on their syntax. This simplicity has allowed variation on a wide scale, information now comes from many sources and can be developed using different types of syntax. As discussed by Allemang and Hendler (2008, pp.1),

“Essential to notion of the web is the idea of an open community: Anyone can contribute their ideas to the whole, for anyone to see.”

A user is able to find a topic that has been analysed, summarised and presented in many different forms, the webpage may have been developed by anyone not necessarily an expert of that field. However this does not prevent the user from sorting out the information sources and finding what they require. One of the main reasons for the success of the web is the development of search engines, which usually are keyword based and search the millions of documents for the content the user has requested. However, there are problems associated with their use:

·  Low accuracy/high return

·  No relevancy

·  Results will depend on vocabulary

·  Results are often distributed

(Antoniou & Harmelen, 2008)

It is often the case that a search will return many results and the user will have to sieve through the pages to find exactly what they need. The problem is that the meaning of the web content is not machine-accessible, yes it can be located but the machine is not able to interpret the meaning. For example the user may search for clothes made by snoop dog; the search will look for the words within the sentence and match them up with HTML documents. The search may be ordered with the page that is showing the most matches, however this may be a page that consists of fashion for dogs. The result is not even linked to what the user was looking for, so the search will continue or be amended. The solution to a more satisfying experience is to make the web-content machine accessible, by providing structure to the semi-structured HTML documents.

1.3. What are Semantics?

Semantics are information (meta-data) about the meaning of represented concepts. (Lacy & Gerber, 2004, pp.265). This translation differs slightly from the one given by Hebeler et al., (2009, pp.4):

“Semantic simply means meaning”.

Although both definitions differ slightly both are still focused on the same thing, data or meaning of data. Another translation to the term Semantic Web could be the ‘web of meaning’, this is the underlying concept to the technology, give meaning to the data on the web. As humans we are able to perform reasoning with data and deduce meaning from text, so why provide meaning to something a user can understand. As outlined above the idea is to provide meaning for the machines or computers, to understand and providing structure to the web. As discussed by Hebeler et al. (2009 pp.5),

“The Semantic Web is a web of data described and linked in ways to establish context or semantics that adhere to defined grammar and language constructs.”

To be able to succeed in this, data will need to be modelled using domain specific vocabulary. The vocabulary will need to be well defined, forming a knowledge representation of a given domain. Alesso and Smith (2006, pp.6), claim that the Semantic Web is a knowledge representation of linked data, allowing machine processing on a global scale.

The term knowledge representation is closely linked to Artificial Intelligence (AI) and will be discussed further in following sections. Both definitions outline the fact that data will need to be linked and have some form of describing the meaning. Taking into consideration the quote given by Tim Berners-Lee at the beginning of this paper, one can say that the Semantic Web is not new technology but an extension of an existing technology. The concept is to build a data structure for the web to work on, translating the meaning of existing data into machine accessible code.

2. Literature Review

2.1. Knowledge representation

The term Knowledge representation has already been mentioned on several occasions, it is the area of computer science where scientists try to model human behaviour. It is closely linked with Artificial Intelligence (AI); the concept is to understand a subject area including all the facts and concepts, as well as the relations among them and the mechanisms for how to combine them to solve problems within that area (Gasevic et.al., 2009).

“Since no organism can cope with infinite diversity, one of the basic functions of all organisms is the cutting up of the environment into classifications by which non-identical stimuli can be treated as equivalent....”

(Rosch, 1978 cited in Luger, 2002 pp.197)

The goal of AI, is to design computer programs that can do things that human call ‘intelligent’ and although the study of AI is relatively new, the foundations of it is logic and can be traced back to ancient Greece; Aristotle is considered to be the father of logic (Antoniou & Harmelen, 2008). The Semantic Web depends on the ability to associate formal meaning with content. The field of knowledge representation provides a good starting point for the design of a Semantic Web language because it offers insight into the design and use of languages that attempt to formalise meaning. Knowledge representation is vital in the development of the Semantic Web, computers will need access to structured collections of information and inference rules that they can use to conduct automated reasoning (Berners-Lee et al., 2001).

Research in the field has spawned a number of knowledge representation languages, each with its own set of features. The following sections will consider a few of the languages that are relevant to this research.

2.2. First Order Logic

Logic is the foundation of knowledge representation, particularly in the form of predicate logic which is also known as first-order logic. Logic offers formal languages, for expressing knowledge and provides a well-understood formal semantics: in most logics, the meanings of sentences are defined without the need to make it operational. An important element of logic is the ability to have automated reasoners that can infer conclusions from the given knowledge:

“Note that knowledge representation and intelligent reasoning are always intertwined, both in the human mind and in AI”

(Gasevic et.al., 2009 pp.5)

Inference enables the discovery of new facts from existing fact, the following is an example of inference, suppose that all Doctors are department members, that all department members are staff members, and that Chris is a Dr. In predicate logic the information is expressed as follows:

Dr(X) → Department(X)

Department(X) → staff(X)

Dr(Chris)

Then it can be inferred that:

Department (Chris)

staff(Chris)

Dr(X) → staff(X)

So from the first three statements, it is concluded that Chris is a member of staff within a department of Doctors. This form of reasoning and inference is an area that AI has focused on for many years and similar concepts are needed for the success of the Semantic Web. However there is an issue with inference, as it introduces scalability issues because of the complexity of performing reasoning over huge amounts of distributed facts (Lacy, 2005). Logic is important in the field of Semantics; it provides a high-level language in which knowledge can be expressed in a transparent way.

A first order logic language consists of logical and non-logical symbols. The logical symbols represent quantification, implication, conjunction and disjunction; while the non-logical symbols are constants, predicates, functions, and variables. Constants are symbols that begin with a lowercase letter and variables are symbols that begin with an uppercase letter. Propositions are represented using arguments, objects of propositions and predicates are assertions about objects. Knowledge represented with first-order logic is written in the form of predicates and rules (Gasevic et.al., 2009). A predicate can be defined as: