Geographical information systems

Paper on:

Geographical Database

By

Myriam Benkirane

Supervised by: Dr. D. Kettani


Outline:

Introduction

I-  What is a Geographical Database?

II-  Data types.

III-  Spatial Models.

IV-  Spatial database types.

V-  Spatial relationships.

VI-  Spatial Database Elements.

VII-  Querying.

VIII-  Data structure and algorithms.

IX-  Spatial indexing.

X-  System design and architecture.

XI-  Tools and standards.

XII-  Geo-database in Morocco

Summary


Introduction:

A geographic information system (GIS) is a computer-based information system that enables capture, modeling, manipulation, retrieval, analysis and presentation of geographically referenced data. Hence, Data is one the fundamental component of GIS technology. One of the major problems that GIS technology was facing is data acquisition. Some of data is still on papers (maps), some are stored on files but it are unstructured and not well presented. At a first glance, relational database were used, however they required huge space and time processing. Spatial database solved the problem and allowed data to be integrated into a single, uniform and efficient data store.

What is a special database?

A standard database management system was and is still used to store, update and retrieve standard data. With the emergence of the GIS (geographic information system) technology, storage and analysis of spatial data became a necessity and requirement. Relational database system attempted to implement and manage spatial database, however storing spatial data in a standard database required an excessive amount of space and a longer time for retrieval and analysis of spatial data. A spatial database addresses these needs and provides all the features required by GIS and other spatial related applications.

In the 90s, pictorial -also called image - databases were used to support spatial data types; they store the spatial data in the form of raster images (images, pictures..., etc). Still, the growing need to deal with data as objects and not as images gave rise to the spatial databases. A spatial database is database system that includes spatial data types (for example point, line, region) and their relationships in its data model and query language. It also provides intelligent and efficient algorithms to implement spatial indexing and spatial join methods. Spatial databases are the fundamental technology for GIS and other applications.

In This paper, I am going to present the different concepts of spatial databases,

I will start by defining the data types, elements, relationships, and models, then I will consider how querying, indexing, spatial join are implemented in spatial database, finally I will describe data structures and algorithms that can be used as tools or building blocks within different system architectures.

.

Data Types:

Data can be classified into two types of data models:

Vector model: It displays graphical data as points, lines or curves, or areas with attributes. Cartesian coordinates and computational algorithms of the coordinates define points in a vector system. Lines or arcs are presented as series of ordered points whereas areas or polygons are stored as ordered lists of points. Vector data requires less computer storage space and maintaining topological relationships is easier in this system. (See figure 1).

Raster Model: A raster based system displays, locates, and stores graphical data by using a matrix or grid of cells. These data are two-dimensional; GIS store information such as forest cover, soil type, land use, or other data in different layers using the raster model. Raster data requires less processing than vector data, but it consumes more computer storage space. (See figure 2).

Spatial Models:

Modeling in GIS involves joining the spatial database to a computer-driven model of some process or procedure. A model is a representation of the real world using a series of mathematical formulas. Modeling allows either two dimensional or three dimensional spatial data to be easily manipulated and processed. Two views can be presented, either objects in space such as cities, rivers, buildings or space itself such as land partition of a country, etc.

Point

Two modeling concepts were derived from these views and are supported by the spatial DBMS (database management system); the first one is related to single object and the second one is related to spatially related collections of objects.

Single objects are represented either by a point, a line or a region. A point represents an object for which only its location in space is relevant for instance cities. A line represents the concept of moving through space or a connection in space for example roads, rivers, etc). A region represents two dimensional objects in space (country, lake). Figure 1 shows the three types of single object modeling.

Partitions and networks belong to the spatially related collections of objects. A partition can be considered as disjoint region objects on which adjacency relationship is often of concern; partitions are used to represent thematic maps. A network is an embedded graph in a plane consisting of set of points (vertices) and lines (edges) objects; it is used to represent highways, power supply lines, rivers, etc.

Euclidean geometry is used to represent the various abstractions described above. For instance a point will be given a pair of real number coordinates. This can leads to errors because processors do not use real numbers. A query for finding the coordinates corresponding to a point of intersection will return a wrong result. Approximations are often used to solve such problem and depend on how the indexing in the spatial database is implemented.

Spatial Database types

Spatial data types or spatial algebra define the abstractions point, line and region together with relationships between them. The Rose algebra is an example of the special algebra and includes the three data types: point, line, and region. Two sets are defined, the first one is EXT= {lines, regions} and GEO= {points, lines, regions}.

There are four classes of operations performed on a spatial data types:

1. Spatial predicates expressing topological relationships:

for each geo in GEO. for each ext1 , ext2 in EXT. for each area in regions area-disjoint

geo * regions -> bool inside

ext1 * ext2 -> bool intersects, meets

area* area -> bool adjacent, encloses

The first operator “inside” checks if a point, line or region is inside a region and returns a Boolean expression. The “intersects” or “meets” operation tests whether two elements of the same or different types within the set EXT intersect. Finally, the “adjacent” operator is applicable on regions belonging to a partition.

2. Operations returning atomic spatial data type values:

for each geo in GEO.

lines * lines -> points intersection

regions * regions -> regions intersection

geo* geo - > geo plus, minus

regions -> lines contour

3. Spatial operators returning numbers:

for each geo1 * geo2 in GEO.

geo1 * geo2 -> real dist

regions ->real perimeter, area

4. Spatial operations on sets of objects:

for each obj in OBJ. for each geo, geo1 , geo2 in GEO.

set(obj) * (obj -> geo) -> geo sum

set(obj) * (obj -> geo1) * geo 2 -> set(obj) closest

“Sum” is a spatial aggregate function and “closest” operator computes the minimum distance between objects.

Spatial Relationships

A major objective of a GIS is to develop spatial relationships between mapped geographic features. Possible relationships between objects are defined by the intersections between point-sets representing the geometric objects. Three classes can be defined. They are mutually exclusive and cover all possible cases.

Topological relationships: such as adjacent, inside, disjoint.

Direction relationships: for example, above, below, north_of, southwest of, etc.

Metric relationships: such as “distance < 100”.

The spatial database is a relational DBMS extended by spatial data types and spatial relationships. It can represent spatial objects such as city, road, river in addition to the usual data types (integer, string, etc).

Example of relations:

relation states (sname: STRING; area: REGION; spop: INTEGER)

relation cities (cname: STRING; center: POINT; ext: REGION; cpop: INTEGER)

relation rivers (rname: STRING; route: LINE)

How does it work?

Figure 4

Spatial data is stored using the coordinate system of a particular projection, that projection is referenced with a Spatial Reference Identification Number (SRID), this number corresponds to another table in the database with all of the spatial reference systems used (see figure 4).This allows the database to know what projection each table is in, and to re-project those tables for calculations. GIS links between spatial (point in a map) and non spatial data (tabular data) by the SRID .Every geographic feature has at least one unique means of identification: a name SRID. In other words, locational information is linked to specific information in a database.

Spatial database elements:

Entity: a phenomenon of interest in reality that is not further subdivided into phenomena of the same kind for instance a city.

Object: a digital representation of all or part of an entity. (City may be represented by a point or a region)

Entity types: similar phenomena to be stored in a database are identified as entity types. (road, river…)

Attribute: an attribute is a characteristic of an entity selected for representation.

Layers: spatial objects can be grouped into layers, also called overlays, coverage or themes

Metadata: Metadata is a summary document providing content, quality, type, creation, and spatial information about a data set. It can be stored in any format such as a text file, Extensible Markup Language (XML), or database record.

Spatial Reference System table: table in the special database where all Spatial Reference Identification Numbers are stored.

Querying:

Querying in spatial database is equivalent to querying in a standard database. It involves connecting the operations of the spatial algebra to the facilities of a DBMS query. Spatial selection and spatial join are the fundamental ones. A spatial database should provide also graphical presentation of spatial data or results of queries, and graphical input of spatial data types values used in queries.

Spatial selection: an operation that returns objects satisfying a spatial predicate with the query object.

Examples:

“All cities in Morocco”

SELECT sname FROM cities c WHERE c.center inside Morocco.area.

“All big cities no more than 100 Kms from Hagen”

SELECT cname FROM cities c WHERE dist(c.center, Hagen.center) < 100 and

c.pop > 500k

Spatial join: a join which compares any two objects through a predicate on their spatial attribute values.

Example:

“For each river, find all cities within less than 50 km.”

cities rivers join[dist(center, route) < 50]

There are also operations for the manipulation of partitions (thematic maps):

Overlay: Computes the elementary regions resulting from overlaying two partitions.

Fusion: Objects are grouped by some arbitrary attribute values.

Voronoi: theVoronoi diagram -> For each point p, the region consists of the points of the plane closer to p than to any other point in space.

Graphical representation of spatial data types values returned by some query is of great concern in a spatial database. Besides, SDBMS should be able to output graphically the combination of several queries.

I am listing here a number of agreed upon requirements for spatial querying:

- Spatial data types.

- Graphical display of query results.

- Graphical combination (overlay) of several query results.

- Display of context (e.g., show background such as a raster image (satellite image) or boundary.

- Legend should clarify the assignment of graphical representations to object classes.

Data Structure and algorithms:

An important issue in spatial database systems is the integration of the spatial algebra with database management system querying, for example representing a spatial data type should be compatible with the DBMS view, and the spatial algebra view.

The representation from a DBMS view:

- is the same as that of attribute values of other types with respect to generic operations.

- can have varying and possibly very large size.

- resides on disk and is stored in one page or a set of pages.

- can efficiently be loaded into main memory.

The representation from a spatial algebra view:

- is a value of some programming language data type, e.g. region.

- is some arbitrary data structure which is possibly quite complex.

- supports efficient computational geometry algorithms for spatial algebra operations.

In addition to the point discussed above, the spatial DBMS should support:

Approximations: stores some approximations (e.g. MBR – Minimum boundary rectangle) to speed up operations

Stored unary function values: such as perimeter or area can be stored once the object is constructed to eliminate future expensive computations.

Spatial Indexing:

The next issue is that the implementation of the operations is using computational geometry in addition the query processing access method or spatial indexing that support spatial selection and spatial join.

Spatial indexing supports all kinds of spatial queries and especially spatial selection and spatial join, spatial indexing organizes objects and space so that only part of the space will be considered to answer a query. Two methods were defined for spatial indexing, either external spatial data structure are added to the spatial DBMS, or the spatial object are mapped into one-dimensional space. Approximations are the fundamental process used by spatial indexing. There are two ways of approximations:

-  Continuous approximations: based on the minimum boundary rectangle and it is based on the coordinates of the object enclosed by the rectangle.

-  Grid approximations: Space is divided into cell and the object is represented by the set of cells that it intersects with.

Spatial indexing take place in to steps:

Filtering step: Find all the MBRs that satisfy the query.

Refinement step: For each qualified MBR, check the original object against the query.

We can conclude that spatial data structure are either stored as a set of points (point value) or a set of rectangles ( for line and region value).

These are some queries supported by points: