1. Query language overview

1.1 What is a query language?

A query is an interesting question about data. A query language is a language that specifies a query to a database in declarative manner.

1.2 Natural language and formal language

Natural language: human language like English, German, or Japanese. Its usage may change (disappear or appear, or modified) with time. Computer cannot interpret it directly

Formal language: language that computer can interpret. There is no ambiguity in meaning (semantics). The grammar (syntax) does not change. programming and query languages are formal language.

1.2 Structured Query Language (SQL)

SQL is a standard query language that supports the relational logical model such as primary, foreign key.

Majority of DBMS support SQL (MS Access, Oracle, MySQL)

GIS software uses SQL to interact with DBMS behind the screen when doing a query.

2. Relational algebra

2.1 RA properties

Algebra: mathematical expressions consisting of two distinct sets - operands and operations.

Relational algebra: a formal query language associated with relational data model. Relations are placed in operands instead of numbers, or symbols used in normal algebra, and the results are also relations (closure). Closure: the result of operations is the same type of objects. This allows to create nested operations that consist of a small and simple sets of operations.

2.2 RA operations

a) Select σ: Retrieve a subset of tuples (rows) of a relation. A single relation is used.

b) Project π: Retrieve a subset of field (columns) for all rows in a relation. A single relation is involved.

Set operation

c) Union: Requires two relations that are union compatible. A U B returns all tuples that are in either of relations

d) Difference: Requires two relations that are union compatible. A – B returns all tuples in relation A that are not in the relation B

e) Intersection: Requires two relations that are union compatible. A ∩ B returns all tuples that are in both relations.

f) Cross-product: does not require union compatible and many relations can be involved. If you have n rows in relation A, and m rows in relation B, you will get n x m rows in resultant relation

Union compatible: two relations are union compatible if they have the same number of columns, and the same domain and the columns are in the same order from left to right.

g) Join

Logical linking tuples across multiple relations. This is not primitive RA operation. Combine selection, projection, and cross product operation.

Conditional joins: Conduct cross product to form all the possible combinations of tuples from tables and then selection of tuples that meets conditions.

Natural joins (equi- joins): use only condition of equality in selection.

3. SQL

Declarative query language, meaning that user tells the DBMS what you want to know, rather than how it is calculated. The algorithms to answer the question are not required.

c) SQL components

Data define language (DDL)

To create, update (Alter) and delete (Drop) the relation or other elements of database such as views.

Data manipulation Language (DML): To populate and edit table content, and to query the content of a database

Data control language (DCL): to control the access control, permission

3.2. SQL DDL

a) Create Table

CREATE TABLE <table name>
("column 1" "data_type_for_column_1",
"column 2" "data_type_for_column_2",
...

Primary Key <attribute>, ... <attribute>

Foreign Key <attribute>)

So, if we are to create the customer table specified as above, we may type in

CREATE TABLE customer
(First_Name char(50),
Last_Name char(50),
Address char(50),
City char(50),
Country char(25),
Birth_Date date

Primary Key (First_Name, Last_Name))

Create view (View simplify complicated nested queries)

CREATE VIEW <Name> AS
SELECT
FROM
WHERE

b) Alter Table: Change relation schema

c) Drop Table: Delete an empty table

DROP TABLE <table name>

So, if we wanted to drop the table called customer that we created

DROP TABLE customer

DML:

a) DML commands for populating and editing

- Insert: Populate the tables

INSERT INTO <table>(<attribute>, ...<attribute>)

VALUES(<value>, ....<value>)

- Update: Change values within a specified row

- Delete: delete rows

DELETE FROM <table>

WHERE<condition>

b) Basic Select Format: to query data in one or more tables

Feature: consists of many clause. return a table. Allow nested query (difficult to understand).

SELECT <columns>

FROM <Relations>

WHERE <condition>

Order By <columns> : to specify the sorting columns for result

Group By <columns> : to group together rows based on the column for statistical calculation

Having <condition> : limit the result of GROUP BY result

Note: If the limitation is based on individual attribute value, the condition is specified in WHERE clause. If the limitation is based on the group statistics calculated in Group By clause, condition is written in Having clause.

3. SQL

3.1. General information

a) SQL & relational databases

b) SQL3 & abstract data types

- Operators and functions

Arithmetic

Comparison

= (equal), < (less than), > (greater than), >= (greater than or equal), <= (greater than or equal), > (not equal),

Logical

AND

OR

Combination of AND and OR (AND is computed first then OR, or use parent hes)

BETWEEN ... AND... (specify the range of numeric values)

LIKE

IN (matches any values in the given list)

Set

ALL

Tests whether all rows in the result set of a subquery meet the specified condition.

ANY

Tests whether one or more rows in the result set of a subquery meet the specified condition.

Statistical

SUM, COUNT, MIN, MAX, AVE

Aggregate statistics

Count:

Count (*): count the number of the rows

Count (distinct <column>): evaluate how many different attributes there are.

Join (Equi-join, Multi-table join):

Order of join optimizes the query but not affect the result

Rule 1: Joined tables are listed in FROM clause.

Rule 2: Attributes used to join tables are declared and matched in WHERE clause.

4. Spatial queries

4.1. Strategies for storing spatial data in relational tables

Background: SQL language is limited to atomic data type

a) Fully normalized

Maintain the functionality and properties of traditional RDBM and fit the spatial data into relational schema. The problem is that data redundancy occurs.

b) Georelational model:

Separate spatial data and attribute data into different data model, such as Arc/Info

No ability to use RDBM functionality such as query, concurrency control etc.

c) BLOB:

Binary large object is a data type used in a RDBMS that consists of binary data type. Spatial data type could use this data type; however, spatial operation is difficult to perform since tedious decoding is necessary to convert back to spatial data, in another word, SQL cannot handle blob formed data.

d) Abstract data types:

This can be integrated into ORDBMS. SQL3 can handle this data type.

4.2. Extending SQL for spatial data

a) OGIS data model: industry-wide standard of geometry data types, spatial operation, currently supported by major DBMS vendors.

- Basic geometry

4 sub-classes: Point, curve, surface and geometryCollection

Conceptual versus substantiated entities

- Operators

Three categories

1. Basic operation that is applied to all geometry types

Envelope(),

2. Test regarding topological relationships, return true or false

3. Spatial data analysis (area, distance, buffer, convexHull, etc)

Spatial analysis queries

Area (geometry)

Distance (geometry1, geometry2)

Topological queries

Touch

Cross

Combined spatial analysis & topological operations

Buffer & overlap

Aggregate spatial queries

Nested spatial query

Exercise 1. Express the following query in RA

a) Find all countries whose GDP is greater than $500 but less than $1000.

π Name (σ GDP > 500 ∩ GDP < 1000 (Country))

b)List the life expectancy in countries that have rivers originating in them

C = Country

R = River

π Name, Life-Exp (σ Co. Name = R. Origin (C x R))

c)Find all cities that are either in South America or whose population is less than two million

Ci = π Name (σ Pop < 2 (City))

Co = π Name (σ Country. Cont = SAM (Country x City))

U = Ci U Co

d)List all cities which are not in South America

Co = π Name (σ Con ≠ SAM (Country))

Ci = π Name, Country (City)

U = π Ci. Name (σ Ci. Country = Co. Name (Co x Ci))

Exercise 2 Express in SQL

a) Find all countries whose GDP is greater than $500 but less than $1000.

SELECT Name

FROM Country

WHERE GEP Between 500 AND 1000

b) List the life expectancy in countries that have rivers originating in them

SELECT C. Life-Exp

FROM Country C, River R

WHERE C. Name = R. Origin

c) Find all cities that are either in South America or whose population is less than two million

SELECT Ci. Name

FROM Country Co, City Ci

WHERE Co. Name = Ci. Country AND (Co. Cont = SAM OR Ci. Pop < 2)

d) List all cities which are not in South America

SELECT Ci. Name

FROM Country Co, City Ci

WHERE Co. Name = Ci. Country AND Co. Cont > SAM

Exercise 3 Express in SQL

a) Count the number of countries whose population is less than 100 million

SELECT Count (*)

FROM Country

WHERE Pop < 100

b) Find the country in North America with the smallest GDP. Use nested query

SELECT Name

FROM Country

WHERE Count = NAM AND GDP =< ALL (SELECT GDP

FROM Country

WHERE Count = NAM)

c) List all countries that are in North America or whose capital cities have a population of less than 5 millions.

SELECT Name

FROM Country Co, City Ci

WHERE Co. Name = Ci. Country

AND (CO. Count = NAM

OR (CI. Capital = ‘Y’ AND Ci. Pop < 5)

d) Find the country with the second highest GDP.

SELECT Name

FROM Country

WHERE GDP > Max (GDP)

AND GDP >= ALL (SELECT GDP

FROM Country

WHERE GDP > Max (GDP))

Exercise 9. Express the following query in SQL, using OGIS extended data type and functions

a) List all cities in the City table which are within five thousand miles of Washington D.C

SELECT Ci. Name

FROM City Ci, City Ci1

WHERE Overlap (Ci. Point, Buffer (Ci1. Point, 5000)) = 1

AND Ci1. Name = 'Washington D.C.'

b) What is the length of Rio Paranas in Argentina and Brazil?

SELECT Co. Name, Length (intersection (R. Shape, Co. Shape)) As “Length”

FROM River R, Country Co

WHERE Cross (R. Shape, Co. Shape) = 1

AND (Co. Name = 'Argentina' OR Co. Name = 'Brazil')

AND R. Name = 'Rio Paranas'

c) Do Argentina and Brazil share a border?

SELECT Count (Co1. Name)

FROM Country Co1, Country Co2

WHERE = Touch (Co1. Shape, Co2. Shape) = 1

AND Co2. Name = 'Brazil' AND Co1. Name = ‘Argentina’

d) List the countries that lie completely south of the equator.

SELECT Co. Name

FROM Country Co

WHERE =

Exercise 10. Transform the following query into SQL, using OGIS specified data type and operations

Give the schema.

RIVER (NAME: char, FLOOD-PLAIN: polygon, GEOMETRY: line string)

ROAD (ID: char, NAME: char, TYPE: char, GEOMETRY: line string)

FOREST (NAME: char, GEOMETRY: polygon)

LAND-PARCELS (ID: integer, GEOMETRY: polygon, county: char)

a) Name all the rivers that cross Itasca State Forest

SELECT R. Name

FROM River R, Forest F

WHERE Cross (R. Geometry, F. Geometry) = 1

AND F. Name = Itasca State Forest'

b) Name all the tar roads that intersect Itasca State Forest

SELECT F. Name

FROM Forest F, Road R

WHERE Cross (R. Geometry, F. Geometry) = 1

AND F. Name = Itasca State Forest' AND R. Type = ‘tar’

c) All roads which stretches within the floodplain of the river Montana are susceptible to flooding. Identify all these roads

SELECT Ro. Name

FROM River Ri, Road Ro

WHERE Cross (Ro. Geometry, Ri. FLOOD-PLAIN) = 1

AND Ri. Name = ‘Montana'

d) No urban development is allowed within two miles of the Red River and five miles of the Big Tree Park. Identify the land parcels and the county they are in that cannot be developed.

SELECT LP. County, LP. Geometry

FROM Land-Parcels LP, River R

WHERE (Within (LP. Geometry, Buffer (R. Geometry, 2)) = 1

AND R. Name = ‘Red River’)

OR (Within (LP. Geometry, Buffer (R. Geometry, 5)) = 1

AND R. Name = ‘Big Tree Park’)

e) A river defines part of boundary of a county

CREATE VIEW County AS

SELECT *

FROM Land-parcels

GROUP BY County

SELECT R. Name

FROM River R, County C

WHERE touch (R. Geometry ,C. Geometry)= 1

Exercise 18. For state park example, outline SQL DDL statements to create relevant tables using OGIS spatial data type.

CREATE TABLE Forest(

Name Varchar (35)

Shape Polygon)

CREATE TABLE River(

Name Varchar (35)

Shape LineString)

CREATE TABLE Road(

Name Varchar (35)

Num_of_Lane Integer

Shape LineString)

CREATE TABLE Facility(

Name Varchar (35)

Shape Point)

CREATE TABLE Fire-Station(

Name Varchar (35)

Shape Point)

CREATE TABLE Forest-Stand(

Stand-Id Integer

Species Varchar (35)

Shape Polygon)

CREATE TABLE Manager(

Name Varchar (35)

Gender Varchar (5)

Age Integer)