1. Query language overview
1.1 What is a query language?
A query is an interesting question about data. A query language is a language that specifies a query to a database in declarative manner.
1.2 Natural language and formal language
Natural language: human language like English, German, or Japanese. Its usage may change (disappear or appear, or modified) with time. Computer cannot interpret it directly
Formal language: language that computer can interpret. There is no ambiguity in meaning (semantics). The grammar (syntax) does not change. programming and query languages are formal language.
1.2 Structured Query Language (SQL)
SQL is a standard query language that supports the relational logical model such as primary, foreign key.
Majority of DBMS support SQL (MS Access, Oracle, MySQL)
GIS software uses SQL to interact with DBMS behind the screen when doing a query.
2. Relational algebra
2.1 RA properties
Algebra: mathematical expressions consisting of two distinct sets - operands and operations.
Relational algebra: a formal query language associated with relational data model. Relations are placed in operands instead of numbers, or symbols used in normal algebra, and the results are also relations (closure). Closure: the result of operations is the same type of objects. This allows to create nested operations that consist of a small and simple sets of operations.
2.2 RA operations
a) Select σ: Retrieve a subset of tuples (rows) of a relation. A single relation is used.
b) Project π: Retrieve a subset of field (columns) for all rows in a relation. A single relation is involved.
Set operation
c) Union: Requires two relations that are union compatible. A U B returns all tuples that are in either of relations
d) Difference: Requires two relations that are union compatible. A – B returns all tuples in relation A that are not in the relation B
e) Intersection: Requires two relations that are union compatible. A ∩ B returns all tuples that are in both relations.
f) Cross-product: does not require union compatible and many relations can be involved. If you have n rows in relation A, and m rows in relation B, you will get n x m rows in resultant relation
Union compatible: two relations are union compatible if they have the same number of columns, and the same domain and the columns are in the same order from left to right.
g) Join
Logical linking tuples across multiple relations. This is not primitive RA operation. Combine selection, projection, and cross product operation.
Conditional joins: Conduct cross product to form all the possible combinations of tuples from tables and then selection of tuples that meets conditions.
Natural joins (equi- joins): use only condition of equality in selection.
3. SQL
Declarative query language, meaning that user tells the DBMS what you want to know, rather than how it is calculated. The algorithms to answer the question are not required.
c) SQL components
Data define language (DDL)
To create, update (Alter) and delete (Drop) the relation or other elements of database such as views.
Data manipulation Language (DML): To populate and edit table content, and to query the content of a database
Data control language (DCL): to control the access control, permission
3.2. SQL DDL
a) Create Table
CREATE TABLE <table name>
("column 1" "data_type_for_column_1",
"column 2" "data_type_for_column_2",
...
Primary Key <attribute>, ... <attribute>
Foreign Key <attribute>)
So, if we are to create the customer table specified as above, we may type in
CREATE TABLE customer
(First_Name char(50),
Last_Name char(50),
Address char(50),
City char(50),
Country char(25),
Birth_Date date
Primary Key (First_Name, Last_Name))
Create view (View simplify complicated nested queries)
CREATE VIEW <Name> AS
SELECT
FROM
WHERE
b) Alter Table: Change relation schema
c) Drop Table: Delete an empty table
DROP TABLE <table name>
So, if we wanted to drop the table called customer that we created
DROP TABLE customer
DML:
a) DML commands for populating and editing
- Insert: Populate the tables
INSERT INTO <table>(<attribute>, ...<attribute>)
VALUES(<value>, ....<value>)
- Update: Change values within a specified row
- Delete: delete rows
DELETE FROM <table>
WHERE<condition>
b) Basic Select Format: to query data in one or more tables
Feature: consists of many clause. return a table. Allow nested query (difficult to understand).
SELECT <columns>
FROM <Relations>
WHERE <condition>
Order By <columns> : to specify the sorting columns for result
Group By <columns> : to group together rows based on the column for statistical calculation
Having <condition> : limit the result of GROUP BY result
Note: If the limitation is based on individual attribute value, the condition is specified in WHERE clause. If the limitation is based on the group statistics calculated in Group By clause, condition is written in Having clause.
3. SQL
3.1. General information
a) SQL & relational databases
b) SQL3 & abstract data types
- Operators and functions
Arithmetic
Comparison
= (equal), < (less than), > (greater than), >= (greater than or equal), <= (greater than or equal), > (not equal),
Logical
AND
OR
Combination of AND and OR (AND is computed first then OR, or use parent hes)
BETWEEN ... AND... (specify the range of numeric values)
LIKE
IN (matches any values in the given list)
Set
ALL
Tests whether all rows in the result set of a subquery meet the specified condition.
ANY
Tests whether one or more rows in the result set of a subquery meet the specified condition.
Statistical
SUM, COUNT, MIN, MAX, AVE
Aggregate statistics
Count:
Count (*): count the number of the rows
Count (distinct <column>): evaluate how many different attributes there are.
Join (Equi-join, Multi-table join):
Order of join optimizes the query but not affect the result
Rule 1: Joined tables are listed in FROM clause.
Rule 2: Attributes used to join tables are declared and matched in WHERE clause.
4. Spatial queries
4.1. Strategies for storing spatial data in relational tables
Background: SQL language is limited to atomic data type
a) Fully normalized
Maintain the functionality and properties of traditional RDBM and fit the spatial data into relational schema. The problem is that data redundancy occurs.
b) Georelational model:
Separate spatial data and attribute data into different data model, such as Arc/Info
No ability to use RDBM functionality such as query, concurrency control etc.
c) BLOB:
Binary large object is a data type used in a RDBMS that consists of binary data type. Spatial data type could use this data type; however, spatial operation is difficult to perform since tedious decoding is necessary to convert back to spatial data, in another word, SQL cannot handle blob formed data.
d) Abstract data types:
This can be integrated into ORDBMS. SQL3 can handle this data type.
4.2. Extending SQL for spatial data
a) OGIS data model: industry-wide standard of geometry data types, spatial operation, currently supported by major DBMS vendors.
- Basic geometry
4 sub-classes: Point, curve, surface and geometryCollection
Conceptual versus substantiated entities
- Operators
Three categories
1. Basic operation that is applied to all geometry types
Envelope(),
2. Test regarding topological relationships, return true or false
3. Spatial data analysis (area, distance, buffer, convexHull, etc)
Spatial analysis queries
Area (geometry)
Distance (geometry1, geometry2)
Topological queries
Touch
Cross
Combined spatial analysis & topological operations
Buffer & overlap
Aggregate spatial queries
Nested spatial query
Exercise 1. Express the following query in RA
a) Find all countries whose GDP is greater than $500 but less than $1000.
π Name (σ GDP > 500 ∩ GDP < 1000 (Country))
b)List the life expectancy in countries that have rivers originating in them
C = Country
R = River
π Name, Life-Exp (σ Co. Name = R. Origin (C x R))
c)Find all cities that are either in South America or whose population is less than two million
Ci = π Name (σ Pop < 2 (City))
Co = π Name (σ Country. Cont = SAM (Country x City))
U = Ci U Co
d)List all cities which are not in South America
Co = π Name (σ Con ≠ SAM (Country))
Ci = π Name, Country (City)
U = π Ci. Name (σ Ci. Country = Co. Name (Co x Ci))
Exercise 2 Express in SQL
a) Find all countries whose GDP is greater than $500 but less than $1000.
SELECT Name
FROM Country
WHERE GEP Between 500 AND 1000
b) List the life expectancy in countries that have rivers originating in them
SELECT C. Life-Exp
FROM Country C, River R
WHERE C. Name = R. Origin
c) Find all cities that are either in South America or whose population is less than two million
SELECT Ci. Name
FROM Country Co, City Ci
WHERE Co. Name = Ci. Country AND (Co. Cont = SAM OR Ci. Pop < 2)
d) List all cities which are not in South America
SELECT Ci. Name
FROM Country Co, City Ci
WHERE Co. Name = Ci. Country AND Co. Cont > SAM
Exercise 3 Express in SQL
a) Count the number of countries whose population is less than 100 million
SELECT Count (*)
FROM Country
WHERE Pop < 100
b) Find the country in North America with the smallest GDP. Use nested query
SELECT Name
FROM Country
WHERE Count = NAM AND GDP =< ALL (SELECT GDP
FROM Country
WHERE Count = NAM)
c) List all countries that are in North America or whose capital cities have a population of less than 5 millions.
SELECT Name
FROM Country Co, City Ci
WHERE Co. Name = Ci. Country
AND (CO. Count = NAM
OR (CI. Capital = ‘Y’ AND Ci. Pop < 5)
d) Find the country with the second highest GDP.
SELECT Name
FROM Country
WHERE GDP > Max (GDP)
AND GDP >= ALL (SELECT GDP
FROM Country
WHERE GDP > Max (GDP))
Exercise 9. Express the following query in SQL, using OGIS extended data type and functions
a) List all cities in the City table which are within five thousand miles of Washington D.C
SELECT Ci. Name
FROM City Ci, City Ci1
WHERE Overlap (Ci. Point, Buffer (Ci1. Point, 5000)) = 1
AND Ci1. Name = 'Washington D.C.'
b) What is the length of Rio Paranas in Argentina and Brazil?
SELECT Co. Name, Length (intersection (R. Shape, Co. Shape)) As “Length”
FROM River R, Country Co
WHERE Cross (R. Shape, Co. Shape) = 1
AND (Co. Name = 'Argentina' OR Co. Name = 'Brazil')
AND R. Name = 'Rio Paranas'
c) Do Argentina and Brazil share a border?
SELECT Count (Co1. Name)
FROM Country Co1, Country Co2
WHERE = Touch (Co1. Shape, Co2. Shape) = 1
AND Co2. Name = 'Brazil' AND Co1. Name = ‘Argentina’
d) List the countries that lie completely south of the equator.
SELECT Co. Name
FROM Country Co
WHERE =
Exercise 10. Transform the following query into SQL, using OGIS specified data type and operations
Give the schema.
RIVER (NAME: char, FLOOD-PLAIN: polygon, GEOMETRY: line string)
ROAD (ID: char, NAME: char, TYPE: char, GEOMETRY: line string)
FOREST (NAME: char, GEOMETRY: polygon)
LAND-PARCELS (ID: integer, GEOMETRY: polygon, county: char)
a) Name all the rivers that cross Itasca State Forest
SELECT R. Name
FROM River R, Forest F
WHERE Cross (R. Geometry, F. Geometry) = 1
AND F. Name = Itasca State Forest'
b) Name all the tar roads that intersect Itasca State Forest
SELECT F. Name
FROM Forest F, Road R
WHERE Cross (R. Geometry, F. Geometry) = 1
AND F. Name = Itasca State Forest' AND R. Type = ‘tar’
c) All roads which stretches within the floodplain of the river Montana are susceptible to flooding. Identify all these roads
SELECT Ro. Name
FROM River Ri, Road Ro
WHERE Cross (Ro. Geometry, Ri. FLOOD-PLAIN) = 1
AND Ri. Name = ‘Montana'
d) No urban development is allowed within two miles of the Red River and five miles of the Big Tree Park. Identify the land parcels and the county they are in that cannot be developed.
SELECT LP. County, LP. Geometry
FROM Land-Parcels LP, River R
WHERE (Within (LP. Geometry, Buffer (R. Geometry, 2)) = 1
AND R. Name = ‘Red River’)
OR (Within (LP. Geometry, Buffer (R. Geometry, 5)) = 1
AND R. Name = ‘Big Tree Park’)
e) A river defines part of boundary of a county
CREATE VIEW County AS
SELECT *
FROM Land-parcels
GROUP BY County
SELECT R. Name
FROM River R, County C
WHERE touch (R. Geometry ,C. Geometry)= 1
Exercise 18. For state park example, outline SQL DDL statements to create relevant tables using OGIS spatial data type.
CREATE TABLE Forest(
Name Varchar (35)
Shape Polygon)
CREATE TABLE River(
Name Varchar (35)
Shape LineString)
CREATE TABLE Road(
Name Varchar (35)
Num_of_Lane Integer
Shape LineString)
CREATE TABLE Facility(
Name Varchar (35)
Shape Point)
CREATE TABLE Fire-Station(
Name Varchar (35)
Shape Point)
CREATE TABLE Forest-Stand(
Stand-Id Integer
Species Varchar (35)
Shape Polygon)
CREATE TABLE Manager(
Name Varchar (35)
Gender Varchar (5)
Age Integer)