[Slide – Title..]
Introduction to Geographic Information Systems – Distance Learning Version
Lecture 4 Script
Introduction: Hello and welcome to Lecture four of the distance learning version of “Introduction to Geographic Information Systems.” By this time, you should be so excited about coming to our regularly scheduled sessions that you can scarcely contain yourselves. I’m going to work hard to channel this exuberance and excitement. There are some advantages to living in an enchanted virtual world. In any event, this lecture will be important because some of the data analysis operations discussed here, you will use in completing your semester project. (Next Slide Please)
[Slide – Intro...]
This lecture will deal with turning data into information. Up to this point we’ve been obsessed with the data, both spatial and attribute. This is the first time we are going to talk about actually performing operations on the data to solve problems and answer “what if” questions. To do that, we are going to have to get into some detail and actually use some mathematical expressions.
The first item on the agenda is how to measure distances in the vector and raster worlds. (Next Slide Please)
Then, we’ll look at attribute queries, which we’ve seen before, but now we’ll see how to form multiple questions in one query. (Next Slide Please)
Next, we’ll see how to answer the question, “How close is it?” by performing a proximity analysis. (Next Slide Please)
If we actually overlay one layer or map over another mathematically, we can answer questions and produce new maps that contain the characteristics of the original maps. For instance, from our Ski Resort example, if we overlay the layer containing the meteorological stations with a land use map of forest and non-forest areas, we produce a map which tells us which meteorological stations are located in the forests, and non- forest areas. (Next Slide Please)
And finally, the analysis of surfaces, like digital terrain models, or DTMS, and networks will be discussed. As usual, we start out with definitions. (Next Slide Please)
[Slide – Database Terminology …]
Take a moment to look at the table. We’ve seen the top six before, but it doesn’t hurt to review them. (Pause) Let’s pay particular attention to the terms in the last two rows. A function or operation is a data analysis procedure performed by a GIS but most importantly, formulated by YOU. Again, let me remind you that you will need to use data analysis techniques to complete your semester project.
And lastly, an algorithm is a series of steps to solve a problem. It’s a road map or flow chart that shows how the problem will be solved. Here again, this has important implications related to your project. Let’s look at several GIS operations, starting with the most fundamental one – measurements. (Next Slide Please)
[Slide – Measurements...]
We need to be able to measure lengths of lines, and perimeters and areas of regions. The methods we use will be dependent on whether the layer is a vector or raster layer. (Next Slide Please)
[Slide –Vector GIS Measurements...]
Here we see that in vector space, we use the Pythagorean Theorem to solve for the length of lines. The perimeter of an area is determined by adding the length of the lines forming the polygon for the area. Remember when a topology table was added to vector data? Now, to calculate the perimeter of a polygon, we just need to add up the lengths of the lines that make up the perimeter of the polygon. To calculate areas, triangles are either added or subtracted to form and calculate the area. I’m sure you’ve seen this stuff before in math or physics. (Next Slide Please)
[Slide – Raster GIS Measurements …]
In raster space, the unit of measure is the length of the side of a cell. In the case of LandSAT Satellite images, the length of a cell is 30m. Then, calculating the length of a straight line from the corner of one cell to another is given by the Pythagorean Theorem with the distances calculated by the number of cells times the dimension of a cell. There is another way to define distance – what is called a Manhattan distance - so named because of the way you have to go from point A to B in a city by walking along block boundaries. There’s still another way to measure distance – called proximity distance. In this scheme, concentric circles radiating from point A are produced – in this case at equal intervals of the units of the cell. Any point like point B can be interpolated exactly from the contours of equal distance.
As for perimeter and area – for perimeter we just count up the number of sides in the perimeter and multiply by the length of a cell side. For area, we count the number of cells and then multiply by the area of a cell, which is the length of the side squared. That’s about it for measurements – pretty straight forward. (Next Slide Please)
[Slide – Queries....]
We’ve seen queries before. Remember the SQL language? - The Standard Query Language? We saw that we could use the SQL to look at the database and retrieve data. Queries can also answer the questions “How many?” and “Where are they?” We’ll add the power of SQL by looking at the notion of writing several queries in one expression. To do that, we need to look at Boolean Operators. The idea of using Boolean Operators is most applicable to vector models since the database is organized with many attributes stored in tables, unlike a raster model. (Next Slide Please)
[Slide –Boolean Operators...]
To study Boolean Operators, let’s return to our Ski Resort example. Let’s let A be the set of Luxury Hotels and B be the set of all hotels with more than 20 rooms. Using Boolean Operators, four questions can be posed. (Next Slide Please)
[Slide – Boolean Operators Continued...]
Here are the four questions that can be asked and answered using Boolean Operators, which appear in red font. Let’s read these then diagram them to see them better.
- Which Hotels are Luxury and have more than 20 bedrooms?
- Which Hotels are Luxury or have more that 20 bedrooms?
- Which Hotels are Luxury but do not have 20 or more bedrooms?
- Which Hotels are either Luxury or have more that 20 bedrooms, but not both?
So we see that there are four Boolean Operators: And, or, not and xnot.
Next Slide Please)
[Slide – Boolean Operators – Venn Diagrams ...]
These are the Venn diagrams from the four statements on the previous slide.
1. Which Hotels are Luxury and have more than 20 bedrooms?
Looking at the AND operator, the cross hatched area represents that sub set that satisfies both criteria. The expression written in SQL is “Hotel”=’Luxury’ AND ‘Bedrooms’ >20.
2.Which Hotels are Luxury or have more that 20 bedrooms?
If we look at the OR operator, then all Hotels meet the criteria.
3.Which Hotels are Luxury but do not have 20 or more bedrooms?
For the NOT operator, the cross hatched area represents all Hotels that are luxury but do not have more than 20 bedrooms.
4.Which Hotels are either Luxury or have more that 20 bedrooms, but not both?
The XOR operator is the inverse of the AND operator. The cross hatched area represents Hotels that are either luxury with less than 20 bedrooms, or other hotels with more than 20 bedrooms. At first glance this stuff looks complicated, but it’s really very logical and helpful in forming complicated queries. But, we have to be careful using Boolean Operators, because we can get drastically different results if we use an AND operator instead of an OR. Queries in Raster space can take on different forms. (Next Slide Please)
[Slide – Queries...]
We can perform a query in a raster domain using a process called reclassification. Reclassification can produce a Boolean image (basically ones and zeros or black and white). Let’s look at an example – in a land use image, one of the classes of land use is forest. If we wish to produce a layer with forest only areas, we can reclassify the land use image. Keep in mind; we could do this by writing a query. (Next Slide Please)
[Slide – BloomfieldLand Use..]
This is a raster land use map of Bloomfield, CT. Note the large patches of green, which are denoting the forests. We can reclassify all the cell values that represent the forest cells to one, and then reclassify all the non-forest cells to zero. When this is done, we obtain a Boolean map or a map that has two attributes - ones or zeros. (Next Slide Please)
[Slide – BloomfieldLand Use Only Forest...]
This is the Boolean image formed by the reclassification. We can easily see now where the forested areas are. It’s clear that reclassification produces the same final product as writing a query except that it may have taken several steps to produce this Boolean image, where reclassification accomplished this in one step. It is usual practice to show a Boolean image in black and white. (Next Slide Please)
[Slide – Proximity Analysis ...]
Now we get to ask the question – How close are things?” The most common procedure that is used to answer this question is Buffering. Buffering is a GIS operation that creates a zone of interest around an entity or set of entities.
(Next Slide Please)
[Slide – Buffer Zones …]
Buffering is an easy process to visualize, but a lot more difficult for the software to execute. We can see possible buffer zones around points, lines and areas. Buffers can be simple areas around a point or line, or a common border around an area seen on the left, to a more complicated buffer zone as seen on figures on the right. Buffers by themselves can be interesting, but their real value involves combining them with other layers in a process called overlaying which we’ll talk about in a few minutes. As an example of the use of buffering, suppose that we are studying the location of radioactive waste disposal sites. Two of the criteria are that the sites must be located more than three km from a railway, and be situated on clay soil because clay is resistant to liquid penetration and can be used as a natural penetration barrier, in case of a leak. The first step is to create a buffer zone around the railways, which is easily done in a GIS. (Next Slide Please)
[Slide – Buffer Zones – 3 km...]
Here the buffer zones are drawn in blue around the rail lines in red. Note the blending of the buffers in areas of overlap. This blending is also supplied by the GIS software. We’ll see these buffer zones used later in the waste disposal example. In raster images, buffers are created by the proximity process, which was presented in the measurement section. Recall that proximity distances radiate from a point and produce contours of constant distance from the point in question. If there are several points to buffer, the software will again blend the proximity zones where they overlap.(Next Slide Please)
[Slide – Proximity Map for Hotels in Ski Resort. …]
If we want to see how close the individual Hotels are in the Ski Resort, we can perform a proximity analysis and create a surface called a distance surface. In this image, the darker the color, the closer the cell is to the Hotel, which is represented by a white cell. Note how the software blended the image as the distance surfaces for each Hotel overlap. If we want to see a particular buffer zone of say 125 km, then we can reclassify the distance surface image making all distance less than 125 km = 1 and given the white color, and all other cells assigned black in a conventional Boolean image. (Next Slide Please)
[Slide – Overlay Operations …]
Perhaps one of the most import data analysis techniques is called overlaying. In GIS, overlaying has two meanings. The first you’ve already seen in ArcView when you drew one layer on top of another. In this lecture, we will take the second definition, which is of course related to data analysis – that is, a GIS operation that combines information from two layers into a new layer. Again, we’ll look at vector and raster overlay operations separately.
(Next Slide Please)
[Slide – Vector Overly Operations..]
An important condition for the overlay process to work is that all layers must be topologically correct. In other words all polygons have to close, all lines must meet at points, and so forth.
When the lines and polygons from the original layers are overlayed, new lines and polygons are formed. The software has to then assign attributes to the new entities, after it actually forms the new entitles. This process is governed by the laws of geometry, and takes some computational power. There are three types of overlay operations that are possible in a vector layer. (Next Slide Please)
[Slide – Vector Overlay Types…]
The three types are point in polygon, line in polygon and polygon in polygon. We’ll look at each one in detail, again using our Ski Resort example. (Next Slide Please)
[Slide – Point in Polygon ..]
If we want to find out which meteorological stations are located in the forest, and which are not, we can overlay the point layer of the Meteorological stations on the polygon forest layer. This produces a new point layer in which the points now have new attributes that describe whether or not they are in the forest. The line in polygon case is a little more complicated. (Next Slide Please)
[Slide – Line in Polygon..]
Now, if we want to know which roads, or parts of roads are in the forest, we can overlay a line layer on a polygon forest layer. The result is a new line layer with the line segment labeled with whether or not they are in the forest. Note the complication here – the new layer has new line segments created by the overlay process. The original line 1 now has two segments. The new numbered segments 1 and 2 are part of the original line with 1 in the forest and 2 in the non-forest. When we go to polygon in polygon things get interesting. (Next Slide Please)
[Slide – Polygon in Polygon..]
The polygon in polygon process resembles the Boolean operator method used in the query procedure.
Staying with our resort example, if we want to know which new areas are within the forest/non forest map or within the resort map, we would use the OR Boolean operation. In this case all of the new polygons fall into these categories. In GIS operations, this is called a UNION.
Next we look at the question “Where are the areas of forest and non forest inside the resort boundary. In Boolean, this would be resort NOT forest/non forest. In GIS talk, this is an IDENTITY process. The question it answers is “Where in the resort, are the forest and non forest areas?”
And finally, we would use the AND Boolean operator to answer the question “Within the resort boundary, where is the forested area?” In GIS language, this is called an INTERSECT process. (Next Slide Please)
[Slide – Vector Overlay Rail Buffer Zone and Clay Geology..]
Remember the buffer zone example for the radioactive waste problem? We saw the buffer zones drawn around the rail lines. Now if we overlay the rail buffer zone with the clay geology layer using an INTERSECT process, we get the clay areas in yellow that are within the rail line buffers. Any site within the yellow areas is not acceptable. Conversely, clay areas not in yellow are acceptable. (Next Slide Please)
[Slide – Little Grey Cells Quiz]
OK Here’s our “Little Gray Cells” Quiz. I may ask some of you for your answers in the chat session, so you might want to jot down some notes. I’ll give you a few moments now to come up with some answers. (Pause here)
Do Not Read
A raster image is made up of cells. T or F
Which Boolean operator will allow both input layers to exist simultaneously?
The creation of a zone of interest around an entity, or set of entities is called an overlay.
T or F
(Next Slide Please)
[Slide – Break]
OK - Let’s take a short break to stretch, wake up, or whatever.
(30 to 60 second break) (Next Slide Please)
[Slide – Raster Overlay Operations…]
In many ways, overlay operations in raster space are much easier to understand than their vector counterparts. We have to keep a couple of things in mind: in raster images, points, lines and areas are all represented by groups of cells; it is also important to understand the kind of data in the cells, such as - is the data format ratio, interval, and so forth.?
To overlay raster layers, we can use the math operators plus, minus, multiply and divide. This process has a name – map algebra. To study go-no go problems, sometimes Boolean images are used. Let’s go through the raster equivalent of the vector overlay operations such as point-in-polygon. We’ll see that it doesn’t make much difference in rasters if we overlay points, lines or polygons in polygon because the points, lines and polygons are just groups of cells. (Next Slide Please)