Descriptive Statistics (Point Pattern Analysis)

Chris Schulze

Problem Set #8

1)Definitions:

Point Pattern Analysis–This method involves the ability to describe patterns of locations of point events and test whether there is a significant occurrence of clustering of points in a particular area.

Quadrant Count Method- This method involves simply recording and counting the number ofevents that occur in each quadrant.

Kernel Density (K means) - This method counts the incidents in an area, centered at the location where the estimate is made. This analysis is a partitioning technique, where points are partitioned into a number of different clusters.

Nearest Neighbor Distance - This method measures the distance from one point to the nearest neighbor point. There are three commonly used functions that use Nearest Neighbor Analyses:

G Function: The simplest measure and is similar to the mean. The G function examines the cumulative frequency distribution of the nearest neighbor distances. The shape of this function can tell us about the way the events are clustered in a point pattern.

F Function: This measure selects point locations anywhere in the study region at random. Then the minimum distance from them to any event in the pattern is determined.

K Function: Imagine placing circles of a defined radius centered on the event. The number of events inside the circle’s radius is then totaled, and the mean count for all of the incidents is totaled. The mean count is then divided by the overall study area. Typically, the K function provides more information about patterns and clusters then either G or F.

2)Name, define, and provide a graphic example of the three basic types of spatial point patters?

1.Clustered – Point features are concentrated on one or a few relatively small areas and form groups.

2.Scattered or Uniform – Point features which are regularly spaced.

3.Random – The arrangement is not clustered or uniform.

3)Why would we want to use Point Pattern Analysis?

1.To determine if events are exhibiting specific pattern over study area or are they randomly distributed.

2.To estimate the density of how the point pattern distributed over the study area.

3.To determine if there is spatial dependence among events and create models to explain observed patterns.

4)Determine the dispersal classification for the following data set using the Quadrant Count Analysis method.

First divide the data set into quadrants (4 quadrants would be appropriate).

Next, calculate the Mean of the sample:

Mean = Number of points in the region = 20 = 5

Number of quadrants 4

Calculate the Variance:

Variance = ∑xi2 – [(∑xi2)2/n] = 22 + 52 + 62 + 72 – (202-4)

______= 4.5

n – 1 4 – 1

Calculate the Variance to Mean Ratio (VTMR):

VTMR = Variance = 4.5 = 0.9

Mean 5

Using the Quadrant Count Analysis method, If VTMR<1, the pattern is considered regularly dispersed. Since our VTMR is less than 1, the example is regularly dispersed.

5)Given the following dataset, use the Kernel Density method describe the patterns of the given point events.

Kernel Density (K-Means)

Sample Data set.

Point # / X-ray value / Yellow value
Point #1 / 1.1 / 60
Point #2 / 8.2 / 20
Point #3 / 4.2 / 35
Point #4 / 1.5 / 21
Point #5 / 7.6 / 15
Point #6 / 2 / 55
Point #7 / 3.9 / 39

Graph of the points, provides visualization of potential groups.

Yellow Value / 60
*
50
40 / *
*
30
20 / * / *
*
10
0
0 / 2 / 4 / 6 / 8 / 10
X-ray Value

Next determine the max distance between points to determine how many cluster will be used.

Based on the graph, it appears that 4 clusters would be appropriate.

Thus, the four clusters chosen are:

Cluster # / X-ray Value / Yellow Value
C1 / 1.1 / 60
C2 / 8.2 / 20
C3 / 4.2 / 35
C4 / 1.5 / 21

Point 1 is close to point 6. Both can be taken as one cluster (called C1|6).

X for C1|6 is (1.1 + 2.0)/2 = 1.55 and Y for C1|6 is (60 + 55)/2 = 57.50

Point 2 is close to point 5. Both can be taken as one cluster (called C2|5).

X for C2|5 is (8.2 + 7.6)/2 = 7.9 and Y for C2|5 is (20 + 15)/2 = 17.50

Point 3 is close to point 7. Both can be taken as one cluster (called C3|7).

X for C3|7 is (4.2 + 3.9)/2 = 4.05 and Y for C3|7 is (35 + 39)/2 = 37

Point 4 is not close to any point, and is called C4.

X for C4 as 1.5 and Y for C4 is 21

Four clusters have been obtained.

Cluster # / X-ray Value / Yellow Value
C1|6 / 1.55 / 57.5
C2|5 / 7.9 / 17.5
C3|7 / 4.05 / 37
C4 / 1.5 / 21

6)Given Describe the process for conducting a Nearest Neighbor Analysis on a data set.

The spatial distribution of many features can be seen as a pattern of points on a map. Such a distribution of points can be classified into one of three types of distribution: Random, Clustered, or Regular. Nearest neighbor analysis provides a way to identify spatial distributions by comparing the average observed distance between features and their nearest neighbor to an expected or theoretical average distance between nearest-neighbor points in a distribution generated by a random process.

Using maps or points, define the distribution of the features of interest as a pattern of points.

Select a study area within the data set.

Measure the distance to each feature’s nearest neighbor.

Calculate the mean of the nearest-neighbor distances (dm).

Calculate the density (p = N / A) of points within the study area – the number of points (N) divided by the area (A).

Compare the observed mean nearest-neighbor distance (dm) to the expected values for the various types of distributions. The expected values for the various types of distribution are dependent upon the density of points (p) within the study area.

Random - the expected mean nearest-neighbor distance is given by:

dr = 1 / (2 • p½)

Clustered - The expected mean neatest-neighbor distance will be 0.

Uniform - The expected mean nearest-neighbor distance is given by:

du = 1.0745 / p½

The nearest-neighbor index (R) is calculated by using:

R = dm / dr

If R ≈ 0, then distribution is clustered

If R ≈ 1, then distribution is random

If R ≈ 2.15, then distribution is uniform

Chris Schulze