A Stratified Traffic accident analysis

Case Study: City of Richardson

Tope Bello

December 2005

Abstract

This research performs a stratified analysis to check if school age kids are involved in traffic accidents more around schools. For this purpose, data on accident locations, school locations, accident victims, and roads are vital to either support the above statement or reject it.Geographic Information Science and spatial statistics techniques were used to analyze the data for this research.Conclusions were drawn using visualization and exploratory statistical techniques. The following techniques were used in this research, Geocoding to transform existing tabular data to spatial data, Bivariate K-Function to identify clusters of traffic accidents around schools, kernel densities to check the density distribution of traffic accidents, and spatial queries to select events based on specific criteria.

Keywords: Bivariate K-Function, Kernel densities, spatial queries, Geocoding,

Introduction

It is easy to imagine a society without crime, disasters, accidents and violence. Then there will be no need to worry about our safety in the environment.But retrospectively safety is a critical part of our day to day lives. Most people are conscious of there environment and can prepare and respond quickly to avoid hazardous events. The ability of kids to respond and evade these hazards is limited because of their limited mental and physical abilities Markus M, Schopf, J. (2001).

The occurrences of traffic accidents are rare and random in space and time. Geographers perform research to identify these patterns based on a certain hypothesis, which also reveals previously unsuspected patterns leading to the formulation of other theories.A method used in examining spatial patterns of disease, the Bivariate K-function, is used for this research.

This research paper focuses on identifying the spatial patterns of traffic accidents to school age kids compared to other traffic accidents in the city, using kernel densities.Spatial queries were used to test if slower speed prevented accidents.And finally, the Bivariate K-function was used to test for clustering of traffic accidents around school. It is expected that traffic accidents to school age kids will occur more around school, than any where else in the city.

Problems Statement

Currently traffic accident data for the city of Richardsonis managed by two departments, the police department and the traffic and transportation department. The two departments have different policies and different goals to achieve with the accident data. This contributes to the variation in data available for this research. The data from the police department has been geocoded and is ready for analysis. There is information that cannot be released by the police department that is very useful to this research, like age of victims. The data from the traffic and transportation department data is in Microsoft access format. It is not a spatial data, but it has variables that were concatenated to achieve a spatial dataset. It also has variables that are most important for this research, the age of the victims.This research paper answers the following questions:

Do traffic accidents happen to school age kid around schools more than elsewhere?

Do the slower speeds in school speed zones prevent accidents to school age children?

Does the temporal and spatial pattern of accidents to school age children differ from patterns for all accidents?

An assumption can be made that accidents to kids happen more around school than other locations in a city. Conclusions can’t be drawn till the data available is fully explored.

Research methodology

The processes were divided into different steps and are structured in the flow chart below. There were three main categories namely: Data preparation, analysis controls, and data analysis.

Data preparation involved Geocoding, SQL queries and point editing. The analysis controls were used for visualizing and exploring the events. The data created for the two categories of accidents was used to check for variations in occurrence of accidents within the city, the two different categories of victims – school age kid’s accidents and other accidents.

The ability of ArcGIS to mine and manipulate data was valuable in exploring the data available for answers to the research question. Spatial queries were used to check if slower speeds in school speed zones reduce accidents to school age children, and also to check occurrence of traffic accidents to children at varying distances from schools.

The ArcGIS spatial analyst extension was used to generate kernel densities. The kernels were used to explore if the temporal and spatial pattern of accidents to school age children differ patterns for all accidents.

Kernel density can be defined as an estimator ofthe density of events across a study area based on a point pattern (White et al 2000).

Bivariate K-Function analysis tests if traffic accidents occur to school age kids more around schools than elsewhere.

The Bivariate K-Function was done using R. R is free statistic software that performs spatial statistics. It requires specific package downloads for specific task, for spatial statistics SPLANCS (Spatial Point Pattern Analysis Code) package was used. The bivariate K- function is defined as the expected number of points of pattern 1 within a distance D of an arbitrary point of pattern 2, divided by the overall density of the points in pattern 1.

Literature Review

A major concern for police departments is the presence of traffic accidents around a particular location, meaning a higher risk of being involved in traffic accidents at that location. Identifying these locations will be very valuable information for the police department. Traffic accident events can be represented as points on a geographic region, making it synonymous to disease occurrences. Various techniques can be used for detecting clusters of events, ranging from visual inspection of maps to full Bayesian method analyzed (Gomez-Rubio et al. 2005)

John Snow’s 1854 study of cholera outbreak in London was the beginning of spatial epidemiology. The technique used here was simply visual inspection of the distribution of the events on a map. He was able to conclude the source of the outbreak was the public water pump situated on Broad Street (Snow J 1854).This technique is used in this research to identify patterns of traffic accidents, to identify schools as a source for traffic accidents to school age kids. Over the years different techniques have been developed to achieve more sophisticated results.

Scan statistics for identifying the concentration of events in space have been modified over time by Openshaw et al. 1987, Besag and Newell 1991, Kulldorf and Nagarwalla 1995. Among the various scan statistics, kernel densities is used to generate kernel maps that shows the density of point or line features using a fixed radius that goes over the entire study area. This technique was used for this research, to explore the variation in patterns of traffic accidents to school kids and other accidents.

Bryn Austin et al. 2005 had a research on clustering of fast food restaurants around schools. It used 400 m radius and 800 meter radius buffer around schools to assess proximity to schools. The bivariate K-function was used to quantify the degree of clustering of fast food restaurants around schools. This technique confirms the findings from the initial visual inspection of the maps for cluster patterns of traffic accidents around schools.

Bailey and Gatrell 1995 explained the applicability of K- function analysis to identify a point source for larynx cancer and lung cancer. A case was also stated on elevated risk of respiratory disease along busy main roads.

Andrew Jones et al. 1996 also used the K-function analysis for the geographic distribution of road accidents in Norfolk, England.

Peter Spooner et al 2004 developed a new technique for analyzing distribution of points on a network, called network K-function. This explores the interaction between points on a network.

We have moved from traffic accident research to disease mapping, and finally to proximity to fast food. What these all have in common is that these events are spatial. Therefore they can be modeled for spatial analysis.

Geographic Information Sciences is very useful in analyzing spatial patterns GIS is defined as a set of tools for collecting, storing, retrieving at will, transforming and displaying spatial data from the real world (Burrough, P. 1986). Rowlingson and Diggle noted that GIS lacks the basic statistical functionality. Meaning there is limit to the statistical analysis that can be performed by a GIS. However other software has been developed to allow geographers perform this complex statistic functions, in a GIS like environment.This research uses a combination of Geographic Information Systems and Spatial statistics to answer the research questions.

Data

For the purpose of this research, data on accident locations, school locations, accident victims, and roads are vital. Base map data for this research was provided by the GIS department at the city of Richardson. The traffic accidents data were provided by the police department and the traffic department.

The data layers used as base map are roads, school, and city limits. The traffic accidents data from the police department is used by the crime analyst at the City of Richardson.

The accident data from the traffic department is used for annual traffic accident reports. The data had their peculiarities, which wereexamined to achieve appropriate data for analysis.

Police Department Accident Data / Traffic Department Accident Data
  • Data was geocoded
  • Data was updated weekly
  • Restrictions apply to usages of data
  • Data was not detailed
  • Data was spatially accurate
/
  • Data was not Geocoded
  • Data was not in a spatial format
  • Data was in a structured database format
  • Data was detailed ( it contained age of victims)
  • No Restrictions applied to the usages of the data for this research
  • Data not up to date
  • Accuracy depended on the final preparation of the data.

Hence, the traffic accident data from the traffic department was used for this research,because it contained information on age of victims. Age of the victims is very critical information needed to answer the research question.

Geocoding

The data was converted from a standard database format to a d-base (dbf) file format supported by the GIS software used for the processes. The location of the accident was recorded, using streets and intersections. To be able to perform spatial analysis a physical location has to be established for each accident. SQL query was used to concatenate the fields with the streets and intersection to generate a field called address.

An address locator was created in Arc Catalogusing the roads file provided by the GIS department. The d-base file was added in arc map for geocoding. The geocoding statistics showed the following:

Matched with score 80 – 100: 49 (0%)

Matched with score <80: 1236 (11%)

Unmatched 9598 (88%)

Matched with candidates tie 875 (8%)

Most of the unmatched records were because of misspelling and abbreviated names. So the unmatched records were matched interactively. The final geocoding statistics showed the following:

Matched with score 80 – 100: 53 (0%)

Matched with score <80: 9460 (87%)

Unmatched 1370 (13%)

Matched with candidates tie 875 (8%)

The unmatched records were mostly incomplete address fields without an intersection. The records that were not matched are not included in this research.

Queriesand Data Editing

Traffic accidents to school age kids (TASK) are defined as accidents that involved children between the age 5 and 15 years, accidents that happened during the daytime and weekdays. The break point is set at 15 years because this research focuses on victims and not drivers and according to law a 16 year old can possess a driver’s license which disqualifies that age group from being potential victims in my analysis.

SQL queries were used to extract TASK.

Not all accidents happened at intersections. This information is also available in the data provided. The direction, miles and feet away from the intersection were recorded during data entry. These were valuable information used to move each accident point using the editor tool in Arc GIS to the exact location.

602 TASK and 8402 other accidents between 2000 and 2004 were used for further analysis.

School Speed Zones

Data on the school speed zones were required to identify accidents that happened in these zones. The data was not available so it was created for this research. There were two options for preparing this data, using a GPS unit or using digital orthogonal photos. The digital orthogonal photos used by the city of Richardson for pictometry is a 6 inch resolution photo taken in 2005. This saved a lot of time and energy and also provided the same results that would have been generated using a GPS unit. The locations identified on the photos were exported as points, and used to create line features that represent the speed zones.

Results

Spatial Queries

Spatial queries were used to identify traffic accidents involving school age kids that happened within the speed zones. 30 TASK fell within the school speed zones. The exact times these accidents occurred helped in identifying the accidents that happened during speed zone flash times. A total of 18 accidents happened in the speed zones during these flash times. RichardsonheightsElementary school, Classical magnet school, Math/ science/ technology magnet school and Richardson high school all account for a total of five accidents in their speed zones during flash times.

These results show that accidents happen within school speed zones. The capability of GIS techniques to check if these events were random or if there was a trend in the occurrence of these accidents are limited. But the occurrence of TASK was checked at varying distances from schools using ArcGIS.

Distance / 150 ft / 250 ft / 500 ft / 1000 ft / 1500 ft
No of TASK / 0 (0%) / 8 (1.3%) / 22 (3.6%) / 61 (10.1%) / 150 (24.9%)

It discovered a pattern where TASK clustered at larger distances around schools, but it didn’t address the issue of spatial randomness.

Kernel Densities

Kernel density was used to identify the distribution of TASK and other accidents to see if there isvariation in patterns. The kernels were generated using ArcGIS spatial analyst extension. A uniform cell size of 100 feet was used to generate all kernels, giving a basis for comparison. The selection of bandwidths for the kernel posed a challenge, and various bandwidths were used for this analysis. A 2000 feet radius was used to generate a kernel density of the accidents. 3000 feet and 5000 feet radiuses were also used to generate kernel densities. The legend values show the densities of the events per cell. It is important to note that manual classification was used for the kernels. It was classified into seven classes with the following distribution in each class - 10%, 20%, 30%, 40%, 50%, 75%, and 100%. This enabled comparison of the kernels generated.

The kernels showed varying patterns in occurrence between the two categories. TASK clustered more along the Beltline road corridor (east to west), while other accidents clustered along North central expressway (north to south).

After exploring the maps,the most appropriate bandwidth that represents a heterogeneous TASK distribution is 3000 feet. The result generated from the kernels for TASK was normalized using other accidents as a control. The 3000feet TASK kernel was normalized using the 5000feet kernel for other accidents. A kernel ratio was generated to establish areas where TASK happens more within the city. This was done using the raster calculator of ArcGIS spatial analyst. The equation {TASK – Other Accident} / {TASK + Other Accidents} was used to generate the normalized map.

No global clustering pattern of TASK around schools was discovered. But the map shows a localized clustering for TASK around AldridgeElementary school. The clustering around Richardson Terrace Elementary and Mark Twain Elementary can not be considered as danger spots for TASK because the overall pattern of all TASK shows a concentration along the corridor where these schools are, Belt line Road. Further research can explore theseobscure patterns.

Bivariate K-Function

To validate the pattern and check for any significant clustering of TASK around schools, bivariate K-function analysis was performed using R. R is a free statistical package used for spatial statistics. The SPLANCS (Spatial Point Pattern Analysis Code) package in R performs many sophisticated spatial statistical analysis. The bivariate K-function models spatial relationships between two variables. In this case we are trying to validate if accidents happen to school age kids around school than else where. Schools being point pattern A and TASK being point pattern B. if clustering of TASK around school exists then we will expect a plot showing high incidences at short distances and a decline in occurrences at larger distance.

Fig 1