An Evaluation of NIJ’s Evaluation Methodology for Geographic Profiling Software
D. Kim Rossmo
Research Professor and Director
Center for Geospatial Intelligence and Investigation
Department of Criminal Justice
Texas State University
March 9, 2005
1
Executive Summary
This is a response to the National Institute of Justice’sA Methodology for Evaluating Geographic Profiling Software: Final Report (Rich & Shively, 2004). The report contains certain errors, the most critical of which involvessuggested performance measurements. Output accuracyis the single most important criterion for evaluating geographic profiling software. The report discusses various performance measures;unfortunately only one of these (hit score percentage/search cost)accurately captures how police investigations actually use geographic profiling. This response addresses the various problems associated with the other measures.
Geographic profiling evaluation methodologiesmust respect the limitations and assumptions underlying geographic profiling, and accurately measurethe actual function of a geographic profile. Geographic profiling assumes: (1) the case involves a series of at least five crimes; (2) the offender has a single stable anchor point; (3) the offender is using an appropriate hunting method; and (4) the target backcloth is reasonably uniform. Additionally, for various theoretical and methodological reasons, not all crime locations in a given series can be used in the analysis.
The most appropriate measure of geographic profiling performance is “hit score percentage/search cost.” It is the ratio of the area searched (following the geographic profiling prioritization) before the offender’s base is found, to the total hunting area; the smaller this ratio, the better the geoprofile’s focus. There are no intrinsic disadvantages to this measure.
The other evaluation measures discussed in the NIJ report are all linked to the problematic “error distance.” “Top profile area” is the ratio of the total area of the top profile region (which is not defined) to the total search area. It is not a measure by itself. “Profile error distance” is the distance from the offender’s base to the nearest point in the top profile region (undefined). “Profile accuracy” indicates whether the offender’s base is within the “top profile area” (undefined); it fundamental misrepresents the prioritization nature of geographic profiling.
“Error distance” is the distance from the offender’s actual to predicted base of operations. While it is easily applied to centrographic measures, the complex probability surfaces produced by geographic profiling software must be reduced to a single (usually the highest) point. Several researchers have unfortunatelyadopted this technique because of its simplicity. There are three major problems with error distance. First, it is linear, while the actual error is nonlinear. Area, rather than distance, is the relevant and required measure. Population (and therefore suspects) increases with area size, which is a function of the square of the radius (error distance). The second problem with error distance is that it is not a standardized measurebecauseof its sensitivity to scale. The third and most serious analytic problem with error distance is that it fails to capture how geographic profiling software actually works. Criminal hunting algorithms produce probability surfaces that outline an optimal search strategy. As anoffender’s search is rarely uniformly concentric, simplifying a geoprofile to a single point from which to base an error distance is invalid. The use of error distance ignores most of the output from geographic profiling software and undermines the very mechanics of how the process functions.
A more comprehensive approach to evaluating geographic profiling as an investigative methodology needs to consider applicability and utility, as well as performance. Applicability refers to how often geographic profiling is an appropriate investigative methodology. Utility refers to how useful or helpful geographic profiling is in a police investigation.
To evaluate geographic profiling properly requires analysing only those cases and crimesappropriate for the technique, and measuring performance by mathematically sound methods. Hit score percentage/search cost is the only measure that meets NIJ’s standard of a “fair and rigorous methodology for evaluating geographic profiling software.”
Introduction
In January 2005, the National Institute of Justice (NIJ) released A Methodology for Evaluating Geographic Profiling Software: Final Report(Rich & Shively, 2004). While the intent of this document is laudable, it is necessary to respond to certain significant errors that are contained in the report. Some of these may be the result of the advisory expert panel not including professional geographic profilers (defined as police personnel whose full-time function involves geographic profiling), “customers” of geographic profiling (police investigators), or developers of geographic profiling software. A crime analyst (trained in geographic profiling analysis for property crime) was the sole law enforcement practitioner on the advisory panel.
The most critical error in the NIJ report involves suggested performance measurements. The expert panel correctly concluded that output accuracy – “the extent to which each software application accurately predicts the offender’s ‘base of operations’” (p. 14) – is the single most important criterion for evaluating geographic profiling software (p. 15). The report discusses various performance measures, providing short definitions, advantages, and disadvantages (p. 16). Only one of these measures (hit score percentage/search cost), however, accurately captures how police investigations actually use geographic profiling. This response addresses the various problems associated with the other measures.
Background
Geographic profiling is a criminal investigative methodology that analyzes the locations of a connected crime series to determine the most probable area of offender residence. It is primarily used as a suspect prioritization and information management tool (Rossmo, 1992a, 2000). Geographic profiling was developed at SimonFraserUniversity’s School of Criminology, and first implemented in a law enforcement agency, the Vancouver Police Department, in 1995.[1]
Geographic profilingembraces a theory-based framework rooted in environmental criminology. Crime pattern(Brantingham & Brantingham, 1981, 1984, 1993), routine activity(Cohen & Felson, 1979; Felson, 2002), andrational choice (Clarke & Felson, 1993; Cornish & Clarke, 1986)theories provide the major foundations. While there are several techniques used by geographic profilers, the main tool is the Rigel software program, built in 1996 around the Criminal Geographic Targeting (CGT) algorithm developed at SFU in 1991.
After discussions in the mid-1990s with senior police executives and managers of the Vancouver Police Department (VPD) and the Royal Canadian Mounted Police (RCMP), it was concluded that several components would be necessary for the successful implementation of geographic profiling within the policing profession. These included:
- creating personnel selection, training.and testing standards;
- following mentoring and monitoring practices;
- developing usable and functional software;
- establishing case policies and procedures;
- identifying supporting investigative strategies;
- building awareness and knowledge in the customer (police investigator) community; and
- committing to evaluation, research, and improvement.
Over the course of the next few years these components were developed, first for major crime investigation, and then for property crime investigation. Personnel from various international police agencies were trained in geographic profiling. Their agencies signed memoranda of understanding agreeing to follow the established protocols, and to assist other police agencies needing investigative support. Training standards for geographic profilers were eventually adopted by the International Criminal Investigative Analysis Fellowship (ICIAF), an independent professional organization first started by the Federal Bureau of Investigation (FBI) in the 1980s.
For a geoprofile to be more than just a map,it must be integrated with specific strategies investigators can use. Examples of strategies identified for geographic profiling include: (1) suspect and tip prioritization; (2) database searches (e.g., police information systems, sex offender registries, motor vehicle registrations, etc.); (3) patrol saturation and surveillance; (4) neighborhood canvasses; (5) information mail outs; and (6) DNA dragnets. The level of resources required by these strategies is directly related to the size of the geographic area in which they are conducted.
While not used by professional geographic profilers, there are two derivative geographic profiling tools also mentioned in the NIJ report: NIJ’s ownCrimeStat JTC (journey-to-crime) module; and The University of Liverpool’sDragnet. Both of these systems were developed in 1998. Neither is a commercial product, and training in their use, beyond software instruction manuals, is currently unavailable. Little is known about a fourth geographic profiling software program, Predator, first mentioned in 1998 on Maurice Godwin’s investigative psychology website.
Geographic Profiling Evaluation
There are three methods for testing the efficacy of geographic profiling software. The first uses Monte Carlo simulation techniques. These test the expected performance of the software on various point patterns representative of serial crime sites. The major advantage of this approach is the ability to generate large numbers of data cases (e.g., 10,000). The major disadvantage is the likelihood the site generation algorithm’s underlying assumptions do not accurately reflect the geographic patterns of all serial crime cases. In addition, the additional information associated with an actual case that can help refine a geoprofile is not present.
The second and most common method of evaluating geographic profiling software performance involves examining solved cases. This technique has been used by Rossmo (1995a, 2000), Canter, Coffey, Huntley, and Missen (2000), Levine (2002), Snook, Taylor, and Bennell (2004), and Paulsen (2004). The major advantage of research using historical (cold) cases is that with sufficient effort a reasonably sized sample of cases can be collected. Disadvantages include sampling bias problems and the need for extensive data review.
The third method tracksgeographic profiling performance in unsolved criminal investigations. This approach is the best of the three as it measures actual – not simulated – performance under field conditions (Rossmo, 2001). It also serves as a blind test as the “answer” is not known at the time of the analysis. Monitoring actual case performance is slow, however, as it is necessary for a case to be solved before it can be included in the data sample.
Every trained geographic profiler is required to keep a case file that records the details of their work. The log includes fields for case number, sequential number, date, crime type, city, region, law enforcement agency, investigator, number of crimes, number of locations, type of analysis, report file name, case status, and result (when solved). This file has both administrative and research purposes. It was encouraging to see the NIJ report recommend the use of logs and journals by individuals involved in geographic profiling. However, considering how much there is to learn with any new police technology (especially in regards to investigator utility versus software performance), it seems more prudent for all users, and not just a sample, to keep detailed records.
Geographic Profiling Evaluation Methodology
Evaluation Premises
NIJ’s purpose was “to develop a fair and rigorous methodology for evaluating geographic profiling software” (p. 4), and their report identifies law enforcement officials as the key audience for the evaluation. With this in mind, thefollowing premises are used as the basis for the discussion in this response.
Anygeographic profiling evaluation methodology should:
- follow the limitations and assumptions underlying geographic profiling;
- analyze exactly what the geographic profiling software produces, and not a simplification or generalization of its output;
- measure, as accurately as reasonably possible, the actual function of a geographic profile;
- use the highest level measurements possible (i.e., ratio/interval/ordinal/nominal); and
- be based on validity and reliability concerns (and not on tangential factors such as “it is easier,” “it has been done that way before,” or “the software has limitations”).
It is tempting, in the effort to increase a study’s sample size, to collect cases from large databases derived from records management systems (RMS). However, if the details of the crimes are overlooked, inappropriate series will be included: GIGO – garbage in, garbage out. Wilpen Gorr, Michael Maltz, and John Markovic have warned us of the importance of data integrity and specificity issues. “You really need to know the capacities and limitations of this less then perfect [crime] data before you dump it into a model” (John Markovic, International Association of Chiefs of Police, NIJ CrimeMap listserve, January 31, 2005).
To prepare a geographic profile properly involves first making sure the case does not violate any underlying assumptions. Furthermore, only those crime locations in the series that meet certain criteria can be used in the analysis. This is one of the reasons why a geoprofile requires anywhere from half a day for a property crime case to up to two weeks for a serial murder case. A significant portion of the geographic profiling training program is spent learning to understand these issues so the methodology is not improperly applied. These complexities are why testing, monitoring and mentoring, and review exist.
Geographic Profiling Assumptions
Any algorithm or mathematical function is only a model of the real world. The appropriateness and applicability of weather forecasting techniques, multiple linear regression, the spatial mean, or horserace odds are all premised on various assumptions. If those assumptions are violated, or if the processes of interest are not accurately replicated, the model has little value. Using atheoretical algorithms for police problems is tantamount to fast food crime analysis.
There are fourmajor theoretical and methodological assumptions required for geographic profiling (Rossmo, 2000):
- The case involves a series of at least five crimes,committed by the same offender. The series should be relatively complete, and any missing crimes should not be spatially biased(such as might occur with a non-reporting police jurisdiction).
- The offender has a single stable anchor point[2]over the time period of the crimes.
- The offender is using an appropriate hunting method.
- The target backcloth is reasonably uniform.
Geographic profiling is fundamentally a probabilistic form of point pattern analysis. Every additional point (i.e., offense location) in a crime series adds information, and results in greater precision. A minimum of five crime locations is necessary for stable pattern detection and an acceptable level of investigative focus; the mean in operational cases has been 14 (Rossmo, 2000, 2001). Monte Carlo testing shows with only three crimes the expected hit score percentage (defined below) is approximately 25%. By comparison, the expected search area drops to 5% with 10 crimes.[3] The resolution of any method will be poor if tested on series of only a few crimes.
The NIJ evaluation methodology recommends analyzing cases with as low as three crimes in the series. While there may be some research interest in studying performance for small-number crime series, the report is supposed to lay out guidelines for evaluation methodologies. The document seems at times to be confused as to its role. Research and evaluation are separate processes. At a minimum, any research results should be reported separately, and it should be made clear they do not represent operational geographic profiling performance.
For evaluation purposes, it is inappropriate to include cases that fall outside the recommended operational parameters. As small-number crime series are easier to obtain than large-number crime series, there is a risk they will inappropriately drive the findings. For example, the distribution for the number of crimes in Paulsen’s (2004) analysis was heavily skewed to small-number series. Of 150 cases, only 37 (25%) meet the minimum specified requirement in geographic profiling of 5 crime locations – and 22 of those were on the borderline (6-7 crimes). Only 15 cases (10%) involved more than 7 crimes.
If the offender is nomadic or transient then there may not be a residence to locate. If the offender is constantly moving residence, then multiple anchor points could be involved in a single crime series, confusing the analysis, and possibly resulting in a violation of the first assumption. It should also be remembered that what constitutes a residence for a street criminal might vary from middle-class expectations. Two geoprofiled burglary cases illustrate this point. In the first, the “home” for a group of transient gypsies was a motel where they temporarily stayed while they committed their crimes. In the second, the homeless offender’s base was a bush in a vacant lot where he slept at night. It is important in geographic profiling to consider the details of the case, the timing of the crimes, and the nature of the area where the peak geoprofile is located. Like all investigative tools, it should be used intelligently. See the discussion below regarding applicability, performance, and utility, in the section General Methodological Comments.
Offender hunting method is defined as the search for, and attack on, a victim or target (Rossmo, 1997, 2000). Geographic profiling is inappropriate for certain search and attack methods. The residence of a poacher (an offender commuting into an area to commit crimes) by definition will not be within the hunting area of the crimes (though he may be using some other anchor point, such as his workplace or a “fishing hole”). The NIJ methodology suggests that “commuters” be included in any evaluations. How that is to be done, however, is not made clear, as the report acknowledges CrimeStat and Dragnet are unable to handle this type of offender (nor can Rigel, as this is an assumption violation). Gorr (2004) presents some interesting and useful ideas for expanding geographic profiling systems to such cases and these should be explored (perhaps with the addition of a directionality component). As discussed above, it is important to distinguish research from evaluation, and report the results of each separately.
While most burglars identify targets during their routine activities in areas of familiarity, others watch for news of estate sales or use accomplices who read luggage nametags at airports. Stalkers(offenders who do not attack victims upon encounter) are also problematic. In one case example, the offenders in a series of armed robberies of elderly victims in Los Angeles went to hospitals and shopping malls, selected suitable victims, and then followed them home where the robbery occurred. In this situation,the victims were choosing the crime sites – not the offenders. A geographic profile based on the robbery locations would therefore be wrong. Instead, the victim encounter sites (the shopping malls and hospitals) should be used because these are the locations the robbers had control over.