12

Training Effectiveness Readiness Levels (TERLs)

Abstract

Understanding the maturity of training technologies is challenging given that the outcome of such technologies is not measured in technical capabilities but in learning outcomes (i.e., its training effectiveness). This means that regardless of the technical prowess of a training technology, it is relatively unproven until it demonstrates producing the desired learning results. It is hard to objectively gauge and compare the training effectiveness claims of training technology because there are many different approaches to assess training effectiveness. Often the most robust measures of training effectiveness are not utilized due to their resource and cost requirements. Training Effectiveness Readiness Levels (TERL’s) aid in eliminating this ambiguity by providing a standard benchmark of the level of scrutiny by which a training system has been evaluated and demonstrated to produce training. TERL’s encompass a standardized 9-level scale that defines the levels of progressive scrutiny by which a training system can be assessed.

INTRODUCTION

Training expenditures are cyclical in nature and follow the general patterns in the economy, yet the use and adoption of training technologies in general has been reaching new heights and now accounts for over $1.5 billion globally in 2012 (Ambient Insight, 2013). At the same time, other domains which traditionally had not benefited from simulation training (e.g., medicine, ground military forces) have increased their use of simulation-based training. This has been in particular as a reaction to emerging challenges such as economic pressures and reduction in resources that demand greater flexibility and efficiencies from training technology (c.g., Bell, Kanar & Kozlowski, 2008). Particularly simulation technologies have presented themselves as capable means to address the flexibility and experiential learning needs of emerging training challenges (Bell & Kozlowski, 2007). While the adoption of different types of training technologies continues to increase a major challenge phase by all these expenditures is the lack of training impact assessments; in particular the limited ability to quantify the benefits of such training (e.g., Government Accountability Office, 2013). Assessment of a training system is paramount given that the value added by such a system lies in its ability to produce learning that an individual can then utilize in an operational environment. Without such assessment this value or its risk are unknown; in the same manner in which a system promises positive training results it may unknowingly produce negative training results which could be catastrophic once a student returns to the operational environment. Unfortunately assessing and quantifying the impact of any training is not trivial due to a variety of challenges that range from technical (e.g., variety of theories, limited skillsets in evaluation methodology) to logistical (e.g., lack of support from stakeholders, cost and complexity of evaluations) (Phillips, 2010). Often for these reasons training assessment is relegated as either an afterthought or conducted with the least resource consuming methods (Champney, et al. 2008; Carnevale & Shultz, 1990; Eseryel, 2002; Bassi & van Buren, 1999; Thompson, Koon, Woodwell, & Beauvais, 2002). In addition given the nature of the training construct (i.e., something that is learned that is retained and applied later in an operational environment, Pennington, Nicolich, & Rahm, 1995; Thorndike & Woodworth, 1901) it is possible to assess different elements of this process such as students perception (e.g., self efficacy), learned content, transferred of learning, etc.; all of which may be labeled under training effectiveness evaluation (TEE). In some instances a system’s technical capabilities are utilized as proof of its training adequacy or effectiveness. This results in systems that are evaluated using a wide range of methods and levels of scrutiny such that results are not comparable across systems nor meaningful unless one understand the method and criteria used to conduct the evaluation.

In order to address this challenge it is necessary to have a framework that objectively defines the parameters that govern the level of scrutiny and validity of different approaches to assess training effectiveness. The Training Effectiveness Readiness Levels (TERL) seek to address this by providing a framework that defines a progressive scale of training assessment scrutiny.

Background

Readiness Level Scales

The use of Readiness Level (RL) scales is not a new concept. There are multiple kinds currently in use that justify this approach. These RL scales have proven to help demonstrate the maturity of scientific research and products and ideas to consumers, sponsors, and industry as a whole. Groups using unique RL systems include NASA (Mankins, 1995), the Department of Defense (2010), and the Federal Aviation Administration (Krois & Rehmann, 2005). Each RL role has specific definitions relevant to the unique needs of their respective fields but they all based on the NASA’s Technology Readiness Level (TRL) scale and are modifications of those stages. NASA TRLs can be described as being a systematic measurement system to assess a technology’s maturity and to serve as a point of comparison between the maturity of different types of technology (Mankins, 1995). Its measures are focused on specific kinds of technologies which limit it from being directly applied to other industries, but it does serve as a foundation for defines an adequate RL scale.

Table 1 NASA Technology Readiness Levels (Mankins, 1995)

TERL / Description
9 / Actual system “flight proven” through successful mission operations.
8 / Actual system completed and “flight qualified” through test and demonstration (ground or space).
7 / System prototype demonstration in a space environment.
6 / System/subsystem model or prototype demonstration in a relevant environment (ground or space).
5 / Component and/or breadboard validation in relevant environment.
4 / Component and/or breadboard validation in laboratory environment.
3 / Analytical and experimental critical function and/or characteristic proof-of-concept.
2 / Technology concept and/or application formulated.
1 / Basic principles observed and reported.

Other groups have modified NASA’s TRL scale to address their individual needs. For instance the Federal Aviation Administration’s (FAA) TRLs provide a model for research, development, and implementation of flight technology that defines a phased approach outlining what is required of both the Research and Development (R&D) organizations and the FAA. (Krois & Rehmann, 2005, c.f. Free Flight Research Program Plan [FAA, 2000] and Air Traffic Management Research and Technology Development [FAA, 2002]). This RL model is distinguished because it focuses on exit criteria and acquisition so that the research organization can be phased out and the FAA can focus on internal refinements in the higher RL levels (e.g., TRLs 7 through 9, where integration issues, and adaptation requirements are pursued). Outside of the more technology focused area, other approaches have taken the RL concept to address specific issues such as Human Factors. This is because, while a technology may have matured within a TRL scale it may still prove to be unusable by a human operator. The Human Factor Readiness Level scale (HFRL) (Hale, Fuchs, Carpenter, & Stanney, 2011) provides a means to standardize a Human Factors (HF) readiness assessment that can be used by decision makes in a wide range of positions. The primary goal of the HFRL is to provide a way of assessing the quality of a project’s HF aspects, and helping to solve related issues by improving HF considerations, optimizing HF R&D resources, and making sure there are no gaps in the HF R&D process. The HFRLs are measured with respect to 24 HF study areas identified in Krois and Rehmann (2005) and the HFRL stages are based on a modified aggregate of several other TRL systems.

Using a similar approach as the HFRLs the TERL scale focuses on identifying the training maturity of a training system by utilizing the level of scrutiny utilized to assess its training efficacy. In order to understand the makeup of the TERL scale it is necessary to review the elements that impact the training effectiveness of training simulation technology and training in general. These are discussed next

Simulation Training

Simulations, in general terms, are representations of an environment which may be real or fictional for the purposes of recreating an experience (Bell, Kanar & Kozlowski, 2008). Simulation training derives their capabilities from the situated learning theory (Bossard & Kermarrec, 2006). This theory proposes that learning occurs as a function of three components: activity, context, and culture (Lave & Wenger, 1990). This implies that its training efficacy stems from its ability to recreate experiences similar to those experienced in the real operational environment that are critical for learning (Stanney, Hale, & Cohn, 2012). In training, simulation technology is utilized to serve as a surrogate environment intended to replace a specific operational environment in order to minimize risks, costs, resource use and to maximize opportunities for learning and rehearsal. Simulation training encompasses the use of virtual reality, serious games, interactive media, predictive models, and imitation activities (Page & Smith, 1998). A simulation that is used for training will model a task’s critical sensory information over a duration of time to replicate instructionally relevant aspects of the task being trained. This allows for experience to be gained by applying critical task skills and reacting to relevant cues. It allows a student to both practice and be tested for the proper application of procedural, psycho-motor, knowledge based tasks, and more. The Federal Aviation Administration (2008) has stringent guidelines about the use of simulation for a task, requiring the capabilities to replicate relevant sensory cues to the task being trained.
Simulation is used to train novices and experts alike in mastering a wide range of skills (Alessi 1988). In the context of training, simulation allows repeated practice in new scenarios where conditions can be controlled and situations created that allow students to apply their knowledge. This experience allows for students to gain competency prior to performing the tasks in the real world (Aldrich, 2009). Simulation is also used to track performance, helping instructors predict the capabilities of a student and gauge areas where they may need further instruction or practice. Simulation also gives instructors the ability to add new levels of difficulty to tasks they already know well, or practice at tasks they don’t experience very often.
The use of simulation has many benefits. It can save money by allowing novices to practice prior to actually performing the real tasks, which in many cases are much more expensive than the simulation (Aldrich, 2009). For example, commercial motor vehicle operators may make use of simple and relatively inexpensive desktop simulators to get a feel for how to make turns in a truck. This allows them to practice backing up into loading docks, and making turns while being in a controlled environment where they can view the truck from any angle. This is much safer and more cost effective than really driving around a large vehicle for practice.
Another major benefit to simulation training is the ability to train for hazardous or emergency situations that would be too dangerous and costly to train in the real environment. For example, an aviation student practicing the emergency water landing of a fully loaded passenger airline is dangerous and will not be practiced in a real aircraft. In a simulation however, the risk to life and property is eliminated and the remaining risks are such things as simulator sickness and potential negative training, the first one which can be solved by merely ending a simulation and the later one by careful design and evaluation of the trainer. With the elimination of these risks students can be presented with difficult and unusual circumstances, learning to mitigate a disaster. Students can repeat the emergency simulations and instructors can replay scenes to point out mistakes until the student masters the skills necessary to handle these situations. For instance simulation training for emergency procedures has shown to improve risk mitigation when emergencies occur in the real tasks (DeVita, Schaefer, Wang, & Dongilli, 2005). Since emergency situations cannot be readily experiences, simulation plays a critical role in making their mastery part of standard training.

Simulation Fidelity

A discussed above, a key element that makes simulation training so effective is its ability to replicate the necessary experimental cues available in an operational environment. Their ability to reconstruct conditions repeatedly that are similar to those in the operational environment is at the core of their value proposition and face validity. Thus, how well a training system replicates the operational experience is believed to be a good precursor to enabling experiential learning. This level of realism of a simulation is referred to as its fidelity. Nonetheless absolute and exact replication of an operational environment is not entirely necessary. Early efforts to achieve higher training performance sought the use of the “identical element” principle. This principle stated that the simulated task environment should be designed to have as many elements in common with operational task environment (Thorndike, 1906). This principle alluded to the necessity for physical fidelity still in use today yet if followed blindly may result in overly complex and expensive simulations (McCauley, 2006; Singley & Anderson, 1989). As a result a “deep structure” principle was adopted, which sought to replicate the environmental behaviors (i.e., functional fidelity; Allen, Hays, & Buffordi ,1986; Gick & Holyoak, 1987; Lehman, Lempert, & Nisbett, 1988). This further served as a precursor to psychological fidelity where the focus is to ensure that students perceive and act within the training environment as they would in the operational environment (Kozlowski, DeShon, Schifflet, Elliot, Salas & Coover, 2004).

The different types of fidelities may be thought of as being composed of a number of ‘cues’ that make up the experience (i.e., experiential cues). These cues may be organized into sensory, psychological and functional cues. Sensory cues are part of the Physical fidelity of a system and are those that are experienced via sensations from the human body’s multiple sensory modalities (e.g., visual, auditory, haptic, proprioceptive, and olfactory). This is the ability of a simulation to reproduce the physical features of the imitated system. An example of this would be that a driving simulation would have a realistic interior cabin, chair, dashboard, and controls would be shaped and offer feedback like the real vehicle. Functional cues are part of the Functional fidelity, or operational fidelity of a system which is the ability of a simulation system to reproduce the behavior of the environment. They define how the environment reacts and behaves to inputs from the student or other entities in the environment (i.e., these are cues a student experiences through reactions of the environment as s/he observes or interacts with the operational environment).For example, on a driving simulator this would be that the controls realistically replicate the type of car being simulated so that the simulated vehicle reacts to the student’s input and the environment as a real vehicle would in the real world. Psychological cues are part of the Psychological fidelity of a system and represent the cognitive (e.g., workload, attention) and affective (e.g., emotions) conditions experienced by an individual while in the operational environment (i.e., what the individual experiencing internally).