© Kindunos Safety Consultancy Ltd, Gorinchem, the Netherlands

Contents

1.  Introduction. …………………………………………………………………………. 5

2.  History of investigations ……………………………………………………….. 6

a.  Four schools of thinking

b.  Elaboration on system safety frameworks

c.  A role for accident investigations

3.  System change management…………………………………………………. 15

a.  safety interventions

b.  the use of communication metaphors

4.  Confusion and notions ………………………………………………………….. 20

a.  dealing with complexity as a mathematical issue

b.  dealing with complexity as a social issue

c.  dealing with uncertainty

d.  interdisciplinary or problem oriented

e.  knowledge management

f.  knowledge innovation

5.  About models and methods …………………………………………………… 34

a.  paradigmatic shift in modeling

b.  from model to method

6.  Unraveling complexity in practice: tools and techniques ……….. 44

a.  the event

b.  the usefulness of forensics

c.  applying time as a diagnostic dimension

d.  the system

e.  system change and adaptation

7.  Summary ………………………………………………………………..……………… 51

References

Correspondence address:

Gorinchem, the Netherlands

June 2013

During his presentation on precaution and accident investigation on a Conference on Ethical Aspects of Risk at Delft University of Technology (14-16 June 2006), an aerospace engineer and safety investigator overheard two social scientists on ethics in the audience saying:

“There is still one out there who has the nerve to defend himself”
Preface

After publishing the book on Shaping public safety investigations of accidents in Europe in 2005 and the Guidelines for safety investigations of accidents in 2009, ESReDA now intends to present its achievements on learning from safety investigations on the ESReDA Seminar in Porto, Portugal, Autumn 2013.

Simultaneously, the Resilience Engineering Association organizes its 5th conference in the Summer of 2013 in Soesterberg, the Netherlands, focusing on managing trade-offs. The conference will address current challenges organizations face, why resilience is needed to address those challenges, and how Resilience Engineering can be put into practice.

During the process of preparing the products of the working group and the conference, discussions have been focusing on notions of resilience engineering, organizational learning, complexity, case based analysis and investigation models. Such discussions have been guided by the intention to provide a practical approach to safety investigations that could be applied across various domains and industrial sectors on minor as well as major events. It became clear that several schools of thought exist, related to specific domains and scientific disciplines, creating confusion and differences in interpretation, creating controversies between theoretical approaches and best practices.

This essay elaborates on several basic notions, scientific principles and theories that – mostly implicit- lay behind the conduct of safety investigations to clarify and resolve ambiguities that presently exists. Researchers from various scientific disciplines and application domains present alternatives for safety investigations and safety and risk management. They challenge current retrospective investigation approaches by advocating new approaches that should make investigations obsolete such as:

-  Design of socio-technical systems: Resilience engineering and safety management

-  Sophistication of modeling: Bayesian belief networks, functional resonance.

-  Introduction of sociological issues: Safety Culture, Leadership and Governance.

This essay however focuses on enhancing the diagnostic capability of safety investigations itself, addressing several critical notions for academics and practitioners in the investigation community:

-  there are several dilemmas and pitfalls in dealing with current investigation notions and perspectives. Normative constructs are applied without being made explicit or clarified as valid and applicable to the investigation process: the phenomenon of complexity and unavoidability of emergent system behavior as unknown-unknown properties in operations, the value of a pro-active over a reactive approach as a feasible replacement of feedback learning, rejection of a technical perspective in favor of social and organizational perspectives, where social scientist mistakenly bark at the Newtonian design and engineering tree instead of the Tayloristic tree of scientific management and organizational failure.

-  staying away from of scientific debates on the validity of notions and concepts across disciplines, on the use and abuse of metaphors, methods and models, the validity of abolition of event analysis in favor of modeling and substantiation of systems behavior without the burden of proof of collecting evidence on a case study basis and a refocusing from accident investigation towards optimization of primary production and control processes. Coexistence between various school of thinking is advocated by senior scientists from almost every discipline. Such a coexistence could favor a theoretical and methodological basis for the emergence of an investigation science, which by definition represents the ability to conduct investigations based on a familiarity with a broad range of disciplines and the ability to pursue several lines of investigation simultaneously.

-  the emergence of IT and software engineering made new scientific notions and technological expertise available for application to the notion of safety. Instead of the conventional mechanical notions regarding ‘failure’, ‘load concept’ and ‘kinetic energy’, notions of ‘complexity’ and ‘emergent properties’ appear. Such notions shift a focus from structure and architecture of systems to process control and performance. Such a shift can be traced back to inherent problems with software reliability, certification and testability of embedded IT software and system designs as has been demonstrated by the Y2K bug, viruses, hacking and cybercrime.

-  the use of communication metaphors as analytic and solution generating tools in applying constraints on operator behavior from a managerial perspective, while such metaphors are solely based on a classic concept of energy transfer which is proven invalid in domains of cognition, decision making and governance. In order to communicate manipulating solutions for modern software interactions and decision making algorithms, the metaphor of the Rubik Cube is introduced.

-  new developments and concepts, such as resilience engineering and forensic engineering, could be mutually supportive for enhancing safety and managing complex and dynamic systems. In addition to the existing technological diagnostic potential, an organizational and social diagnostic framework and toolkit will be welcomed in the community of safety investigations. Such a cooperation across disciplinary boundaries is in its early phase of development.

Finally, cooperation with operational experts and incorporating operational experiences in the investigation process is indispensable in identification of systemic and knowledge deficiencies. All parties and perspectives should be incorporated in the investigation.

Involving operational expertise and experience:

In an article on High-Altitude Upset Recovery in Aviation Week, captain C. B. "Sully" Sullenberger described his ditching in the Hudson as a seminal accident. "We need to look at it from a systems approach, a human/technology system that has to work together. This involves aircraft design and certification, training and human factors. If you look at the human factors alone, then you're missing half or two-thirds of the total system failure..."

He also emphasizes the importance of the availability of primary flight parameters: “accurate airspeed indications alone aren't the best data the crew needs to recover from an upset. That requires knowing the wing's angle of attack (AoA). We have to infer angle of attack indirectly by referencing speed. That makes stall recognition and recovery that much more difficult. For more than half a century, we've had the capability to display AoA (in the cockpits of most jet transports), one of the most critical parameters, yet we choose not to do it."

Involving scientific modeling and human behavior theories:

On one hand, the chief investigator of BEA, concludes in “The final word: Air France flight 447” (Troadec 2013) that: the combined use of ergonomics of warning design, training conditions, recurrence training process did not generate expected behaviour and showed the limits of current safety models.

On the other hand, the vigilance, proficiency and flexibility of qualified air crews proved of paramount importance in the ability to recover from unforeseen and unprecedented events in case of the Qantas Flight QF32. The Airbus A380 loss of containment event was brought to a successful closure by deviating from procedural flight performance and rule based responses in a potentially unrecoverable failure (ATSB 2010).

Involving certification and inspection authorities:

In the AA Flight 587 case, it took investigators almost 2 years to identify a flaw in the hydraulic system that had been certified as failsafe in the 1960’s. Since the FAA and aviation authorities cannot match the resources of companies such as Boeing and Airbus, manufacturers engineers signed off most of the elements of the Dreamliner battery pack. Leaving a final approval to the governmental agencies, subtle conflicts of interest could influence the assumptions of manufacturers engineers in earlier phases of certification. As stated by Tom Haueter, the NTSB chief investigator on AA flight 587: “It’s the assumptions that kill you, and if things do not work out the way you planned, things can go very bad, very fast”.

1. Introduction

Major transitions in socio-technical system developments have been argued, based on internal and external conditions. Internal factors are focusing on performance pressure within a system in order to control required growth, modal shift demands, intelligent operation and expansion of transport services. External factors should be integrated in a future systems development, dealing with land use, detrimental environmental effects, sustainability and safety. Such transitions have changed the operating environment of safety investigation theory and practices.

Technological innovation

Major issues in several modes of transportation have lead to such a system pressure that major changes should be introduced. A ‘system leap’ forwards may be inevitable. Rather than applying proven technology and pragmatic improvements on a detailing level, technological innovation may be necessary to overcome constraints in system development.

Conceptual change

A ‘system leap’ approach introducing conceptual changes of a non-technological nature in the architecture of complex systems dealing with business modeling, market development and globalization. External pressure exists with respect to spatial planning, land use and urban development coping with congestion, external safety and environmental constraints in densely populated areas and compact city concepts.

Integrating safety

Historically, safety has been submitted to a fragmented approach. To fulfill internal and external demands, a ‘conceptual leap’ in safety notions may be required. It may become necessary to transform safety from an operational cost into a strategic issue, integrated in each phase during the life cycle of such systems.

The question is how safety should be integrated in these developments in order to guarantee a pro-active and sustainable safe operation throughout their life cycles.

Reading advice

In this publication, some background information is provided on the use of the ESReDA safety investigation method. After this short introduction, chapter 2 provides a scoping and embedding of investigation in the various school of safety thinking. Such scoping indicates the success rate for learning potential by the feedback loops that are available.

Chapter 3 introduces the scope of intervention potential, dealing either with systems optimization within the operating envelope or systems adaptation by design.

Chapter 4 elaborates on a series of notions, such as complexity, uncertainty and problem orientation and knowledge management.

Chapter 5 discusses the claims for a paradigm shift and the transition from model to method.

Chapter 6 elaborates on the various tools and techniques that are available to operate the ESReDA method, comprising forensic sciences, time as a diagnostic dimension and the separation between recomposing the event and modeling the system.2. History of investigations

Accident investigation as an analytic tool foremost has its origin in all modes of the transportation industry and found a rapid expansion to process industry and the energy sector. Gradually, investigation principles have been applied to sectors with a less prominent technological signature, such as medicine, firefighting, rescue and emergency. This long and dedicated history has caused a wide variety and divergence of notions, fundamentals, methods and perspectives.

Therefore, generalizing investigative notions have to be put in its historical perspective, tracing back their origins and motives, their ability to enhance safety by embracing a socio-technical systems perspective and the specific role of investigations in this approach (Lees 1960, Edwards 1972, Hendrickx 1991, Benner 1996, Rimson and Benner 1996, ETSC 2001, Leveson 2002, ESReDA 2005, Young et.al. 2005, Katsakiori 2008, Benner 2009, ESReDA 2009).

Over the decades, four safety Schools of Thought in high technology complex systems have emerged. At the same time, new technological developments in high technology sectors such as aviation have forced the investigation community to reflect on the next century of investigation theory and practices, which create major challenges for the profession (Hersman 2012, McIntosh 2012).

At the same time, challenges are to be met in a competitive and open market on a global scale (EU 2011):

In 2050, the European air transport system is integrated in a complete logistical transport chain and part of a fully interconnected, global aviation system that is based on a multilateral regime rather than on a series of bilateral agreements. Interoperability between Europe and the other regional components of the global network is complete. Commercial air transport services are provided mainly by airlines organized as a few global alliances. Thanks to tight links between technological and regulatory approach, Europe has a global lead in the implementation of international standards covering all aviation issues, including interoperability, the environment, energy, security and safety. This leadership ensures that the global regulatory system enables market access and free, fair and open competition.

Within Europe the number of commercial flights is up to 25 million in 2050 compared to 9.4 million in 2011. At the same time, these plans put high demands on maintaining the present safety level in the aviation sector (EU 2011):

Overall, the European aviation transport system has less than one accident per ten million commercial aircraft flights. For specific operations, such as search and rescue, the aim is to reduce the number of accidents by 80% compared to 2000 taking into account increasing traffic.

2.1 Four schools of thought

Safety in modern transportation systems has been an issue for about 150 years. It evolved as a discipline from several different domains and disciplines and has a strong practical bias.

Consequently, various ‘schools of thought’ have been merging, of which the most important can be categorized as ‘Tort Law School’, ‘Reliability Engineering School’ and ‘System Safety Engineering School’ (McIntyre 2000). In addition a fourth school is defined as ‘System Deficiency and Change’ (Stoop 2002).