Technology Final Report
Secure/Resilient Systems and
Data Dissemination/Provenance
September 2017
Prepared for
The Northrop Grumman Cyber Research Consortium
As part of IS Sector Investment Program
Prepared by
Bharat Bhargava
CERIAS, Purdue University
Table of Contents
1 Executive Summary 3
1.1 Statement of Problem 3
1.2 Current State of Technology 4
1.3 Proposed Solution 5
1.4 Technical Activities, Progress, Findings and Accomplishments 18
1.5 Distinctive Attributes, Advantages and Discriminators 23
1.6 Tangible Assets Created by Project 24
1.7 Outreach Activities and Conferences 25
1.8 Intellectual Property Accomplishments 26
2 General Comments and Suggestions for Next Year 26
List of Figures
Figure 1. High-level view of proposed resiliency framework …..………...……………….……….……………….5
Figure 2. Service acceptance test ………………...……………………………………………………………….....7
Figure 3. View of space and time of MTD-based resiliency solution….………………………………………..…..9
Figure 4. Moving target defense application example…….……...………………………………………………...10
Figure 5. High-level resiliency framework architecture.……………………………………………………..….....11
Figure 6. System states of the framework….……………………………………………………………………….12
Figure 7. Data d1 leakage from Service X to Service Y….…………………………………………………………14
Figure 8. Data Sensitivity Probability Functions ………..……………………………………………………15
Figure 9. Encrypted search over database of active bundles (by Leon Li, NG “WAXEDPRUNE” project) ...……16
Figure 10. Experiment Setup For Moving Target Defense (MTD)…………………………………………………20
Figure 11. EHR dissemination in cloud (created by Dr. Leon Li, NGC) …………………………………………. 21 Figure 12. AB performance overhead with browser's crypto capabilities on / off ……..…………………………. 22 Figure 13. Encrypted Search over Encrypted Database ………………………………..…………………………. 23
List of Tables
Table 1. Executive Summary………………….………………………………………………………………...….1
Table 2. Operations supported by different crypto systems …..……………………………………………...15 Table 3. Moving Target Defense (MTD) Measurements……...... ………………………………………….21 Table 4. Encrypted Database of Active Bundles. Table ‘EHR_DB’ …………………………………………..22
1 Executive Summary
Title / Secure/Resilient Systems and Data Dissemination/ProvenanceAuthor(s) / Bharat Bhargava
Principal Investigator / Bharat Bhargava
Funding Amount / $200,000
Period of Performance / September 1, 2016 - August 31, 2017
Was this a continuation of Investment Project? / Yes / TRL Level / 3
Key Words / data provenance, data leakage, resiliency, adaptability, security, self-healing, MTD
Key Partners & Vendors
Table 1: Executive Summary
1.1 Statement of Problem
In a cloud-based environment, the enlarged attack surface along with the constant use of zero-day exploits hampers attack mitigation, especially when attacks originate at the kernel level. In a virtualized environment, an adversary that has fully compromised a virtual machine (VM) and has system privileges (kernel level, not the hypervisor) without being detected by traditional security mechanisms exposes the cloud processes and cloud-resident data to attacks that might compromise their integrity and privacy, jeopardizing mission-critical functions. The main shortcoming of traditional defense solutions is that they are tailored to specific threats, therefore limited in their ability to cope with attacks originating outside their scope. There is need to develop resilient, adaptable, reconfigurable infrastructure that can incorporate emerging defensive strategies and tools. The architectures have to provide resiliency (withstand cyber-attacks, and sustain and recover critical function) and antifragility (increase in capability, resilience, or robustness as a result of mistakes, faults, attacks, or failures).
The volume of information and real time requirements have increased due to the advent of multiple input points of emails, texts, voice, tweets. They are all coming to the government agencies such as US State Department for dissemination to many stakeholders, to make sure of security of classified information (cyber data, user data, attack event data) so that it can be identified as classified (secret) and disseminated based on access privileges to the right user in a specific location on a specific device. For forensics/provenance, the determination of the identity of all who have accessed/updated/disseminated the sensitive cyber data including the attack event data is needed. There is need to build systems capable of collecting, analyzing and reacting to dynamic cyber events across all domains while also ensuring that cyber threats are not propagated across security domain boundaries and compromise the operation of system. Solutions that develop a science of cyber security that can apply to all systems, infrastructure, and applications are needed. The current resilience schemes based on replication lead to an increase in the number of ways an attacker can exploit or penetrate the systems. It is critical to design a vertical resiliency solution from the application layer down to physical infrastructure in which the protection against attacks is integrated across all the layers of the system (i.e., application, runtime, network) at all times, allowing the system to start secure, stay secure and return secure+ (i.e. return with increased security than before) [13] after performing its function.
1.2 Current State of Technology
Current industry-standard cloud systems such as Amazon EC2 provide coarse-grain monitoring capabilities (e.g. CloudWatch) for various performance parameters for services deployed in the cloud. Although such monitors are useful for handling issues such as load distribution and elasticity, they do not provide information regarding potentially malicious activity in the domain. Log management and analysis tools such as Splunk [1], Graylog [2] and Kibana [3] provide capabilities to store, search and analyze big data gathered from various types of logs on enterprise systems, enabling organizations to detect security threats through examination by system administrators. Such tools mostly require human intelligence for detection of threats and need to be complemented with automated analysis and accurate threat detection capability to quickly respond to possibly malicious activity in the enterprise and provide increased resiliency by providing automation of response actions. In addition Splunk is expensive
There are well-established moving target defense (MTD) solutions designed to combat against specific threats, but limited when there are exploits beyond their boundaries. For instance, application-level redundancy and replication schemes prevent exploits that target the application code base, but fail against code injection attacks that target runtime execution, e.g. buffer and heap overflows, and control flow of the application.
Instruction set randomization [51], address space randomization [4], randomizing runtime [5], and system calls [6] have been used to effectively combat against system-level (i.e. return-oriented/code injection) attacks. System-level diversification and randomizations are considered mature and tightly integrated into some operating systems. Most of these defensive security mechanisms (i.e. instruction/memory address randomizations) are effective for their targets, however, modern sophisticated attacks require defensive solution approaches to be deeply integrated into the architecture, from the application-level down to the infrastructure simultaneously and at all times.
Several general approaches have been proposed for controlling access to shared data and protecting its privacy. DataSafe is a software-hardware architecture that supports data confidentiality throughout their lifecycle [7]. It is based on additional hardware and uses a trusted hypervisor to enforce policies, track data flow, and prevent data leakage. Applications running on the host are not required to be aware of DataSafe and can operate unmodified and access data transparently. The hosts without DataSafe can only access encrypted data, but it is unable to track data if they are disclosed to non- DataSafe hosts. The use of a special architecture limits the solution to well-known hosts that already have the required setup. It is not practical to assume that all hosts will have the required hardware and software components in a cross-domain service environment.
A privacy-preserving information brokering (PPIB) system has been proposed for secure information access and sharing via an overlay network of brokers, coordinators, and a central authority (CA) [8]. The approach does not consider the heterogeneity of components such as different security levels of client’s browsers, different user authentication schemes, trust levels of services. The use of a trusted third party (TTP) creates a single point of trust and failure.
Other solutions address secures data dissemination in untrusted environments. Pearson et al. present a case study of EnCoRe project that uses sticky policies to manage the privacy of shared data across different domains [9]. In the EnCoRe project, the sticky policies are enforced by a TTP and allow tracking of data dissemination, which makes it prone to TTP-related issues. The sticky policies are also vulnerable to attacks from malicious recipients.
1.3 Proposed Solution
We propose an approach for enterprise system and data resiliency that is capable of dynamically adapting to attack and failure conditions through performance/cost-aware process and data replication, data provenance tracking and automated software-based monitoring & reconfiguration of cloud processes (see Figure 1). The main components of the proposed solution and the challenges involved in their implementation are described below.
Figure 1. High-level view of proposed resiliency framework
1.3.1 Software-Defined Agility & Adaptability
Adaptability to adverse situations and restoration of services is significant for high performance and security in a distributed environment. Changes in both service context and the context of users can affect service compositions, requiring dynamic reconfiguration. While changes in user context can result in updated priorities such as trading accuracy for shorter response time in an emergency, as well as updated constraints such as requiring trust levels of all services in a composition to be higher than a particular threshold in a critical mission, changes in service context can result in failures requiring the restart of a whole service composition. Advances in virtualization have enabled rapid provisioning of resources, tools, and techniques to build agile systems that provide adaptability to changing runtime conditions. In this project, we will build upon our previous work in adaptive network computing [10], end-to-end security in SOA [11] and the advances in software-defined networking (SDN) to create a dynamically reconfigurable processing environment that can incorporate a variety of cyber defense tools and techniques. Our enterprise resiliency solution is based on two main industry-standard components: The cloud management software of OpenStack [12] – Nova, which provides virtual machines on demand; and the Software Defined Networks (SDN) solution – Neutron, which provides networking as a service and runs on top of OpenStack.
The solution that we developed for monitoring cloud processes and dynamic reconfiguration of service compositions as described in [10] involved a distributed set of monitors in every service domain for tracking service/domain-level performance and security parameters and a central monitor to keep track of the health of various cloud services. Even though the solution enables dynamic reconfiguration of entire service compositions in the cloud, it requires replication, registration and tracking of services at multiple sites, which could have performance and cost implications for the enterprise. In order to overcome these challenges, the proposed framework utilizes live monitoring of cloud resources to dynamically detect deviations from normal service behavior and integrity violations, and self-heal by reconfiguring service compositions through software-defined networking of automatically migrated service instances. A component of this software-defined agility and adaptability solution is live monitoring of services as described below.
1.3.1.1 Live Monitoring
Cyber-resiliency is the ability of a system to continue degraded operations, self-heal, or deal with the present situation when attacked [13]. We may need to shut down less critical computations, communications and allow for weaker consistency as long as the mission requirements are satisfied. For this we need to measure the assurance level, (integrity/accuracy/trust) of the system from the Quality of Service (QoS) parameters such as response time, throughput, packet loss, delays, consistency, acceptance test success, etc.
To ensure the enforcement of SLAs and provide high security assurance in enterprise cloud computing, a generic monitoring framework needs to be developed. The challenges involved in effective monitoring and analysis of service/domain behavior include the following:
· Identification of significant metrics, such as response time, CPU usage, memory usage, etc., for service performance and behavior evaluation.
· Development of models for identifying deviations from performance (e.g., achieving the total response time below a specific threshold) and security goals (e.g., having service trust levels above a certain threshold).
· Design and development of adaptable service configurations and live migration solutions for increased resilience and availability.
Development of effective models for detection of anomalies in a service domain relies on careful selection of performance and security parameters to be integrated into the models. Model parameters should be easy-to-obtain and representative of performance and security characteristics of various services running on different platforms. We plan to investigate and utilize the following monitoring tools that provide integration with OpenStack in order to gather system usage/resiliency parameters in real time [14]:
- Ceilometer [15]: Provides a framework to meter and collect infrastructure metrics such as CPU, network, and storage utilization. This tool provides alarms set when a metric crosses a predefined threshold, and can be used to send alarm information to external servers.
- Monasca [16]: Provides a large framework for various aspects of monitoring including alarms, statistics, and measurements for all OpenStack components. Tenants can define what to measure, what statistics to collect, how to trigger alarms and the notification method.
- Heat [17]: Provides an orchestration engine to launch multiple composite cloud applications based on templates in the form of text files that can be treated like code. Enabling actions like autoscaling based on alarms received from Ceilometer.
As a further improvement for dynamic service orchestration and self-healing, we plan to investigate models that are based on a graceful degradation approach for service composition, which replace services that do not pass acceptance tests as seen in Figure 2 based on user-specified or context-based policies with ones that are more likely to pass the tests at the expense of decreased performance.
Figure 2. Service acceptance test
1.3.2 Moving Target Defense for Resiliency/Self-healing
The traditional defensive security strategy for distributed systems is to prevent attackers from gaining control of the system using known techniques such as firewalls, redundancy, replications, and encryption. However, given sufficient time and resources, all these methods can be defeated, especially when dealing with sophisticated attacks from advanced adversaries that leverage zero-day exploits. This highlights the need for more resilient, agile and adaptable solutions to protect systems. MTD is a component in NGC project Cyber Resilient System [13]. Sunil Lingayat of NGC has taken interest and connected us with other researchers in NGC working in Dayton.