Mark Chekhanovskiy

CS265 Fall 2002

Research Topic:

Modern Distributed Systems Design

– Security and High Availability

-Measuring Availability

How resiliency and high availability are interconnected?

Define downtime and what causing downtime.

How to meager availability?

-Highly Available Data Management

Data management is the most sensitive area of modern distributed systems.

Quick overview of existing data topologies

-Redundant System Design

Redundant storage (RAID, Multihosting, Multipathing, DiskAray, JBOD, etc)

Failover Configurations and Management

Introduction to SAN and Fibrechannel protocol

Security aspects of data management in Storage Area Networks

Resilience and high availability mean that all of a system’s failure modes are known and well-defined, including networks and applications. They mean that the recovery times for all known failures have an upper bound; we know how long a particular failure will have the system down. While there may be certain failures that we cannot cope with very well, we know they are and how to recover from them, and we have backup plans for use if our recoveries don’t work. A resilient system is one that can take a hit to a critical component, and recover and come back in a known, bounded and generally acceptable period of time.

Measuring availability always turns to be about measuring the cost. In mission critical systems availability is the most important property that needs to be considered. We must consider not only the cost as burden but be able to evaluate is when “100%” uptime is required. Unfortunately just pairing two servers with 99% of uptime wont make our system 99.99% even basic mathematically you can argue so. Downtime could be defined by following: “If a user cannot get his job done on time, the system is down”

Percentage Uptime / Percentage Downtime / Downtime per year / Downtime per week
98% / 2% / 7.3 days / 3h22m
99% / 1% / 3.65 days / 1h41m
99.8% / 0.2% / 17h30m / 20m10s
99.9% / 0.1% / 8h45m / 10m5s
99.99% / 0.01% / 52.5m / 1m
99.999% / 0.001% / 5.25m / 6s
99.9999% / 0.00001% / 31.5s / 0.6s

There are many causes for system downtime that are falling into following categories:

  • Planned – ones that easiest to reduce that include scheduled system maintenance, hot-swappable hard drives, cluster upgrades and even failovers. Usually 30% of all downtime;
  • People or human factor – dumb mistakes and complex innovation in IT equipment, software and protocols requires greater knowledge of engineers. Usually 15 % of all downtime;
  • Software Failures - due to software bugs and viruses. (40%)

MTBF

Availability = ------, where MTBF – “mean time between failures” and

MTBF + MTTR MTTR - “maximum time to repair”

So what could go wrong in Modern Distributed Systems?

  • Hardware
  • Environmental and Physical Failures
  • Network Failures
  • Database System Failures (application crash or hangs, resource shortfalls, index corruption, buggy software)
  • Web Server Failures
  • File and Print Server Failures

The Cost of Downtime:

Industry / Business Operation / Average Downtime cost per hour
Financial / Brokerage Operation / $6.45 Mil
Financial / Credit Card/Sales Authorization / $2.6M
Media / Pay per view TV / $150K
Retail / Catalog sales / $90K-$115K
Transportation / Airlines / $89.5K

Levels of Availability:

  1. Regular Availability - Do Nothing Special;
  2. Increased Availability - Protect the Data (use RAID);
  3. High Availability – Protect the System (loosely coupled servers, hit JACK SPOT 99.98% of Availability)
  4. Disaster recovery – Protect the Organization
  5. Fault-Tolerant System – System that build from double- and triple-redundant components working in parallel.

Disks and the data stored on them, are the most critical part of just about any computer system. Why?

  1. Disks are the most likely component to fail
  2. Disks contain data
  3. The data must be protected
  4. Data accessibility must be ensured

How storage is managed:

  1. Disk(s) in computer system (eg SCSI). SCSI-1 3-5Mb/sec evolution to Ultra-3 160Mb/s. There are initiators and target that are connected to bus
  2. Fiber channel speed 100Mb/s evolution to 2Gb/s. Devices could be 2 kilometers apart on Fiber channel network
  3. Multi-hosting – one set of disks connected to more then one server
  4. Multi-pathing – connecting single host to a single disk array with more then one data path
  5. Disk array – a single enclosure or cabinet, containing slots for many disks
  6. JBOD (Just Bunch of Drives) – collection of disks with no hardware intelligence
  7. SAN – Storage Area Networks (brakes one-to-one relation between server and storage, storage virtualization, utilization, centralized management and allocation, intrinsic resilience and high availability, no-need for disk co-location, complex fail over configurations, efficient resource deployment, LAN-free backups)

The RAID standard describes several ways to combine and manage a set of independent disks so that the resultant combination provides a level of disk redundancy.

RAID-0 : stripping – each chunk of data to be written to disk is broken into smaller segments, with each segment written to a separate disk. (Increases performance but decreases availability)

RAID –1 : mirroring – is a model when a copy of every byte on disk is kept on a second disk. Using more then one copy of the data will increase level of redundancy. Good data protection from the inevitable loss of disk, read performance is better then without. It requires 100% disk overhead. Synchronization of failed drive requires block for block complete copy of the contents of the original disk that requires time and a lot of IO.

RAID-1+0 – Striped Mirrors

RAID 3, 4 and 5: Parity RAID – Each RAID volume requires space equivalent to one extra disk. This additional disk’s blocks contain calculated parity, which are generated by taking XOR of the contents of the corresponding data blocks on all other disks in the RAID volume.

RAID-3 – Virtual Disk Blocks. Every disk operation touches all disks regardless of the size of the write.

RAID-4 – Dedicated Parity Disk. The entire disk is devoted to RAID volume parity.

RAID-5 – Striped Parity Region works as RAID 4 by the parity volume is striped across all disks.

Windows and UNIX operating systems are using Journaled FileSystem (JFS) to reduce the risk of loosing data after system crashes and corresponding increase of system availability by limiting the need for full disk scan. JFS reserve chunk of disk space for a journal or intent log. When a write occurs an entry is written is written to the journal before the write is completed. The journal entry contains enough information for the filesystem to recreate the transaction.

Designing systems with high availability systems we need to minimize the maximum time to repair. One practical solution is to put two identical systems and tie them together, so if one system fails, the other one can take over. The migration of services from one server to another is called failover. Failover must meet following requirements:

  • Transparent to client,;
  • Quick (no more then 5 min, ideally 0-2 min);
  • Minimal manual intervention, guaranteed data access.

These could be achieved by having following components:

  • Two servers, one primary another takeover;
  • Two network connections, third is highly recommended: A pair of heartbeat networks that run directly from one server to another but which are completely independent from each other. Second is a public network and the third connection for administrators.
  • All disks on a failover pair should have some sort of redundancy; mirroring is always preferred over parity RAID.
  • Application portability – must run on both servers, one server at a time.
  • No single point of failure. If there is a component in failover pair whose failure will cause both servers to become down or otherwise unavailable, then we don’t have high availability

The simplest and most common failover configurations are the two-node kind. There are two kinds of two-nodes configurations, asymmetric and symmetric. In the asymmetric configuration one node is active and doing critical work, while its partner node is dedicated standby, ready to take over should the first node fail. In the symmetric configuration, both nodes are doing independent critical work and should either node fail, the survivor steps in and does double duty, serving both sets of services until the first node can brought back into service.

Security in IP Storage Networks:

The notion of transporting mission-critical storage data over untrusted networks would be unacceptable if auxiliary security methods were unavailable

Off-the shelf technologies: firewall products, authentication and encryption utilities, and VLANs for data centers as well as VPN for wide area applications. Adapting the FCP to IP transport, as with the iFCP protocol, enables even Fibre Channel end devices to benefit from advanced security utilities developed for IP.

Security in Fibre Channel SANs

Elementary security for Fibre Channel SANs is enforced by physical separation of networks. Fibre Channel assumes a dedicated network isolated from the user and public network due to use of a unique protocol. The vulnerability of a Fibre Channel SAN to penetration is exposed through its management interface that uses SNMP over Ethernet. Access to fabric management would not give the hacker access to storage data directly, but would allow a storage resource to be reassigned to a different server. The typical security concerns of Fibre Channel SANs are not focused on penetration from outside, but segregation of storage resources within the SAN. Fibre Channel enforces basic separation of applications and departments within SAN through zoning.

There are three types of zoning:

1)Port or hard zoning facilitates port attachment based on their physical port attachment.

2)WWN Zoning – a more flexible zoning mechanism for Fibre Channel SANs uses WWNs instead of port connections to enforce zones. The WWN is unique to every Fibre Channel device and is registered with the fabric during SNS[1] login. WWN zoning can be spoofed just as source IP address can be manipulated to bypass security enforcement.

3)LUN Masking – zoning based on logical units. In a multivendor environment, the administrator may have to use several different configuration utilities to perform LUN masking on different storage resources.

Aside from port and WWN zoning and proprietary LUN masking implementations, Fibre Channel has no integrated security facilities. The FC-3 common services layer is a placeholder for data encryption and other functions, but these have not been fully formulated or implemented.

Security Options for IP Storage Networks

A distinct advantage of IP SAN technology is the ability to leverage sophisticated security utilities for authentication and data encryption.

1st Line of defense – iSNS, as the central repository of data for device discovery and discovery enforcement, is logical place to host security services. As part of registration process, for example, IP storage device could register its X.509 public key certificate with the iSNS server. Once discovery domains are established, the iSNS server can distribute the appropriate public keys between devices in the same domain.

2nd LUN Masking as in Fibre Channel and VLAN tagging that enables individual packets to be associated with designated groups of authorized hosts. VLUN tagging can be used to enforce traffic separation and prioritization through a switched Ethernet.

3ed IP Security or IPSec has two main components: authentication of identity of communicating peers in the networks and data encryption

4th Gigabit Ethernet Switches and IP routers may support policy-based security controls through ACLs by specifying specific group of IP to permit or deny access to a destination address as well as include policies based on type of IP traffic

References:

  1. Blueprints for High Availability – “Designing Resilient Distributed Systems” by Evan Marcus, Hal Stern
  2. Security Engineering – “A Guide to Building Dependable Distributed Systems” by Ross Anderson
  3. IP SANs – “A Guide to iSCSI, iFCP and FCIP Protocols for Storage Area Networks” by Tom Clark
  4. On-Line Article: An intrusion-tolerant security server for an open distributed system
  5. On-Line Article: Transparent Fault Tolerance for Enterprise Applications

[1] SNS – Storage Name Server that is used for device discovery, iSNS – Internet Storage Name Server used in IP SANs