Features of Distributed Systems.

Resource sharing: Applications using resources at different locations.

Different resources at different locations (appears as though

all are residing locally)

Same resources at different locations:

Combination of the two.

Why?

■ Resources available at different locations could be used successfully without leaving them unutilized.

■ Not everything need to be duplicated everywhere.

■ Sharing job load potentially speeds up processes.

Other features.

■ Transparency

Existence of multiple machines, servers, platforms, etc. should be invisible to users. Users see a single system.

Difficult to achieve. Further division:

● Access transparency: Remote resource access is exactly same as local access syntactically and semantically. Global naming issue of objects. Issue of Distributed shared memory.

● Location transparency: Name of an object shouldn’t

reveal its location. Objects need to move from one place to another. Therefore, Name transparency.

Also user mobility: A user could login from any machine without using rlogin.

● Replication transparency: All the replicas of files and resources should be transparent to users. Naming of replicas (mapping of a resource name with its replicas) and replication control (how many copies) are two issues.

● Failure transparency: Partial failures may be tolerated though in a degraded form. Fault tolerancy issue. Complete failure transparency is not achievable at this stage of technology. Complete failure transparency may be questionable.

● Migration transparency: Linked with location transparency. For better performance and security, a distributed object may need to be migrated. Movement of object would be done automatically by the system and it must be transparent. Migration decisions: (a) which object to move, (b) when to move.

● Concurrency transparency: To achieve concurrency with finite computable resources among spatially distributed processes is a challenge. E.g. prevent concurrent updates of same file by two different processes. Critical properties:

♦ Event ordering: All access requests to various resources must be properly ordered for consistency

♦ Mutual-exclusion guarantee

♦ No starvation guarantee

♦ Deadlock free operation

● Performance transparency: Ideally a system should be automatically reconfigured to improve or maintain performance after load balancing. Nodes overloading while other nodes are idle should not be permitted.

● Scaling transparency: System should be allowed to expand in scale without disrupting activities of the users. Scalable algorithms should be used.

■ Reliability

A distributed system is expected to be more reliable than its corresponding centralized system.

♦ Reliability is enhanced through redundancy.

An redundancy system. Each component may fail with a probability. The probability that the system itself fails is

This assumes that only one component is needed to keep the system running. If components out of are needed for a functionality, then probability of failure

Two assumptions are behind above:

  • all components are independent and have identical failure probabilities
  • if one fails it’s not repaired

If a distributed OS is in charge of controlling and managing of these resources, just increasing redundancy would not achieve more reliability.

Nature of fault:

● Fault could be mechanical, or algorithmic causing system failure.

● Fail-stop or fault-stop situation: System stops functioning at a fault.

● Byzantine fault: System continues to function but produces wrong results (as by undetected software error)

● Fail-safe: Even in the event of a failure, the system would continue functioning as if it didn’t fail.

● Fail-secure: Not the same as fail-safe. Fail-safe & fail-secure refer to the status of the system (secure-side).

E.g. A magnetic door lock works by electric current. If no current, there is no mag lock. Normal Fail-safe situation. If, however, lock remains closed, then it is fail-secure.

The reliability issues are:

♦ Fault avoidance

♦ Fault tolerance

♦ Fault detection and recovery (repair)

■ Flexibility

A distributed system must appear to be open and flexible to users for the following:

● Usually a distributed system is evolving, both functionally and characteristically. Changes must appear compatible to users. This implies:

♦ Ease of modifications. The system must be open and modular.

♦ Ease of enhancements. New services are easily available within the old/new format.

♦ OS kernel ought to be microkernel. Otherwise every new service addition would require rebooting the system.

■ Performance

A distributed system must be at least as good as a centralized system. This is a crucial parameter. If OS components fail to work together as a distributed system, the performance might degrade badly.

♦ Batch as much as possible (multiplexing, piggybacking, ..)

♦ Cache whenever is possible

♦ Minimize copying of data

♦ Minimize network traffic

♦ Take advantage of fine-grade parallelism for multiprocessing.

■ Scalability

■ Heterogeneity

■ Security

■ Emulation of existing OS

Compare the following:

  1. A single server CPU with performance processing jobs per unit time. Jobs are coming into the open queue at a rate of jobs per unit time.

b. A system each of which has a performance of processing jobs per unit time. The total processing power is still the same . A job arriving at the head of the queue attempts to join the CPU found idle.

The results (for a single server infinite queue system)

= traffic density =

The corresponding results for multi-CPU system – each line resembling a single server queue with job admission rate of and job processing rate of - are

and

Response time is terrible for a multi-CPU system with large even though each processor is receiving th of the total incoming load.

Then why use distributed processing?

■ Usually total processing power of a normal distributed system could be higher than that of single PC type. Cost-wise this is a superb deal.

■ Even if some of the processors aren’t available (crashes, withdrawn from service), the total system doesn’t crash.

Fault-tolerance.

More of Fault-tolerance.

Some questions. Is Fault-tolerance always a desirable thing? In some cases, it may be desirable. Up to how long? When should it be considered unacceptable?

Fault tolerance as an emergent property.

Also, an allied concept: Healing or repair.

If fault emerges as intolerable, could system attempt to heal itself? Self-healing as a system property!

But what is the system? Does the human observer, the controller indentifying fault-intolerancy belong to the system? …

Other features.

■ Transparency

■ Reliability

■ Flexibility

■ Performance

■ Scalability

■ Heterogeneity

■ Security

■ Emulation of existing OS

An operating systems framework for a distributed system.

OS is a resource allocator, a resource manager in a centralized system. Therefore, there must be a distributed OS for a distributed system carrying out the same set of functions.

Normally, for a centralized system,

For a multi unit system, we need to organize it as

DCE = Distributed Computing Element

DCE An operating system

DCE: A TCP/IP architecture by OSF (Open Software Foundation) to furnish an opens system platform.

Think of a set of individual pillars. Each one can support its own weight and objects placed on top of it.

But to make it a bridge on these pillars, all we need is a

horizontal slab whose weight all the pillars could take. The horizontal slab is the DCE.

DCE provides integration. DCE provides communication. DCE provides security, a common framework for time, directory and file service. All these are based on three computing models:

■ Client-server architecture

■ Remote Procedure Calls

■ Shared Files