Features of Distributed Systems.
Resource sharing: Applications using resources at different locations.
Different resources at different locations (appears as though
all are residing locally)
Same resources at different locations:
Combination of the two.
Why?
■ Resources available at different locations could be used successfully without leaving them unutilized.
■ Not everything need to be duplicated everywhere.
■ Sharing job load potentially speeds up processes.
Other features.
■ Transparency
Existence of multiple machines, servers, platforms, etc. should be invisible to users. Users see a single system.
Difficult to achieve. Further division:
● Access transparency: Remote resource access is exactly same as local access syntactically and semantically. Global naming issue of objects. Issue of Distributed shared memory.
● Location transparency: Name of an object shouldn’t
reveal its location. Objects need to move from one place to another. Therefore, Name transparency.
Also user mobility: A user could login from any machine without using rlogin.
● Replication transparency: All the replicas of files and resources should be transparent to users. Naming of replicas (mapping of a resource name with its replicas) and replication control (how many copies) are two issues.
● Failure transparency: Partial failures may be tolerated though in a degraded form. Fault tolerancy issue. Complete failure transparency is not achievable at this stage of technology. Complete failure transparency may be questionable.
● Migration transparency: Linked with location transparency. For better performance and security, a distributed object may need to be migrated. Movement of object would be done automatically by the system and it must be transparent. Migration decisions: (a) which object to move, (b) when to move.
● Concurrency transparency: To achieve concurrency with finite computable resources among spatially distributed processes is a challenge. E.g. prevent concurrent updates of same file by two different processes. Critical properties:
♦ Event ordering: All access requests to various resources must be properly ordered for consistency
♦ Mutual-exclusion guarantee
♦ No starvation guarantee
♦ Deadlock free operation
● Performance transparency: Ideally a system should be automatically reconfigured to improve or maintain performance after load balancing. Nodes overloading while other nodes are idle should not be permitted.
● Scaling transparency: System should be allowed to expand in scale without disrupting activities of the users. Scalable algorithms should be used.
■ Reliability
A distributed system is expected to be more reliable than its corresponding centralized system.
♦ Reliability is enhanced through redundancy.
An redundancy system. Each component may fail with a probability. The probability that the system itself fails is
This assumes that only one component is needed to keep the system running. If components out of are needed for a functionality, then probability of failure
Two assumptions are behind above:
- all components are independent and have identical failure probabilities
- if one fails it’s not repaired
If a distributed OS is in charge of controlling and managing of these resources, just increasing redundancy would not achieve more reliability.
Nature of fault:
● Fault could be mechanical, or algorithmic causing system failure.
● Fail-stop or fault-stop situation: System stops functioning at a fault.
● Byzantine fault: System continues to function but produces wrong results (as by undetected software error)
● Fail-safe: Even in the event of a failure, the system would continue functioning as if it didn’t fail.
● Fail-secure: Not the same as fail-safe. Fail-safe & fail-secure refer to the status of the system (secure-side).
E.g. A magnetic door lock works by electric current. If no current, there is no mag lock. Normal Fail-safe situation. If, however, lock remains closed, then it is fail-secure.
The reliability issues are:
♦ Fault avoidance
♦ Fault tolerance
♦ Fault detection and recovery (repair)
■ Flexibility
A distributed system must appear to be open and flexible to users for the following:
● Usually a distributed system is evolving, both functionally and characteristically. Changes must appear compatible to users. This implies:
♦ Ease of modifications. The system must be open and modular.
♦ Ease of enhancements. New services are easily available within the old/new format.
♦ OS kernel ought to be microkernel. Otherwise every new service addition would require rebooting the system.
■ Performance
A distributed system must be at least as good as a centralized system. This is a crucial parameter. If OS components fail to work together as a distributed system, the performance might degrade badly.
♦ Batch as much as possible (multiplexing, piggybacking, ..)
♦ Cache whenever is possible
♦ Minimize copying of data
♦ Minimize network traffic
♦ Take advantage of fine-grade parallelism for multiprocessing.
■ Scalability
■ Heterogeneity
■ Security
■ Emulation of existing OS
Compare the following:
- A single server CPU with performance processing jobs per unit time. Jobs are coming into the open queue at a rate of jobs per unit time.
b. A system each of which has a performance of processing jobs per unit time. The total processing power is still the same . A job arriving at the head of the queue attempts to join the CPU found idle.
The results (for a single server infinite queue system)
= traffic density =
The corresponding results for multi-CPU system – each line resembling a single server queue with job admission rate of and job processing rate of - are
and
Response time is terrible for a multi-CPU system with large even though each processor is receiving th of the total incoming load.
Then why use distributed processing?
■ Usually total processing power of a normal distributed system could be higher than that of single PC type. Cost-wise this is a superb deal.
■ Even if some of the processors aren’t available (crashes, withdrawn from service), the total system doesn’t crash.
Fault-tolerance.
More of Fault-tolerance.
Some questions. Is Fault-tolerance always a desirable thing? In some cases, it may be desirable. Up to how long? When should it be considered unacceptable?
Fault tolerance as an emergent property.
Also, an allied concept: Healing or repair.
If fault emerges as intolerable, could system attempt to heal itself? Self-healing as a system property!
But what is the system? Does the human observer, the controller indentifying fault-intolerancy belong to the system? …
Other features.
■ Transparency
■ Reliability
■ Flexibility
■ Performance
■ Scalability
■ Heterogeneity
■ Security
■ Emulation of existing OS
An operating systems framework for a distributed system.
OS is a resource allocator, a resource manager in a centralized system. Therefore, there must be a distributed OS for a distributed system carrying out the same set of functions.
Normally, for a centralized system,
For a multi unit system, we need to organize it as
DCE = Distributed Computing Element
DCE An operating system
DCE: A TCP/IP architecture by OSF (Open Software Foundation) to furnish an opens system platform.
Think of a set of individual pillars. Each one can support its own weight and objects placed on top of it.
But to make it a bridge on these pillars, all we need is a
horizontal slab whose weight all the pillars could take. The horizontal slab is the DCE.
DCE provides integration. DCE provides communication. DCE provides security, a common framework for time, directory and file service. All these are based on three computing models:
■ Client-server architecture
■ Remote Procedure Calls
■ Shared Files