Q Mutual Exclusion in Distributed Systems

q Mutual Exclusion in Distributed Systems

v Limitations of a Distributed System

¨ Absence of global clock

§ The absence of perfectly synchronized clocks and global time in distributed systems makes the determination of the occurrences of events in different parts of the system difficult to do without some algorithms to help us

§ Many algorithms under certain conditions, make it possible to ascertain the order in which two events occur

¨ Absence of shared memory

v Lamport’s Logical Clocks

¨ Some definitions:

§ a ® b, if a and b are events in the same system, a occurred before b

§ a ® b, if a is the event of sending a message m in a process, b is the event of receipt of the same message m by another process

§ If a ® b and b ® c, then a ® c

§ Causally related events: past events influence future events

- Event a causally affects event b if a ® b

§ Concurrent events: Two distinct events a and b are said to be concurrent (a || b) if a !® b and b !® a

¨ Conditions satisfied by the system of clocks

§ For any event, if a ® b, then C(a) < C(b)

§ For any two events a and b in a process PI, if a occurs before b, then CI(a) < CI(b)

§ If a is the event of sending a message m from process Pi and b is the event of receiving the same message m at process Pj, then Ci(a) < Cj(b)

§ Clock Ci is incremented between any two successive events in process Pi

- Ci := Ci + d (d > 0)

- If a and b are two successive events in PI and a ® b, then Ci(b) := Ci(a)+ d

§ If event a is the sending of message m by process Pi, the message m is assigned a timestamp tm = Ci(a). On receiving the same message m by process Pi, CI is set to a value greater than or equal to its present value and greater than tm

- Cj := max(Cj , tm + d) (d > 0)

v If a ® b, then C(a) < C(b). The reverse is not necessary true if the events have occurred in different processes.

v Using Lamport’s clock timestamps, we cannot conclude the causal relationship between two events occurring in different processes by just looking at the timestamps of the events.

v One solution to this limitation is the use of Vector clocks, where on receipt of messages, a process learns about the more recent clock values of the rest of the processes in the system.

Vector Clocks

¨ Each process Pi is equipped with a clock Ci which is an integer vector of length n, where n is the number of processes

¨ Implementation rules for the vector clocks:

§ Clock Ci is incremented between any two successive events in process Pi

- Ci[ i ] := Ci[ i ] + d (d > 0)

§ If event a is the sending of the message m by process PI, then message m is assigned a vector timestamp tm = Ci( a ); on receiving the same message m by process Pj, Cj is updated as follows:

- "k, Cj[ k ] := max (Cj[ k ], tm[k])

¨ The basic idea behind vector clocks is that on the receipt of messages, a process learns about the more recent clock values of the rest of the processes in the system.

v Causally Related Events

¨ Events a and b are causally related, if ta < tb or tb < ta. Otherwise, these events are concurrent

v Mutual Exclusion for Distributed Systems

¨ Lack of shared memory (shared variables)

¨ Must rely on message passing to assure mutual exclusion

¨ Mutual exclusion algorithms for distributed systems:

§ Non-token based

§ Token based

v Measuring performance

¨ Number of messages necessary per CS (Critical Section) invocation

¨ Synchronization Delay: the time required after a site leaves the CS and before the next site enters the CS

¨ Throughput: CS execution rate

¨ Low and high system load performance rating

¨ Best and worst case performance

v Solving the distributed mutual exclusion problem

¨ The control site algorithm

§ Single point of failure

§ Uneven work load

§ High synchronization delay

§ Low system throughput

§ High and uneven network traffic

v Non-Token-Based Algorithms

¨ The concept of information structure forms the basis for unifying different non-token-based mutual exclusion algorithms

§ This information structure defines the data structure needed at a site to record the status of other sites

§ The information kept in the data structure is used by a site in making decisions when invoking mutual exclusion

¨ The information structure at site Si consists of the following three sets:

- Request set Ri

- Inform set Ii

- Status set Sti

¨ A site must obtain permission from all the sites in its request set before entering CS

¨ Every site must inform all the sites in its inform set when waiting to enter the CS or exiting the CS of its status change

¨ The status set contains the ids of sites for which Si maintains status information

¨ If Si Î Ij Þ Sj Î Sti

¨ A site maintains a variable CSSTAT containing the site’s knowledge of the status of the CS and a queue containing REQUEST messages in the order of their timestamps for which no GRANT message has been sent

¨ Correctness condition: To guarantee mutual exclusion, the information structure of sites in the generalized algorithm must satisfy the following conditions:

§ If "i: 1 £ i £ N :: Si Î Ii the following conditions are necessary and sufficient to guarantee mutual exclusion

§ "i: 1 £ i £ N :: Ii Ì Ri

§ "i"j: 1 £ i,j £ N :: (Ii Ç I j ¹ F)Ú(Si Î Rj Ù Sj Î Ri) (for every two sites, either they request permission from each other or they request permission from a common site)

¨ Lamport’s algorithm

§ Uses Lamport’s clocks

§ Every site SI keeps a request-queue containing mutual exclusion requests ordered by their timestamps

§ Every site has a request set: "i: 1 £ i £ N :: RI = {S1, S2, …, SN}

§ Number of messages per CS invocation are 3(N-1)

§ Synchronization delay is T

§ Requesting the CS by site Si:

§ Si Sends REQUEST(tsi, i) message to all sites in Ri and places the request on the request_queuei

§ Sj receives REQUEST(tsi, i), message and sends timestamped REPLY message back to Si, and places the request on the request_queuej.

§ Entering the CS by Si:

§ It has received a message with larger timestamp than (tsi, i) from all other sites (assuring that every one has received and replied to its request)

§ It’s request is at the top of the request_queuei

§ Releasing the CS by Si:

§ Removes its request from the top of its request queue and sends a timestamped RELEASE message to all the sites in its request set

§ Sj receives RELEASE message and removes the request from the request queue

¨ The proof of the theorem that Lamport’s algorithm achieves mutual exclusion can be done by contradiction

¨ Performance:

§ Requires 3(N-1) messages, where N is the number of sites in the request set

§ Has a synchronization delay of T, where T is the average message delay

¨ Ricart-Agrawala Algorithm

§ An improvement on Lamport’s algorithm which requires few number of messages

§ Requesting the CS by Si:

§ Sends timetamped REQUEST messages to all sites in its request set

§ Receiving site Sj sends a REPLY message to Si if it is neither requesting nor executing the CS or if it is requesting and has a higher timestamp than Si

§ Executing the CS by Si:

§ Enters the CS after it has received REPLY messages from all the sites in its request set

§ Releasing the CS by Si:

§ Sends REPLY messages to all the deferred requests

¨ The proof of the theorem that Ricart-Agrawala algorithm achieves mutual exclusion can also be done by contradiction

¨ Performance:

§ Requires 2(N-1) messages, where N is the number of sites in the request set

§ Has a synchronization delay of T, where T is the average message delay

v Token-Based Algorithms

¨ Unlike non-token-based algorithms, token-based algorithms are free from starvation and deadlock

¨ Suzuki-Kasami algorithm

§ CS is entered by the site having the token which is passed around the sites

§ A site void of the token, attempting to enter the CS broadcasts a REQUEST message for the token to all the other sites

§ Outdated REQUEST(i,n) messages are distinguished from the current ones by the array RNI[1…N] where RNI[ j ] is the largest sequence number received so far in a REQUEST message from Sj. When site Si receives a REQUEST(j,n) message, it sets RNi[ j ] := max(RNI[ j ], n)

§ The token consists of a queue of requesting sites, Q, and an array of integers LN[1..N], where LN[j] is the sequence number of the request that site Sj executed most recently. LN[I] := RNi[ i ] by SI to indicate that its request has been executed

§ Requesting the CS by Si when it does not have the token:

§ Increment RNi[i] and sends REQUEST(i,sn) messages to all sites. sn = RNi[i]

§ Receiving site Sj sets RNj[i] = max(RNi[i], sn) and if it has the token and does not need it, sends it to Si if RNj[i] = LN[I] + 1

§ Executing the CS by Si:

§ When it possesses the token

§ Releasing the CS by Si:

§ Sets LN[I] = RNi[i]

§ For every Sj whose ID is not in the token queue, it appends its ID to the token queue if RNi[j] = LN[j] + 1

§ If token queue is non empty after the above update, it deletes the top site ID from the queue and sends the token to the site indicated by the ID

¨ Performance:

§ Very simple yet very efficient

§ Requires ONLY 0 or N messages per CS execution, where N is the number of sites

§ Has a synchronization delay of 0 or T, where T is the average message delay