DATA ALLOCATION TECHNIQUES IN DISTRIBUTED
DATABASESYSTEMS
MITAT UYSAL
DoğuşUniversity
Department of Computer Engineering
Acıbadem-İstanbul
TURKEY
ABSTRACT
In this paper,different data allocation models in distributed database systems are reviewed and a new dynamic allocation algorithm for non-replicated distributed database systems,namely, the threshold algorithm,is formulated and proposed.The threshold algorithm reallocates data with respect to changing data access patterns.
This paper also presents a new simulation model for evaluating the performance of non-replicated distributed database systems.
Keywords: Data allocation techniques,distributed database design,threshold algorithms,Markov processes,system simulation,optimal algorithms
1
PROBLEM DEFINITION
A distributed database is a collection of multiple,logically interrelated databases over a computer network.A distributed database management system is then defined as the software management system that permits the management of the DDBS and makes the distribution transparent to the users.[1]
The relational model represents the data in a database as a collection of relations when a relation is thought of as a table of values,each row in the table represents a collection of related data values.
In relational database terminology,a row is called a tuple,a column name is called an attribute and the table is called a relation.
With respect to fragmentation,the important issue is the appropriate unit of distribution.There are clearly two alternatives for fragmentation:a)Horizontal fragmentation b)Vertical fragmentation
The organization of distributed database systems can be investigated along three dimensions:I)Level of sharing, II)Behavior of access patterns, and III)Level of knowledge on access pattern behavior[1].
1
DATA ALLOCATION PROBLEM
The allocation of resources across the nodes of a computer network is an old problem that has been studied extensively. Most of this work, however, does not address the problem of distributed database design, but rather that of placing individual files on a computer network [1].
We will define firstly the allocation problem:
DEFINITION: Assume that there are a set of database parts F={f1,f2,…….fn}(which are called as fragments of tables) and a network consisting of nodes N={N1,N2,….Nm} on which a set of applications A={a1,a2,…..ak} is running. The allocation problem involves finding the “optimal distribution of F to N” [1][2].
COST FUNCTION
Total cost function C can be expressed as follows:
C= Ccomm + Cproc + Cstorage
where
Ccomm is the communications cost for message and data
Cproc is the site processing cost (CPU and I/O)
Cstorage is the storage cost for data and programs at sites.
OPTIMALITY
Optimality can be defined with respect to two measures:
- Minimal cost: The allocation problem attempts to find an allocation scheme that minimizes the cost function that is given above.
- Performance: The allocation strategy is designed to maintain a performance metric. Two well-known ones are to minimize the response time and to maximize the system throughput at each site [3].
SOLUTION METHODS
A number of different heuristics have been applied to the solution of the database allocation problem.
The data allocation problem was introduced when Eswaran [4] first proposed the concept of data fragmentation. Studies on vertical fragmentation [5][6], horizontal fragmentation [7] and mixed fragmentation [8][9] were conducted.
In these studies, data allocation is determined prior to the design of a database depending on some static data access patterns and/or static query patterns. In a static environment where the access probabilities of nodes to fragments never change, a static allocation of fragments provides the best solution.
However, in a dynamic environment where these probabilities change over time, the static allocation solution would degrade database performance [1].
Initial studies on dynamic data allocation give a framework for data redistribution [10] and demonstrate how to perform the redistribution process in the minimum possible time [11].
THE NON-REDUNDANT “BEST FIT” METHOD
A general rule for data allocation states that data should be placed as close as possible to where it will be used and then load balancing should be considered to find a global optimization of system performance.
The non-redundant “best fit” method determines the single most likely site to allocate a table based on maximum benefit, where benefit is interpreted to mean total query and update references.
The “best-fit” approach for non-replicated allocation
We allocate Pi at the site j, where the number of references Bij to Pi is maximum
Bij = fkj (rki + uki)
k
To determine the appropriate allocation of a part, we need to measure the costs and benefits.
We need to know:
- the frequency of application k at node j as fkj
- the number of retrieval references of application k to part Pi as rki
- the number of update references of application k to part Pi as uki
[12]
The main advantage of this algorithm is its simplicity.
The disadvantages of the algorithm can be described as below:
- The number of local references may not accurately characterize time or cost (reads and writes given equal weights)
- There are no insights regarding replication
THE REDUNDANT “ALL BENEFICIAL SITES” METHOD
This method can be used for either the redundant or non-redundant case. It selects all sites for a table allocation where the benefit is greater than the cost for one additional copy of that table. It can be assumed to start with zero copies.
The benefit for database part P at node (site) N is measured by the difference in elapsed time to do a remote query to database part P from node N (i.e.,there is no replicated copy available locally). Total benefit for database part P at node N is the weighted sum of the benefit for each query times the frequency of queries.
The cost for database part P at node N is the total elapsed time for all the local updates of database part P plus the total elapsed time for all the remote updates for the given database part at that node.
Total cost for database part P at node N is the weighted sum of the cost for each update transaction times the frequency of update transactions.
The “all beneficial sites” method for replicated allocation
We place Pi at all sites j where the cost of retrieval references is larger than the cost of update references. Bij is evaluated as a difference:
Bij = fkj rki – C fkj´ uki
k k j´≠j
where C is a constant which measures the ratio between cost of update and retrieval access.
Advantages of the “non-redundant best-fit method” are:
- simplicity
- it can be applied to either redundant or non-redundant cases
- it reads and writes given appropriate weights
Disadvantages of the method can be described also as below:
- global averages of query and update time may not be realistic
- network topology and protocols are not taken into account
RECENT STUDIES IN THE ALLOCATION PROBLEM
Ulus,T. And Uysal,M. propose a new dynamic data allocation algorithm for non-replicated distributed databases and analyze the algorithm using a finite-state Markov chain.[13] This study is based on [2]. In this study, horizontal, vertical or mixed fragmentation can be used. The allocation unit can even be as small as a record or an attribute.
OPTIMAL ALGORITHM(Ulus and Uysal)
In distributed database systems, the performance increases when the fragments are stored at the nodes from which they are most frequently accessed. The problem is to find this particular node for each fragment. Counting the accesses of each node to a fragment offers a practical solution. Having the highest access value for a particular fragment, a node could be the primary candidate to store the fragment.
An m by n access counter matrix S, where m denotes the number of fragments and n denotes the number of nodes, is shown below.
Every element sij of S, where (i.e., non-negative integers), shows the number of accesses to fragment i by node j. A row of S shows the access counts of all nodes to a particular fragment, whereas a column of S shows the access counts of all fragments for a particular node. S is updated after each access to the fragment. To satisfy the minimum response time constraint, each fragment should be stored in the node with the highest access count in its corresponding row. In other words, if for all, then the fragment i should be stored in node x.
For example, let the following matrix be the access counter matrix, S, of a DDS with three nodes and three fragments.
In the example, A, B, C denote fragments and x, y, z denote nodes. Fragment A is accessed 3, 6 and 2 times by nodes x, y and z, respectively. For fragments B and C, the access counts are 5, 3, 1 and 1, 4, 2 respectively. According to S, fragment A and C should be stored in node y; and fragment B should be stored in node x.
The storage node of S is an issue of the algorithm. To store it in a central node creates extra network traffic overhead. It also leads to a reliability problem in case of central node failure. The best solution would be to decompose the matrix into rows and store each row together with its associated fragment in the same node. In this way, whenever the fragment migrates, its associated counters migrate as well. Figure 1 shows fragment i with its associated counters, s0 through sn.
Figure 1. Any fragment i in optimal algorithm.
1.For each (locally) stored fragment, initialize the access counter rows to zero. (sik=0, k=1,2,..n)
2.Process an access request for the stored fragment.
3.Increase the corresponding access counter of the accessing node for the stored fragment. (If node x accesses fragment i, set six= six +1.)
4.If the accessing node is the current owner, go to step 2 (i.e., local access, otherwise it is a remote access).
5.If the counter of a remote node is greater than the counter of the current owner node, transfer the ownership of the fragment together with the access counter array to the remote node (i.e., the fragment migrates). (If node x accesses fragment i and six>sij , send fragment i to node x.)
6.Go to step 2.
Figure 2. Optimal algorithm.
There are two inherent properties introduced by the optimal algorithm. The first one is the ownership property, that is, for each fragment the node with highest access counter value is the current owner node of the fragment, in which case the fragment is stored in this node.
The second one, the migration property, dictates that for any fragment the ownership is transferred to a new node if the access counter value of the new node exceeds the access counter value of current owner node. In this case, the particular fragment migrates and is stored in this new owner node. In other words, the owner node of the fragment changes.
An advantage of the optimal algorithm is central node independence. That is, since each node runs the algorithm autonomously, there is no central node dependence. Every node is of equal importance. Whenever one node crashes, the algorithm may continue its operation without the fragments stored in the crashed node.
There are two drawbacks associated with the optimal algorithm. The first one is the potential storage problem. As the fragment size decreases and/or the number of nodes increases, the size of the access counter matrix increases, which in turn results in extra storage space need for the access counter matrix. For instance, if the fragment size is one record and the number of nodes is 500, then for each record an array of 500 access counter values should be stored. In some cases, this access counter array size may exceed the record size.
The second drawback is the scaling problem for the data type that stores the access counter values. Since access counter values are continuously increasing, this problem may result in anomalies. For example, if one byte is chosen to store the counter values, then a value greater than 255 cannot be stored in this data type.
A potential timing problem, which may cause back and forth migration of a fragment, deserves explanation. Think of a case where there are two nodes denoted by Y and Z. Suppose at an instant, Y has the highest access counter for a fragment whereas Z’s counter is one less than Y's. Consider Z performs two successive accesses to the fragment, causing a transfer of the fragment from Y to Z. Later on, Y performs two successive accesses to the fragment, causing again the transfer of the fragment from Z to Y. If these successive two accesses of Y and Z continue in turns, the fragment will migrate back and forth between Y and Z after every two accesses. This can also be generalized to multiple nodes where fragments migrate in a circular fashion.
THE THRESHOLD ALGORITHM
In some cases, due to the extra storage space need, it could be very costly to use the optimal algorithm in its original form. For a less costly algorithm, the solution is to decrease the need for extra storage space. The proposed threshold algorithm in this paper serves this purpose.
Let the number of nodes be n and let xs denote the access probability of a node to a particular fragment. Suppose the fragment is stored in this particular node (i.e. it is the owner node). For the sake of simplicity, let xd denote the access probability of all the other nodes to this particular fragment. The owner does local access, whereas the remaining nodes do remote access to the fragment.
The probability that the owner node does not access the fragment is (n - 1)xd. The probability that the owner node does not perform two successive accesses is [(n - 1)xd]2. Similarly, the probability that the owner node does not perform m successive accesses is [(n - 1)xd]m. Therefore; the probability that the owner node performs at least one access of m successive accesses is 1-[(n - 1)xd]m.[13]
SIMULATION MODEL
Database parts are taken as tables. So, there are n tables in the database. The set of tables, T, is defined as below:
T= {T1, T2, ……………… Tn}
The topology of the network is given and the network has m nodes. The set of nodes, N is defined as below:
N= {N1, N2, ……………. Nm}
The set of queries which is running on this network and uses the database, Q, is defined as below:
Q= {q1, q2, ………………qk}
An n x m access counter matrix S is defined as below:
S=
where n denotes the number of tables and m denotes the number of nodes. Every element Sij of S where Sij Z+ U {0}, shows the number of accesses to table i by node j.
To satisfy the minimum number response time constraint, each table should be stored in the node with the highest access count in its corresponding row.
We assume a system model with Poisson query transaction arrivals. Let ti, i>=1, denote the arrival time of the i-th query transaction. Let Xi denote the inter-arrival time between the arrival of the i-th query transaction and the (i-1)-th query transaction, i.e.,
X1=t1
Xi=ti – ti-1i≥2
The distributions of the random variables {Ti : i≥1} or equivalently {Ai : i≥1} are specified to describe an arrival process.
One special class of arrival process that is particularly useful is a renewal process. A stochastic process is called a renewal process if the sequence inter-arrival times {Ai : i≥1} are independently identically distributed. Suppose A ~ fA is a “nice” density, i.e., one can generate samples of fA. Then the following simple algorithm simulates the corresponding renewal process:
- Generate A1, A2, ……….. An ~ FA (a) independent
- Set T1 A1 and Ti Ti-1 + Ai, i≥2
- Set N(t) =0 for t [0, T1] and N(t) =i for t [Ti,Ti+1), i≥1
N(t) is a continuous time stochastic process taking values in Zt ={0, 1, 2, ….}
In the simulation model,a distributed database system consists of an arbitrary but,for one simulation run,fixed number of sites.Each site has one CPU server,so at any point in time only one autonomously operating entity can be active at one site.
A site is responsible for properly scheduling the entities located on it.They are scheduled on a first-come/first-served basis.Rather than having transactions enter the system at arbitrary points in time,a number of transactions are started at the beginning of a simulation run and the system then preserves the same number of active transactions throughout the simulation.
ASSUMPTIONS
Access to data items is assumed to be distributed uniformly over the entire database. All transactions are of the same size and transaction operations arrive at each data item as a Poisson process.[14]
RESULTS
Performance in distributed database systems isdependent on the allocation of data among sites of the database.The allocation of data is traditionally static and determined off-line,using estimates of access frequencies.However,in many situations the access frequencies from various sites in the database are not known a-priori.Re-allocation of data using a dynamic method can significantly improve the performance of the distributed database system.In this paper,a new algorithm and a simulation model are given for achieving this task.
The threshold algorithm is modeled using a finite-state Markov chain. To simplify the model, a special case where the access probabilities of the nodes are all equal except a single node is examined. The equilibrium probabilities for a fragment in any node are obtained in terms of access probabilities and the threshold value. The behavior of a fragment, in reaction to a change in access probabilities or to a change in threshold value, is investigated. It is shown that the fragment tends to stay at the node with higher access probability. As the access probability of the node increases, the tendency to remain at this node also increases. It is also shown that as the threshold value increases, the fragment will tend to stay more at the node with higher access probability.
The theorems that are given for special cases in [13] are tested using simulation runs.
When the threshold algorithm with a threshold t is used, the database part will be in node Owith the probability Os is given in [13] as below:
where all nodes have equal access probability of Xd to a particular database part except node O, which has a different access probability of Xs, where Xs [0, 1] and Xd [0, 1], Xs 0, Xd 0 and Xs 1.