Otto Borchert
Computer Science 765
Maximizing the Effectiveness of Peano Trees through the use of Distributed Computing and Data Striping
Introduction
Peano trees are a new invention poised to change the way spatial data is recorded, used, evaluated, and searched. This new technology is just coming to life, and the ability to use it effectively is an important step to making the technology mainstream. An array of tools will be necessary in order to exploit these Peano trees, also known as P-trees, to their full potential. One of the first tools required for widespread use of the technology is a way to increase the speed and efficiency of processing the information stored in them. A combination of data striping and distributed computing is an excellent first step to achieving these goals. This paper will start by giving a background of Peano trees, data striping, and distributed computing and will describe how these can be used in tandem to make P-trees a more useful technology. Finally, there will be a description of a prototype system with an array of 16 nodes used to demonstrate the effectiveness of such a project.
Peano Trees - What are they?
The Peano Tree is a relatively new invention by Dr. William Perrizo from the Computer Science department at North Dakota State University. Peano trees are used to store spatial data in a lossless compressed form for which it is easy to perform various operations to aid in data mining techniques, such as counts, ANDs, ORs, and complements. [PER]
For an example of how a Peano tree is created, see figure one. To begin with, a file of values is turned into its binary equivalent as shown in the first stage of the figure. These equivalents are then split into different files based on bit position. The three "bit sequential" files are shown in the second stage of the figure. Bit sequential files are a related invention of Dr. Perrizo's. [PER] Next, these files are re-ordered into "Peano order". If you follow the lines like a Z in the figure, you can see how the file is created into Peano order. Finally this information is organized into a tree structure. The first child represents the first quadrant of the whole file, the second child represents the second, etc. The first child of the first child represents the first quadrant of the first quadrant and so on. A one value node means that this is a "pure 1" or all of the values in this quadrant are 1. A zero value then means one of two things: if there are children in that node, that's how the subquadrants of this quadrant are laid out; if there are no children, the quadrant is pure 0; that is, all of the values in this quardrant are 0.
Figure 1. Peano Tree Example
Decimal ValueBinary Value
0000
1001
1001
2010
2010
2010
1001
1001
4100
4100
5101
6110
6110
5101
1001
0000
Bit Sequential Files (Raster order)
0000 0000 1111 1100
0001 1100 0001 1000
0110 0011 0010 0110
Bit position 1 in raster order. Notice the Z pattern in order to build the peano ordering
0 - 0 0 - 0
/ / /
0 - 0 / 0 - 0
/
------
/
1 - 1 1 - 1
/ / /
1 - 1 / 0 - 0
Peano Order
0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0
Peano tree
0
|
0 0 1 0
|
1100
How exactly is this useful? You will notice the incredible compression this allows, even in this simple example. Since spatial data is normally very continuous (as opposed to having discrete locations that change drastically from point to point) this is actually a fairly good example of a real world scenario. Also, by ordering data by bit position you can gather more useful information. For example, the first bit position which shows the largest amount of change can be more useful than the last bit position which only shows differences in values of 1. There are also a number of recent presentations that show the usefulness of peano trees in a variety of application including microarray data, stock market data, genetic information, and the original use in crop analysis. [CLA]
Note that this is not the only way one could describe a peano tree. The particular type of tree used in figure one is called the P1 tree. In an NZ tree, values of 0 are pure, while 1 describes the mixed nodes. This project uses the fact that if you compare the two different tree types, you can be sure of the underlying values. If both a P1 and NZ type tree have the same value, all of the subquadrants of this quadrant have the same value, where if they are different, the subquadrants are mixed.
Another way of describing a peano tree is through the use of a vector format. In such a format, the quadrant in question is listed, followed by the P1 and NZ tree values. The first pure 0 and pure 1 quadrants are displayed, but all children nodes are removed because of redundancy. For example, see figure two below. The figure describes the p-tree shown in figure 1. Such a system creates even smaller peano tree representations, because there are no tree pointers involved.
Figure 2 - Peano Tree Vector Format
QuadrantNZP1
[] 00110010
[11]11001100
Data Striping - Spreading Data Out
Ramakrishnan describes data striping as segmenting data into partitions of equal size for distribution across multiple disks. [RAM] With respect to peano trees, this would be the distributing of pieces of the p-tree across multiple disk drives. There are a number of well proven methods of data striping that already exist and are in practice today.
The most widely used method of data striping is in RAID arrays. RAID level 0 describes data striping when used to distribute data of any kind across multiple disks. In RAID level 0, data is split into some segment known as a striping unit and placed on multiple disks. So, for example, the first byte of a file is stored on one disk, the second byte on a second disk and so on.
Distributed Computing - Putting data on different computers
Distributed computing is the final link of the three technologies necessary to build a prototype system. By using computers in different, yet close, physical proximity can allow for the distribution of processing for later tools that can do root counts, and operations, and other data mining calculations.
According to Dr. Michael Stonebraker, and other sources, there are 3 main types of distributed computing: Shared memory, shared disk, and shared nothing. Each has its own advantages and disadvantages. [STO]
The shared memory architecture uses multiple processors that share a common memory area. For example, a symmetric multiprocessor computer would fall into this category, if it were running by itself. This architecture is the easiest to maintain with only one central system, but is difficult to scale up to large sizes. For example, there are very few computers built with more than 8 or 16 processors on one motherboard.
The shared disk architecture uses multiple processors that each has their own memory area, but they share a hard disk. An example of such a system is a cluster of computers connected to an NFS server. According to Stonebraker, This methodology is a middle ground between shared memory and shared nothing when considering scalability and maintenance difficulties. [STO]
The shared nothing architecture, as the name implies, involves using multiple processors that share no disk or memory. Normally, this is accomplished using a network connection between multiple computers. An example includes the Linux cluster, Beowulf system. This system is the most difficult to maintain because each system has its own operating system and set of hardware to maintain, but there is no single point of failure and scales up very well.
The Project
By combining all three of these topics into one project, I hope to develop the best way to store P-trees in a distributed environment. There has already been some work done in developing the best strategy for laying out P-trees into vectors for use in programs that calculate root counts, AND, and OR operations.
First, one needs to determine a way of splitting the p-trees into "striping units" so that they can be distributed across multiple disks. There is an obvious point at which to divide the p-trees, notably, each of their children can go to a different disk. In this theory, information about any first quadrant data would be on disk 0, the second quadrant in disk 1, and so on.
Next, one must determine a way to use multiple computers to create a distributed environment for the data to be stored. A shared memory architecture could be feasible, in that you could use a multi-processor machine connected to an array of hard drives, but this would not fit established systems that well. Another idea is to use a shared nothing approach using TCP/IP to connect the server with a multitude of clients. This is the approach I used when building the prototype system.
So based upon these ideas, one could build a system. First, a server would take a tabular file and create a series of peano trees that need to be striped. Next, clients connect to the server in order to perform the striping on each of their respective disks. Then the server takes the p-trees, gives each of the clients a quadrant to work on, and keeps track of what trees and quadrants each client controls. The clients then process the data, striping information that needs to be striped and passes quadrants that need to be striped to other clients.
One question that naturally arises from this algorithm is what would the best way to divide work amongst the clients. My idea is to take a pseudo round robin approach. If the server keeps track of what peano trees correspond to the first bit positions, you must make an attempt to keep those quadrants on different computers as they will more than likely be queried the most during calculations. For example, if the server parses three tabular files, make sure that the quadrants from the first file created from each tabular file are on separate systems.
A second question is what exactly does a client do when it "processes the data" When a client receives a peano tree, it looks at an NZ representation of the tree and a P1 representation of the tree. It then writes these two values to its hard drive. Then it needs to determine if more information needs to be stored. So, it compares the NZ and P1 values. If they are the same, no further processing is required because the quadrant is either pure 1 or pure 0. If they are different however, that means that quadrant is mixed and needs to be written to another disk. So, the client then sends that information to the client responsible for storing that quadrant of data.
The Prototype System
In order to prove this would be a useful project, it would be best to build a prototype system. I will start by giving an overview of what I built and then go into the specifics of each module.
I chose to write the code in C++ for its modularity and my familiarity with network programming in C. My overall goals were to make it as modular as possible so that I could swap in and out implementations as much as possible. For example, I feel that the P-tree implementation could be made much better. Because of the modularity of the system, one would only need to swap in a new ptree.cpp and ptree.h file and change how the P-trees are converted into network strings.
There is extensive use of sockets, which are a standard across many platforms (including Windows and UNIX). This means that it may be feasible to re-write this program for use in Windows systems as well, but that is beyond the scope of this project.
PTree Implementation
The P-tree was implemented by creating a tree structure that stored the NZ and P1 tree types in one structure, and assigning four children to it, which correspond to the four children all P-trees have. This type of structure was mainly chosen for its ease of checking NZ and P1 tree types. Also note that my representation of p-trees is very inefficient because there are always four children, even if it is a pure node. I chose this because I did not wish to take time away from the more interesting question of whether a client server that striped data was feasible or not.
Sending a Ptree across the network
Next, I needed to create a method of creating a data stream to send across the network so a client can understand and re-create the tree in question. To make a p-tree into a network stream, I went through the tree recursively, giving each node an identification number and writing the NZ and P1 giving a structure as follows: nodeID|P1|NZ|child1nodeID|child2nodeID|child3nodeID|child4nodeID. On rebuilding the tree at the other end, the program went backwards through this structure and rebuilt the tree from the bottom up, returning the root node on completion.
The drawbacks of this choice was that it is incredibly large. I used characters to send information instead of a bit stream, so the stream is at least 8 times larger than it needs to be. Also, if I used peano vectors instead of sending a tree, the system may shrink even further because of the lack of tree pointers. The advantages of the choice in network implementation has the benefit of having easier to read code, and it was very easy to debug because I could view packets as they went across the network to see if they were correct or not.
Client/Server Implementation
There are 3 states that a server using this implementation can go through. First, it waits for connections. Once the specified number of clients has connected, the server begins a minimal amount of processing by converting the p-tree into a data stream as prescribed above. This is then queued to be sent to the waiting clients. When data is returned from the clients that resulting data is then requeued for further processing. This loop is continued until each node returns a signal indicating that it has reached a leaf node.
The clients similarly have three states to their life cycle. They wait for data from the server, and then they process this data into a p-tree, write out the data necessary for this tree sections storage, and return the children nodes to the server for distribution among the other clients in the system.
Conclusion
I was able to compile the clients on two different FreeBSD systems, and using NDSU's local Linux cluster, I was able to test the project as described. As expected, results from the more remote FreeBSD systems came back slower than the local network of Linux machines. However, it did prove the scalability of the system and ability to be compiled on a variety of UNIX variant platforms.
There are a number of improvements to be made in the system. I would include a better implementation of p-trees, some type of user interface, the ability to import any tabular data you wanted to use, and using bit streams instead of characters to transport and store data. Another improvement could be to give clients information about other clients so that they could send information amongst themselves directly. This would cause an increase of overhead, but would also make the system more stable, because it would remove the single failure point that a server client paradigm of this nature introduces.
Peano trees will be an important technology for the advancement of data mining technology, and the development of tools to more efficiently use these trees will be paramount to achieving their maximum potential. This project and prototype system has proven the feasibility of data striping and distributed computing as a first step in building a toolset for p-tree manipulation. Hopefully, many more projects will follow to continue the excellent progress of this technology.
Note: The source code and presentation of this paper are available at:
References
[STO] Stonebraker, Michael, The Case for Shared Nothing appears at URL
[CLA]Class Presentations, Computer Science 765. North Dakota State University. November 26, December 3, 10, and 17, 2001. Schedule and Topics appears at URL
[RAM] Ramakrishnan, Raghu and Johannes Gehrke. Database Managment Systems: Second Edition. Boston, McGraw-Hill Higher Education, 2000.
[PER] Perrizo, William. Bit Sequential (bSQ) Sata Model and Peano Count Trees (P-trees), appears at URL