GRID COMPUTING
TECHNICAL PAPER ON ICT
JOHN THOMAS
School of Management Studies
CUSAT Kochi-22
Email:
Abstract:Grid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large tasks. This technology has been applied to computationally-intensive scientific, mathematical, and academic problems through volunteer computing, and it is used in commercial enterprises for such diverse applications as drug discovery, economic forecasting, seismic analysis, and back-office data processing in support of e-commerce and web services.
What distinguishes grid computing from typical cluster computing systems is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing grid may be dedicated to a specialized application, it is often constructed with the aid of general purpose grid software libraries and middleware.
Keywords: Grid computing,Internet
INTRODUCTION
According to a 2008 paper published by IEEE Internet Computing,"Grid Computing is a paradigm in which information is permanently stored in servers on the Internet and cached temporarily on clients that include desktops, entertainment centers, table computers, notebooks, wall computers, handhelds, sensors, monitors, etc."
The term Grid computing originated in the early 1997s as a metaphor for making computer power as easy to access as an electric power grid in Ian Foster and Carl Kesselmans seminal work, "The Grid: Blueprint for a new computing infrastructure".
CPU scavenging and volunteer computing were popularized beginning in 1997 by distributed.net and later in 1999 by SETI@home to harness the power of networked PCs worldwide, in order to solve CPU-intensive research problems.
The ideas of the grid (including those from distributed computing, object oriented programming, web services and others) were brought together by Ian Foster, Carl Kesselman and Steve Tuecke, widely regarded as the "fathers of the grid[1]." They led the effort to create the Globus Toolkit incorporating not just computation management but also storage management, security provisioning, data movement, monitoring and a toolkit for developing additional services based on the same infrastructure including agreement negotiation, notification mechanisms, trigger services and information aggregation. While the Globus Toolkit remains the defacto standard for building grid solutions, a number of other tools have been built that answer some subset of services needed to create an enterprise or global grid.
During 2007 the term cloud computing came into popularity, which is conceptually similar to the canonical Foster definition of grid computing (in terms of computing resources being consumed as electricity is from the power grid). Indeed grid computing is often (but not always) associated with the delivery of cloud computing systems.
.
Understanding Grid Computing
Grid computing describes both a platform and a type of application. A Grid computing platform dynamically provisions, configures, reconfigures, and deprovisions servers as needed. Grid applications are those that are extended to be accessible through the Internet. These Grid applications use large data centers and powerful servers that host Web applications and Web services.
Shashi B Mal, Director, Systems & Technology Group, IBM India/South Asia explained, “Grid computing is an emerging approach to shared infrastructure in which large pools of systems are linked together to provide IT services. Grid Computing will allow corporate data centers to operate more like the Internet by enabling computing across a distributed, globally accessible fabric of resources, rather than on local machines or remote server systems. Organizations can use them as much as they want and as wireless broadband connection options grow, wherever they need them.”
Grid computing describes how computer programs are hosted and operated over the Internet. The key feature of Grid computing is that both the software and the information held in it live on centrally located servers rather than on a end-user’s computer. A Google spokesperson added, “This means people can access the information that they need from any device with an Internet connection—including mobile and handheld phones—rather than being chained to the desktop. It also means lower costs, since there is no need to install software or hardware.”
Grids versus conventional supercomputers
"Distributed" or "grid" computing in general is a special type of parallel computingwhich relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus.
The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a multiprocessor supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various processors and local storage areas do not have high-speed connections. This arrangement is thus well-suited to applications in which multiple parallel computations can take place independently, without the need to communicate intermediate results between processors.
The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public Internet.
There are also some differences in programming and deployment. It can be costly and difficult to write programs so that they can be run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs to run on multiple machines (but each given a different part of the same problem). This makes it possible to write and debug on a single conventional machine, and eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.
Data Grid
A data grid is a grid computing system that deals with data — the controlled sharing and management of large amounts of distributed data. These are often, but not always, combined with computationalgrid computing systems.
Many scientific and engineering applications require access to large amounts of distributed data (terabytes or petabytes). The size and number of these data collections has been growing rapidly in recent years and will continue to grow as new experiments and sensors come on-line, the costs of computation and data storage decrease and performances increase, and new computational science applications are developed.
Current large-scale data grid projects include the Biomedical Informatics Research Network (BIRN), the Southern California Earthquake Center (SCEC), and the Real-time Observatories, Applications, and Data management Network (ROADNet), all of which make use of the SDSCStorage Resource Broker as the underlying data grid technology. These applications require widely distributed access to data by many people in many places. The data grid creates virtual collaborative environments that support distributed but coordinated scientific and engineering research.
In Memory Data Grid also referred to as IMDG. A data grid is a grid computing system that deals with data — the controlled sharing and management of large amounts of distributed data. These are often, but not always, combined with computational grid computing systems. Full description can be found under data grid definition.
Space-Based Architecture
Space-Based Architecture (SBA) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications using the tuple space paradigm. It follows many of the principles of Representational State Transfer, Service-Oriented Architecture and Event-Driven Architecture, as well as elements of grid computing. With a space-based architecture, applications are built out of a set of self-sufficient units, known as processing-units (PU). These units are independent of each other, so that the application can scale by adding more units.
The SBA model is closely related to other patterns that have been proved successful in addressing the application scalability challenge, such as Shared-Nothing Architecture, used by Google, Amazon.com and other well-known companies. The model has also been applied by many firms in the securities industry for implementing scalable electronic securities trading applications.
Grid File System
A Grid File System is a computer file system whose goal is improved reliability and availability by taking advantage of many smaller file storage areas.
Comparisons
Because current File Systems are designed to appear as a single disk for a single computer to manage (entirely), many new challenges arise in a grid scenario whereby any single disk within the grid should be capable of handling requests for any data contained in the grid.
Features
Most file storage utilizes layers of redundancy to achieve a high level of data protection (inability to lose data). Current means of redundancy include replication and parity checks. Such redundancy can be implemented via a RAID array (whereby multiple physical disks appear to a local computer as a single disk, which may include data replication, and/or disk partitioning). Similarly, a Grid File System would consist of some level of redundancy (either at the logical file level, or at the block level, possibly including some sort of parity check) across the various disks present in the "Grid".
Framework
First and foremost, a File Table mechanism is necessary. Additionally, the file table must include a mechanism for locating the (target/destination) file within the grid. Secondly, a mechanism for working with File Data must exist. This mechanism is responsible for making File Data available to requests.
Implementation
With the recent advent of Torrent technology, a parallel can be drawn to a Grid File System, in that a torrent tracker (and search engine) would be the "File Table", and the torrent applications (transmitting the files) would be the "File Data" component. An RSS-Feed like mechanism could be utilized by File Table nodes to indicate when new files are added to the table, to instigate replication and other similar components.
A File system which incorporates Torrent technology (distributed replication, distributed data request/fulfillment) would likely be a good start for such a technology.
If both such systems (file table, and file data) were capable of being addressed as a single entity (ie: using virtual nodes in a cluster), then growth into such a system could be easily controlled simply by deciding which uses the grid member would be responsible (File Table and file lookups, and/or File Data).
Availability
Assuming there exists some method of managing data replication (assigning quotas, etc) autonomously within the grid, data could be configured for high availability, regardless of loss or outage.
Troubles
The largest problem currently revolves around distributing data updates. Torrents support minimal heiarchy (currently implemented either as metaData in the torrent tracker, or strictly as UI and basic categorization). Updating multiple nodes concurrently (assuming atomic transactions are required) presents latency during updates and additions, usually to the point of not being feasible. Additionally, a grid (network based) file system breaks traditional TCP/IP paradigms in that a File System (generally low level, ring 0 type of operations) require complicated TCP/IP implementations, introducing layers of abstraction and complication to the process of creating such a grid file system.
Examples
Current examples of high available data include: Network Load Balancing / CARP - splitting incoming requests to multiple computers, usually configured identically or as one whole Shared Storage Clustering / SANs - a single disk (one or more physical disks acting as a single logical disk) is presented to multiple computers which split incoming requests. This is usually used when more computing power is required than disk access. Data Replication / Mirroring - multiple computers may attempt to synchronize data (usually point-in-time or snapshot based). Used more often for either Reporting (based on last snapshot) or backup purposes. Data Partitioning - splitting data among multiple computers. In databases, data is often partitioned based on tables (certain tables exist on certain computers, or a table is split among multiple computers at certain "break points")... general files tend to be partitioned either by category (cetegory based folders), or location (geographically separated).
Grid computing would bring the benefits from many such solutions, if it were widely adopted.
Semantic Grid
The Semantic Grid refers to an approach to Grid computing in which information, computing resources and services are described using the semantic data model. In this model the data and metadata are expressed through facts (small sentences). Therefore it becomes directly understandable for humans. This makes it easier for resources to be discovered and joined up automatically, which helps bring resources together to create virtual organizations. The descriptions constitute metadata and are typically represented using the technologies of the Semantic Web, such as the Resource Description Framework (RDF).
By analogy with the Semantic Web, the Semantic Grid can be defined as "an extension of the current Grid in which information and services are given well-defined meaning, better enabling computers and people to work in cooperation."
This notion of the Semantic Grid was first articulated in the context of e-Science, observing that such an approach is necessary to achieve a high degree of easy-to-use and seamless automation enabling flexible collaborations and computations on a global scale.
The use of Semantic Web and other knowledge technologies in Grid applications is sometimes described as the Knowledge Grid. Semantic Grid extends this by also applying these technologies within the Grid middleware.
Some Semantic Grid activities are coordinated through the Semantic Grid Research Group of the Global Grid Forum.
Architecture
The majority of Grid computing infrastructure currently consists of reliable services delivered through next-generation data centers that are built on compute and storage virtualization technologies. The services are accessible anywhere in the world, with The Grid appearing as a single point of access for all the computing needs of consumers. Commercial offerings need to meet the quality of service requirements of customers and typically offer service level agreements.Open standards and open source software are also critical to the growth of Grid computing.
Grid computing comes into focus only when you think about what IT always needs: a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. Grid computing encompasses any subscription-based or pay-per-use service that, in real time over the Internet, extends IT's existing capabilities.
Grid computing is at an early stage, with a motley crew of providers large and small delivering a slew of Grid-based services, from full-blown applications to storage services to spam filtering. Yes, utility-style infrastructure providers are part of the mix, but so are SaaS (software as a service) providers such as Salesforce.com. Today, for the most part, IT must plug into Grid-based services individually, but Grid computing aggregators and integrators are already emerging.
InfoWorld talked to dozens of vendors, analysts, and IT customers to tease out the various components of Grid computing. Based on those discussions, here's a rough breakdown of what Grid computing is all about:
1.SaaS
This type of Grid computing delivers a single application through the browser to thousands of customers using a multitenant architecture. On the customer side, it means no upfront investment in servers or software licensing; on the provider side, with just one app to maintain, costs are low compared to conventional hosting. Salesforce.com is by far the best-known example among enterprise applications, but SaaS is also common for HR apps and has even worked its way up the food chain to ERP, with players such as Workday. And who could have predicted the sudden rise of SaaS "desktop" applications, such as Google Apps and Zoho Office?
2. Utility computing
The idea is not new, but this form of Grid computing is getting new life from Amazon.com, Sun, IBM, and others who now offer storage and virtual servers that IT can access on demand. Early enterprise adopters mainly use utility computing for supplemental, non-mission-critical needs, but one day, they may replace parts of the datacenter. Other providers offer solutions that help IT create virtual datacenters from commodity servers, such as 3Tera's AppLogic and Cohesive Flexible Technologies' Elastic Server on Demand. Liquid Computing's LiquidQ offers similar capabilities, enabling IT to stitch together memory, I/O, storage, and computational capacity as a virtualized resource pool available over the network.
3. Web services in the Grid
Closely related to SaaS, Web service providers offer APIs that enable developers to exploit functionality over the Internet, rather than delivering full-blown applications. They range from providers offering discrete business services -- such as Strike Iron and Xignite -- to the full range of APIs offered by Google Maps, ADP payroll processing, the U.S. Postal Service, Bloomberg, and even conventional credit card processing services.
4.Platform as a service
Another SaaS variation, this form of Grid computing delivers development environments as a service. You build your own applications that run on the provider's infrastructure and are delivered to your users via the