Gridftp: Advantages

Gridftp: Advantages

GridFTP: Advantages

Caitlin Minteer

RadfordUniversity

What Is GridFTP?

GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. It is based upon the Internet FTP protocol, and it implements extensions for high-performance operation that were either already specified in the FTP specification but not commonly implemented or that were proposed as extensions by our team. The current GridFTP protocol specification is now a "proposed recommendation" document in the Global Grid Forum (GFD-R-P.020).

GridFTP uses basic Grid security on both control (command) and data channels. Other features include multiple data channels for parallel transfers, partial file transfers, third-party (direct server-to-server) transfers, reusable data channels, and command pipelining.

Advantages of GridFTP

GridFTP is mainly used for securely moving very large amounts of data in the scientific and commercial communities. There are several reasons that give GridFTP this advantage. These include Security, the use of Parallel Streams, the use of Striping, Partial File Transfer, Reliable and Restartable data transfer, Data Extensibility, and Protocol Extensibility.

Security: Security tools deal with authentication or establishing the identity of users or services, protecting communications, and determining authorization,which is who is allowed to perform what actions, as well as with supporting functions such as managing user credentials and maintaining group membership information.The Globus GridFTP server and client use the Grid Security Infrastructure (GSI) protocol that allows a secure Public Key Infrastructure (PKI) interface, andadds the capability of delegated authority through certificates. By using delegated authority it eliminates the need for the user to enter passwords onto hundreds of different sites. This is done by enabling single sign-on in virtual organizations.

Performance using Parallel Streams: GridFTP supports multiple TCP streams in parallel between a single source and destination. On wide-area links, this feature can improve aggregate bandwidth relative to that achieved by a single stream. Typical performance improvements range from a factor of 3 to 5 times higher than FTP alone, by using multiple parallel streams to minimize bottlenecks inherent in TCP/IP. See Performance Tuning of GridFTP.
Performance using Striping:GridFTP supports striped parallel file transfers that can increase this performance factor by an order of magnitude. Moreover, striping and parallelism may be used in tandem; that is, one may have multiple TCP streams open between each of the multiple servers participating in a striped transfer. See also GridFTP Striping Architecture and Performance.
Partial File Transfer:In many cases in the scientific community it is expedient to download only portions of a large file, instead of the entire file. GridFTP supports this capability by specifying the byte position in the file to begin the transfer.
Third Party Control and Reliable/Restartable Data Transfer:In many cases, reliability is more important than speed. In fact, the desire for speed is born out of the user needing to baby-sit the transfer, not some intrinsic application need. To enable reliability, the GridFTP server automatically sends restart markers (checkpoints) to the client. If the transfer has a fault, the client may restart the transfer and provide the markers received. The server will restart the transfer, picking up where it left off based on the markers. The Reliable File Transfer (RFT) service goes one step further by providing a service interface (job submission like interface) and writing the restart markers to a database so that it can survive a local fault. Additionally clients are able to act as a third-party to initiate transfers between remote sites.
Data Extensibility:The Data Storage Interface (DSI) completely abstracts away the underlying storage. If the user can implement the DSI, then a GridFTP-compliant server can be put in front of the source of data. We currently have DSIs for POSIX filesystems, HPSS, the Storage Resource Broker (SRB), and a prototype DSI for doing space reservation via the Condor NeST storage utility.
Protocol Extensibility:The Globus eXtensible IO (XIO) system utilizes a read, write, open, close abstraction that Globus GridFTP is able to leverge in order to be transport protocol agnostic. Hence, in environments where it makes sense, protocols much more aggressive than TCP can be utilized. To meet more specific extensibility needs, we also provide easy-to-use development libraries. Currently there is an XIO driver for UDT.