Aim: Case study on BitTorrent

Theory:

BitTorrent is an application layer network protocol used to distribute files. It uses a peer-to-peer (P2P) network architecture where many peers act as a client and a server by downloadingfrom peers at the same time they are uploading to others. The serving capacity increases as thenumber of downloaders increases making the system self-scaling. It also uses a client-serverarchitecture where peers contact the server to find other peers that they may connect to.BitTorrent was created by Bram Cohen in 2002. He created it as a way to distribute thefree Linux operating system.

The main difference between the BitTorrent and other P2P protocols isit does not have any search functionality in the protocol. BitTorrent does not allow users to share files directly from their computer like other P2P networks, instead a torrent file needs to be created which represents a peer-to-peer transfersession, for a particular file or files. There is no way of searching for these torrent files by usingthe BitTorrent protocol. Instead most peers download these torrent files from a website, whichusually hosts many torrent files and allows users to upload their own torrent files.

BitTorrent Terminologies

The following are the most common terminologies in the BitTorrent protocol:

  1. Torrent: A small metadata file, which contains information about the data, you want to download, not the data itself.
  2. Peer:A peer is another computer on the Internet that is sharing the file you wish to download.
  3. Seed:A seed is a peer that has the entire copy of the specific torrent the peer wishes to download. The more seeds there are, the better the chances are for completion of the file downloading.
  4. Leech:A leech is a peer that wishes to download files but not share or upload the files on its computer.
  5. Swarm:A swarm is a group of users that are collectively connected for a particular file that is they are either uploading or downloading the same file.
  6. Tracker:It is a server on the Internet that coordinates the action of BitTorrent clients.
  7. Choked:A connection is defined as choked when the transmitter isn’t sending anything on the link.

Once a peerhas the torrent file it may join the torrent session by opening this file in their BitTorrentapplication. A peer must be in one of two states. It is in the leecher state when it is stilldownloading the file while uploading pieces it has to other leechers. A peer is in the seed state ifit has the complete file and is uploading to leechers. There needs to be at least one seeder inorder for the torrent to be alive otherwise the leechers will not be able to finish.

A file is split into equal size pieces and protocol keeps track of what pieces havebeen downloaded. Each peer must maintain a list of peers it is connected to, which is called thepeer set. Also a peer can only upload to a subset of this peer set called the active peer set. Peersalso need to know what pieces of the content each peer in its peer set has. By knowing whatpeers a peer can upload to and by knowing what pieces all of its connected peers have, BitTorrentcan use this information in order to deliver content efficiently. This results in less than a tenthpercent of bandwidth overhead and it reliably utilizes all available upload capacity.Unlike Gnutella, rather than establish a relationship between just two peers, the BitTorrent protocol simultaneously gathers pieces of a file from several peers that already have the file or that are in the process of obtaining it.

BitTorrent Architecture

BitTorrent is a hybrid network using both the client-server architecture and the peer-to-peer architecture. The centralized server is called the tracker. The tracker’s responsibility is tohelp peers find other peers. A tracker consists of many torrent sessions with each session it keepstrack of all of the peers participating in the particular torrent. The peer contacts the tracker andthe tracker responds with a list of peers it may connect to. The tracker is not responsible for theactual distribution of the content at all.

The bandwidth of the tracker is very low because it is asimple protocol, which peers only connect to when they start up and at defined time intervals of usually 30 minutes. The peer knows the URL of the tracker because it is defined in the torrentfile.

Fig:Bittorrent Architecture

The torrent file is a static ‘metainfo’ file that represents a session of content beingdistributed. The torrent file is created with the URL of the tracker and the actual file or files to bepart of this torrent. The format of the torrent file is bencoding. Bencoding consists of nesteddictionaries and lists. These dictionaries and lists can contain strings and integers. The torrentfile is a bencoded dictionary containing two keys, announce and info. The announce key is theURL of the tracker. The info key is a dictionary with the following keys: name, piece length, pieces, andeither length or files key.

Working

Unlike some other peer-to-peer downloading methods, BitTorrent is a protocol that offloads some of the file tracking work to a central server (called atracker). Another difference is that it uses a principal calledtit-for-tat. This means that in order to receive files, you have to give them. With BitTorrent, the more files you share with others, the faster your downloads are. Finally, to make better use of available Internet bandwidth (the pipeline for data transmission), BitTorrent downloads different pieces of the file you want simultaneously from multiple computers.

Overview of how bittorrent works:

  • A peer acquires a .torrent file, which will have among other things A) the SHA-1 hash of the fileset, B) the URL of the tracker, and C) the number of pieces that the file is broken into, as well as an SHA-1 hash of every piece. The size of the pieces are determined by the torrent itself.
  • The peer then connects to the tracker using the URL specified in the torrent. The tracker responds with a list of peers. Trackers talk HTTP over port 80 or 443.
  • The peer then selects another peer, using the information from the tracker, and contacts it directly to set up an exchange session, attempting to get a piece.Note that exchange sessions are directly done by the peers and the tracker is NOT involved in the transfer. The tracker only provides information.
  • Once the peer has a piece, it verifies it against the SHA-1 hash, and writes it to the file. It can then offer that piece when selecting another peer. Subsequent exchange sessions involve "trading" pieces.
  • The peer reconsults the tracker every so often to get an updated list of peers. The peer does not have to wait for one exchange to finish before starting another one if it has multiple pieces, so once the peer has a bunch of pieces the transfer can really speed up. This is why torrents start slow but gain speed quickly as the peer acquires pieces.
  • When a peer has all the pieces, the entire file is verified against the fileset SHA-1 hash. Then, it becomes aseeder, and is now doing nothing but helping the fileset be more highly available. Peers that do not have all the pieces areleechers.

If a torrent has no seeds, it is dead, although if a complete copy of the file exists between all pieces held by all peers they will eventually trade to get a complete copy amongst themselves.

BitTorrent Protocol Details

  1. Overall operation

File sharing with BitTorrent Protocol (BTP) is accomplished by means of two protocols, Tracker HTTP Protocol (THP) and Peer Wire Protocol (PWP). THP deals with the methods for communication between peer and the tracker for the purposes of joining the swarm, obtaining the tracker URL, reporting progress etc. PWP defines mechanisms for the communication between the peers and thus deals with the actual download and upload of files.

  1. Bencoding

Metainfo file as well as the responses from the tracker is encoded in a simple, efficient, and extensible format called bencoding. Bencoded messages are nested dictionaries and lists, which consist of encoded integers and strings.

Bencoding is done as follows:

  • Integers are encoded as, prefix ‘i’, followed by base ten representation of integer, and followed by suffix ‘e’. For example, integer 10 is represented as ‘i10e’.
  • Strings are encoded by prefixing the length of the string, followed by colon, followed by the actual string. For example, string spam is represented as ‘4: spam’.
  • Lists are encoded as prefix ‘l’, followed by bencoded elements, followed by suffix ‘e’.
  • Dictionaries are encoded as prefix‘d’, followed by key-value pairs where keys are bencoded strings and values are bencoded elements, followed by suffix ‘e’.
  1. Tracker HTTP Protocol:

Tracker HTTP Protocol provides methods for introducing the peers to others in the swarm. Tracker is the HTTP service for the peers to join the swarm by helping them to find each other. It neither involves in the transfer of file nor holds a copy of the file. It completely relies on the periodical requests from the peers; if it misses a request from any peer then it assumes that the peer is dead.

  • Request:

The peer contact the tracker by sending the HTTP GET request using the URL of the tracker, obtained from the Torrent file. The GET request must be parameterized as per the standards of HTTP protocol. The parameters included in the request are info_hash which is the peer calculated hash value of the information about the files to download, peer_id which is the self designated ID of the peer, port which is the port number in which the peer is listening for connections from other peers, uploaded which denotes the total number of bytes peer has uploaded after joining the swarm, downloaded which denotes the total number of files the peer downloaded from the swarm, left which indicates the total number of bytes peer needs in order to complete the download, ip which is the Internet wide address of the peer, numwant which indicates the number of peers the local peer want to receive from the tracker, and event which can be a regular request, started, completed, or stopped events.

  • Response

Upon receiving the GET request, the tracker must respond with a document containing bencoded dictionary with keys such as failure reason which is a human readable string containing reason for the failure if the request to join the swarm is a failure, interval which indicates the time interval between two consecutive regular requests, complete which indicates the number of seeder, incomplete which indicates the number of peers downloading the file, and peers which is a list of peers that needs to be contacted for downloading the file.

  1. Peer Wire Protocol

Peer Wire Protocol facilitates the communication between the peers in order to exchange the files they have. PWP describes the steps taken by a peer after it has received the information of the neighboring peers as a response from the tracker. PWP operates over a TCP connection.

  • Peer Wire Guidelines

A standard algorithm is not specified by PWP in order to select elements from the peers with whom the files are shared. Instead peers need to follow some guidelines while choosing the algorithm such as uploading at the very least the same amount that they download, pipelining the data requests etc.

  • Handshaking

The local peer should open a port for listening to the connections from the neighboring peers. The port in which the peer listens is implementation specific and is reported to the tracker along with the GET request to the tracker. Before sharing the actual data, the remote peer should open a TCP connection and perform handshake operation with the local peer. A handshake message consists of fields such as protocol name, peer id etc.

  • Message communication

Once the handshake operation is performed both ends of the TCP connection are ready to communicate by sending messages. PWP message communication takes place to inform the neighboring peers about the changes in state in the local peer (state-oriented messages) as well as for the transfer of data blocks between the peers (data-oriented messages). Interested, Uninterested, Choked, Unchoked, Have, and Bitfield come under the category of state-oriented messages whereas Request, Cancel, and Piece falls into the category of data-oriented messages.

  • The End Game

Towards the end of the download session, the local peer may send Request messages to all its neighboring peers in order to request the remaining blocks to complete the downloading of the entire file. Also, the local peer sends Cancel messages to all the pending requests if a block is received successfully. The requesting of the blocks is done in stages, as newer blocks are requested when the responses for the earlier requests are received. The client enters the end game when its request’s for all the remaining blocks are issued.

  • Piece Selection Strategy

The selection of the pieces of file has great impact on the performance of the BitTorrent protocol. It is important to select a piece of file in such a way that there should not be any pieces of file missing in the swarm. Also, the goal is to distribute the pieces to different peers as soon as possible in order to increase the download speed. This helps in preserving the complete copy of the file even if the seeder leaves the swarm at some point. Several policies are employed in selecting the blocks to download

  • Strict policy, which selects the remaining blocks of the piece of file once a block, is requested, before requesting any blocks that belong to another piece of file.
  • Rarest first selects the blocks that are not common in the swarm to download so that more rare blocks of the file will be available for the other peers to download.
  • Random first block, which selects a random available block, is employed by a peer that has joined the swarm and who does not possess any other blocks with it.
  • Peer Selection Strategy

This describes the choking algorithm that helps in determining the peer for exchanging the block, from among the neighboring peers. One method is to periodically rate the peers depending on their upload rate to the client and other implementation criteria. But, this selection method has the disadvantage of not showing a fair scheme for the new peers to download a file. Another method employed is to select the peers randomly at regular intervals so that even the new peers get a chance of being unchoked.

The Future of BitTorrent

With the technological breakthrough of the file transfer, the technology of BitTorrent could be widely applied in many different online services that involved file transfer.

  • Video and audio streaming

Broadband TV stations and online radio stations are popular in most countries. In the existing technical settings, large amount of streaming servers are needed to provide stable services to users. However, if the technology of BT can be applied to video and audio streaming services, all users consuming the same online TV or radio program could download and upload the pieces of stream at the same time, that means the serving cost of the service providers could be largely reduced with the latest technology.

  • Video sharing community

In the era of Web 2.0, web content will be highly depended on the users submission and contribution. With the success of YouTube, the users generated video has become a latest personal publishing channel, hugh amount of interesting content which users generated and consumed by themselves. However the service is limited by the resolution of the video and downloading time when the amount of usage increases. The bigger sizes and better quality of the video, the longer the downloading time. The BitTorrent protocol can be used to provide the video sharing service with high density quality among the users. Here in this case the the video which is watched by user will be stored on local hard drive of the user and then the BitTorrent client running on machine of that user will inform central server about new seed. Then this user can contribute in the distribution of video to other users. Since many videos in video sharing community are watched by a huge amount of users, BitTorrent will help in distributing load on central server to great extent.

  • Online game

Online games are common in the world. Millions of users are playing games together every moment, the enormous concurrent access to the servers limit the performance of the speed of loading of the games. Although the patches of the games are minimized to speed up the game, the online game companies haven’t come up a model to deal with the large investment of servers and bandwidth cost. Similar to the above cases on video and audio streaming, BT could be applied to the online game industry which could highly decrease the loading time. With the increase of number of players, the more computers become servers to upload and download the patches together. And finally the speed of the game will be increased with more people joining the game at the same time.