9

HTML5 WebSocket:

Full-Duplex Solution for the Web

Daniel Imhoff

Department of Computer Science
University of Wisconsin-Platteville
April 2nd, 2013

Abstract

Originally, the purpose of the Internet was to deliver static documents from the server to the client. As the demand for interactivity increased and the potential for the Web realized, it was soon understood that the tools of the Web were largely inadequate. Thus began the pursuit of an application stack that fulfilled the extensive requirements for development while building on top of the large infrastructure that was already in place. HTML5 is a turning point in the history of the Web; it represents a turn from being able to produce simple websites to being able to create rich applications with standardized, elegant, and—perhaps most importantly—simple development tools and APIs. One of the most intriguing developments in HTML5 is WebSocket, a part of HTML5 Connectivity. WebSocket is a solution to the lack of “push” technologies (the ability for servers to initiate data transfer) as well as a solution to the half-duplex model in which the Web operates. Developers, for the first time in the history of the Web, will be able to use WebSocket to build truly real-time web applications.

Introduction

Throughout the years, the Web has blossomed and evolved from simple, static, text-based document sharing to a widely accepted and supported open industry standard application stack. Using highly scrutinized, communally open tools such as HTML, CSS, and JavaScript allows developers to develop applications and deploy them to virtually any computer or mobile device instantly. HTML5 WebSocket introduces full-duplex, bidirectional communication between the client application and remote servers over the Web without the use of poorly supported third-party plugins such as Java or Flash.[1] This means, for the first time in the history of the Web, developers are able to build truly real-time web applications.

HTML5 WebSocket comprises RFC 6455 (The WebSocket Protocol) and the WebSocket API (Application Programming Interface). The protocol is a set of rules for communication. It allows implementations of WebSocket to be different while allowing them to communicate with each other. The API controls the protocol and allows developers to use WebSocket in their applications.

History

The Web operates through HTTP (Hypertext Transfer Protocol), which is a request/response protocol for the client/server model; simply put, a web browser submits an HTTP request to the server and the server responds with the requested resource.[1] This protocol model is classified as half-duplex, which means traffic between client and server can only travel a single direction at any one time and only one of these parties is allowed to initiate data communication. While this protocol model worked great for sharing static HTML (Hypertext Markup Language) pages when the Web began, the complex web applications quickly outgrew the protocols and APIs under which they were being developed.

Information that the web server returns such as news, sports statistics, medical information, and stock prices can often be out of date by the time the user digests it. Viewers are classically trained to “refresh” web pages to get the latest information. Developers are classically trained to implement convoluted “hacks” that simulate an elegant full-duplex connection.[4] Web developers have been working around these limitations for years with three well-known methods: polling, long polling, and streaming.

Polling is a regularly intervaled HTTP request sent to the server for information regardless of whether the information is fresh or stale. Knowing exactly when information on the server is updated is impossible, thus inevitably creating unnecessary requests and connections.[1]

With long polling, a variant of polling, the browser sends HTTP requests in intervals, but the server retains the connection opened by the client until either the information is updated, in which case it sends the updated information and completes the request, or the designated timeout is reached.[1]

Streaming is initiated by the client like polling and long polling, but the server responds with a stream that is continuously updated and kept open indefinitely. Because streaming is still encapsulated in HTTP, it is susceptible to buffering by proxy servers and firewalls.[1]

These methods provide almost real-time communication, but they introduce unnecessary overhead and inherent problems. Full-duplex connectivity is never obtained, as in each case the client must wait for responses from the server in order to initiate subsequent requests.

In 2011, Ian Fette of Google, Inc. and Alexey Melnikov of Isode Ltd. worked with the Internet Engineering Task Force (IETF) to develop, finalize and publish RFC 6455: The WebSocket Protocol specification. Since then, the specification has been adopted into modern browsers such as Google Chrome, Mozilla Firefox, Microsoft Internet Explorer, Apple Safari, and Opera.

After the WebSocket protocol specification was published, the WebSocket API was written and edited by Ian Hickson of Google, Inc. The WebSocket API is a specification standardized and published by the World Wide Web Consortium (W3C), an organization dedicated to promoting web standards with which all other organizations comply.

With the WebSocket protocol and WebSocket API specifications, developers are given simple, elegant, standardized tools for implementing full-duplex, bidirectional communication between client and server.

Importance

WebSocket is important because using WebSocket instead of polling, long polling, or streaming improves performance, simplifies the development process, and allows your application to be deployed virtually anywhere HTML5 applications can be deployed.

WebSocket reduces the number of HTTP requests and responses drastically over polling. It saves bandwidth, CPU power, and latency.[1]

WebSocket simplifies the development of real-time applications. With a small, powerful, and simple API, a developer can do more quickly with WebSocket than any other real-time strategy in web development.

WebSocket delivers the flexibility of transport layer protocols to the Web.[1] Real-time functionality can now be achieved in web applications such as chat, collaborative document editing, multiplayer games, etc.[1]

The WebSocket Protocol

The WebSocket Protocol defines in detail how computer systems can implement WebSockets for communication with other systems. The protocol comprises three parts: an opening and closing handshake and the data transfer.[2]

The protocol also defines two URI schemes, ws, for WebSocket, and wss, for WebSocket Secure, which is encrypted over Transport Level Security (TLS).

The Opening Handshake

WebSocket connections begin with an opening handshake, which is basically an HTTP request that represents an agreement to switch protocols from HTTP to WebSocket. The HTTP request such as the one shown in Listing 1 is sent from the client to the server, and then the server sends a response such as the one shown in Listing 2.[2]

Listing 1. Example of a WebSocket handshake request.

GET /chat HTTP/1.1

Host: example.com

Upgrade: websocket

Connection: Upgrade

Origin: http://example.com

Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

Sec-WebSocket-Version: 13

Listing 2. Example of a WebSocket handshake response.

HTTP/1.1 101 Switching Protocols

Upgrade: websocket

Connection: Upgrade

Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

In order for the server to compute the Sec-WebSocket-Accept value, it takes the value from Sec-WebSocket-Key and appends the string “258EAFA5-E914-47DA-95CA-C5AB0DC85B11,” which is a constant key suffix included in the protocol specification that every WebSocket server should know[1], and then takes the SHA-1 hash of the resulting concatenation and base64-encodes it.[2]

The Sec- headers shown in the HTTP request and response above relate to the WebSocket protocol. Table 1 describes these headers in detail.[2]

Table 1. Descriptions of WebSocket Sec- headers.

Header / Description
Sec-WebSocket-Key / Appearing in the initial HTTP request, the key is used to prevent cross-protocol attacks.[1]
Sec-WebSocket-Accept / Appearing in the subsequent HTTP response, the accept is used to confirm that the server understands the WebSocket protocol.
Sec-WebSocket-Version / Indicates version compatibility. RFC 6455 defines the version as 13.[2]
Sec-WebSocket-Protocol / Subprotocol selector for advertising the protocols that a client can use.
Sec-WebSocket-Extensions / A list of protocol-level extensions supported by the client.[1]

Data Transfer

Once the handshake is complete and a connection is established, the client and server can send data to each other at any time. Each message sent from either party is called a frame. Frames can be sent as text or binary in elegant full-duplex mode. Text frames begin with 0x00 byte and end with 0xFF byte, containing the UTF-8 data in between and binary frames use a length prefix.[6] See Figure 1 for the layout of a WebSocket frame.

Figure 1. A WebSocket frame.[6]

The opcodes are 4 bits that define the type of message being sent.[1] For a list of opcodes, see Table 2.

Table 2. Descriptions of Opcodes.[1]

Opcode / Type of Message / Description
0 / Continuation / The frame is a continuation of a message sent earlier.[2]
1 / Text / The type of frame is text.
2 / Binary / The type of frame is binary.
8 / Close / Closing handshake to end the connection.
9 / Ping / One party sends a ping to the other party.
10 / Pong / One party sends a pong to the other party.

The Closing Handshake

WebSocket connections are closed through a WebSocket frame called the closing handshake. Closing handshakes contain the opcode 8 to tell the other party that the connection is closing.

Not all WebSocket connections end with the closing handshake, however. Because the WebSocket protocol is built over TCP, it is not uncommon for sockets to close abruptly due to one party (usually the client) disconnecting. WebSocket connections whose underlying TCP socket drops connectivity suddenly are closed normally without error.

The WebSocket API

The WebSocket API allows developers to interface their JavaScript with WebSockets. The API defines an object that comprises three parts: information about the state of the connection, methods that allow developers to interact with the WebSocket connection, and events that are fired when a WebSocket event occurs.

As we learned in the WebSocket protocol, messages can be sent back and forth over the underlying TCP connection in full-duplex mode. Methods defined by the WebSocket interface allow developers to send messages and close the connection. Event listeners allow developers to receive messages asynchronously as well as handle the current state of the WebSocket.

The WebSocket Constructor

The WebSocket constructor takes one required argument, url, representing the URL to which you want to connect, and one optional argument, protocols, which can be a string for a single subprotocol or an array for multiple subprotocols.[3] See Listing 3 for an example of an instantiation of the WebSocket interface.

Listing 3. Sample WebSocket instantiation.

// Create new WebSocket connection.

var socket = new WebSocket(“ws://echo.websocket.org”, “myProtocol”);

In this example, we are creating a new WebSocket connection to echo.websocket.org over the myProtocol subprotocol. Subprotocols are useful to distinguish which type of WebSocket the client wants. There are three types of subprotocols:

·  Registered subprotocols: Officially registered protocols according to RFC 6455.

·  Open protocols: Widely used and standardized protocols which have not been registered as official protocols.

·  Custom protocols: Protocols that application developers create.

WebSocket Events

The WebSocket API is purely event-driven.[1] Applications define event listeners and wait for messages and changes in connection status from the other party which arrive asynchronously. These event listeners are defined for the four events of the WebSocket interface: open, message, error, and close.

Open

WebSocket connections are not opened instantaneously with the constructor—they are opened some time after and then the open event fires. By this time, the handshake has completed and the WebSocket is ready to send and receive messages. The callback for the open event is onopen. See Listing 4 for an example.

Listing 4. Sample WebSocket open event handler.

// Event handler for WebSocket connection opening.

socket.onopen = function(event) {

alert(“Connection opened!”);

};

Message

The WebSocket message event is fired when messages are pushed from the other party. The callback for the message event is onmessage. See Listing 5 for an example.


Listing 5. Sample WebSocket message event handler.

// Event handler for receiving WebSocket messages.

socket.onmessage = function(event) {

alert(“Received message: ” + event.data);

};

Error

The WebSocket error event is fired in response to unexpected errors. If an error is detected, the WebSocket connection is closed. The callback for the message event is onerror. See Listing 6 for an example.


Listing 6. Sample WebSocket error event handler.

// Event handler for errors in WebSocket connection.

socket.onerror = function(event) {

console.log(“WebSocket Error: ”, event);

// Custom function for handling errors

handleErrors(event);

};

Close

The WebSocket close event is fired when the WebSocket connection is closed. The callback for the message event is onclose. See Listing 7 for an example.


Listing 7. Sample WebSocket close event handler.

// Event handler for closed connections.

socket.onclose = function(event) {

console.log(“WebSocket connection closed.”);

};

WebSocket Methods

The WebSocket interface contains two methods: send() and close().

send()

Once the WebSocket connection is opened, messages may be sent to the other party using the send() method. It will throw an exception if the connection is not open or invalid. This method will allow text or binary data to be sent. See Listing 8 for an example.

Listing 8. Sample of a WebSocket sending a message.

// Wait for the open event before sending.

var socket = new WebSocket(“ws://echo.websocket.org”);

socket.onopen = function(event) {

socket.send(“Hi there!”);

};

close()

The close() method terminates the WebSocket connection with a closing handshake. It has two optional arguments: code, a status code that the computer understands, and reason, a human-readable string that explains why the connection was closed.

WebSocket Attributes

There are three available attributes in the WebSocket interface: readyState, bufferedAmount, and protocol.

readyState

This attribute is used to report the current state of the WebSocket. It is an integer value that can be compared to the four WebSocket connection state constants, listed in Table 3.

Table 3. Connection state constants.

Attribute constant / Value / Status
WebSocket.CONNECTING / 0 / The connection is going through the opening handshake.
WebSocket.OPEN / 1 / The connection is established.
WebSocket.CLOSING / 2 / The connection is going through the closing handshake.
WebSocket.CLOSED / 3 / The connection has been closed.

bufferedAmount