Common Performance Issues in Network Applications

Operating System

Part 1: Interactive Applications

White Paper

Abstract

A properly written network application can squeeze every possible cycle out of a high performance network stack such as that which exists in TCP/IP on the Microsoft® Windows® 2000 platform. Likewise, a poorly written network application can consume network and system resources, and exhibit substandard performance for users. This paper is intended to help developers and others improve applications that are written to communicate over networks.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2000 Microsoft Corporation. All rights reserved. Microsoft, Windows, and WindowsNT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Other product and company names mentioned herein may be the trademarks of their respective owners.

Microsoft Corporation • One Microsoft Way • Redmond, WA 98052-6399 • USA

11/2000

Contents

Introduction 2

Network Terminology 3

Performance Dimensions 4

User vs. Administrator View of a Networked Application 4

Transactional vs. Streaming Applications 5

Different Network Environments 6

TCP/IP Limitations 7

Application Behaviors 8

TCP/IP-Specific Issues 8

Recognizing Slow Applications 9

Calculating Overhead with Netstat 10

A Slow Application 12

Writing the Slowest Application 12

The First Take 12

Cleaning Obvious Problems in the Code 13

Redesigning for Fewer Connects 15

Compressed Block Send 16

Future Improvements 17

Best Practices for Interactive Applications 18

Summary 19

For More Information 20

Introduction

The Windows 2000 TCP/IP protocol stack was written and tested with performance and scalability in mind. This implementation allows a properly written application to extract every possible cycle out of the networking hardware. Here are some examples of how the Windows 2000 TCP/IP stack performs:

· Windows 2000 Server has been tested with over 200,000 simultaneous TCP connections.

· Internet Information Services (IIS) on Windows 2000 was highly ranked in SPECWeb96, handling more than 25,000 HTTP requests per second.

· Windows 2000 was used to set a land speed record of more than 750 Megabits-per-second (Mbps) on a transcontinental gigabit network consisting of 10 hops.

These figures show Windows 2000 TCP/IP can process data very quickly. But, the purpose of this paper is not to help you write benchmarking applications that achieve these speeds. The intent is to show how to avoid writing slow applications—applications that fail to utilize even 10 percent of the available network bandwidth.

Some of the most common uses of the network come through productivity tools such as e-mail, Web access, and file transfer. If these basic tools do not meet users’ expectations, they get frustrated. This paper presents a few simple guidelines for writing interactive network applications optimized for efficient use of the network.

This article addresses:

· Network terminology needed to understand the metrics affecting applications written for the network.

· Performance dimensions that affect perceived and actual network performance of an application.

· Limitations of the TCP/IP protocol that might contribute to application performance.

· Application behaviors that lead to poor network performance and how to recognize them.

· The adjustments needed to make a slow application utilize the network more efficiently.

· Best practices for writing interactive applications.

Network Terminology

The following metrics are used to measure various aspects of network and protocol performance. The values for these metrics in various scenarios and how the protocol stack affects or handles these metrics can affect how your applications perform in a given scenario.

Round Trip Time (RTT). RTT, expressed in milliseconds, is the elapsed time for a request to go from node ‘A’ to node ‘B’, and for the reply from ‘B’ to return to ‘A.’ The RTT is the total time for the trip. The forward and reverse path times do not need to be the same.

RTT depends on the network infrastructure in place, the distance between nodes, network conditions, and packet size. Packet size, congestion, and payload compressibility have a significant impact on RTT for slower links. Other factors can affect RTT, including forward error correction and data compression, which introduce buffers and queues that increase RTT.

Goodput. Goodput, measured in bits-per-second, shows useful application data successfully processed by the receiver. It measures effective or useful throughput and includes only application data—not packet, protocol, or media headers.

Protocol Overhead. Protocol Overhead, expressed as a percentage, is the number of non-application bytes (protocol and media framing) divided by the total number of bytes. In this paper, overhead is calculated for both directions, but it can be calculated separately for each direction.

Bandwidth-Delay Product. The bandwidth-delay product is the product of the bits-per-second bandwidth of the network and the RTT, or delay, in seconds. This equates to the number of bits it takes to fill the “network pipe.” If this number is large, the TCP/IP stack must be able to deal with a large amount of unacknowledged data in order to keep the pipeline full. This is an important end-to-end metric for streaming applications.

Performance Dimensions

The ideal application delivers the best-expected user performance while using the least amount of network resources in all networking environments. Users and administrators have different perceptions of what it means for an application to perform well on the network. Well written applications meet the requirements for both viewpoints.

User vs. Administrator View of a Networked Application

Users judge application performance based on their experience with the application, while administrators judge an application’s performance based on how efficiently it uses network resources.

If you ask users what they want in terms of application performance, they might say, “make it start and exit quickly” and “it must provide good interactive response, smart error handling with reasonable timeouts and positive feedback.” In short, users want it to be fast and predictable.

Administrators, on the other hand, might say things like, “conserve my network and PC resources with low overhead, minimum connections and bandwidth controls” and “make sure there are no help desk calls”. The administrator needs to make the application scale.

These requirements can help you write your application. For example, to make a network application initialize quickly, don’t make the user interface wait for the network. Some tasks can be performed before the network is available or without the network altogether. If the network is not responding, the user may simply need the UI to close the application.

During shutdown you should not wait for the network, because users are probably wishing to leave their workstation. Well-written client-server applications should handle abortive disconnects as well. Do not write an application that starts a possibly lengthy operation that can’t be interrupted on shutdown, such as synchronizing files or folders with a server. Networks are variable entities and even small operations can take a long time on certain networks. Provide positive feedback to users, including indications of progress and estimated completion times; don’t leave users wondering about the status of the application.

To provide interactive response to users, a good guideline is 500 milliseconds (ms). If it takes longer, users may notice a delay. Good responsiveness is also important in eliminating unnecessary help desk calls. Applications should be fast enough to give a user confidence; especially slow applications can be perceived as unreliable.

Not all network errors are as critical as they may first appear. For example, if an application has received or posted all of its data, it can likely ignore errors in closing the connection. Never assume that the network (or the user) is actually available and either handle errors without user intervention (or ignore them if errors are non-critical). An application should also define its own reasonable timeouts. For example, a Windows Sockets connect() request may block under some conditions for as much as 21 seconds. Applications may need to introduce their own timeouts that are appropriate for their users.

Conserving network bandwidth is partially about minimizing the protocol overhead incurred by your application. It is also about eliminating unnecessary network traffic. Protocols with a lower header tax can be used to transfer application data. For example, when sending smaller amounts of non-critical or repeatable data, use UDP as opposed to TCP to reduce the overhead associated with connection establishment and maintenance. If the same data must be sent to multiple recipients, multicast could be an option. Be aware that UDP applications are not flow-controlled—pushing beyond the available bandwidth can cause catastrophic network failure. The netstat utility, shipped with Windows 2000, can be used with its –e and –s options to display statistics for the various protocols.

System resources can be consumed quickly if proper restraint is not used. For example, sockets and TCP connections consume resources. Do not use several TCP connections per client when one connection will suffice.

In summary, for transactional applications, a good user experience and low network utilization are not conflicting goals. The network is a bottleneck. Network-intensive applications spend more time waiting. As an analogy, think about the efficiency of shopping with a shopping list versus just browsing to find what you want. It’s the same with good network applications: they “know” what they need to do to complete tasks quickly.

Transactional vs. Streaming Applications

There are also two basic types of network applications, transactional and streaming. These application types could also be called Interactive and Batch Processing applications. Transactional applications are “stop and go” applications. The protocol operations seen are usually request-reply in nature and operations may need to be ordered, although not in all cases. Some examples of transactional applications include Synchronous RPC, as well as some HTTP and DNS implementations.

With streaming applications, the objective is to move data, with little concern for data ordering. Many traditionally transactional applications can also be streamed. Some examples of streaming applications include network backup and FTP.

The type of your application determines the network and protocol characteristics that you should be concerned with.

Different Network Environments

As shown in the figure below, there are several different network environments that affect the networked behavior of your application. Properties that differentiate these environments include whether they are low or high bandwidth and have a low or high RTT. Network environments affect transactional and streaming applications in different ways. Transactional applications are more sensitive to RTT; streaming applications are more sensitive to bandwidth-delay product.

Dial-up networks and some wireless networks have a variable RTT. Satellite networks generally have an asymmetric bandwidth between upstream and downstream. Wireless LAN and ADSL are good examples of networks with bandwidth-delay products similar to that of Fast Ethernet.

TCP/IP Limitations

The TCP/IP protocol, like other protocols, has a number of built-in limitations. Most of these limitations are only visible when running a poorly written application. How these limitations affect your application depend on whether it is a transactional or streaming application.

Transactional applications are affected by the overhead required for connection establishment and termination. For example, each time a connection is established on an Ethernet network, three packets of about 60 bytes each must be sent and approximately 1 RTT is required for this exchange. When termination of a connection occurs, four packets are exchanged. This is compounded when an application opens and closes connections often.

In addition, when a connection is established, “slow-start” takes place. This artificially limits the number of data segments that can be sent before acknowledgement of those segments is received, an efficiency designed to limit network congestion. When a connection over Ethernet is first established, regardless of the receiver’s window size, a 4-kilobyte (KB) transmission can take up to 3-4 RTT due to slow-start.

A TCP/IP optimization, known as the Nagle Algorithm, can also limit data transfer speed on a connection. The Nagle algorithm was designed to reduce protocol overhead for applications that send small amounts of data, such as a Telnet session sending a single character at a time. Rather than immediately sending a packet with lots of header files and little data, the stack waits for more data from the application or an acknowledgement before proceeding.

Delayed acknowledgements or “Delayed Ack” was also designed into TCP/IP to enable more efficient “piggybacking” of acknowledgements when return data was forthcoming from the receiving side application. However, if this data is not forthcoming and the sending side is waiting for an acknowledgement, delays of about 200 ms per-send can be experienced.

When a TCP connection is closed, the connection resources at the node that initiated the close are put into a wait state, called TIME-WAIT, to guard against data corruption if duplicate packets linger in the network (ensuring both ends are done with the connection). This can deplete resources required per-connection (RAM and Ports) when applications frequently open and close connections.

In addition to being affected by delayed ACK and other congestion avoidance schemes, streaming applications can also be affected by a default receive window that is too small on the receiving end.