Capacity Planning for Office Communications Server 2007 Speech Server Deployments

White Paper

1

Contents

Introduction 4

Estimating Channel Capacity Requirements 4

Identification of Performance Goals 4

Estimating expected call load – incoming call applications 5

Estimating expected call load – outbound calling applications 7

Estimating Average Application load (application complexity) 7

Disk Space Considerations for Application Tuning and Reporting 10

Application Tuning Disk Space Requirements 10

Reporting Requirements 12

Estimating Hardware Requirements 13

“Reference” Server Hardware Specification 13

Suggested Load (channels) per computer per application type 13

Additional Factors that Affect Capacity 13

Deployment Topology 15

Bandwidth considerations 15

Example Application Capacity planning 17

Simple Application – Store Locator 17

Medium Complexity Application – Flight Booking 19

High Complexity Application – “How May I Help You” call director 21

Optimizing Speech Application Performance – tips and tricks 23

Consider converting Conversational Understanding grammars to GRXML 23

Minimize the amount of logging done 23

Prefer recorded prompts over TTS 23

Performance of SALT applications migrated from MSS V1 (R2) 23

Office Communications Server 2007, Speech Server Capacity Planning Tool 24

Introduction

By carefully planning the capacity of your Microsoft® Office Communications Server 2007 Speech Server deployment, you can ensure that your system's telephony and speech recognition (SR) resources are adequately sized to meet expected demands. You can estimate the number of computers you will need by factoring in information about the type of application you are deploying, the system performance goals, and the estimated call volume. As soon as you have estimated the number of servers needed, you can fine-tune the deployment by testing the system under the load of expected call volume to discover the actual capacity and performance numbers that you need.

In general, your system's capacity is constrained by the performance characteristics you expect from each computer in your deployment. As channel density (that is, the number of simultaneous calls) increases, application performance tends to decrease, eventually resulting in declined calls. To ensure a good experience for callers interacting with your speech application, Speech Server declines new calls if response times get too slow.

Estimating Channel Capacity Requirements

When estimating channel capacity requirements for your Speech Server deployment, you must consider the following three key factors:

·  System performance goals

·  Current and future call workloads

·  Application complexity

The business objectives for your Speech Server deployment, such as optimal call-handling service levels or high customer satisfaction scores, should weigh heavily in determining your system performance goals. Aside from the typical computer hardware elements (such as processor speed, the amount of memory, and hard disk capacity), call workloads and application complexity are the two main factors that most affect the performance of a Speech Server deployment. These factors play a significant role in determining the scale of your deployment. It is important to estimate these as accurately as possible to meet or exceed your performance objectives.

Identification of Performance Goals

Your Speech Server deployment might replace or supplement a current business function, such as call center agents handling routine or low-complexity inbound phone calls. Alternately, it might provide new capabilities that are attractive to the business, such as enhancing customer service by providing outbound automated appointment reminders. To ensure that your deployment delivers on your business objectives, you need to establish performance goals for the deployment early in the design process. This provides a clear benchmark for success and helps guide your solution design and deployment.

Performance goals for Speech Server deployments typically include the following measurements:

·  User-perceived latency (UPL) — The length of time between the end of a caller's input and the response from Speech Server.

·  Call pass rate — The percentage of calls successfully answered by Speech Server without being dropped or unanswered (including busy-out).

·  Resource utilization — The amount of server resources that the application consumes, such as CPU, memory, and hard disk.

User-Perceived Latency

A low average UPL is important to maintain caller satisfaction as well as to prevent runaway error conditions during an automated call. Users of your speech applications expect the timing of the application's responses to closely mimic that of human conversations. If users are forced to wait too long, they can become impatient and opt out of using the system. In other cases, they might start to repeat inputs that cause the application to reprocess or misrecognize responses. Because neither of these situations is desirable, always factor a reasonable UPL into your performance objectives:

A UPL of less than two seconds is generally acceptable; less than one second is ideal.[1]

Call Pass Rate

Most call centers already have metrics in place for acceptable call pass rates, usually between 95 and 99 percent. An increase in the number of unanswered, dropped, or blocked calls would most certainly cause customer satisfaction ratings to drop, leading to a detriment affect on original business goals. Thus, the performance goals for your Speech Server deployment should include meeting or exceeding the current call pass rate.

Target a call pass rate of at least 95 percent, preferably higher.

CPU Utilization

While application resilience and network issues can influence both call pass rates and UPL, maximizing and controlling system resources is an excellent way to optimize your deployment for meeting your UPL and call pass rate goals. Resources within the computer include the main subsystems of CPU, memory, and disk/storage. Of all these server resources, CPU utilization measurement is the best indicator of resource availability because resource utilization problems elsewhere, such as a shortage of memory, result in excessive use of CPU resources. Thus, to ensure optimal system performance:

Set your maximum CPU utilization target at 70 percent or lower.

Performance Goals Summary

To recap, a good set of guidelines for your performance objectives should include the following three metrics:

·  UPL < 2 seconds

·  Call pass rate >= 95%

·  CPU utilization <= 70%

Having established your performance objectives, you can now move to the next step in assessing your capacity needs: estimating current and future call workloads.

Estimating Expected Call Load – Incoming Call Applications

As previously mentioned, the workload placed on the computer affects the ability of your Speech Server deployment to achieve its performance objectives. The higher the workload, the more resources you need. You can determine the call workload using actual statistics from current operations, such as the number of calls currently coming into your call center or helpdesk. If actual statistics are not available, you need to estimate these call statistics.[2] The statistics most important in determining call workload are as follows:

·  Number of calls

·  Call concurrency

·  Call length

These call measurements help you identify the number of telephony channels required in the deployment. For example, given that one channel handles one call at a time, if each call takes ten minutes and you assume each call begins as the previous one ends, one channel can handle six successful calls in an hour. Therefore, if you expect 19 to 24 calls each hour, you need four channels to enable four concurrent calls. However, real call patterns are never so evenly distributed and another method of calculating the number of required channels must be used.

Estimating Required Telephony Ports Using an Erlang Calculation

One of the best ways to estimate the number of channels needed to handle your call workload is to use an Erlang traffic model. This well-known call-statistics calculation method helps you to determine how many channels you need to support call center traffic and gives you a good starting point for investigating your deployment requirements.[3]

Erlang calculations use a unit of measurement called the Erlang. An Erlang describes traffic through telephony equipment (such as incoming calls into a phone switch or PBX).

1 Erlang = 1 continuous 3600 second (60 minute) call

To determine the maximum number of Erlangs that you need to accommodate, multiply the number of calls during the busy hour (B)[4] by the average call length in seconds (L), and then divide by 3600:

Erlangs = (B*L)/3600

For example, if you estimate that your deployment will receive 1000 calls during the busy hour and the average length of a call is two minutes (120 seconds), the estimated number of Erlangs is determined as:

(1000*120)/3600 = 33.3 Erlangs

When you have established the number of Erlangs your deployment will process, you can use an Erlang traffic calculation[5] to determine the number of required telephony channels. The Erlang traffic formula is a mathematical algorithm that incorporates the number of Erlangs you estimated plus a percentage of blocked calls (or busy signals) that you think is acceptable. For example, if you take the 33.3 Erlangs from above and you can tolerate 1 percent call blockage, the Erlang traffic calculation determines that your deployment needs to scale to 45 concurrent channels.

With insight into how to calculate call workloads and thus the number of concurrent channels required of your solution, the next factor you need to consider is the complexity of your application.

èUse the Peak Call Load sheet of the Speech Server Capacity Planning workbook to calculate these numbers for your own deployment.

Estimating Expected Call Load – Outbound Calling Applications

Estimating the call load for an outbound calling application is quite different from an application that accepts inbound calls, where you have no control over the call pattern. For an outbound application, the channel capacity that you need to scale to depends on:

·  How many calls need to be placed.

·  The time frame within which the calls must be placed.

·  The average call duration.

·  The proportion of unanswered calls.

For example, consider a video rental company that makes reminder calls to customers with overdue videos. They have to place up to 10,000 calls per day, between 9 A.M. and 5 P.M. – this equates to 1250 calls per hour. Assuming an average call duration of 45 seconds, the Erlang calculation is as follows:

Erlangs = (1250*45)/3600

= 56250/3600

= 15.625

For an outbound calling application, the call pattern is controlled by the application and will be steady and predictable so the number of Erlangs map well to the number of concurrent channels you need to be able to scale to. Rounding up, this gives us a scale goal of 16 concurrent channels.

Estimating Average Application load (Application Complexity)

A Speech Server application can range from simple to complex and the demand that the application makes on Speech Server resources increases with the complexity of the application. To help you identify the complexity level of your application, the following sections describe the three broad application types: simple, average, and complex.

It should be noted that there is very little difference in performance of VoiceXML applications versus .NET managed code speech applications. Performance should not be a factor influencing your choice of application authoring style.

Simple Application

Simple applications include dual tone multi-frequency (DTMF) applications and those with small amounts of simple automatic speech recognition (ASR). DTMF applications play pre-recorded prompts and collect touchtone key presses from callers in response to prompts. This application type might use text-to-speech (TTS) to play back small amounts of dynamic data, such as bank balances or callers' names. DTMF applications place a relatively low burden on the speech recognition engine and on system resources. The speech recognition capability of simple applications is typically limited to rudimentary single-token responses, such as “Yes/No” or digits. Whether featuring touchtone or a combination of touchtone and speech, simple applications use grammars that are sparse and only contain a small number of entries.

An example of a simple application might be a department store locator application. Customers call a toll-free number and obtain the operating hours and address of a local store. The application requires customers to input their postal code using the telephone keypad. It then performs a database lookup and reads back the operating hours and address of the nearest store using TTS.

Average Applications

Average applications typically play a number of pre-recorded and/or TTS prompts and perform moderate levels of speech recognition using medium-sized or variable grammars. This application type might also use text-to-speech to play back dynamic data, such as bank balances or users' names. Average applications create greater load on the speech recognition engine than DTMF or simple applications, but because of the moderately sized grammars, the overall load is less than that of applications with large complex grammars. Average applications typically recognize more complex speech inputs, such as dates and place names.

An example of an average application is a bank account management application. Customers can carry out account management tasks over a telephone, such as checking balances, transferring money, and requesting statements. The application understands speech input, such as the account type (for example, checking or savings) and the task that they want to perform (for example, check balance, transfer funds, and get statement). The customers can use either the telephone keypad or speech to input account numbers and money amounts. The application uses text-to-speech to play back statements and balances.

Another example of an average complexity application is purchasing tickets for air travel. The application prompts for information, such as the number of tickets, departure and arrival dates, locations, the class of ticket, and food requirements. The application receives and processes speech responses to these prompts. After the tickets are booked, the customer's credit card details are used to reserve the tickets. The user can also check that booked flights are on time and make lost luggage inquiries.

Complex Applications

Application complexity has a profound effect on the performance of Speech Server because complex applications put the highest burden on ASR and TTS resources. Complex applications perform a lot of speech recognition using complex grammars (such as conversational grammars with many nodes and training sentences or grammars with more than 25,000 items). This application type spends little time doing less CPU intensive activities such as recording voice mails or using text-to-speech to play back large amounts of text, such as e-mail messages.