Capacity Lab Study: Enterprise Intranet Collaboration Solution

Capacity Lab Study: Enterprise Intranet Collaboration Solution

Microsoft SharePoint Server 2010

Capacity Lab Study: Enterprise Intranet Collaboration Solution

This document is provided “as-is”. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.

Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred.

This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.

© 2010 Microsoft Corporation. All rights reserved.

Microsoft SharePoint Server 2010 Capacity Lab Study: Enterprise Intranet Collaboration Solution

Alex Soh, Dina Ayoub, Kfir Ami-ad
Microsoft Corporation

May2010

Applies to: Microsoft® SharePoint® Server 2010

Summary: This whitepaper provides guidance on performance and capacity planning for a SharePoint Server 2010 enterprise intranet collaboration solution.

  • Lab environment specifications, such as hardware, farm topology and configuration;
  • Test farm dataset;
  • Test results analysiswhich should help you determine the hardware, topology and configuration you need to deploy a similar environment, and optimize your environment for appropriate capacity and performance characteristics.

Contents

Introduction

Glossary

Overview

Scaling Approach

Correlating the lab environment with a production environment

Methodology and Test Notes

Specifications

Hardware

Web and Application servers

Database Servers

Topology

Configuration

Workload

Dataset

Results and Analysis

Web Server Scale Out

Test methodology

Analysis

Results graphs and charts

Database Server Scale Out

Test methodology

Analysis

Results graphs and charts

Web server Scale Up

Test methodology

Analysis

Results graphs and charts

Comparing SharePoint Server 2010 and Office SharePoint Server 2007

Workload

Test methodology

Analysis

Results graphs and charts

Introduction

This document provides guidance aboutscaling out and scaling up servers in a Microsoft® SharePoint®Server 2010 enterprise intranet collaboration solution, based on a testing environment at Microsoft. Capacity planning informs decisions on acquiring hardware and making system configurations to optimize your solution. Different scenarios have different requirements, so it is important to supplement this guidance with additional testing on your own hardware and in your own environment. If your planned design and workloadis similar to the environment described in this document, you can use this document to draw conclusions about scaling your environment up and out.

This document includes:

  • Specifications, which include hardware, topology, and configuration
  • The workload, which is the demand on the farm, including the number of users, and the usage characteristics
  • The dataset, including database sizes
  • Test results and analysis for scaling out Web servers
  • Test results and analysis for scaling up Web servers
  • Test results and analysis for scaling out database servers
  • ComparisonbetweenSharePoint 2007 and SharePoint 2010regarding throughput and effect on the web and database servers.

The SharePoint Server 2010 environment described in this document is a lab environment that mimics a production environment at a large company. The production environment hosts mission-critical team sites and publishing portals for internal teams for enterprise collaboration, organizations, teams, and projects. Employees use that production environment to track projects, collaborate on documents, and share information within their organization. The environment includes a large amount of small sites used for ad-hoc projects and small teams. For details about the production environment, see SharePoint 2010 Technical Case Study: SharePoint Server 2010 Enterprise Intranet Collaboration Environment.

Before reading this document, it is important that you understand the key concepts behind SharePoint Server 2010 capacity management. The following documentation will help you learn about the recommended approach to capacity management and provide context for helping you understand how to make effective use of the information in this document, as well as define the terms used throughout this document.

  • Capacity management and sizing for SharePoint Server 2010
  • SharePoint Server 2010 Capacity Management: Software Boundaries and Limits

Also, we encourage you to read the following:

  • Storage and SQL Server capacity planning and configuration (SharePoint Server 2010)

Glossary

There are some specialized terms you will encounter in this document. Here are a few key terms and their definitions.

  • RPS: Requests per second. The number of requests received by a farm or server in one second. This is a common measurement of server and farm load.
    Note that requests are different from page loads; each page contains several components, each of which creates one or more requests when the page is loaded. Therefore, one page load creates several requests. Typically, authentication checks and events consuming negligible resources are not counted in RPS measurements.
  • Green Zone: This is the state at which the server can maintain the following set of criteria:
  • The server-side latency for at least 75% of the requests is less than 1 second.
  • All servers have aCPU Utilization of less than 50%.
    Note:Because this lab environmentdid not have an active search crawl running, the database server was kept at 40% CPU Utilization or lower, to reserve 10% for the search crawl load. This assumes Microsoft® SQL Server® Resource Governor is used in production to limit Search crawl load to 10% CPU.
  • Failure rate is less than 0.01%.
  • Red Zone (Max): This is the state at which the server can maintain the following set of criteria:
  • HTTPrequest throttling feature is enabled, but no 503 errors (Server Busy) are returned.
  • Failure rate is less than 0. 1%.
  • The server-side latency is less than 3 seconds for at least 75% of the requests.
  • Database server CPU utilization is less than 80%, which allows for 10% to be reserved for the Search crawl load, limited by using SQL Server Resource Governor.
  • AxBxC (Graph notation): This is the number of Web servers, application servers, and database servers respectively in a farm. So for example, 8x1x2 means that this environment has 8 Web servers, 1 application server, and 2 database servers.
  • MDFandLDF:SQL Server physical files. For more information, see Files and Filegroups Architecture.

Overview

Scaling Approach

This section describes the specific order that we recommend for scaling machines in your environment, and is the same approach we took for scaling this lab environment. This approach will allow you to find the best configuration for your workload, and can be described as follows:

  1. First, we scaled out the Web servers. These were scaled out as far as possible under the tested workload, until the database server became the bottleneck and was not able to accommodate any more requests from the Web servers.
  2. Second, we scaled out the database server by moving half of the content databases to another database server. At this point, the Web servers were not creating sufficient load on the database servers, so they were scaled out further.
  3. In order to test scale up, we tried another option which is scaling up the Web servers rather than scaling them out. Scaling out the Web servers is commonly preferred over scaling them up because scaling out provides higher redundancy and availability.

Correlating the lab environment with a production environment

The labenvironment outlined in this document is a smaller scale model of a production environment at Microsoft, and although there are significant differences between the two environments, it can be useful to look at them side by side because they are both enterprise collaboration environments where the patterns observed should be similar.

The labenvironment contains a subset of the data from the productionenvironment, as well as some modificationsto the workload. This has an impact on the test results in terms of Web server memory usage, becausethe object cache on the production environment receives a larger amount of hits on unique sites, and thus utilizes more memory. The labenvironment also has less data, and almost all of it is cached in memory as opposed to the production environment which carries over seven terabytes of data, so the database server on the production environment needs to perform more disk reads than the database server in the labenvironment. Similarly, the hardware used in the labenvironment is significantly different from the productionenvironment it models, becausethere is less demand on those resources. The labenvironment relies on more easily available hardware.

To get a better understanding of the differences between the environments, read the Specifications section in this document, and compare it to the specifications in the SharePoint 2010 Technical Case Study: SharePoint Server 2010 Enterprise Intranet Collaboration Environment.

Methodology and Test Notes

This document provides results from a test lab environment. Because this was a lab environment and not a production environment, we were able to control certain factors to show specific aspects of performance for this workload. In addition, certain elements of the production environment, listed below, were left out of the lab environment to simplify testing overhead. Note thatomitting these elements is not recommended for production environments.

  • Between test runs, we modified only one variable at a time, to make it easy to compare results between test runs.
  • The database servers used in this lab environment were not part of a cluster becauseredundancy was not necessary for the purposes of these tests.
  • Search crawl was not running during the tests, whereas it might be running in a production environment. To take this into account, we lowered the SQL Server CPU utilization in our definition of ‘Green Zone’ and ‘Max’ to accommodate the resources that a search crawl would have consumed if it were running simultaneously with our tests. To learn more about this, read Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).

Specifications

This section provides detailed information about the hardware, software, topology, and configuration of the lab environment.

Hardware

Web and Application servers

There are from oneto eight Web servers in the farm, plus one Application server.

Web Server / WFE1-8, and APP1
Processor(s) / 2 quad-core 2.33 GHz processors
RAM / 8 GB
Operating system / Windows 2008 Server R2
Size of the SharePoint drive / 80 GB
Number of NICs / 2
NIC Speed / 1 Gigabit
Authentication / NTLM
Load balancer type / Windows NLB
Services running locally / WFE 1-8: Basic Federated Services, including:Timer Service, Admin Service, and Trace Service.
APP1: Word Automation Services, Excel Services and SandBoxed Code Services.

Database Servers

There are from two to three database servers, up to two running the default SQL Server instance housing the content databases, and one running the logging database.The logging database is not tracked in this document.

Note
If you enable usage reporting, we recommend that you store the logging database on a separate Logical Unit Number (LUN). For large deployments and some medium deployments, a separate LUN will not be sufficient, as the demand on the server’s CPU may be too high. In that case, you’ll need a separate database server box for the Logging database. In this lab environment, the logging database was stored in a separate instance of SQL Server, and its specifications are not included in this document.
Database Server –
Default Instance / DB1-2
Processor(s) / 4 dual-core 3.19 GHz processors
RAM / 32 GB
Operating system / Windows 2008 Server R2
Storage and geometry / Direct Attached Storage (DAS)
Internal Array with 5 x 300GB 10krpm disk
External Array with 15 x 450GB 15krpm disk
6 x Content Data (External RAID0, 2 spindles 450GB each)
2 x Content Logs (Internal RAID0, 1 spindle300GB each)
1 x Temp Data (Internal RAID0, 2spindles 150GB each)
1 x Temp Log (Internal RAID0, 2 spindles 150GB each)
2 x Backup drive (Internal RAID0, 1 spindle each, 300GB each)
Number of NICs / 1
NIC Speed / 1 Gigabit
Authentication / NTLM
Software version / SQL Server 2008 R2 (pre-release version)

Topology

Configuration

To allow for the optimum performance, the following configuration changesweremade in this lab environment.

Setting / Value / Notes
Site Collection
Blob Caching / On / The default is Off. Enabling Blob Caching improves server efficiency by reducing calls to the database server for static page resources that may be frequently requested.
Database Server – Default Instance
Max degree of parallelism / 1 / The default is 0. To ensure optimal performance, we strongly recommend that you set max degree of parallelism to 1 for database servers that host SharePoint Server 2010 databases. For more information about how to set max degree of parallelism, see max degree of parallelism Option.

Workload

The transactional mix for the lab environment described in this document is similar to the workload characteristics of a production environment at Microsoft. For more information on the production environment, seeSharePoint 2010 Technical Case Study: SharePoint Server 2010 Enterprise Intranet Collaboration Environment.

Here are the details of the mix for the lab tests run against SharePoint Server 2010 as compared to the production environment. While there are some minor differences in the workloads, both represent a typical transactional mix on an enterprise collaboration environment.

Dataset

The dataset for the lab environmentdescribed in this document is a subset of the dataset from a production environment at Microsoft. For more information on the production environment, seeSharePoint 2010 Technical Case Study: SharePoint Server 2010 Enterprise Intranet Collaboration Environment.

Dataset Characteristics / Value
Database size (combined) / 130 GB
BLOB size / 108.3 GB
Number of content databases / 2
Number of site collections / 181
Number of Web applications / 1
Number of sites / 1384

Results and Analysis

The following results are ordered based on the Scaling Approach mentioned in the overview section of this document.

Web Server Scale Out

This section describes the test results that were obtained when we scaled out the number of Web servers in this lab environment.

Test methodology

-Add Web servers of the same hardware specifications, keeping the rest of the farm the same.

-Measure RPS, latency, and resource utilization.

Analysis

In our testing, we found that:

-The environment scaled up to four Web servers per databaseserver; however the increase in throughput was non-linear particularly on addition of the fourth Web server.

-After four Web servers, there are no additional gains to be made in throughput by adding more Web servers because the bottleneck at this point was the database server CPU Utilization.

-The average latency was almost constant throughout the entire test, unaffected by the number of Web servers and throughput.

Note
The conclusions described in this section are hardware specific, and the same throughput mighthave been achieved by a larger number of lower-end hardware, or a smaller number of higher-end hardware. Similarly, changing the hardware of the database server would affect the results. To get an idea on how much of a difference the hardware of the Web servers can affect these results, see the Web Server Scale Up section.

Results graphs and charts

In the following graphs, the X axis shows the change in the number of Web servers in the farm, scaling from one Web server (1x1x1) to five Web servers (5x1x1).

  1. Latency and RPS
    The following graph shows how scaling out (adding Web servers) affects latency and RPS.
  2. Processor utilization
    The following graph shows how scaling out the Web servers affects processor utilization on the Web server(s) and the database server.

  1. SQL Server I/O operations per section (IOPs) for MDF and LDF files
    The following graphs show how the IOPs on the content databases change as the number of Web servers is scaled out. These are measured by looking at the following performance counters:
    -PhysicalDisk: Disk Reads / sec
    - PhysicalDisk: Disk Writes / sec
    In this lab environment, we determined that our data on IOPs was not representative of a production environment because our dataset was so small that we could fit much more of it in cache than would be possible in the production environment we are modeling. We calculated projected reads by multiplying the value of the data we had from the lab for writes/second by the ratio of reads to writes in our production environment. The results below are averages, but there are also spikes that occur during certain operations which need to be accounted for. To learn more about estimating IOPs needed, see Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).

Maximum:

Green Zone:


Example of how to read these graphs:
An organization with a workload similar to that described in this document that expects 300 RPS to be their green zone, could use 3x1x1 topology, and would use roughly 600 Physical Disk reads/sec on the MDF file.