White Paper: Working with large lists in Office SharePoint® Server2007

Author:

Steve Peschka

Date published:

July 2007

Summary:

Microsoft performed performance testing against Microsoft® Office SharePoint® Server2007 to determine the performance characteristics of large SharePoint lists under different loads and modes of operation. This white paper presents their findings.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

 2007 Microsoft Corporation. All rights reserved.

Microsoft, SQL Server, Windows, SharePoint, and Active Directory are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Table of Contents

Goals

Test results and findings

Test characteristics

Data access methods

Browser

SPList with For/Each

SPList with SPQuery

SPList with DataTable

SPListItems with DataTable

Lists Web service

Search

PortalSiteMapProvider

Test harness

WinForm test application

WebPart and JavaScript

Web Part

Test results

Browser-based viewing and page size

The baseline test

Testing with a very large list

Comparing results with an indexed column

Comparing an indexed column to an ID column

Analyzing the results

Search

PortalSiteMapProvider

SPList

Data maintenance considerations

Data locking

Crawl times

Related content

Goals

The test results in this white paper are intended to demonstrate the difference in the performance characteristics of SharePoint lists containing large numbers of items when different data access types are used to present list contents. Test results in this white paper show how to optimize list performance through limits on the number of items that appear in a list, and by choosing the most appropriate method of retrieving list contents.

The tests upon which the results in this white paper are based were conducted by using artificially created test data and simulated users. Real-world results may vary depending on hardware, number of concurrent users, farm configuration, and user operations being performed.

Test results and findings

There is documented guidance for Microsoft®Office SharePoint® Server2007 regarding the maximum size of lists and list containers. For typical customer scenarios in which the standard Office SharePoint Server2007 browser-based user interface is used, the recommendation is that a single list should not have more than 2,000 items per list container. A container in this case means the root of the list, as well as any folders in the list— a folder is a container because other list items are stored within it. A folder can contain items from the list as well as other folders, and each subfolder can contain more of each, and so on. For example, that means that you could have a list with 1,990 items in the root of the site, 10 folders that each contain 2,000 items, and so on. The maximum number of items supported in a list with recursive folders is 5 million items.

In Office SharePoint Server2007, virtually all end-user data is stored in a list. A document library, for example, is just a specialized list. The same is true for calendars, contacts, and other interfaces; they are all just customized versions of the basic SharePoint list, also referred to as an SPList. The individual items in the list are referred to as list items generally, or an SPListItem in an SPListItemCollection in the Office SharePoint Server2007 object model. The findings in this article are equally important across all of the ways in which you store and work with data in a Office SharePoint Server2007 site.

There are some scenarios in which you want to take advantage of the features of Office SharePoint Server2007, but need to exceed the limit of 2,000 items per container. If you write your own interface for managing and retrieving the data, it’s quite possible that you can go past this limit without an adverse impact on farm performance. You may be able to manage larger lists to some extent by using views within Office SharePoint Server2007 that are filtered such that there are never more than 2,000 items returned. Filtered views provide better performance than just trying to view one large flat list, but are not as efficient as breaking down the list into different containers if you are using the predefined browser-based Office SharePoint Server2007 interface.

If you develop your own interface, there are several different ways to retrieve list data, each with different performance characteristics. Some data access methods perform very well, but are only useful in a limited number of scenarios. Finally, there are also performance tradeoffs that need to be made with other data maintenance tasks in addition to data retrieval.

Test characteristics

The tests in this white paper were conducted on a relatively underpowered Microsoft Virtual Server2005 R2 image to show a comparison of farm performance characteristics when different data access types are used to manipulate list data. The goal of these tests was not to establish a new arbitrary limit, or to deliver a “requests per second” type number that is typically used in a load style test to show raw throughput capacity. The virtual server image was running Office SharePoint Server2007 Enterprise Edition and had 1 gigabyte (GB) of allocated RAM. Virtual Server was running on a host machine with a 2 gigahertz (GHz) dual-core processor and 2GB of RAM.

Baseline tests were done first with a list containing 1,500 items. The list schema looked like this:

Title: Single line of text

Expense Category: Choice (Meals, Travel, Hotel, Supplies)

Amount: Currency

Deductible: Yes/No

Created By: Person or Group

Modified By: Person or Group

In the baseline tests, no columns were indexed; measurements were taken just to provide a relative value that could be used after the number of items in the list exceeded recommended boundaries. In the tests against a very large list, one set was done with no columns being indexed and a second round was done after configuring the Expense Category column to be indexed. The query that was executed in each one of the tests used a WHERE clause against the Expense Category field looking for the first 100 items that contained “Supplies.”

To provide another point of comparison, the data being selected was based on ID value in the tests against the very large list. The ID is a built-in numeric indexed field in all SharePoint lists that is well suited to queries. The query in this case was constructed with a WHERE clause that retrieved items where the ID ranged from 44,500 through 44,599.

Some tests were also run with the site under load. To create the load during the testing process, a LoadTest was created in the Microsoft Visual Studio® .NET 2005 development system to stress test the site. Instead of targeting a specific number of users in the test, it was configured as a goal-based test, or a test in which a target value is defined for a particular measurement, and the test determines the number of requests required to achieve the target. In this case, the goal that was configured for the test was to achieve a consistent target CPU utilization on the Office SharePoint Server2007 computer of from 60 through 80 percent.

Data access methods

Each test consisted of retrieving a subset of data from the list using one of a number of different data access methods. This section shows the different methods that were tested.

Note: The code samples included in the following sections are intended to show the process used to conduct tests. The code may not comply with coding best practices, and should not be used in a production environment without careful review and testing.

Browser

The list was viewed using a browser and the predefinedOffice SharePoint Server2007 interface. A special tool, which is described in the Test Harness section later in this white paper, was developed to accurately capture how long it takes to view that information and browse through pages of data.

SPList with For/Each

The Office SharePoint Server2007object model (OM) was used to retrieve the list into anSPList object. Each item in the list was then enumerated with a For/Each loop until items were found that matched the search criteria.

The following sample code was used for this method.

'get the site

Dim curSite As SPSite = New SPSite("

'get the web

Dim curWeb As SPWeb = curSite.OpenWeb()

'get our list

Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

'get the collection of items in the list

Dim curItems As SPListItemCollection = curList.Items

'enumerate the items in the list

ForEach curItem As SPListItem In curItems

'do some comparison in here to see if it's an item we need

Next

SPList with SPQuery

The OM was used to create an SPQuery object that contained the query criteria. That object was then used to against an instance of the list in a SPList object. The results of the query were returned by calling the GetItems method on the SPList object.

The following sample code was used for this method.

'get the site

Dim curSite As SPSite = New SPSite("

'get the web

Dim curWeb As SPWeb = curSite.OpenWeb()

'create our query

Dim curQry As SPQuery = New SPQuery()

'configure the query

curQry.Query = "<Where<Eq<FieldRef Name='Expense_x0020_Category'/<Value Type='Text'>

Hotel</Value</Eq</Where>"

curQry.RowLimit = 100

'get our list

Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

'get the collection of items in the list

Dim curItems As SPListItemCollection = curList.GetItems(curQry)

'enumerate the items in the list

ForEach curItem As SPListItem In curItems

'do something with each match

Next

SPList with DataTable

This is one of two methods that test using a Microsoft ADO.NET DataTable to work with the data. In this case an instance of the list is obtained with an SPList object. The data from it is then retrieved into a DataTable by calling the GetDataTable() method on the Items property—for example, SPList.Items.GetDataTable(). The DataTable’s DefaultView has a property called RowFilter that was then set to find the items. To keep the methodology between data access methods consistent, the DataTable was not cached between tests—it was filled each time by calling the GetDataTable() method. In a real-world scenario this test would have performed better had the DataTable been cached after the data was first retrieved, but it serves as a valuable point in comparison testing about the cost of this approach versus retrieving a DataTable from a selection of data that’s already filtered.

The following sample code was used for this method.

'get the site

Dim curSite As SPSite = New SPSite("

'get the web

Dim curWeb As SPWeb = curSite.OpenWeb()

'create our query

Dim curQry As SPQuery = New SPQuery()

'configure the query

curQry.Query = "<Where<Eq<FieldRef Name='Expense_x0020_Category'/<Value Type='Text'>

Hotel</Value</Eq</Where>"

curQry.RowLimit = 100

'get our list

Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

'get the item in a datatable

Dim dt As DataTable = curList.Items.GetDataTable()

'get a dataview for filtering

Dim dv As DataView = dt.DefaultView

dv.RowFilter = "Expense_x0020_Category='Hotel'"

'enumerate matches

For rowNum AsInteger = 0 To dv.Count - 1

'do something with each match

Next

SPListItems with DataTable

This method is similar to the SPList with DataTable method, but with a twist. An instance of the list is retrieved through an SPList object. An SPQuery object is created to build a query, and that query is executed against the SPList object, which returns an SPListItems collection. The data from that collection is then retrieved into a DataTable by using the GetDataTable() method on the SPListItems collection.

The following sample code was used for this method.

'get the site

Dim curSite As SPSite = New SPSite("

'get the web

Dim curWeb As SPWeb = curSite.OpenWeb()

'create our query

Dim curQry As SPQuery = New SPQuery()

'configure the query

curQry.Query = "<Where<Eq<FieldRef Name='Expense_x0020_Category'/<Value Type='Text'>Hotel</Value</Eq</Where>"

curQry.RowLimit = 100

'get our list

Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

'get the collection of items in the list

Dim curItems As SPListItemCollection = curList.GetItems(curQry)

'get the item in a datatable

Dim dt As DataTable = curItems.GetDataTable()

'enumerate matches

ForEach dr As DataRow In dt.Rows

'do something with each match

Next

Lists Web service

The Lists Web service, which comes with Windows SharePoint Services3.0 and Office SharePoint Server2007, was used to retrieve the data. A Collaborative Application Markup Language (CAML) query was created and submitted along with the list identifier, and an XML result set was returned from the Lists Web service.

The following sample code was used for this method.

'create a new xml doc we can use to create query nodes

Dim xDoc AsNew XmlDocument

'create our query node

Dim xQry As XmlNode = xDoc.CreateNode(XmlNodeType.Element, "Query", "")

'set the query constraints

xQry.InnerXml = "<Where<Eq<FieldRef Name='Expense_x0020_Category'/<Value Type='Text'>Hotel</Value</Eq</Where>"

'create the Web service proxy that is mapped to Lists.asmx

Using ws AsNew wsLists.Lists()

'configure it

ws.Credentials = System.Net.CredentialCache.DefaultCredentials

ws.Url = "

'create the optional elements

Dim xView As XmlNode = xDoc.CreateNode(XmlNodeType.Element, "ViewFields", "")

Dim xQryOpt As XmlNode = xDoc.CreateNode(XmlNodeType.Element, "QueryOptions", "")

'query the server

Dim xNode As XmlNode = ws.GetListItems("myListID", "", xQry, xView, "", xQryOpt, "")

'enumerate returned items

For nodeCount AsInteger = 0 To xNode.ChildNodes.Count - 1

'do something with each match

Next

EndUsing

Search

The OM was used to execute a query against the Office SharePoint Server2007 search engine and return the results as a ResultTableCollection. That was then further distilled down into an ADO.NET DataTable via the ResultTable of ResultType.RelevantResults from the ResultTableCollection.

The following sample code was used for this method.

'get the site

Dim curSite As SPSite = New SPSite("

'get the web

Dim curWeb As SPWeb = curSite.OpenWeb()

'get our list

Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

Dim qry AsNew FullTextSqlQuery(curSite)

Dim SQL AsString = "SELECT Title, Rank, Size, Description, Write, Path, Deductible, ExpenseCategory, ID, Vendor, Amount FROM portal..scope() WHERE CONTAINS (""URL"",'""#SITEURL#Lists/#LISTURL#*""') #DEFAULT# ORDER BY ""Rank"""

'do token replacement

SQL = SQL.Replace("#SITEURL#", "

SQL = SQL.Replace("#LISTURL#", curList.Title)

SQL = SQL.Replace("#DEFAULT#", "AND FREETEXT (""ExpenseCategory"",'""Hotel""')")

qry.QueryText = SQL

qry.RowLimit = 100

qry.ResultTypes = ResultType.RelevantResults

'execute the query

Dim rtc As ResultTableCollection = qry.Execute()

Dim rt As ResultTable = rtc(ResultType.RelevantResults)

Dim dt AsNew DataTable()

dt.Load(rt, LoadOption.OverwriteChanges)

'enumerate matches

ForEach dr As DataRow In dt.Rows

'do something with each match

Next

PortalSiteMapProvider

One approach to retrieving list data in Office SharePoint Server2007that’s not very well known is the use of thePortalSiteMapProviderclass. It was originally created to help cache content for navigation. However, it also provides a nice automatic caching infrastructure for retrieving list data. The class includes a method called GetCachedListItemsByQuery that was used in this test. This method first retrieves data from a list based on an SPQuery object that is provided as a parameter to the method call. The method then looks in its cache to see if the items already exist. If they do, the method returns the cached results, and if not, it queries the list, stores the results in cache and returns them from the method call.

The following sample code was used for this method. Note that it is different from all of the previous examples in that you cannot use the PortalSiteMapProviderclassin Windows forms applications.

'get the current web

Dim curWeb As SPWeb = SPControl.GetContextWeb(HttpContext.Current)

'create the query

Dim curQry AsNew SPQuery()

curQry.Query = "<Where<Eq<FieldRef Name='Expense_x0020_Category'/<Value Type='Text'>Hotel</Value</Eq</Where>"

'get the portal map provider stuff

Dim ps As PortalSiteMapProvider = PortalSiteMapProvider.WebSiteMapProvider

Dim pNode As PortalWebSiteMapNode = TryCast(ps.FindSiteMapNode(curWeb.ServerRelativeUrl), PortalWebSiteMapNode)

'get the items

pItems = ps.GetCachedListItemsByQuery(pNode, "myListName_NotID", curQry, curWeb)