Everything You Always Wanted to Know About PowerPivot Data Refresh but Were Afraid to Ask

SQL Server Technical Article

Writer: Mariano Teixeira Neto

Technical Reviewers:Lee Graber,Heidi Steen, Dave Wickert

First Published:September 2010

Minor updates September 2012

Applies to: SQL Server 2008 R2

Summary:The purpose of PowerPivot for SharePoint is to allow users to share their PowerPivot workbooks in a secure and scalable environment. To ensure that you are sharing and collaborating on the most recent data available to you, PowerPivot for SharePoint provides a data refresh feature that lets you automatically update PowerPivot data in an Excel workbook. This white paper describes in detail the data refresh feature in PowerPivot for SharePoint. It starts by explaining the steps for setting up a data refresh schedule in SharePoint, and then it continues with an in-depth view on how data refresh works on a SharePoint farm. Both administrators and the business analysts who author and manage PowerPivot workbook data can benefit from learning more about setting up and using data refresh in a SharePoint environment.

Copyright

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.

© 2010 Microsoft Corporation. All rights reserved.

Contents

Introduction

Creating a Schedule

Data Refresh

Schedule Details

Earliest Start Time

E-mail Notifications

Credentials

Data Sources

Infrastructure

PowerPivot Data Refresh Timer Job

Anatomy of PowerPivot Data Refresh

Parallelization

Sample Situations

Scenario 1

Scenario 2

Scenario 3

Best Practices (Tips and Tricks)

Managing the Data Refresh Settings in a PowerPivot Service Application

Refreshing a Workbook More Than Once a Day

Pausing All Data Refresh Operations on a Given Server

Working with the Unattended Data Refresh Account

Conclusion

Appendix

Introduction

Because PowerPivot for Microsoft Excel simplifies data integration in Excel, it is common for a PowerPivot workbook to contain data from multiple sources, including public, corporate, or local data in a single workbook. Some of this data might change slowly or not at all, while other data changes so quickly that it is virtually out of date as soon as you import it.

To help you keep your PowerPivot workbooks using the most current data, PowerPivot for SharePoint provides a data refresh feature that retrieves newer data from the external data sources you used to build the workbook. After a SharePoint administrator enables data refresh on the server, any PowerPivot workbook owner can schedule data refresh for workbooks saved to Shared Documents or some other SharePoint library. Figure 1 shows the drop-down menu that opens the data refresh schedule page.

Figure 1:Accessing PowerPivot Data Refresh from a Shared Documents library, from the context menu

PowerPivot Gallery, used for showcasing reports and workbooks, offers another way to get to the data refresh feature. SharePoint users who have Contributor permission can click the Calendar icon shown for each workbook in a PowerPivot Gallery. Note that if the user does not have sufficient privileges on the workbook, the Calendar icon is not available on the page. Figure 2 shows the icon that opens the data refresh schedule definition page.

Figure 2:Accessing PowerPivot Data Refresh from a PowerPivot Gallery, from the Calendar icon

Creating a Schedule

Regardless of how you get to the schedule definition page, the details of the page remain the same. Figure 3 shows the initial view of the page. The first step to setting up a data refresh schedule is to click Enable. This makes the page active so that you can fill in the values you want to use.

Figure 3:Enabling Data Refresh in the Manage Data Refresh page

The Manage Data Refresh page is organized into six sections.

Section / General description
Data Refresh / Enable or disable a data refresh schedule.
Schedule Details / Define the frequency and timing details of a data refresh.
Earliest Start Time / Specify the earliest start time for a data refresh.
E-mail Notifications / Specify e-mail address of the users to be notified in the event of data refresh failures.
Credentials / Provide the credentials that will be used to refresh data on your behalf.
Data Sources / Select which data sources should be automatically refreshed. You also use this section to create custom schedules that vary for each data source, or specify credentials for connecting to the data source.

Table 1:Manage Data Refresh page sections

Data Refresh

To enable or disable a data refresh schedule, select or clear the Enable check box on the Manage Data Refresh page. If this check box is selected, you can edit all parts of the data refresh schedule. If the check box is cleared, the page is read-only, and after you click OK, subsequent data refresh operations are prevented for that workbook.

Schedule Details

In the Schedule Details section, you can specify the frequency and timing details of the data refresh. There are four options to choose from:

  • Daily
  • Weekly
  • Monthly
  • Once

The Daily option enables you to schedule data refresh to occur every n day(s), every weekday, or on specific days of the week.

If you select the Also refresh as soon as possible check box, data is refreshed as soon as the server can process it. This refresh occurs in addition to the periodic data refresh schedule. This option is available for periodic schedules only (that is, daily, weekly, and monthly schedules). Select this check box if you want to verify that the data refresh will run properly. For example, you may not know whether data credentials are configured correctly. This option provides a way to test the data refresh before its scheduled execution time.

Figure 4:Daily schedule details options

The Weekly option is for scheduling data refresh on a weekly basis, such as every n week(s) or on specific days of the week.

Figure 5:Weekly schedule details options

The Monthly option schedules data refresh to run on a specific day of the month, or on the first, second, third or last specific day of the week every n month(s).

Figure 6:Monthly schedule details options

The Once option is for scheduling a one-time data refresh operation that runs as soon as the server can process the request. After the data refresh is complete, the system disables this schedule. Notice that the Also refresh as soon as possible check box is not available for this option.

Figure 7:Once schedule details option

Earliest Start Time

In the Earliest Start Time section, specify details about when you prefer data refresh to occur. You can enter a specific time before which data refresh should not start, or you can choose to refresh data after business hours. This page does not determine the time at which the data refresh actually starts; the schedule is queued and processed based on available resources. For example, if the server is busy with on-demand queries (which take precedence over data refresh jobs), the server waits to refresh your data until after those queries are processed. Another option for specifying earliest start time is to run data refresh after business hours. The definition of business hours for your organization is determined by the administrator of the PowerPivot service application (for more information, see Managing the Data Refresh Settings in a PowerPivot Service Application).

Figure 8:Earliest Start Time options

E-mail Notifications

In this section of the page, you can specify the e-mail addresses of anyone who should be notified of a data refresh failure. If you also want to receive notifications of successful data refresh operations,you can use the regular SharePoint alerting system for e-mail notification (the basis of the alert would be a new file added to the target document library).

Figure 9:E-mail Notifications option

Credentials

PowerPivot for SharePoint uses the SharePoint Secure Store Service to store any credentials used in data refresh. In the Credentials section of the schedule page, the schedule owner can specify the Windows credentials that will be used to refresh data on his or her behalf. Any data source that uses trusted or integrated security will be refreshed as this user. Note that in order for the data refresh to succeed, the credentials should have access to the data sources for this workbook. There are three options to choose from:

  • Use an account preconfigured by the administrator (this is the service application’s unattended data refresh account).
  • Use a specific Windows user name and password.
  • Use a predefined Secure Store Service target application ID that stores the Windows credentials you want to use.

Both the PowerPivot unattended data refresh account and the target application ID that stores arbitrary Windows credentials must be set up by a SharePoint administrator in Central Administration. Because these credentials are shared among all users, for instance, this option is typically used where additional credentials would be actually used for data access. A good example for when this option is advised is when all of the data sources use SQL Server authentication (that is, the actual user names and passwords are on each data source). In this case the unattended execution account can be a low permission service account. It is normally not a good idea to have the unattended execution account be someone’s primary user account: Because of the way data refresh uses Windows accounts, this means that anyone can impersonate that user if he or she accesses data using a trusted connection. For more information, see Anatomy of PowerPivot Data Refresh.

Figure 10:Credentials options – using an account configured by the administrator

A schedule owner can also choose to type the Windows user credentials to be used on the data refresh. These credentials are securely stored in SharePoint’s Secure Store Service.

Figure 11:Credentials options – specifying Windows user credentials

The third option lets a schedule owner specify credentials previously saved in a Secure Store Service Target Application ID. In order to use this option, you must enter the Target Application ID used to look up the credentials in the Secure Store Service. The Target Application ID specified must be a group entry, and both the interactive user and the PowerPivot System service account must have read access.

Figure 12:Credentials options – specifying a Target Application ID

Note: Setting up and maintaining Secure Store Service is outside the scope of this white paper. For more information about Secure Store Service, see the TechNet articles referenced in the appendix.

Data Sources

A workbook can have many data sources with different characteristics. You can choose to create a data refresh schedule using different options for each data source.

Figure 13:Data sources settings

The schedule definition page provides options for choosing the data sources to be refreshed and when to refresh them. It also provides fields for specifying database credentials or other non-Windows credentials used on the database connection. At least one data source must be selected in order to save the schedule. The data source’s credentials are not used for impersonation, but are instead included on the connection string as UserName and Password. These credentials override those used on the connection string for the original data import.

For each data source, different settings are available. You can specify a custom schedule data source or use the general schedule specified for this workbook.

Note: The only elements that can be changed on the connection string are the UserName and Password elements. To edit any of the other elements, for example, to change the source server name, you must download the workbook to your desktop, edit it using the PowerPivot Excel add-in, and then republish to SharePoint.

Figure 14:Data Sources details

Infrastructure

PowerPivot data refresh infrastructure is designed to process data refresh jobs as long as there are resources available to do the job. This section provides an overview of the PowerPivot Data Refresh infrastructure and shows the data refresh execution flow.

PowerPivot Data Refresh Timer Job

The PowerPivot data refresh timer job can be accessed through Central Administration. From Central Administration, click Monitoring, and then click Review job definitions. Look for PowerPivot data refresh timer job, as shown in Figure 15.

Figure 15:The PowerPivot data refresh timer job

By default, the PowerPivot data refresh timer job is set to run every minute. This is how frequently the system will look for data refresh schedules to be processed. You can change the periodicity of the timer job by clicking PowerPivot Data Refresh Timer Job, as shown in Figure 15. Changing this setting causes the data refresh schedules to be processed less frequently, because the default setting is the most frequent setting available. In general, this setting shouldn’t be changed.

Figure 16:The PowerPivot data refresh timer job details

This timer job is responsible for calling the PowerPivot System Service (PSS) whose task it is to check the queue of scheduled data refresh jobs. When the timer job runs, it verifies which data refresh schedules are past due according to the schedule’s specifications. That is, the system checks to see whether the current time is equal to or greater than the date and the earliest start time specified in the data refresh schedule. If there are runnable data refreshes, and if there are available resources to process it, a server with enough memory and CPU will kick off the data refresh process. A step-by-step walkthrough shows how data refresh works.

Anatomy of PowerPivot Data Refresh

  1. First, the system looks for enabled schedules that are runnable, meaning that their schedule time period has come due. Because many jobs might be scheduled to run at the same time, the system tries to run a job as soon as it has available resources. All of the PowerPivot SharePoint servers in a farm monitor and handle data refresh jobs the same way. Ultimately, one of them detects that your job is runnable and runs it.
  2. After impersonating the Windows user specified in the schedule, the system extracts the workbook from the content database using the SharePoint object model.For impersonation to work, the user must have supplied a valid Windows account in the schedule, and he or she must ensure that that account has Contribute (read/write) access rights to the workbook. The workbook is stored in a temporary folder (in the OLAP Backup folder) so that it can be used later (see step 9).
  3. The system sends a request to a local Microsoft SQL Server Analysis Services engine instance to extract the embedded database from the workbook and then loads the database into it. The database is loaded as read/write (so that it can be updated). This database is used for this data refresh job only – the system ensures that it is not used for querying while data refresh is underway (that is, no Analysis Servicesquery commands are allowed to execute).
  4. If one of the data sources specified in this schedule has custom data source credentials, the connection string for that data source ismodified to use those credentials. This is done using an XMLA command to the data source.
  5. The system impersonates the Windows user for a second time and sends processing commands to the database. This Windows account is the one that will be used for trusted or integrated security connections.In this step, Analysis Services establishes a connection to the external data sources and pulls updated data into the PowerPivotdatabase. The processing command is not sent to all tables or dimensions. The process commands are sent to just those objects that are dependent on the data sources listed in the schedule.
  6. The data source credentials (if any) are reset.
  7. The PowerPivotdatabase is saved back to the workbook.
  8. If it is not set already, the embedded connection’s property “Refresh data when opening the file” is set to true. This ensures that users immediately see the new data when the workbook is opened. It also means that snapshot generation will include the new data in aPowerPivot Gallery thumbnail image.
  9. Impersonating the Windows account for a third time, the workbook is saved back to the content database using the SharePoint binary object model. If the document library is a PowerPivot Gallery, the object model fires its ‘updated file’ event handler, which starts the snapshot generation process.
  10. The schedule’s status is updated with information about the job and saved in data refresh history.
  11. Finally, the refreshed read/write database is unloaded and cached as a read-only database so it is available for on-demand query requests for that data.

Parallelization

In order to scale out data refresh processing, there are two ways that PowerPivot for SharePoint provides parallelization. The first one is to have more than one application server that is data refresh enabled, and the second is to allow more the one data refresh job to run concurrently on the same application server.