LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011

Large Synoptic Survey Telescope (LSST)

Site Specific Infrastructure Estimation Explanation

Mike Freemon and Steve Pietrowicz

LDM-143

7/17/2011

The contents of this document are subject to configuration control and may not be changed, altered, or their provisions waived without prior approval of the LSST Change Control Board.

LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011

Change Record

Version / Date / Description / Owner name
1 / 5/13/2006 / Initial version (as Document-1684) / Mike Freemon
2 / 9/27/2006 / General updates (as Document-1684) / Mike Freemon
3 / 9/7/2007 / General updates (as Document-1684) / Mike Freemon
4 / 7/17/2011 / General updates (as Document-1684) / Mike Freemon
5 / 4/11/2012 / Modified rates for power, cooling, floorspace, shipping / Mike Freemon

Table of Contents

Change Record i

1 Overview of Sizing Model and Inputs Into LDM-144 1

2 Data Flow Among the Sheets Within LDM-144 2

3 DM-BaseSite ICD (LSE-77) 3

3.1 DM Power Capacity 3

3.2 DM Rack Space 3

4 Policies 3

4.1 Ramp up 3

4.2 Replacement Policy 4

4.3 Storage Overheads 4

4.4 Spares (hardware failures) 4

4.5 Extra Capacity 4

4.6 Multiple Copies for Data Protection and Disaster Recovery 4

5 Key Formulas 5

5.1 Compute Nodes: Teraflops Required 5

5.2 Compute Nodes: Bandwidth to Memory 5

5.3 Database Nodes: Teraflops Required 5

5.4 Database Nodes: Bandwidth to Memory 5

5.5 Database Nodes: Disk Bandwidth Per Node (Local Drives) 5

5.6 Disk Drives: Capacity 5

5.7 Disk Drives and Controllers (Image Storage): Bandwidth to Disk 5

5.8 GPFS NSDs 6

5.9 Disk Drives (Database Nodes): Aggregate Number of Local Drives 6

5.10 Disk Drives (Database Nodes): Minimum 2 Local Drives 6

5.11 Tape Media: Capacity 6

5.12 Tape Drives 6

5.13 HPSS Movers 6

5.14 HPSS Core Servers 6

5.15 10GigE Switches 7

5.16 Power Cost 7

5.17 Cooling Cost 7

5.18 Cooling Connection Fee 7

6 Selection of Disk Drive Types 8

6.1 Image Storage 8

6.2 Database Storage 8

7 Rates and Discounts 9

7.1 Power and Cooling Rates 9

7.2 Floorspace Leasing Rates 11

7.3 Shipping Rates 11

7.4 Academic and Non-Profit Discounts 11

8 DM Control System (DMCS) Servers 12

9 Additional Descriptions 12

9.1 Description of Barebones Nodes 12

10 Computing 12

10.1 Gigaflops per Core (Peak) 12

10.2 Cores per CPU Chip 13

10.3 Bandwidth to Memory per Node 13

10.4 System Bus Bandwidth per Node 13

10.5 Disk Bandwidth per Node 14

10.6 Cost per CPU 14

10.7 Power per CPU 15

10.8 Compute Nodes per Rack 15

10.9 Database Nodes per Rack 15

10.10 Power per Barebones Node 16

10.11 Cost per Barebones Node 16

11 Memory 16

11.1 DIMMs per Node 16

11.2 Capacity per DIMM 17

11.3 Bandwidth per DIMM 18

11.4 Cost per DIMM 18

11.5 Power per DIMM 18

12 Disk Storage 19

12.1 Capacity per Drive (Consumer SATA) 19

12.2 Sequential Bandwidth Per Drive (Consumer SATA) 19

12.3 IOPS Per Drive (Consumer SATA) 20

12.4 Cost Per Drive (Consumer SATA) 20

12.5 Power Per Drive (Consumer SATA) 20

12.6 Capacity Per Drive (Enterprise SATA) 21

12.7 Sequential Bandwidth Per Drive (Enterprise SATA) 21

12.8 IOPS Per Drive (Enterprise SATA) 21

12.9 Cost Per Drive (Enterprise SATA) 22

12.10 Power Per Drive (Enterprise SATA) 22

12.11 Disk Drive per Rack 22

13 Disk Controllers 23

13.1 Bandwidth per Controller 23

13.2 Drives Required per Controller 23

13.3 Cost per Controller 23

14 GPFS 24

14.1 Capacity Supported per NSD 24

14.2 Hardware Cost per NSD 24

14.3 Software Cost per NSD 24

14.4 Software Cost per GPFS Client 25

15 Tape Storage 25

15.1 Capacity Per Tape 25

15.2 Cost per Tape 25

15.3 Cost of Tape Library and HPSS 26

15.4 Bandwidth Per Tape Drive 26

15.5 Cost Per Tape Drive 26

15.6 Tape Drives per HPSS Mover 27

15.7 Hardware Cost per HPSS Mover 27

15.8 Hardware Cost per HPSS Core Server 27

16 Networking 28

16.1 Bandwidth per Infiniband Port 28

16.2 Ports per Infiniband Edge Switch 28

16.3 Cost per Infiniband Edge Switch 29

16.4 Cost per Infiniband Core Switch 29

16.5 Bandwidth per 10GigE Switch 29

16.6 Cost per 10GigE Switch 30

16.7 Cost per UPS 30

30

LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011

The LSST Site Specific Infrastructure Estimation Explanation

This document provides explanations and the basis for estimates for the technology predictions used in LDM-144 “Site Specific Infrastructure Estimation Model.”

The supporting materials referenced in this document are stored in Collection-974.

1  Overview of Sizing Model and Inputs Into LDM-144

Figure 1. The structure and relationships among the components of the DM Sizing Model

2  Data Flow Among the Sheets Within LDM-144

3  DM-BaseSite ICD (LSE-77)

LSE-77 defines and quantifies the DM infrastructure requirements for the BaseSite Facility in La Serena, Chile. This section provides additional details and justification for those requirements.

3.1  DM Power Capacity

The ICD specifies 440 kW.

Net Base CTR+DAC equipment power = 204 kW

Net Base AP (or commissioning cluster) equipment power reservation = 60 kW

Net replacement hardware power (10%) = 27 kW

Total net power for computing equipment = 291 kW

Adjustment for power utilization efficiency (1.5X) gives a total gross power including power for cooling of 437 kW.

3.2  DM Rack Space

The ICD specifies 64 racks.

Storage racks are 1.5 compute rack equivalents, and tape racks are 1.6 compute rack equivalents.

Base CTR = 2 compute racks + 4.5 compute rack equivalents for storage + 12.8 compute rack equivalents for tape

Base DAC = 13 compute racks + 1.5 compute rack equivalents for storage

Base AP = 6 compute rack equivalents

Replacement hardware = 4 compute racks + 3 compute rack equivalents for storage + 12.8 rack equivalents for tape

Total compute rack equivalents = 60

4  Policies

4.1  Ramp up

The ramp up policy during the Commissioning phase of Construction is described in LDM-129. Briefly, in 2018, we acquire and install the computing infrastructure needed to support Commissioning, for which we use the same sizing as that for the first year of Operations.

4.2  Replacement Policy

Compute Nodes 5 Years

GPFS NSD Nodes 5 Years

Disk Drives 3 Years

Tape Media 5 Years

Tape Drives 3 Years

Tape Library System Once at Year 5

4.3  Storage Overheads

RAID6 8+2 20%

Filesystem 10%

4.4  Spares (hardware failures)

This is margin for hardware failures. This is what takes into account that at any given point in time, there will be some number of nodes and drives out of service due to hardware failures.

Compute Nodes 3% of nodes

Disk Drives 3% of drives

Tape Media 3% of tapes

4.5  Extra Capacity

Disk 10% of TB

Tape 10% of TB

4.6  Multiple Copies for Data Protection and Disaster Recovery

Single tape copy at BaseSite

Dual tape copies at ArchSite (one goes offsite for disaster recovery)

See LDM-129 for further details.

5  Key Formulas

This section describes the key formulas used in LDM-144.

Some of these formulas are interrelated. For example, the formulas used to establish minimum required nodes or drives will typically use multiple formulas based upon different potential constraining resources, and then take the maximum of the set in order to establish the minimum needed.

5.1  Compute Nodes: Teraflops Required

(number of compute nodes) >= (sustained TF required) / (sustain TF per node)

5.2  Compute Nodes: Bandwidth to Memory

(number of compute nodes) >=

(total memory bandwidth required) / (memory bandwidth per node)

5.3  Database Nodes: Teraflops Required

(number of database nodes) >= (sustained TF required) / (sustain TF per node)

5.4  Database Nodes: Bandwidth to Memory

(number of database nodes) >=

(total memory bandwidth required) / (memory bandwidth per node)

5.5  Database Nodes: Disk Bandwidth Per Node (Local Drives)

(number of database nodes) >=

(total disk bandwidth required) / (disk bandwidth per node)

where the disk bandwidth per node is a scaled function of PCIe bandwidth

5.6  Disk Drives: Capacity

(number of disk drives) >= (total capacity required) / (capacity per disk drive)

5.7  Disk Drives and Controllers (Image Storage): Bandwidth to Disk

(number of disk controllers) = (total aggregate bandwidth required) /

(bandwidth per controller)

(number of disks) = MAX of A and B

where

A = (total aggregate bandwidth required) / (sequential bandwidth per drive)

B = (number of controllers) * (drives required per controller)

5.8  GPFS NSDs

(number of NSDs) = MAX of A and B

where

A = (total storage capacity required) / (capacity supported per NSD)

B = (total bandwidth) / (bandwidth per NSD)

5.9  Disk Drives (Database Nodes): Aggregate Number of Local Drives

(number of disk drives) >= A + B

where

A = (total disk bandwidth required) / (sequential disk bandwidth per drive)

B = (total IOPS required) / (IOPS per drive)

5.10  Disk Drives (Database Nodes): Minimum 2 Local Drives

There will be a minimum of at least two local drives per database node

5.11  Tape Media: Capacity

(number of tapes) >= (total capacity required) / (capacity per tape)

5.12  Tape Drives

(number of tape drives) = (total tape bandwidth required) /

(bandwidth per tape drive)

5.13  HPSS Movers

(number of movers) = MAX of A and B

where

A = (number of tape drives) / (tape drives per mover)

B = (total bandwidth required) / (bandwidth per mover)

5.14  HPSS Core Servers

(number of core server) = 2

This is flat over time.

5.15  10GigE Switches

(number of switches) = MAX of A and B

where

A = (total number of ports required) / (ports per switch)

B = (total bandwidth required) / (bandwidth per switch)

5.16  Power Cost

(cost for the year) = (kW on-the-floor) * (rate per kWh) * 24 * 365

5.17  Cooling Cost

(cost for the year) = (mmbtu) * (rate per mmbtu) * 24 * 365

where

mmbtu = btu / 1000000

btu = watts * 3.412

5.18  Cooling Connection Fee

(one-time cost) = ((high water MW) * 0.3412 / 12) * (rate per ton)

where

high water MW = (high water watts) / 1000000

high water watts = high water mark for watts over all the years of Operations

This is a one-time fee paid during Commissioning, and only applies at the Archive Site.

6  Selection of Disk Drive Types

At any particular point in time, disk drives are available in a range of capacities and prices. Optimizing for cost per TB requires selecting a different price point than optimizing for cost per drive. In LDM-144, the “InputTechPredictionsDiskDrives” sheet implements that logic using the technology prediction for disk drives based upon when leading edge drives become available. We assume a 15% drop in price each year for a particular type of drive at a particular capacity, and that drives at a particular capacity are only available for 5 years. The appropriate results are then used for the drives described in this section.

6.1  Image Storage

Disk drives for image storage are sitting behind disk controllers in a RAID configuration. Manufacturers warn against using commodity SATA drives in such environments, based on considerations such as failure rates caused by heavy duty cycles and time-limited error recovery (TLER) settings. Experience using such devices in RAID configurations support those warnings. Therefore, we select Enterprise SATA drives for image storage, and optimize for cheapest cost per unit of capacity.

SAS drives are not used as sequential bandwidth is the primary motivation for the drive selection, and SATA provides a more economical solution.

6.2  Database Storage

The disk drives for the database nodes are local, i.e. they are physically contained inside the database worker node and are directly attached. Unlike most database servers, where IOPS is the primary consideration, sequential bandwidth is the driving constraint in our qserv-based databases servers. Since these are local drives, and since they are running in a shared-nothing environment where the normal operating procedure is to take a failing node out of service without end-user impact, we do not require RAID or other fault-tolerant solutions at the physical infrastructure layer. Therefore, we strive to optimize for the cheapest cost per drive, and so select consumer SATA drives for the database nodes.

SAS drives are not used as sequential bandwidth is the primary motivation for the drive selection, and SATA provides a more economical solution.

7  Rates and Discounts

7.1  Power and Cooling Rates

7.1.1  Archive Site

The power rate for the University of Illinois for 2013 is $0.0746 per kWh.

The cooling rate for the University of Illinois for 2013 is $16.71 per mmbtu.

See Document-15107:

https://docushare.lsstcorp.org/docushare/dsweb/Get/Document-15107/FY13UtilityRates.pdf

which is also available at:

http://www.energymanagement.illinois.edu/pdfs/FY13UtilityRates.pdf

7.1.2  Base Site

The 2013 power rate for La Serena is $0.154 per kWh (USD).

The 2013 cooling rate for La Serena is $34.42 per mmbtu (USD).

Power Rate

See Document-14992.

Additional description:

On 10/2/2013 7:22 AM, Jeff Barr wrote:

> ... *right now *the current electric rate at the

> current exchange rate (October 2, 2013) is:

> 71.79 CLP/kWh / 503.09 CLP/USD = 0.143 USD/kWH

>

> As previously noted there are transmission losses that are distributed

> to all the users, both on the La Serena Recinto and on Cerro Pachón, so

> for the final cost of effective kWH metered at the facility ~8% should

> be added to that rate:

> 0.143 x 1.08 = 0.154 USD/kWH

Cooling Rate

The cooling technology and power utilization efficiency (PUE) is not yet known for the La Serena facility. As an approximation, the cooling rates are assumed to be proportional to the power rates. In particular, the power rates at La Serena are 2.06 times the power rates at Champaign. Until the specific attributes of the La Serena facility are known, we assume the cooling rates at La Serena follow the same ratio, i.e. that the cooling rates at La Serena are 2.06 times the cooling rates in Champaign, IL. That represents a PUE of ~1.7.