LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011
Large Synoptic Survey Telescope (LSST)
Site Specific Infrastructure Estimation Explanation
Mike Freemon and Steve Pietrowicz
LDM-143
7/17/2011
The contents of this document are subject to configuration control and may not be changed, altered, or their provisions waived without prior approval of the LSST Change Control Board.
LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011
Change Record
Version / Date / Description / Owner name1 / 5/13/2006 / Initial version (as Document-1684) / Mike Freemon
2 / 9/27/2006 / General updates (as Document-1684) / Mike Freemon
3 / 9/7/2007 / General updates (as Document-1684) / Mike Freemon
4 / 7/17/2011 / General updates (as Document-1684) / Mike Freemon
5 / 4/11/2012 / Modified rates for power, cooling, floorspace, shipping / Mike Freemon
Table of Contents
Change Record i
1 Overview of Sizing Model and Inputs Into LDM-144 1
2 Data Flow Among the Sheets Within LDM-144 2
3 DM-BaseSite ICD (LSE-77) 3
3.1 DM Power Capacity 3
3.2 DM Rack Space 3
4 Policies 3
4.1 Ramp up 3
4.2 Replacement Policy 4
4.3 Storage Overheads 4
4.4 Spares (hardware failures) 4
4.5 Extra Capacity 4
4.6 Multiple Copies for Data Protection and Disaster Recovery 4
5 Key Formulas 5
5.1 Compute Nodes: Teraflops Required 5
5.2 Compute Nodes: Bandwidth to Memory 5
5.3 Database Nodes: Teraflops Required 5
5.4 Database Nodes: Bandwidth to Memory 5
5.5 Database Nodes: Disk Bandwidth Per Node (Local Drives) 5
5.6 Disk Drives: Capacity 5
5.7 Disk Drives and Controllers (Image Storage): Bandwidth to Disk 5
5.8 GPFS NSDs 6
5.9 Disk Drives (Database Nodes): Aggregate Number of Local Drives 6
5.10 Disk Drives (Database Nodes): Minimum 2 Local Drives 6
5.11 Tape Media: Capacity 6
5.12 Tape Drives 6
5.13 HPSS Movers 6
5.14 HPSS Core Servers 6
5.15 10GigE Switches 7
5.16 Power Cost 7
5.17 Cooling Cost 7
5.18 Cooling Connection Fee 7
6 Selection of Disk Drive Types 8
6.1 Image Storage 8
6.2 Database Storage 8
7 Rates and Discounts 9
7.1 Power and Cooling Rates 9
7.2 Floorspace Leasing Rates 11
7.3 Shipping Rates 11
7.4 Academic and Non-Profit Discounts 11
8 DM Control System (DMCS) Servers 12
9 Additional Descriptions 12
9.1 Description of Barebones Nodes 12
10 Computing 12
10.1 Gigaflops per Core (Peak) 12
10.2 Cores per CPU Chip 13
10.3 Bandwidth to Memory per Node 13
10.4 System Bus Bandwidth per Node 13
10.5 Disk Bandwidth per Node 14
10.6 Cost per CPU 14
10.7 Power per CPU 15
10.8 Compute Nodes per Rack 15
10.9 Database Nodes per Rack 15
10.10 Power per Barebones Node 16
10.11 Cost per Barebones Node 16
11 Memory 16
11.1 DIMMs per Node 16
11.2 Capacity per DIMM 17
11.3 Bandwidth per DIMM 18
11.4 Cost per DIMM 18
11.5 Power per DIMM 18
12 Disk Storage 19
12.1 Capacity per Drive (Consumer SATA) 19
12.2 Sequential Bandwidth Per Drive (Consumer SATA) 19
12.3 IOPS Per Drive (Consumer SATA) 20
12.4 Cost Per Drive (Consumer SATA) 20
12.5 Power Per Drive (Consumer SATA) 20
12.6 Capacity Per Drive (Enterprise SATA) 21
12.7 Sequential Bandwidth Per Drive (Enterprise SATA) 21
12.8 IOPS Per Drive (Enterprise SATA) 21
12.9 Cost Per Drive (Enterprise SATA) 22
12.10 Power Per Drive (Enterprise SATA) 22
12.11 Disk Drive per Rack 22
13 Disk Controllers 23
13.1 Bandwidth per Controller 23
13.2 Drives Required per Controller 23
13.3 Cost per Controller 23
14 GPFS 24
14.1 Capacity Supported per NSD 24
14.2 Hardware Cost per NSD 24
14.3 Software Cost per NSD 24
14.4 Software Cost per GPFS Client 25
15 Tape Storage 25
15.1 Capacity Per Tape 25
15.2 Cost per Tape 25
15.3 Cost of Tape Library and HPSS 26
15.4 Bandwidth Per Tape Drive 26
15.5 Cost Per Tape Drive 26
15.6 Tape Drives per HPSS Mover 27
15.7 Hardware Cost per HPSS Mover 27
15.8 Hardware Cost per HPSS Core Server 27
16 Networking 28
16.1 Bandwidth per Infiniband Port 28
16.2 Ports per Infiniband Edge Switch 28
16.3 Cost per Infiniband Edge Switch 29
16.4 Cost per Infiniband Core Switch 29
16.5 Bandwidth per 10GigE Switch 29
16.6 Cost per 10GigE Switch 30
16.7 Cost per UPS 30
30
LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011
The LSST Site Specific Infrastructure Estimation Explanation
This document provides explanations and the basis for estimates for the technology predictions used in LDM-144 “Site Specific Infrastructure Estimation Model.”
The supporting materials referenced in this document are stored in Collection-974.
1 Overview of Sizing Model and Inputs Into LDM-144
Figure 1. The structure and relationships among the components of the DM Sizing Model
2 Data Flow Among the Sheets Within LDM-144
3 DM-BaseSite ICD (LSE-77)
LSE-77 defines and quantifies the DM infrastructure requirements for the BaseSite Facility in La Serena, Chile. This section provides additional details and justification for those requirements.
3.1 DM Power Capacity
The ICD specifies 440 kW.
Net Base CTR+DAC equipment power = 204 kW
Net Base AP (or commissioning cluster) equipment power reservation = 60 kW
Net replacement hardware power (10%) = 27 kW
Total net power for computing equipment = 291 kW
Adjustment for power utilization efficiency (1.5X) gives a total gross power including power for cooling of 437 kW.
3.2 DM Rack Space
The ICD specifies 64 racks.
Storage racks are 1.5 compute rack equivalents, and tape racks are 1.6 compute rack equivalents.
Base CTR = 2 compute racks + 4.5 compute rack equivalents for storage + 12.8 compute rack equivalents for tape
Base DAC = 13 compute racks + 1.5 compute rack equivalents for storage
Base AP = 6 compute rack equivalents
Replacement hardware = 4 compute racks + 3 compute rack equivalents for storage + 12.8 rack equivalents for tape
Total compute rack equivalents = 60
4 Policies
4.1 Ramp up
The ramp up policy during the Commissioning phase of Construction is described in LDM-129. Briefly, in 2018, we acquire and install the computing infrastructure needed to support Commissioning, for which we use the same sizing as that for the first year of Operations.
4.2 Replacement Policy
Compute Nodes 5 Years
GPFS NSD Nodes 5 Years
Disk Drives 3 Years
Tape Media 5 Years
Tape Drives 3 Years
Tape Library System Once at Year 5
4.3 Storage Overheads
RAID6 8+2 20%
Filesystem 10%
4.4 Spares (hardware failures)
This is margin for hardware failures. This is what takes into account that at any given point in time, there will be some number of nodes and drives out of service due to hardware failures.
Compute Nodes 3% of nodes
Disk Drives 3% of drives
Tape Media 3% of tapes
4.5 Extra Capacity
Disk 10% of TB
Tape 10% of TB
4.6 Multiple Copies for Data Protection and Disaster Recovery
Single tape copy at BaseSite
Dual tape copies at ArchSite (one goes offsite for disaster recovery)
See LDM-129 for further details.
5 Key Formulas
This section describes the key formulas used in LDM-144.
Some of these formulas are interrelated. For example, the formulas used to establish minimum required nodes or drives will typically use multiple formulas based upon different potential constraining resources, and then take the maximum of the set in order to establish the minimum needed.
5.1 Compute Nodes: Teraflops Required
(number of compute nodes) >= (sustained TF required) / (sustain TF per node)
5.2 Compute Nodes: Bandwidth to Memory
(number of compute nodes) >=
(total memory bandwidth required) / (memory bandwidth per node)
5.3 Database Nodes: Teraflops Required
(number of database nodes) >= (sustained TF required) / (sustain TF per node)
5.4 Database Nodes: Bandwidth to Memory
(number of database nodes) >=
(total memory bandwidth required) / (memory bandwidth per node)
5.5 Database Nodes: Disk Bandwidth Per Node (Local Drives)
(number of database nodes) >=
(total disk bandwidth required) / (disk bandwidth per node)
where the disk bandwidth per node is a scaled function of PCIe bandwidth
5.6 Disk Drives: Capacity
(number of disk drives) >= (total capacity required) / (capacity per disk drive)
5.7 Disk Drives and Controllers (Image Storage): Bandwidth to Disk
(number of disk controllers) = (total aggregate bandwidth required) /
(bandwidth per controller)
(number of disks) = MAX of A and B
where
A = (total aggregate bandwidth required) / (sequential bandwidth per drive)
B = (number of controllers) * (drives required per controller)
5.8 GPFS NSDs
(number of NSDs) = MAX of A and B
where
A = (total storage capacity required) / (capacity supported per NSD)
B = (total bandwidth) / (bandwidth per NSD)
5.9 Disk Drives (Database Nodes): Aggregate Number of Local Drives
(number of disk drives) >= A + B
where
A = (total disk bandwidth required) / (sequential disk bandwidth per drive)
B = (total IOPS required) / (IOPS per drive)
5.10 Disk Drives (Database Nodes): Minimum 2 Local Drives
There will be a minimum of at least two local drives per database node
5.11 Tape Media: Capacity
(number of tapes) >= (total capacity required) / (capacity per tape)
5.12 Tape Drives
(number of tape drives) = (total tape bandwidth required) /
(bandwidth per tape drive)
5.13 HPSS Movers
(number of movers) = MAX of A and B
where
A = (number of tape drives) / (tape drives per mover)
B = (total bandwidth required) / (bandwidth per mover)
5.14 HPSS Core Servers
(number of core server) = 2
This is flat over time.
5.15 10GigE Switches
(number of switches) = MAX of A and B
where
A = (total number of ports required) / (ports per switch)
B = (total bandwidth required) / (bandwidth per switch)
5.16 Power Cost
(cost for the year) = (kW on-the-floor) * (rate per kWh) * 24 * 365
5.17 Cooling Cost
(cost for the year) = (mmbtu) * (rate per mmbtu) * 24 * 365
where
mmbtu = btu / 1000000
btu = watts * 3.412
5.18 Cooling Connection Fee
(one-time cost) = ((high water MW) * 0.3412 / 12) * (rate per ton)
where
high water MW = (high water watts) / 1000000
high water watts = high water mark for watts over all the years of Operations
This is a one-time fee paid during Commissioning, and only applies at the Archive Site.
6 Selection of Disk Drive Types
At any particular point in time, disk drives are available in a range of capacities and prices. Optimizing for cost per TB requires selecting a different price point than optimizing for cost per drive. In LDM-144, the “InputTechPredictionsDiskDrives” sheet implements that logic using the technology prediction for disk drives based upon when leading edge drives become available. We assume a 15% drop in price each year for a particular type of drive at a particular capacity, and that drives at a particular capacity are only available for 5 years. The appropriate results are then used for the drives described in this section.
6.1 Image Storage
Disk drives for image storage are sitting behind disk controllers in a RAID configuration. Manufacturers warn against using commodity SATA drives in such environments, based on considerations such as failure rates caused by heavy duty cycles and time-limited error recovery (TLER) settings. Experience using such devices in RAID configurations support those warnings. Therefore, we select Enterprise SATA drives for image storage, and optimize for cheapest cost per unit of capacity.
SAS drives are not used as sequential bandwidth is the primary motivation for the drive selection, and SATA provides a more economical solution.
6.2 Database Storage
The disk drives for the database nodes are local, i.e. they are physically contained inside the database worker node and are directly attached. Unlike most database servers, where IOPS is the primary consideration, sequential bandwidth is the driving constraint in our qserv-based databases servers. Since these are local drives, and since they are running in a shared-nothing environment where the normal operating procedure is to take a failing node out of service without end-user impact, we do not require RAID or other fault-tolerant solutions at the physical infrastructure layer. Therefore, we strive to optimize for the cheapest cost per drive, and so select consumer SATA drives for the database nodes.
SAS drives are not used as sequential bandwidth is the primary motivation for the drive selection, and SATA provides a more economical solution.
7 Rates and Discounts
7.1 Power and Cooling Rates
7.1.1 Archive Site
The power rate for the University of Illinois for 2013 is $0.0746 per kWh.
The cooling rate for the University of Illinois for 2013 is $16.71 per mmbtu.
See Document-15107:
https://docushare.lsstcorp.org/docushare/dsweb/Get/Document-15107/FY13UtilityRates.pdf
which is also available at:
http://www.energymanagement.illinois.edu/pdfs/FY13UtilityRates.pdf
7.1.2 Base Site
The 2013 power rate for La Serena is $0.154 per kWh (USD).
The 2013 cooling rate for La Serena is $34.42 per mmbtu (USD).
Power Rate
See Document-14992.
Additional description:
On 10/2/2013 7:22 AM, Jeff Barr wrote:
> ... *right now *the current electric rate at the
> current exchange rate (October 2, 2013) is:
> 71.79 CLP/kWh / 503.09 CLP/USD = 0.143 USD/kWH
>
> As previously noted there are transmission losses that are distributed
> to all the users, both on the La Serena Recinto and on Cerro Pachón, so
> for the final cost of effective kWH metered at the facility ~8% should
> be added to that rate:
> 0.143 x 1.08 = 0.154 USD/kWH
Cooling Rate
The cooling technology and power utilization efficiency (PUE) is not yet known for the La Serena facility. As an approximation, the cooling rates are assumed to be proportional to the power rates. In particular, the power rates at La Serena are 2.06 times the power rates at Champaign. Until the specific attributes of the La Serena facility are known, we assume the cooling rates at La Serena follow the same ratio, i.e. that the cooling rates at La Serena are 2.06 times the cooling rates in Champaign, IL. That represents a PUE of ~1.7.