Automated Operation of the LSST Data Management System

Wavefront Sensor Interface between the Camera and Telescope LSE-67 8/11/2011

Large Synoptic Survey Telescope (LSST)

Automated Operation of the LSST Data Management System

Kian-Tat Lim

LDM-230

Latest Revision:October 10, 2013

This LSST document has been approved as a Content-Controlled Document by the LSST DMTechnical Control Team. If this document is changed or superseded, the new document will retain the Handle designation shown above. The control is on the most recent digital document with this Handle in the LSST digital archive and not printed versions. Additional information may be found in the LSST DM TCT minutes.

The contents of this document are subject to configuration control and may not be changed, altered, or their provisions waived without prior approval of the LSST Change Control Board.

Automated Operations of the LSST Data Management System LDM-230 10/10/2013

Change Record

Version / Date / Description / Owner name
1 / 5/22/2013 / Initial version / Kian-Tat Lim
1.1 / 10/9/2013 / Updates resulting from the Process Control and Data Products Reviews / Kian-Tat Lim
1.2 / 10/10/2013 / TCT approved / R Allsman

Table of Contents

1Introduction

2Alert Production

2.1Base DMCS and OCS Commandable Entities

2.1.1init command

2.1.2configure command

2.1.3enable command

2.1.4disable command

2.1.5release command

2.1.6stop command

2.1.7abort command

2.1.8reset command

2.1.9startIntegration event

2.1.10nextVisit event

2.2EFD replication

2.3Alert Production Hardware

2.3.1Replicator

2.3.2Distributor

2.3.3Worker

2.4Catch-Up Archiver

2.5Calibration image and engineering image modes

2.6Daytime DM operations mode

2.7Failure Modes

2.8Maintenance and Upgrades

3Calibration Products Production

4Data Release Production

4.1Overall Sequence

4.2Detailed Sequence

4.3Parallelization

4.4Input and Output

4.5Failure Modes

4.6Maintenance and Upgrades

5Data Access Center

5.1Databases

5.2Image Storage

5.3Level 3 Storage and Compute

5.4Failure Modes

5.5Maintenance and Upgrades

6Appendix: Abbreviations

The contents of this document are subject to configuration control by the LSST DM Technical Control Team.

Automated Operations of the LSST Data Management System LDM-230 10/10/2013

Automated Operations of the
LSST Data Management System

1Introduction

This document details the automated operations concept for the LSST Data Management System (DMS). It describes how the various components of the application[1], middleware[2], and infrastructure[3] layers of the DMS work together to enable generation, storage, and access for the Level 1, Level 2, and Level 3 data products[4]). It specifies how processing and data will flow from one component to another.

There are four major parts within the DMS: the Alert Production and its associated Archivers, the Calibration Products Production, the Data Release Production (DRP), and the Data Access Center (DAC), which also provides facilities for Level 3 science processing.

These four parts are implemented across four major centers located at two sites: the Base Center and Chilean Data Access Center (DAC) located at the AURA compound in La Serena, Chile and the Archive Center and US DAC located at NCSA in Urbana-Champaign, Illinois, USA. The DM system also communicates with the Camera and the Observatory Control System located at the Summit Facility on Cerro Pachon, Chile.

Note that many of these operations rely on the LSST Observatory Network to transfer data and/or control information. The operations specific to the network itself are not in the scope of this document; they are covered in the LSST Network Operations and Management Plan (Document-11918). This document presumes that the network is operating normally except where specifically called out.

2Alert Production

The Alert Production's primary responsibilities are:

To archive all images from the Camera, including science images, wavefront sensor images, calibration frames, and engineering images, to tape archives at both the Base and Archive Centers,
To process these images to generate Level 1 data products, especially alerts indicating that something has changed on the sky and orbits for Solar System objects, and
To provide image quality feedback to the Observatory Control System (OCS).

The science images include both crosstalk-corrected images that are used for immediate Level 1 processing and raw, uncorrected images that are permanently stored.

The Alert Production can be described from a “top-down” perspective, starting with the “commandable entities”, which are software devices that the OCS can send commands to and receive status messages, events, and telemetry from. It can also be described from a “bottom-up” perspective starting with the physical machines used. Here, we start with the top-down view, going into more detail on the machines and their operations afterwards.

For context, here are the basic functions of some of the Data Management (DM) infrastructure components (see Figure 1):

1. “Replicator” computers at the Base that receive images from the Camera and associated telemetry, transfer them to local storage, and send them over the wide-area network (WAN) to the distributor machines at the Archive.

2. A network outage buffer at the Base that retains a copy of each image in non-volatile storage for a limited time in case of WAN failure.

3. Tape archives at the Base and Archive that retain permanent copies of each image and other data products.

4. Shared disk storage for inputs and Level 1 data products at the Chilean and US DACs.

5. “Distributor” computers at the Archive that receive images and telemetry from the replicator machines and transfer them to local storage and the worker machines.

6. “Worker” computers at the Archive that perform the Alert Production computations.

7. Base and Archive DM Control Systems (DMCSs) running on one or more computers at each location that control and monitor all processing.

8. A DM Event Services Broker running on one or more computers at the Archive that mediates all DM Event Services messaging traffic.

9. A Calibration database at the US DAC that keeps information necessary to calibrate images.

10. Engineering and Facilities Database (EFD) replicas at the Chilean and US DACs that store all observatory commands and telemetry.

11. The Level 1 database at the Chilean and US DACs that stores the Level 1 catalog data products.

12. The Level 2 database at the US DAC that stores measurements of astronomical Objects.

13. An Alert Production control database at the Base that maintains records of all data tr

ansfer and processing and is used by the Base DMCS.

2.1Base DMCS and OCS Commandable Entities

The Alert Production hardware is divided into four commandable entities from the perspective of the OCS:

1. Archiver: responsible for archiving images in real time.

2. Catch-Up Archiver: responsible for archiving images that did not get captured in real time due to an outage of some part of the DM system.

3. EFD Replicator: responsible for replicating the EFD from the Summit to the Chilean DAC and the US DAC.

4. Alert Production Cluster: responsible for generating Level 1 data products.

Each commandable entity can be commanded by the OCS to configure, enable, or disable itself, along with obeying other generic OCS commands such as init, release, stop, and abort. Each commandable entity publishes events and telemetry to the OCS for use by the observatory operations staff. The command/action/response protocol used by the OCS is common to all subsystems and is a standard real-time system control mechanism used, for example, by the ATST[5]. The configure/enable/disable message pattern is also a common one; it is used, for example, in the LHCb control system[6].

All these commandable entities are implemented in the Base DMCS. They all run on a single machine, which is the only one that communicates directly with the OCS. If it fails, as detected by heartbeat monitoring, it is powered down and a spare machine is enabled at the same IP address, possibly missing one or more visits.

The Base DMCS communicates with the OCS via the Data Distribution Service (DDS), through which it receives commands according to a well-defined asynchronous command protocol[7] and sends command result messages, status updates, events, and telemetry. It should be noted that the commandable entities do their processing while in the IDLE state from the perspective of the command protocol.

The Base DMCS will be booted before the start of each night's observing to ensure that the system is in a clean configuration. When the Base DMCS cold boots, the Base DMCS performs a self test sequence to verify that it can communicate with the DM Event Services Broker (for DM-internal communications) and the OCS (via DDS). After the self test sequence, the commandable entities start up, in no particular defined configuration, and publish the OFFLINE state to the OCS.

The Base DMCS uses the Orchestration Manager (currently baselined to be implemented using HTCondor[8]) to start jobs on the replicators, distributors, and workers. The Orchestration Manager may run on the Base DMCS host or another machine.

The typical sequence of OCS commands after a cold boot will be init, configure, and enable for each commandable entity.

2.1.1initcommand

This instructs the OCS-visible commandable entity controlled by the Base DMCS to move from an OFFLINE state to a normal commandable IDLE state. Successful completion requires that the Base DMCS ensure that OCS global control is not locked out by DM engineering (e.g. software installation, diagnostic tests, etc.).

2.1.2configurecommand

This tells one of the OCS-visible commandable entities controlled by the Base DMCS to establish or change its configuration. The configuration includes the set of computers to be used, the software to be executed on them, and parameters used to control that software. There will be several standard configurations used during operations (although each configuration will change with time); each such configuration can be thought of as a mode of the corresponding DM commandable entities. Some modes may apply to multiple commandable entities at the same time. Changing modes (by reconfiguring the commandable entities) is expected to take from seconds to possibly a few minutes; it is intended that mode changes may occur at any time and multiple times during a night.

Besides normal science observing mode, available configurations will include raw calibration image and engineering image modes for the Archiver and Alert Production Cluster in which there are no visits and different data products are generated. Another mode for the Alert Production Cluster will be daytime DM operations (disconnected from the camera), in which the Alert Production Cluster will be used to perform solar system object orbit-fitting and various daily maintenance and update tasks and the Archiver is disabled or offline.

First, the Base DMCS verifies the command format and accepts the command. Then it checks that the configuration is legal and consistent and that various prerequisites are met. When the check is complete, the commandable entity is disabled (see the disable command in section 2.1.4), the configuration is installed, and success is returned to the OCS. If the configuration is illegal or cannot be installed properly, a command error (non-fatal) with failure reason is sent instead.

All of the commandable entities' configurations include the version of the software to be used. This version must have already been installed on the participating machines. The presence of the necessary software versions is checked by the Base DMCS in the Alert Production database (as maintained by system management tools).

The Archiver's configuration prerequisite is that sufficient replicator/distributor pairs are available.

The Catch-Up Archiver's configuration prerequisite is that sufficient catch-up-dedicated replicator/distributor pairs are available.

The Alert Production Cluster's prerequisites are that sufficient workers are available.

The EFD Replicator's prerequisite is that communication with the US DAC EFD replica is possible.

At the end of a configure command, the commandable entity is always disabled.

2.1.3enablecommand

This command enables the commandable entity to run and process events and data. An enable command is rejected if no configuration has been selected by a prior configure command to the commandable entity.

Enabling the Archiver causes the Base DMCS to subscribe to the “startIntegration” event.

Enabling the Catch-Up Archiver allows it to scan for unarchived images to be handled and enables the Orchestration Manager to schedule image archive jobs.

Enabling the Alert Production Cluster causes the Base DMCS to subscribe to the “nextVisit” event in normal science mode; another event may be subscribed to in calibration or engineering mode.

Enabling the EFD Replicator causes the Base DMCS to enable the US DAC EFD replica to be a slave to the Chilean DAC EFD replica.

2.1.4disablecommand

This command disables the commandable entity from running and processing news events and data.

Disabling the Archiver causes it to unsubscribe from the “startIntegration” event. It does not terminate any replicator jobs already executing.

Disabling the Catch-Up Archiver stops it from scanning for unarchived images and tells the Orchestration Manager to stop scheduling any new image archive jobs.

Disabling the Alert Production Cluster causes it to unsubscribe from the “nextVisit” event. It does not terminate any worker jobs already executing. In particular, the processing for the current visit (not just exposure) will normally complete.

Disabling the EFD Replicator causes the Base DMCS to disable the slave operation of the US DAC EFD replica.

2.1.5releasecommand

This is the equivalent of a disable command, but the commandable entity goes to the OFFLINE state.

2.1.6stopcommand

If issued during a configure command, this command causes the commandable entity to go into the no configuration state.

If issued during any other command, this command is ignored.

2.1.7abortcommand

If issued during a configure command, this command causes the commandable entity to go into the ERROR state with no configuration.

If issued at any other time, this command does nothing except change the commandable entity to the ERROR state. In particular, an abort received during enable will leave the system enabled and taking data, but in the ERROR state from the command processing standpoint. Note that stopping the processing of any commandable entity is handle by the disable command, not the abort command.

2.1.8resetcommand

This command performs the equivalent of the disable command and leaves the commandable entity in the IDLE state with no configuration.

In addition to the above commands, the Base DMCS subscribes to and responds to the following events published through the OCS DDS:

2.1.9startIntegration event

Upon receipt of anstartIntegration event, if the Archiver has been enabled, the Base DMCS launches replicator jobs. One job is launched for each science raft (21) and one more job is launched to handle wavefront sensor images. The middleware will preferentially allocate these jobs to the pool of fully-operational replicators, falling back to the pool of local-only replicators if more than two jobs are assigned per fully-operational replicator. (See section 2.3.1 below for a more complete description of the replicator pools.)

If a replicator machine fails, the Orchestration Manager will automatically reschedule its job on another replicator machine (or a Catch-Up Archiver replicator).

The Base DMCS will track the submission, execution, and results of all replicator jobs using Orchestration Manager facilities and the Alert Production control database.

2.1.10nextVisit event

Upon receipt of a nextVisit event, if the Alert Production Cluster has been enabled, the Base DMCS launches worker jobs. One job is launched for each CCD (189) and four more jobs are launched for the wavefront sensors. These jobs are sent to the Orchestration Manager for distribution to the worker machines.

If a worker machine fails, the Orchestration Manager will automatically reschedule its job(s) on another worker machine (at lower priority, so that it can be suspended or terminated if the machine is needed to handle a current visit).

The Base DMCS will track the submission, execution, and results of all worker jobs using Orchestration Manager facilities and the Alert Production control database.

2.2EFD replication

Not included in the Alert Production per se but closely tied to it is replication of the Engineering and Facility Database (EFD) from the Summit to the Chilean DAC and the Chilean DAC to the US DAC.

The replication is implemented by standard replication mechanisms for the selected database management system used to implement the EFD. The latency for the replication from the Summit to the Chilean DAC is anticipated to typically be in the milliseconds, although latencies of up to one visit time are acceptable. The latency for the replication from the Chilean DAC to the US DAC is to be as short as possible, constrained by the available bandwidth from Chile to the US, but no longer than 24 hours (except when a network outage occurs). The typical case for Chile-to-US replication is expected to be seconds or less.

The Alert Production computations will require telemetry stored in the EFD. The design does not rely on replication for this information, however. At the Base, the local Chilean DAC EFD replica is queried for some information, but the OCS telemetry stream is also monitored for more recent changes than are reflected in the results of the query. This essential data is then sent along with the image data to the Archive for processing. If the replication proves to have sufficiently low-latency and be sufficiently reliable, it will be easy to switch to an alternate mode where the US DAC EFD replica is queried for the information of interest.

2.3Alert Production Hardware

We now describe the detailed operations performed by each Alert Production infrastructure component. The sequence of operations for a typical visit is shown in Figure .

All DM hardware is monitored by DM system administration tools, which publish results via the Archive DM Control System. Each machine verifies its software installation on boot (e.g. via hash or checksum).