‘Jun2011’ VCM Testing of MUSE Control Software

ESO Garching, 2011-06-28

J.Knudstrup

Summary

A small VCM test has been carried out June 2011. This is the first test, since the SW team was restructured, January 2011. The last VCM test of the MUSE control SW, was carried out beginningof October 2010.

Since then, the SW has undergone a major refactoring process. During this process also the comments resulting from the code review, have been taken into account.

Apart from project team restructuring and the SW refactoring, some other changes are:

  1. The MUSE consortium has accepted to adopt subversion (SVN) as configuration control system, which will replace CMM used so far for the VLTSW.
  2. For the VCM tests executed at ESO, virtual machines (VMs) are now used, as opposed to real computers. It seems that this may affect the test results, as it is considerably more heavy to execute the tests on the ‘virtual HW’, compared to the real HW. This needs to be confirmed by executing the same tests on the consortium’s IWS.

In connection with the migration and usage of SVN it seems that the consortium has experienced some issues. It would be good to summarize the issues encountered, and to propose how to improve the situation. Note, the MUSE consortium is one of the first to adopt SVN. It is however the intention from ESO’s side, within the next months to migratall SW handled via CMM, to SVN. It is positive that the MUSE project has done the migration by now.

The test carried out at ESO by the contact person, is indeed not a real VCM test as it was using the “mumtoul” locality module and not the “mumgar” module. Due to the migration from CMM to SVN, little time was available for carry out the test, and it was the intention to produce at least some preliminary results/conclusions, of the last 5-6 months of work on the control SW.

In general, due to time constraints, this test can only be considered as a ‘mini VCM test’.

The main conclusions of the test session are:

-Building the code base checked out from SVN worked without problems, good

-In general some instability issues were observed whilst working with the SW. These may have been caused by running the SW in a VM.

-A serious issue encountered is that the FITS files prodeced are invalid. It seems that there are issues in NGC to be fixed.

-A number of deliverables, still needs to be implemented:

  • The complete set of templates (to be complemented gradually over the next months).
  • The OS Status Panel is not implemented.
  • The Science Data GUI is under design, and needs to be implemented.
  • The automatic, “INSC”, based tests should be prepared.

WRT the instability issues encountered, if this is caused or partially caused by running in a VM, it should probably be a general requirement towards all VLTSW core SW, like e.g. NGC/OPT and the instrument, that it shall be possible to deploy it in a VM. Note, no more ‘real’ IWS’ are available in the VCM.

In general, the MUSE control SW team has done a great work, but there still seems to be quite some work ahead before completing the control SW.

Maybe it would be useful that the consortium prepares a small document listing all the deliverables planned to be delivered according to the SW design, and indicates the status of each, together with a realistic deadline and the amount of resources needed to complete the delivarable.

Despite the issues encountered and reported below, I believe the consortium is on the right track!

It is desirable to carry out an additional VCM test soon,using this time the proper locality module for ESO Garching.

Note, the MUSE VCM tests so far, have been carried out, running the SW entirely in simulation, within the IWS CCS environment, though deploying together with the TCS simulator, running in a separate environment. It should be discussed, if a more realistic test environment should be used for the tests in the future, but may not be feasible, due to time constraints.

Note also, it is recommended to execute a VCM test every ~3 months. Due to the restructuring of the project team and the refafactoring of the SW, this has not be possible. It should be attempted however, to comply with this, for the remaining part of the development phase.

The details of the test, follow below. Issues encountered or highlighted with: Yellow. Serious issues encountered are highlighted with: Orange.

Observations

The following observations were made during the test. The consortium is kindly requested to fill in the feedback from consortia for each item after investigating the issue. Subsequently tickets may be filed, where needed.

1 / Building from SVN / Checking out the code base from SVN and building everything from scratch.
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
svn co $SVNREPO/tags/RC/MUSE/TOUL/0.33 MUSE / No issues, takes a while before completing due to the big calibration files.
export TARGET=NO_HW
pkginBuildmumtoul -target NO_HW / Completed without issues. Warning issues though:
Building muotsf/src: make clean all man install ... OK (warnings ignored).
2 / Starting SW/Bringing ONLINE / Starting SW from Scratch and bringing it to operational condition
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
chmod 775 /insroot/MUSE/SYSTEM/COMMON/SETUPFILES/ / Don't forget to continue to iterate on issue reported in VLTSW20110234. / OK. The following comment has been added “As this file is a temporary file, it is proposed to save file into $VLTDATA/tmp directory. Is it OK?”
muinsStart / Some error messages in the log monitor, can be addressed later. / OK.
Started TCS simulator / OK / -
Executed STANDBY from MUSE Control Panel / OK / -
Executed ONLINE from MUSE Control Panel / Received error message due to timeout in connection with NGC interaction:
bossINTERFACE.C:2841 bossERR_SUBSYSTEM_REPLY S Error reply received from sub-system DCS for command ONLINE
evhDB_CMD_SEND.C:866 evhERR_CMD_ERR_REPLY W Received an error reply to command ONLINE from process ngcocon_mungcopt on environment wmuse
ngcoconROUTE.C:488 ngcoconERR_TIMEOUT S Command ONLINE sent to ngcoits_ngc1 timed out.
ngcoconROUTE.C:488 ngcoconERR_TIMEOUT S Command ONLINE sent to ngcoits_ngc4 timed out.
ngcoconROUTE.C:488 ngcoconERR_TIMEOUT S Command ONLINE sent to ngcoits_ngc2 timed out.
ngcoconROUTE.C:488 ngcoconERR_TIMEOUT S Command ONLINE sent to ngcoits_ngc3 timed out.
This happened many times during the tests:
MUSE: Investigate if this problem can be reproduced on their IWS. If not, it may be caused by running the SW in a VM. / NGC timeout has been observed during tests in Garching in spring when VM was used. On the real HW in Grenoble, this problem never occurs.
Resubmitted STANDBY + ONLINE / System went to ONLINE/IDLE as expected
3 / Starting panels / Exercising starting the various panels with the muinsStart tool
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
muinsStart -panel OS_CONTROL / OK / -
muinsStart -panel OS_STATUS / Panel pops up stating “Under development”.
MUSE: What is the expected delivery of this? / This panel is still under development. We are iterating with ‘science’ people in consortium on mockups before implementation which should start very soon.
muinsStart -panel ICS / OK, but following log messages are printed on the shell:
USER DEBUG> DB EVENT ATTACH FAIL for @wmuse<alias>CLS1.intPower
Reason : cannot attach event to "@wmuse<alias>CLS1.intPower"
Invalid DB attribute or filter : VECTORS and TABLES use 'w' filter only
USER DEBUG> DB EVENT ATTACH FAIL for @wmuse<alias>CLS2.intPower
Reason : cannot attach event to "@wmuse<alias>CLS2.intPower"
Invalid DB attribute or filter : VECTORS and TABLES use 'w' filter only
USER DEBUG> DB EVENT ATTACH FAIL for @wmuse<alias>CLS3.intPower
Reason : cannot attach event to "@wmuse<alias>CLS3.intPower"
Invalid DB attribute or filter : VECTORS and TABLES use 'w' filter only
USER DEBUG> DB EVENT ATTACH FAIL for @wmuse<alias>CLS4.intPower
Reason : cannot attach event to "@wmuse<alias>CLS4.intPower"
Invalid DB attribute or filter : VECTORS and TABLES use 'w' filter only
USER DEBUG> DB EVENT ATTACH FAIL for @wmuse<alias>CLS5.intPower
Reason : cannot attach event to "@wmuse<alias>CLS5.intPower"
Invalid DB attribute or filter : VECTORS and TABLES use 'w' filter only
USER DEBUG> DB EVENT ATTACH FAIL for @wmuse<alias>CLS6.intPower
Reason : cannot attach event to "@wmuse<alias>CLS6.intPower"
Invalid DB attribute or filter : VECTORS and TABLES use 'w' filter only / These errors will be investigated.
muinsStart -panel DCS_ngc1 / Failed, nothing happens. Logged on stdout:
-> INIT : Reading MUSE configuration.
MUSE> START: Fri Jun 24 14:31:43 UTC 2011
MUSE> ARGS : -panel DCS_ngc1.
ngcoui: Command not found.
MUSE> Panel MUSE DCS Stand-alone Panel ngc1: started.
MUSE> END : Start end.
MUSE> END : Fri Jun 24 14:31:44 UTC 2011 / This problem will be investigated.
muinsStart -panel RTD_ngc1 / OK / -
muinsStart -panel RTD / OK / -
muinsStart -panel TCCD / OK / -
muinsStart -panel TCCD_RTD / OK / -
muinsStart -panel ALARM / OK / -
muinsStart -panel LOG / OK / -
muinsStart -panel OS_ENGINEERING / OK / -
4 / MUSE OS Engineering Panel / Exercising OS Engineering Panel
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
Pressed Science "GUI ..." / MUSE> ERROR: Unknown panel Science. / This panel does not exist yet
Pressed TCCD "GUI ..." / MUSE> ERROR: Unknown panel SgsCCD. / Panels related to slow-guiding are still under development. Should be delivered very soon.
Pressed ICS "GUI ..." / OK
5 / TCCD SW / Exercising the TCCD SW
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
muinsStart -panel TCCD
muinsStart -panel TCCD_RTD / OK / -
Execute exposure / OK, image displayed in TCCD RTD / -
6 / Restart SW/Bring ONLINE / The purpose of the test is to verify the realibility of starting up the system.
(this test should probably be repeated 5-10 times in future VCM tests)
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
muinsStop / OK / -
muinsStart / OK / -
Submit ONLINE / OK. Note, sometimes it fails; see above. / -
7 / Preset Template/SGS / Exercise the SGS via the Preset Template. Afterwards the SGS loop should be closed.
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
Load "MUSE_wfm_acq_Preset" in BOB. / OK / -
Execute "MUSE_wfm_acq_Preset" in BOB / Error message:
2011-06-27 10:06:05.429839 wmutcs prs prsControl PRESET: Command PRESET failed
2011-06-27 10:06:05.406702 wmutcs trkws trkwsControl trkwsCON_MAIN_TASK.C:2417 64 7206 1 W trkwsERR_LIMIT : Invalid Coordinates, result out of limit
2011-06-27 10:06:05.408150 wmutcs evh prsAction evhDB_CMD_SERIAL.C:668 191 7206 2 W evhERR_CMD_ERR_REPLY : Received an error reply to command OBJSTAR from process trkwsControl on environment wmutcs
Suggests to set target to:
TEL.TARG.ALPHA "000000";
TEL.TARG.DELTA "-850000";
In test/example OBs / Target coordinates will updated according to suggested values.
Redefined targets in OB / - / -
Started auto guider, active optics / - / -
Execute "MUSE_wfm_acq_Preset" in BOB / Closing the SGS loop sometimes fails:
Error 1:
ngcoconROUTE.C:488 ngcoconERR_TIMEOUT S Command ONLINE sent to ngcoits_ngc1 timed out.
Error 2:
2011-06-2714:06:18.014088wmuseclipvmuosgsControlclipvDATA_IO.C:122 2 4857 1 S clipvERR_PARAM : Parameter error (Final image event has not been received) / Error related to NGC has never been observed with real machine; probably related to VM (TBC).
The error related to final event occurs time to time, and it is probably related to LCU_SIMULATION mode of the TCCD. However, this should be confirmed. I think, it never occurs when CLIP image simulator is used.
8 / Execute observation / Execute standard observation template, SGS loop should be closedwith a preset template
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
Execute "MUSE_wfm_acq_Preset" in BOB / OK, but may fail; see above. / -
Load “MUSE_obs_Standard” OB in BOB / OK / -
Execute MUSE_obs_Standard” OB in BOB / Usually works, but failsoccationally. Once the execution got stuck for 47 minutes.
TBI if this caused by running in a VM.
MUSE: Try to reproduce by exercising this OB/template.
Sometimes the SGS loops is opened (->”OFF”) while executing this OB. The following error message is logged:
2011-06-2809:50:04.024605wmuseclipvmuosgsControlclipvDATA_IO.C:122 2 2701 1 S clipvERR_PARAM : Parameter error (Final image event has not been received)
2011-06-2809:50:04.024626wmuseclipvmuosgsControlclipvIMG_WRAPPER_FLOAT.C:228 15 2701 2 S clipvERR_DATA_IO : Data I/O Error (could not open input)
MUSE: Investigate if this can be reproduced. / Same comment as above. We need to check whether if this error is related to LCU SIMULATION or not. However, the slow-guiding, should be more robust against such error; i.e. when this error is detected, process could log error and wait for next one. And, for example, loop could be stopped if 3 consecutive errors occur.
Verify data produced / The two output images produced are illegal/corrupt.
  1. The filename is wrong. A ticket has been filed -> NGC/OPT for this (VLTSW20110170)
  2. The contents of the file is corrupt, starting after the 6 extension. A ticket has been filed -> NGC/OPT for this (VLTSW20110169).
For point 2., executed “fitsverify” on an output file[1]:
$ fitsverify /vlt/MUSE/INSROOT/SYSTEM/DETDATA/OBS_OBS179_0001.1.fits
fitsverify 4.16 (CFITSIO V3.250)
------
File: /vlt/MUSE/INSROOT/SYSTEM/DETDATA/OBS_OBS179_0001.1.fits
7 Header-Data Units in this file.
======HDU 1: Primary Array ======
228 header keywords
Null data array; NAXIS = 0
======HDU 2: Image Exten. ======
102 header keywords
SER-NO=221 16-bit integer pixels, 2 axes (4224 x 4240),

======HDU 6: Image Exten. ======
102 header keywords
SER-NO=225 16-bit integer pixels, 2 axes (4224 x 4240),
======HDU 7: Image Exten. ======
102 header keywords
SER-NO=226 16-bit integer pixels, 2 axes (4224 x 4240),
< End-of-File >
*** Error: File has extra byte(s) after last HDU at byte 215089920.
++++++++++++++++++++++ Error Summary ++++++++++++++++++++++
HDU# Name (version) Type Warnings Errors
1 Primary Array 0 0
2 SER-NO=221 Image Array 0 0
3 SER-NO=222 Image Array 0 0
4 SER-NO=223 Image Array 0 0
5 SER-NO=224 Image Array 0 0
6 SER-NO=225 Image Array 0 0
7 SER-NO=226 Image Array 0 0
End-of-file 0 1
**** Verification found 0 warning(s) and 1 error(s). **** / SPR related to these problems have been submitted. Waiting for new NGC SW fixing these bugs.
9 / Image Reconstruction / Small test of Fast Image Reconstruction.
(instructions received from G.Zins)
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
Enable fast reconstruction:
msgSend "" muoControl SETUP "-function DET.FRS.ST T"
MESSAGEBUFFER:
-1 / OK, but why is “-1” returned? / ‘-1’ is the expoId. Should be checked with Ezster, but I think that ‘-1’ means that current expoId is has been used.
Start RTD and set camera to SCIENCE_0 / OK / -
Run observation OB (MUSE_obs_Standard) / OK.
-Reconstructed images displayed continuously.
-Note, executing the observation template with FRS enabled, is extremely heavy on a virtual machine, took 2270s in this case. / -
10 / Generic Offset Template
Step / Observations / Answer/Feedback from Consortium / Comments Contact Person
Load MUSE_obs_GenericOffset in BOB / - / -
Run MUSE_obs_GenericOffset in BOB with FRS activated / Completed, but it took 3767s to complete the OB. / -
Run MUSE_obs_GenericOffset in BOB without FRS activated / OK. Completed in 318s. / -

MUSE Control SW June2011 VCM Testing – ESO Garching – 2011-06-28 – J.Knudstrup – Page 1

[1]