Appendix I.

From merrill@mira Fri Dec 10 14:11:24 1993 Date: Fri, 10 Dec 93 14:11:14 MST From: merrill@mira (Michael Merrill) To: buchholz Subject: WILDFIRE recovery" Content-Length: 27475

Here is my current version of WILDFIRE recovery procedures. "itrofflwn -ms filename" where "n" is the laser printer number would print it out. Please check for accuracy and let me know whatever mods or additions are required.

------------------------------------------------------------------------------

 

WILDFIRE System Recovery Procedures The following procedures are intended to assist the user of the KPNO Infrared Instrumentation System WILDFIRE with recognizing and diagnosing a variety of computer, IRAF, and WILDFIRE system failure modes. Specific recipes are given for restoring the system to full operation.

 

The situations covered include:

resurrecting IMTOOL/SAOIMAGE

resurrecting GTERM/XTERM

re-engaging an unresponsive GTERM/XTERM/IMTOOL/SAOIMAGE

stopping an unresponsive movie sequence

re-engaging an unresponsive mouse

resurrecting the WILDFIRE IMAGE SAVER PROCESS

unresponsive WILDFIRE CONTROL

WILDFIRE RESTART procedures

when the INSTRUMENT STATUS window has vanished

simple (WARM) RESTART

stalled system (WARM) RESTART

WARM RESTART after instrument computer re-boot

COLD RESTART after instrument power interruption

COLD RESTART after dsp heurikon box power interr- uption

SUN computer re-boot

 

RESURRECTING IMTOOL/SAOIMAGE

If the "IRAF" imtool (saoimage) image display window has disappeared, first look for its icon to see if it has merely been closed. If you find the icon, click on it with the left mouse button to re-open it. If you can't find the icon, type "ps" in any available window and look for the imtool (saoimage) display task. If you can't find it, the imtool (saoimage) display task has died.

NOTE: Temporary loss of the display task does not effect the WILDFIRE control process itself in any way. You can continue to take data while you sort things out.

As noted below, you might have to kill multiple "display" tasks after order is restored. Within the WILDFIRE observe task, selecting "none" for the "channel to be displayed" query will turn off the automatic display.

If the "IRAF" imtool (saoimage) image display window has died, you can restart it by either selecting "imtool" from the WILDFIRE submenu within the Sun main menu or go to the console window and type:

imtool & <return>

or

saoimage & <return>

to restart the process and get a new IMTOOL (SAOIMAGE) display window. Type =imcur within IRAF in the GTERM win- dow. The window will reopen and display the image cursor. Type "<return>" in the imtool window to get back to GTERM. Move the imtool window for more convenient placement. (Move the arrow cursor to the edge of the window and "grab" onto it by holding down the middle mouse button; "drag" the win- dow where you want it and release the middle mouse button)

Verify that there are no pending "display" tasks by typing:

irafproc <return>

in the GTERM or WILDFIRE CONTROL windows. Look for the "display" process in the resulting output. Kill all pending "display" process using the "kill -9 PID" command, syntax where "PID" is the number (leftmost entry on the line where the display task appeared) of the process you need to kill.

NOTE: The most likely reason for the imtool window to vanish is that two sessions were trying to display to it simultaneously - for instance the WILDFIRE automatic display option and an imexamine - there is a fairly narrow window at the start of a display when clobbering the imtool data stream will confuse the imtool to the point that it will choose to kill itself rather than enter some unpredictable state. After the imtool is successfully configured for a new image, the remainder of the data, no matter how con- fused, will just look like pixels. (Actually, due to buffering there will tend to be interleaved bands from the two images.)

RESURRECTING GTERM/XTERM

If the IRAF gterm (xterm) window has died, you can restart it by either selecting "gterm" ("xterm") from the WILDFIRE submenu within the Sun main menu or go to the con- sole window and type:

gterm & <return>

or

xterm & <return>

to get a new GTERM (XTERM) window. Then type "cl <return>" in the window to restart IRAF. Finally change back to the appropriate directory.

RE-ENGAGING AN UNRESPONSIVE GTERM/XTERM/IMTOOL/SAOIMAGE

If IRAF has hung or has otherwise gotten lost or con- fused inside its GTERM/IMTOOL (XTERM/SAOIMAGE) and graphics windows, you can use whatever it takes in the way of <CTRL- C>, logout, or kill commands on the appropriate IRAF processes to regain control.

Start by attempting a <CTRL-C> in each of the IMTOOL (SAOIMAGE), GTERM (XTERM), and graphics windows (select "show graph" from the GTERM menu as needed to re-display the graph). If you regain control, type:

flprc <return> in the GTERM window and resume opera- tions.

You can freely quit IRAF at any time by typing:

logout in the IRAF GTERM (XTERM) window

and restart IRAF by typing:

cl <return> in the IRAF GTERM (XTERM) window.

NOTE: The most likely reason for the imtool window to hang is that two sessions were trying to display to it simultane- ously - for instance the WILDFIRE automatic display option and an imexamine. There is a fairly narrow window the at the start of a display when clobbering imtool data stream will confuse the imtool to the point that it will choose to kill itself rather than enter some unpredictable state. After the imtool is successfully configured for a new image, the remainder of the data, no matter how confused, will just look like pixels. (Actually, due to buffering there will tend to be interleaved bands from the two images.)

RE-ENGAGING AN UNRESPONSIVE MOUSE

If you have lost control of the mouse, the most likely candidate is IRAF. You can restore control as follows:

o+ Login remotely to the WILDFIRE control system ( berry at the 1.3m) as "ir50inch" from whatever computer sys- tem is readily available to you.

o+ Type irafproc <return> (or "ps -a <return>" if "iraf- proc" doesn't work) and examine the IRAF processes run- ning.

o+ Kill whichever IRAF processes you need to regain con- trol of the mouse, starting with the display and then the imexam process and continuing with the rest as needed. As soon as you regain control of the mouse, exit (type "logout <return>") re-start IRAF (type "cl <return>") if necessary in its GTERM window and resume.

STOPPING AN UNRESPONSIVE MOVIE SEQUENCE

If the movie sequence has hung (you have tried pressing <return> in the WILDFIRE control window several times, but the new images just keep coming in)

o+ type exit <return> in the WILDFIRE control window.

Then complete the following SIMPLE RESTART sequence to get back to full operational control:

When you see the "berry" prompt in the INSTRUMENT CONTROL window, you can re-start the system as follows. In the INSTRUMENT CONTROL window:

o+ type go SQIID <return>

o+ answer y <return> to the question "Do you want windows?"

o+ type setup sqiid <return> . The system output will pause (order one minute) after it has issued one or more "running" statements and then resume

o+ answer y <return> to the question "Do you want to activate the array?" (if you failed to answer "y", you can activate the detectors by typing activate <return> when you see the "%" prompt). [Note: SQIID has no button to push.]

o+ type puse parameterfile to restore the saved parameterfile, including the "eask" choices (type puse sqiid if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "header directory", "pixel directory", "filename tem- plate", "display", and "picture index" values to verify that the images will be going where you want them

You should now be able to continue with your observing.

RESURRECTING THE WILDFIRE IMAGE SAVER PROCESS

If the WILDFIRE image saver process has died, go to the WILDFIRE housekeeping STATUS window and type:

saver v & <return>

to restart the process and adopt the window (the one you just typed in) for output. Don't panic if you used the WILDFIRE control window or a shelltool by mistake. You can create a new shelltool from the main Sun menu. The resultant choice of output window might be inconvenient, but will not be destructive. You can restart the saver process at any time you need to re-start it. In particular, if saver has died during an observation (e.g., it might have run out of diskspace), and as a consequence the WILDFIRE control has hung (waiting to complete the save), you can restart saver without resorting to a RESTART to regain con- trol of WILDFIRE. (Reset the WILDFIRE pixeldir or make space within the existing WILDFIRE pixeldir by typing "newImdir" in the WILDFIRE window and type in an appropriate directory as needed before restarting saver).

UNRESPONSIVE WILDFIRE CONTROL

If the WILDFIRE control process has hung or died, fol- low the appropriate RESTART procedures described below to resume operations as needed:

1) If the power to the cooler heads goes down but the com- puter power is unaffected, the WILDFIRE electronics should still be running and WILDFIRE control should not be affected. Do not cycle power on the DSP and SQIID electronics if only the cooler power goes off or glitches.

2) If WILDFIRE control is lost, follow the appropriate WARM RESTART procedure described below to resume opera- tions.

3) Use the COLD RESTART procedures described below only when problems taking data appear after a power glitch and the WARM RESTART procedures described below do not fix the problem. BE CONSERVATIVE!!! Turning off the power to the SQIID electronics should be used as a last resort, if all other attempts to restart fail. Should it become necessary to cycle the power or if a power failure causes hardware or software problems the appropriate COLD RESTART procedure below should be used to power up the system.

WILDFIRE RESTART PROCEDURES

The following procedures are intended as a guide for restoring the WILDFIRE system following varying levels of system failure:

o+ when the INSTRUMENT STATUS window has vanished

o+ simple (WARM) RESTART

o+ stalled system (WARM) RESTART

o+ WARM RESTART after instrument computer re-boot

o+ COLD RESTART after instrument power interruption

o+ COLD RESTART after dsp heurikon box power interruption

Re-booting the computer and cycling power to the instrument, the SUN instrument computer berry, or the DSP in the Heurikon box within berry's rack in the com- puter room) are not of normal WILDFIRE/SQIID operations and should not be done without proper consultation.

Follow the procedures labeled "WARM" when you are restarting

WILDFIRE when the WILDFIRE system (DSP electronics and/or SQIID electronics) has not been powered down and the pro- cedures labeled "COLD" when all or part of the WILDFIRE sys- tem has been powered down.

After a warm restart the, dark current might be a bit high (about a factor of two) for a while. This elevated dark should decay away back to normal within the hour. After a cold restart (whereby the detector has been de- activated and the entire system been re-energized), the dark current might be quite high (a factor of ten or more) for a while. This highly elevated elevated dark should decay away back to within a factor of two high within an hour and be back to normal after an additional hour has elapsed

 

Typical SQIID WARM RESTART session (user action in bold print):

go SQIID Do you want windows? [n] y LD-NET (Network Loader), Version 89.1 [Link I/O Driver: `SCIO'] Copyright (c) 1986-1989 by Logical Systems

Loading first phase of bootstrap to root node 1 Finished loading first phase, awaiting first acknowledge Loading second phase of bootstrap to root node 1 Bootstrap loaded, awaiting acknowledge Successfully bootstrapped root node 1

Bootstrapping the remainder of the network: Bootstrapping node 100 Bootstrapping node 101 Bootstrapping node 102 Bootstrapping node 103 Bootstrapping node 203 Bootstrapping node 202 Bootstrapping node 201 Bootstrapping node 200 Bootstrapping node 2 Bootstrapping node 50 Bootstrapping node 51 Bootstrapping node 10 Bootstrapping node 11 Network successfully bootstrapped

Downloading program: ../tld/SQIID/b011.tld Downloading program: ../tld/SQIID/inst.tld Downloading program: ../tld/SQIID/seq.tld Downloading program: ../tld/SQIID/dspw.tld Program downloading completed

WILDFIRE SYSTEM CONTROL LAST BUILT: Wed Sep 1 13:08:36 MST 1993 [1] 21194 in time for j = 0.44. in time for h = 0.44. in time for k = 0.44. in time for l = 0.44. NOTE: all paramters should be given in jhkl order. Be sure to set itvoffset before using sqtv or sqiid.

% setup sqiid in time for j = 0.44. in time for h = 0.44. in time for k = 0.44. in time for l = 0.44. Killed. Killed.

Killed. Killed. in time for j = 0.44. in time for h = 0.44. in time for k = 0.44. in time for l = 0.44. 0 128 256 384 512 640 768 896 1024 1152 1280 1408 running running running running [NOTE: system output will pause here of order a minute before resuming] Array 0, Even Data Offset 0 set to 0.088 Array 1, Even Data Offset 0 set to 0.285 Array 2, Even Data Offset 0 set to 0.052 Array 3, Even Data Offset 0 set to 1.005 Array 0, Odd Data Offset 0 set to -0.019 Array 1, Odd Data Offset 0 set to -0.019 Array 2, Odd Data Offset 0 set to -0.010 Array 3, Odd Data Offset 0 set to -0.019 setbias: rvalue 1.478925 Array 0, Detector Bias 0 set to 3.007 setbias: rvalue 1.506375 Array 1, Detector Bias 0 set to 3.010 Do you want to activate the array? (y or [n]) y setbias: rvalue 1.521930 Array 2, Detector Bias 0 set to 3.009 setbias: rvalue 1.525285 Array 3, Detector Bias 0 set to 3.009 Push the button, Max! [NOTE: SQIID has no button to push] % Array 1 activated, Array 2 activated, Array 3 activated, Array 4 activated,

% ask Integration time (s): [5 5 5 1] 5 5 5 2 Filename: [test%03d%s] n1s001 Header Directory: [/data2/ir50inch/sqiid1] Pixel Directory: [/data2/pixels] Picture index: [0 0 0 0] 1 % display j Now displaying j. % plist Number of coadds 1 1 1 1 Number of lnrs 1 1 1 1 Number of pictures 1 Integration time (seconds) 5 5 5 2 Filename template n1s001%03d%s Header Directory /data2/ir50inch/sqiid1 Pixel Directory /data2/pixels Process mode stare stare stare stare

Picture index 1 1 1 1 Microcode sqdptsi Channels to display j RA of object 0:00:00 DEC of object 0:00:00 EPOCH of object 1950 Observation offset 0 Type of observation object Current airmass 1 Object name dark Comment none Image name list file /tmp/list Title Header Var 1 Header Var 2 Header Var 3 Header Var 4

INSTRUMENT STATUS WINDOW HAS VANISHED:

 

If the INSTRUMENT STATUS window has vanished, you should first check to see if it has simply been closed. Type `fireproc' from an active window (type ! fireproc if the command is not found) and look for the sundaemons process.

If the process is present, the window has been closed and you need to find it and open it (if the icon is not visible, then it is probably behind one of your open windows). Inside OpenWindows, you can check the "windows" item in the menu for the status of all operating windows. If the Instrument Status window is present, open it, and continue with your observing.

If the Instrument Status window has died, perform the Simple Restart procedure listed below.

SIMPLE (WARM) RESTART:

(NOTE: if the power to the instrument and/or the DSP Heuri- kon box in the computer room has been interrupted or the computer has been re-booted, this procedure might fail. Look below for more specific procedures.)

If WILDFIRE has crashed (INSTRUMENT STATUS window has van- ished (and could not be found by the above procedure) and/or the "berry" prompt has returned to the INSTRUMENT CONTROL window) the following steps should restore operation within the INSTRUMENT CONTROL window:

o+ If you see the "%" prompt, first type "exit" to cleanly exit the processes and get the "berry" prompt.

When you see the "berry" prompt in the INSTRUMENT CONTROL window, you can re-start the system as follows. In the INSTRUMENT CONTROL window:

o+ type go SQIID <return>

o+ answer y <return> to the question "Do you want windows?"

o+ type setup sqiid <return> . The system output will pause (order one minute) after it has issued one or more "running" statements and then resume

o+ answer y <return> to the question "Do you want to activate the array?" (if you failed to answer "y", you can activate the detectors by typing activate <return> when you see the "%" prompt). [Note: SQIID has no button to push.]

o+ type puse parameterfile to restore the saved parameterfile, including the "eask" choices (type puse sqiid if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "header directory", "pixel directory", "filename tem- plate", "display", and "picture index" values to verify that the images will be going where you want them

You should now be able to continue with your observing.

STALLED SYSTEM (WARM) RESTART:

 

If WILDFIRE is hung (INSTRUMENT CONTROL window unresponsive and data collection stalled):

o+ type <control C> <control C> in the INSTRUMENT CONTROL window

o+ type hung several (2 or 3) times in the CONSOLE WINDOW until either the WILDFIRE "%" prompt or the UNIX "berry" prompt returns in the INSTRUMENT CONTROL win- dow.

o+ if you see the "%" prompt first, type exit <return> in the INSTRUMENT CONTROL window; the berry prompt will return.

When you see the "berry" prompt in the INSTRUMENT CONTROL window, you can re-start the system as follows. In the INSTRUMENT CONTROL window:

o+ type go SQIID <return>

o+ answer y <return> to the question "Do you want windows?"

o+ type setup sqiid <return> . The system output will pause (order one minute) after it has issued one or more "running" statements and then resume

o+ answer y <return> to the question "Do you want to activate the array?" (if you failed to answer "y", you can activate the detectors by typing activate <return> when you see the "%" prompt). [Note: SQIID has no button to push.]

o+ type puse parameterfile to restore the saved parameterfile, including the "eask" choices (type puse sqiid if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "header directory", "pixel directory", "filename tem- plate", "display", and "picture index" values to verify that the images will be going where you want them

You should now be able to continue with your observing.

WARM RESTART AFTER INSTRUMENT COMPUTER RE-BOOT:

 

If the instrument computer has been rebooted (but the black DSP Heurikon box within berry's rack in the computer room has remained on):

o+ login as "ir50inch" on the instrument computer (berry) with password

o+ answer <return> to the "term" prompt then,

o+ if you want to use Sunview, type "st" or,

o+ if you want to use Openwindows, then type "op".

o+ start IRAF by typing "cl < return>" in the IRAF Gterm/Xterm window.

When you see the "berry" prompt in the INSTRUMENT CONTROL window, you can re-start the system as follows. In the INSTRUMENT CONTROL window:

o+ type go SQIID <return>

o+ answer y <return> to the question "Do you want windows?"

o+ type setup sqiid <return> . The system output will pause (order one minute) after it has issued one or more "running" statements and then resume

o+ answer y <return> to the question "Do you want to activate the array?" (if you failed to answer "y", you can activate the detectors by typing activate <return> when you see the "%" prompt). [Note: SQIID has no button to push.]

o+ type puse parameterfile to restore the saved parameterfile, including the "eask" choices (type puse sqiid if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "header directory", "pixel directory", "filename tem- plate", "display", and "picture index" values to verify that the images will be going where you want them

You should now be able to continue with your observing.

 

COLD RESTART AFTER INSTRUMENT POWER INTERRUPTION:

 

If the power to the instrument was interrupted (but the black DSP Heurikon box within berry's rack in the computer room stayed on and the computer was not re-booted), in the INSTRUMENT CONTROL window:

o+ type powerup SQIID <return>

o+ when instructed, turn on power to the instrument

o+ answer y <return> to the question "Do you want windows?"

o+ type setup sqiid <return> . The system output will pause (order one minute) after it has issued one or more "running" statements and then resume

o+ answer y <return> to the question "Do you want to activate the array?" (if you failed to answer "y", you can activate the detectors by typing activate <return> when you see the "%" prompt). [Note: SQIID has no button to push.]

o+ type puse parameterfile to restore the saved parameterfile, including the "eask" choices (type puse sqiid if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "header directory", "pixel directory", "filename tem- plate", "display", and "picture index" values to verify that the images will be going where you want them

You should now be able to continue with your observing.

o+ "setup sqiid", answering "y" to the question about detector activation

o+ "puse parameterfile" to restore the parameterfile you had been using (type "puse sqiid" if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "pixeldir", "headerdir", filename, and nextpix values to verify that the images will be going where you want them

You should now be able to continue with your observing.

 

COLD RESTART AFTER DSP HEURIKON BOX POWER INTERRUPTION:

 

If the black DSP Heurikon box in berry's rack in the com- puter room box has been powered down, then you must:

o+ turn off power to the instrument

o+ verify that the DSP box is turned on

o+ reboot the instrument computer (berry) by typing <L1 A> or <Stop A>

o+ login as "ir50inch" on the instrument computer (berry) with password

o+ answer <return> to the "term" prompt then,

o+ if you want to use Sunview, type "st" or,

o+ if you want to use Openwindows, then type "op".

o+ start IRAF by typing "cl <return>" in the IRAF Gterm/Xterm window

In the INSTRUMENT CONTROL window type:

o+ "coldstart" (If it takes longer than a few seconds to complete, a major problem has occurred and you should turn off the DSP box and start this procedure again.)

o+ type powerup SQIID <return>

o+ when instructed, turn on power to the instrument

o+ answer y <return> to the question "Do you want windows?"

o+ type setup sqiid <return> . The system output will pause (order one minute) after it has issued one or more "running" statements and then resume

o+ answer y <return> to the question "Do you want to activate the array?" (if you failed to answer "y", you can activate the detectors by typing activate <return> when you see the "%" prompt). [Note: SQIID has no button to push.]

o+ type puse parameterfile to restore the saved parameterfile, including the "eask" choices (type puse sqiid if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "header directory", "pixel directory", "filename tem- plate", "display", and "picture index" values to verify that the images will be going where you want them

You should now be able to continue with your observing.

o+ "setup sqiid", answering "y" to the question about detector activation

o+ "puse parameterfile" to restore the parameterfile you had been using (type "puse sqiid" if you don't remember what it was)

o+ use "ped" or "ask", paying special attention to the "pixeldir", "headerdir", filename, and nextpix values to verify that the images will be going where you want them

You should now be able to continue with your observing.

 

SUN COMPUTER RE-BOOT

If it becomes necessary to re-boot the Sun computer, a task which should be handled by Mountain Computer Support Staff, the WILDFIRE system should be safed first by typing:

exit <return>

in the WILDFIRE CONSOLE window.

Please consult with either the Mountain Computer Support Staff or your designated SQIID support person before re- booting the Sun computer. Indiscriminate re-booting of the Sun computer can have a deleterious impact on SQIID observa- tions.

To safely re-boot the Sun computer from its console press:

<L1> and <A> keys together, then type:

G 0

Then, after the disk sync is complete press:

<L1> and <A> keys together again, then type:

b

Tape drive hang-ups can usually be cleared during the day with minimal impact on SQIID operations.

After the Sun Computer has been re-booted, follow the WARM RESTART AFTER INSTRUMENT COMPUTER RE-BOOT procedure above.

 


[Return to main page ] [ Go to previous page ]

National Optical Astronomy Observatories, 950 N. Cherry, PO Box 26732, Tucson, AZ 85726, Phone: 520-318-8000, FAX: 520-318-8360

Posted: 23Mar98