NEWFIRM Data Reduction Pipeline Overview
Mark Dickinson
Current draft: 27 August 2004

Introduction

This document is an in-progress record of notes about the NEWFIRM data processing pipeline. It is intended to describe the main issues that will govern the form and operation of the pipeline, the calibration reference files that will be created and used, and the data products that the pipeline will generate. The document also serves as an outline for the steps in the pipeline. All of these things will be filled out in progressively more detail, and links can be made to other documents being developed in parallel.

Contents:


Calibration reference frames:

The following calibration reference information is needed in order to process NEWFIRM data through pipeline calibration. Here, we assume that these are "static" reference files that would apply to data taken during a given night (or at least for a given block of observations). This is in contrast to potentially time-variable information needed to characterize and remove other instrumental signatures, such as the sky background and photometric calibration. These static reference frames would be generated from calibration data taken using standardized procedures during each night (darks, flats) or less frequently if the relevant signature is stable (linearity, geometric distortion). Here, we describe the expected characteristics of each instrumental signature and the type of calibration data that are needed to correct that signature. We highlight the metadata needed for pipeline modules to construct the reference file from a particular calibration data set, and briefly note what kind of pipeline steps are needed to carry this out.

1) Static bad pixel masks

Masks are needed that identify the locations of known bad or unstable pixels in the array. These pixels might be: Generally, these should be fairly stable with time, but a regular program of instrument calibration should include a procedure for monitoring bad pixels on the array and updating the bad pixel masks.

2) "Dark" frames

For infrared arrays, "dark" frames usually are used to measure both the true dark current and the two-dimensional bias structure, both of which are usually functions of integration time. We will call these "dark frames" here, but in practice for modern arrays it is often the case that the main effect that one is trying to remove is the signature of the bias.

Observers should take dark frames using the same integration times that are used for science (or any other) observations. If these are not available for a given exposure time, we can explore using frames with the nearest exposure time, or interpolating between other exposure times, but this would require testing.

3) Linearity calibration

Infrared arrays are generally inherently nonlinear. The count rate measured from a source with a given intensity will vary depending on the number of electrons already accumulated within a pixel. This is usually characterized by a function, ideally calibrated for each pixel on the array but sometimes just characterized for the array as a whole, which relates the number of counts measured in a pixel to the number of counts that would have been accumulated if the array behavior had been linear. It is often assumed (ideally, based on experimental data), or at least *defined*, that the accumulation of counts is linear when the pixel is "empty" (i.e., at low count levels), and becomes increasingly nonlinear at higher count thresholds. Often this behavior then rolls over or saturates at high count levels, when the array becomes severely nonlinear; data are often corrected up to and then are regarded as "saturated" beyond that point.

An additional complication is that infrared arrays are usually operated in a mode such as double-correlated sampling or Fowler sampling. In double-correlated sampling, the array is first reset electronically. After a short interval, each pixel is read non-destructively, and its value is recorded (the "zeroth read"). The exposure continues, and each pixel is then read again, after the desired exposure time has elapsed since the zeroth read. Generally, the difference between the two reads is computed on-board the instrument, and only the difference between these two values is saved. (Sometimes, it is possible to save both the initial and final read values as separate images using an engineering mode for the instrument.) In the case of Fowler sampling, the array is read multiple times during the "zeroth" and final reads, and the results are summed or averaged in order to reduce the effective readout noise. The signal that was accumulated in a pixel during the zeroth read is subtracted away and the actual value is lost (unless that readout is saved separately). However, it should not be neglected when computing the total signal accumulated in the pixel for the purposes of linearity correction. Generally, this zeroth read signal must be estimated from the total signal accumulated the pixel, converted to a count rate, and multiplied by the time interval between pixel reset and the zeroth read. Ideally, this estimate requires a correction itself (often computed iteratively) because the measured counts in the difference image are themselves nonlinear, and do not accurately represent the initial count rate when the array was more linear at low count levels.

Linearity calibration is usually measured by taking a series of exposures of a uniformly, stably illuminated source such as the dome flat field white spot or (if the white spot illumination is not stable) an opaque source that provides relatively constant thermal emission in a long-wavelength filter (generally K-band or a narrower filter in the 2 to 2.5 micron wavelength region). Frames are taken with gradually increasing exposure times, so that more counts are accumulated with each time increment. If the illumination were constant and the array were linear, the recorded counts would increase linearly with increasing exposure time. Deviations from this trend are measured from the data (ideally on a per-pixel basis), and some function is fit to expected (linear) counts vs. measured counts, or to (measured/expected) count rate vs. measured counts. Dark frames must be taken and subtracted for each exposure time in the sequence to remove the pedestal level (generally dominated by electronic bias effects, not actual dark current signal). An additional complication is that if the illumination source is not precisely constant with time, then one should intersperse exposures taken with a fixed reference time between the exposures with increasing exposure time, in order to monitor the illumination count rate. Then, one fits fits a time-varying function to this to account for the resulting variation in the "expected" linear counts.

4) Flat fields

Flat fielding may be achieved using exposures of the illuminated dome spot ("dome flats"), twilight sky ("twilight flats"), or by combining dithered exposures of the dark sky ("dark sky flats").

Testing with previous IR instruments shows that it is hard to know a priori which sort of flat will work best for a given filter. E.g., for both IRIM and FLAMINGOS KPNO, it has been reported that dome flats work best at K-band, while dark sky flats work best at J and H. The testing is usually done by dithering stars through many positions over the array area during photometric conditions (the so-called "thousand points of light test"), then reducing the data using the various flats that are available, and finally performing photometry to see which type of flat field minimizes the scatter in photometry from place to place over the array. Previous experience shows that the answer will then generally be "stable" with time - e.g., if dome flats work best, they will always work best. This suggests that these tests can be done as part of the scientific characterization and verification of the instrument, and then the best choice can then be adopted for each filter from then on as part of standard calibration procedures.

For dome flats, it is common to take exposures with the dome lamps on and others with the dome lamps off (using the same exposure times), and then take the difference of the two. By doing this, we remove not only the "dark" (+ bias) signature, but also any stray light or thermal emission that is not properly imaged through the optical path, and which could distort the "shape" of the dome flat. For twilight and dark sky flats, this is not possible. Those flats must be dark subtracted using conventional dark frames.

Another important consideration is that ideally the flat fields should be taken with a signal level (counts per pixel) that is similar to that which is achieved in the science observations, in order to avoid errors due to differential nonlinearity. If the nonlinearity can be reliably measured and corrected, then this should not really be necessary, but general lore has it that it is nevertheless a good idea. This is another issue to be tested during the scientific characterization and validation of NEWFIRM.

5) Electronic cross-talk

Some infrared arrays show electronic cross-talk behavior, such that signal on a given pixel can impact the measured counts at other places in the array, e.g., by producing electronic ghost images, or sometimes by elevating or depressing signal in other pixels (e.g. rows or columns) downstream in the readout sequence. The nature of this varies for different arrays and readout electronics, and must be tested and characterized for NEWFIRM. We do not attempt to describe possible calibrations or correction procedures here, but include this placeholder to indicate that this may be an issue for NEWFIRM. It may be possible to characterize the behavior and correct it deterministically, as is done for MOSAIC (e.g.) via cross-talk coefficients.

6) Geometric distortion

In order to combine and coadd NEWFIRM images, we will need to map pixel positions on the detector arrays to astrometric position on the sky. This mapping will likely include nonlinear terms, which we describe here as geometric distortion, such that simple, linear image translations are not sufficient to align one image to another. These should be measured using calibration data; several possibilities exist:

It is hoped that the nonlinear terms of the distortion pattern are fairly stable for NEWFIRM and can be calibrated only occasionally, eliminating the need for users to do this on a nightly or per-run basis. This needs to be tested and verified during the scientific verification of the instrument.

As part of this procedure, the relative sky positions, orientations, and pixel scales of the four NEWFIRM detectors should also be measured.


Basic pipeline procedures for science data

1) Frame-level removal of instrumental signatures:

2) Sky subtraction (first pass)

3) Sky mapping:

4) Data quality monitoring:

Interlude: cosmic ray masking

At some point, we need to identify and mask cosmic ray events and other transient pixel defects. The problem should be closely analogous to that for CCD data, e.g., MOSAIC. We might consider doing this either on a per-frame basis (with spatial filters), or during the image combination (next step below) with a sigma-rejection or "diff-detect" scheme. We do not discuss this further here, but we assume here that some procedure is used, and that masks are generated that record the positions of pixels to be excluded from the combined images.

5) Image combination (first pass):

6) Object masking:

7) Sky subtraction (second pass):

8) [ Recalculate zeropoint scaling? ]

9) Image combination (second pass):


Data products

Here, we discuss the data products that we expect the pipeline to produce.

Image products for reduced data:

HST/NICMOS images offer a possible model for the calibrated science images from NEWFIRM. It is not necessarily the case that we want to have all five of these for NEWFIRM data, and perhaps not all of them for each type of data product (see below). However, I list them all here for consideration.

We might consider recording only the first three data types for processed data products that represent single exposures, and all five for combined image mosaics. We should consider whether we think users would actually find the NSAMP and TIME images useful for combined mosaics, or if they would just be satisfied with ERROR or WEIGHT images alone.

Types of processed data:

We envision three types of processed data:


Still TBD: