Reference Files, Parameter Files and CRDS
The JWST pipeline uses version-controlled reference files and parameter files to supply pipeline steps with necessary data and set pipeline/step parameters, respectivley. These files both use the ASDF format, and are managed by the Calibration References Data System (CRDS) system.
Most pipeline steps rely on the use of reference files that contain different types of calibration data or information necessary for processing the data. The reference files are instrument-specific and are periodically updated as the data processing evolves and the understanding of the instruments improves. They are created, tested, and validated by the JWST Instrument Teams. The teams ensure all the files are in the correct format and have all required header keywords. The files are then delivered to the Reference Data for Calibration and Tools (ReDCaT) Management Team. The result of this process is the files being ingested into the JWST Calibration Reference Data System (CRDS), and made available to users, the pipeline team and any other ground subsystem that needs access to them.
Information about all the reference files used by the Calibration Pipeline can be found at Reference File Information, as well as in the documentation for each Calibration Step that uses a reference file. Information on reference file types and their correspondence to calibration steps is described within the table at Reference File Types.
Parameter files, which like reference files are encoded in ASDF and version-controlled by CRDS, define the ‘best’ set of parameters for pipeline steps as determined by the JWST instrument teams, based on insturment, observing model, filter, etc. They also may evolve over time as understanding of caibration improves.
By default, when running the pipeline via
strun or using the
method when using the Python interface, the appropriate parameter file will be determined
and retrieved by CRDS to set step parameters.
Calibration References Data System (CRDS) is the system that manages the reference files that the pipeline uses. For the JWST pipeline, CRDS manages both data reference files as well as parameter reference files which contain step parameters.
CRDS consists of external servers that hold all available reference files, and the machinery to map the correct reference files to datasets and download them to a local cache directory.
When the Pipeline is run, CRDS uses the metadata in the input file to determine the correct reference files to use for that dataset, and downloads them to a local cache directory if they haven’t already been downloaded so they’re available on your filesystem for the pipeline to use.
The environment variables `crds_context` and `crds_server` must be set before running the pipeline
Reference Files Mappings (CRDS Context)
One of the main functions of CRDS is to associate a dataset with its best
reference files - this mapping is referred to as the ‘CRDS context’ and is
defined in a
pmap file, which itself is version-controlled to allow access to
the reference file mapping at any point in time, and revert to any previous set
of reference files if desired.
The CRDS context is usually set by default to always give access
to the most recent reference file deliveries and selection rules - i.e the
‘best’, most up-to-date set of reference files. On occasion it might be
necessary or desirable to use one of the non-default mappings in order to, for
example, run different versions of the pipeline software or use older versions
of the reference files. This can be accomplished by setting the environment
CRDS_CONTEXT to the desired project mapping version, e.g.
$ export CRDS_CONTEXT='jwst_0421.pmap'
For all information about CRDS, including context lists, see the JWST CRDS website:
The CRDS server can be found at
Inside the STScI network, the pipeline defaults are sufficient and no further action is necessary.
To run the pipeline outside the STScI network, CRDS must be configured by setting two environment variables:
CRDS_PATH: Local folder where CRDS content will be cached.
CRDS_SERVER_URL: The server from which to pull reference information
To setup to use the server, use the following settings:
Setting CRDS Environment Variables in Python
The CRDS environment variables need to be defined before importing anything
crds. The examples above show how to set an environment variable in
the shell, but this can also be done within a Python session by using
In general, any scripts should assume the environment variables have been set before the scripts
have run. If one needs to define the CRDS environment variables within a script,
the following code snippet is the suggested method. These lines should be the first
os.environ['CRDS_PATH'] = 'path_to_local_cache'
os.environ['CRDS_SERVER_URL'] = 'url-of-server-to-use'
# Now import anything else needed