Prepared by Ross N. Hoffman, Phil Partain, Tanya Peevey, and Steve Finley
November 3, 2018
The 9-km ECMWF Cubic Octahedral (O1280) grid Nature Run (ECO1280) created by ECMWF and hosted by CIRA/CSU is a single uninterrupted 14-month (10248-h) long forecast made with a circa 2016 version (IFS cycle 43r1) of the operational global deterministic ECMWF atmospheric forecast model.
The ECO1280 may be used for any research purpose. Interested researchers should acquire the ECO1280 by contacting CIRA/CSU to describe planned usage and to obtain data access instructions. (See the FAQ “How do I download the ECO1280?”) Resulting publications or presentations must acknowledge ECMWF and CIRA/CSU. (See the FAQ “What are proper acknowledgments to state when using the ECO1280?”)
This FAQ should answer your questions about OSSEs, NRs, and the ECO1280. This FAQ will explain how to get, use, and acknowledge ECMWF for their efforts in developing the ECO1280. Along the way this FAQ offer pointers to additional information about the ECMWF model, necessary software resources, and more. If you have a question not answered or inadequately answered by this FAQ, please write to ECO1280@colostate.edu so we can improve it.
The following people have made contributions to planning and executing the ECO1280 project, including: Robert Atlas (NOAA/AOML), Ross N. Hoffman (NOAA/AOML), Lars Isaksen (ECMWF), Nils Wedi (ECMWF), Sylvie Malardel (ECMWF & Meteo-France), Pedro Maciel (ECMWF), Lidia Cucurull (NOAA/AOML), Tanya Peevey (CIRES and NOAA/ESRL), Christian Kummerow (Colorado State University), Philip Partain (CIRA/Colorado State Univ.), Steve Finley (CIRA/Colorado State Univ.), and more to come.
Observing System Simulation Experiments (OSSEs) extend the concept of Observing System Experiments (OSEs) to new sensors. OSEs are data denial experiments to determine the impact of existing observing systems. In a similar approach, OSSEs determine the impact of new observing systems by performing data denial experiments that assimilate synthetic observations simulated from a realistic Nature Run (NR). The NR is used in simulation experiments as an alternative reality. That is, in an OSSE, the NR is taken to be the truth, we simulate all observations, including those observations from sensors that do not yet exist, by sampling the NR. We then conduct experiments with and without the proposed sensors to see what their impact is on improving analyses and forecasts.
According to Hoffman and Atlas (2016) OSSEs “provide a rigorous, cost-effective approach to evaluate the potential impact of new observing systems and alternate deployments of existing systems, and to optimize observing strategies. They are also used to prepare for the assimilation of new types of data in order to accelerate their application to operational prediction, as well as to optimize the assimilation of existing data.” Further “For the OSSEs to produce accurate quantitative results, all of the components of the OSSE system must be realistic.”
A Nature Run (NR), which is used to represent the atmosphere, is a long forecast. Ideally a NR is generated by a state-of-the-art numerical model and realistically represents all phenomena that affect the new observing system. Since no forecast system is perfect, there should be realistic differences between the NR model and the model used for assimilation and forecasting. After a couple of weeks, a NR will no longer be a good forecast of reality. For all researchers the value of the new ECMWF NR is that the ECMWF model is very advanced and complete and accurate. For NOAA researchers, an additional value for the reliability of future OSSE results is that there are some differences between the ECMWF and NOAA models, just as there are differences between reality and any model.
The ECMWF atmospheric forecast model is hydrostatic, spectral, and uses a hybrid vertical coordinate. In the ECMWF Integrated Forecast System (IFS), the Earth is a sphere of radius 6,371,229 m, and gravity is a constant, g=9.80665 m/s2.
The ECMWF NR (ECO1280) uses the TCo1279L137 configuration of the ECMWF Integrated Forecast System (IFS) model. This model version has 9-km (average) resolution in gridpoint space, T1279 spectral truncation, and 137 vertical layers. For physical processes (convection, radiation, etc.) the ECO1280 uses an octahedral reduced Gaussian grid (O1280). The vertical model structure is a hybrid sigma-pressure coordinate system. This configuration, denoted IFS cycle 43r1, was operational between Nov 2016 and July 2017. ECO1280 starts 0000 UTC 30 Sep 2015 with the rapid intensification of Hurricane Joaquin (See Fig. 1).
The model uses a spectral coefficient representation for some calculations, but basically the model is defined on the transform horizontal O1280 grid on the “full” vertical levels (aka layer midpoints). The “half” vertical levels are the interfaces between the layers. Some quantities (e.g., pressure, geopotential height) are defined on both the half and full levels.
Essentially, all that is needed to fully describe the atmosphere and its time rate of change in the ECMWF model atmosphere are the 3D prognostic variables (temperature, wind components, specific humidity, cloud hydrometeors (droplets, rain, ice, snow), cloud fraction, and ozone) on the transform grid at the full levels, and the 2D prognostic variables (surface pressure and other surface and ground variables) and the prescribed 2D fields (SST, the surface topography) on the transform grid. All of these fields are archived in the ECO1280 distribution. For the convenience of users, additional diagnostic fields (such as geopotential height and vertical velocity at full levels) that are complicated to calculate are also included. Actually running the ECMWF model requires additional information describing the model, as well as various physical constants (g, Rdry, etc.). Also, during the model time integration some quantities at previous time steps are also used by the model to ensure numerical stability of some dissipative processes—these time-lagged quantities must be included in restart files.
The ECO1280 should be used with caution above the lower stratosphere due to large explicit damping in this region. From the surface to level 30 (~11 hPa) the horizontal diffusion is very weak and increases with wavenumber to avoid a build up of energy at the smallest resolved scales. From level 30 to level 16 (~1.5 hPa), this “spectral viscosity” increases with height. Finally, from level 15 to 1, the explicit diffusion for the divergence (only) becomes very large in a so-called “sponge layer” in order to damp out upward propagating gravity waves that would otherwise be reflected from the top of the model.
The ECO1280 distribution includes all quantities that are archived operationally. The GRIB codes and variable definitions are listed in spread sheet ECO1280-GRIB of excel workbook ECO1280(also in ECO1280-GRIB or ECO1280-GRIB). Each row of this table contains the GRIB parameter ID variable name, units, and which collection it is in. (See the FAQ “What are the collections?”)
Compared to the usual operational practice, GRIB ID 54 contains the natural log of full level pressure, and the following three additional special variables are also archived:
These include: (a) Two experimental fields—the 3d Gaussian grid fields of convective rain and snow flux (using experimental product GRIB codes 101 and 102)—to permit simulating all-sky radiances following Geer et al. (2010a) and Geer et al. (2010b). (b) The geopotential height (geopotential divided by g) at model full levels (GRIB code 156, was 3008 in earlier NRs). This is a diagnostic quantity, but its calculation is complicated.
The user should note that the variables are organized into the following collections of similar or related variables. At each forecast time there is one archive file for each collection. The vorticity and divergence spectral components are in a separate collection to allow the use of the ‘int’ tool described in the FAQ “How do I convert the ECO1280 to a regular latitude-longitude grid?” The collections are:
ml_spec: Spectral fields on model levels (T1279) – One file is about 2.3 Gb.
ml_spec_vo_d: Spectral fields for vorticity and divergence on model levels (T1279) – One file is about 0.9 Gb.
lnsp_spec: Spectral fields at the surface (T1279) – One file is about 3.3 Mb.
ml_reducgg: Grid point fields on model levels (O1280) – One file is about 11.7 Gb.
surf_reducgg: Surface fields (O1280): – One file is about 1.7 Gb.
These collection names make up part of the GRIB filenames.
All ECO1280 variables are saved every 3 h. During the first month, data is saved every 1 h. ECMWF saves a full restart file at the “end of each month”, which is defined to be 00 UTC on the last day of the month. (The list of restart times is: 744, 1464, 2208, 2952, 3648, 4392, 5112, 5856, 6576, 7320, 8064, 8784, 9528, 10248.) Plans are to rerun at least one additional month at 1-h resolution, during a period with intense mesoscale convective activity over continental U.S. An additional month at 1-h resolution during the autumn of 2016 may be archived as a second hurricane case.
Note that accumulation, minimum and maximum type variables depend on the archive interval and will be different for 1-h and 3-h archiving. These include:
This only affects the surf_reducgg collection. Otherwise the 3-hourly files are identical. At CIRA/CSU for months with 1-h archiving, we only store the 1-h archive files. A researcher interested in using the 3-h archive for these months only has to download every third file, unless she is interested in one of the variables depending on the post-processing interval. In that unusual case, the researcher would have to download all the surf_reducgg files, and calculate the 3-h version of these fields from the 1-h version.
ECMWF is a so-called “spectral” model, so the main variables—wind, temperature, ln(ps)—are really represented by spectral coefficients and gridded values are derived as needed. Most physics happens at the Gaussian grid, designed so that transforms from spectral to grid and back again will be accurate. Traditionally “accurate” meant “will not alias quadratic calculations” when transformed back to spectral space. But most physics is more nonlinear than that and the O1280 grid is designed to be more accurate in this sense.
In “T1279” the T refers to triangular spectral truncation and 1279 is the maximum wavenumber included in the truncation. Triangular truncation provides a uniform effective horizontal resolution over the sphere and is preferred to represent topography (Hoskins, 1980). The T1279 truncation has a horizontal resolution of 9-km. However, there are several possible ways to convert spectral truncation to horizontal resolution and the effective resolution of the ECO1280 is coarser than 9 km. See Laprise (1992).
The O1280 grid is a cubic octahedral reduced Gaussian grid with 1280 latitudes in each hemisphere. Compared to previously used transform grids, this grid provides more resolution in physical (as opposed to spectral) space, but reduces the number of grid points along latitude circles toward the pole more aggressively. See the discussion in the ECMWF Software Wiki entry about the O1280 grid for more information. The spread sheet GG1280 in file ECO1280 (also in ECO1280-GG1280 or ECO1280-GG1280) contains the Gaussian latitudes and m, the number of grid points for each latitude. At each latitude, the longitudes begin at 0° and increase by steps of 360°/m. Since the number of points per latitude is generally not a multiple of (2, 3, 5), the “old” Temperton (1983) FFT992 is replaced with FFTW.
The vertical model structure is a hybrid sigma-pressure coordinate system. In this system levels follow topography near the surface and are constant pressure levels in the upper atmosphere. There is a smooth transition in between. See Fig. 2.
We prefer to think of the vertical structure in terms of layers and interfaces. However, historically, the ECMWF model vertical structure is described in terms of levels—full levels for the layers and half-levels for the interfaces. In this nomenclature, the surface is level 137½ and the top of the model is at p½=0 hPa. The geopotential height at full level 137 is typically 10 m above the surface elevation, zs. The model top is p=2 Pa (0.02 hPa) at level 1.
The main prognostic variables—wind, temperature, humidity, cloud water variables—are defined at the full levels. During model integration, some diagnostic variables—vertical velocity—are defined at the half levels and some—geopotential height—are defined at both. However, all ECO1280 variables are saved at full model levels with the exception of the surface quantities, including the two surface fields—surface geopotential and ln(surface pressure)—that are stored as spectral coefficients.
The pressure level for each hybrid level for the case of surface pressure = 1013.25 hPa can be found in the L137 spread sheet of the excel workbook ECO1280 (also in ECO1280-L137 or ECO1280-L137). The L137 spread sheet also includes the an and bn coefficients that specify the half level pressures in term of surface pressure as
Full-level pressure is included in the ECO1280 data because it depends, in a complicated way, on the vertical discretization scheme used in the model.
The key conditions of the ECMWF license to CIRA/CSU are:
Researchers who acquire the ECO1280 from CIRA/CSU agree to the following conditions:
State that “We thank ECMWF for producing and CIRA/CSU for distributing the T1279 ECMWF Nature Run.”
First, request an account by writing to us at ECO1280@colostate.edu. Please use the following as a template for your request.
Please provide an account to download ECO1280 files from CIRA/CSU. We/I plan to use the ECO1280 in the following experiments. [Describe your experiments here.] We/I plan to download ECO1280 data for the time period [give your NR time period] from IP address [give your IP address]. We/I agree to the terms and conditions of use of the ECO1280: [Insert here a copy of the list of conditions that researchers who acquire the ECO1280 from CIRA/CSU agree to from the FAQ “What are the conditions attached to the ECO1280?”.]
Thank you. My name below serves as my signature,
[Your name here]
You will then be sent instructions on how to connect to our FTP server and retrieve the files. We use the SFTP protocol, and recommend using the LFTP application to make downloading many large files easier.
You probably do not need the entire ECO1280 for your experiments. Scale the following by the number of months you will download. Multiple by 3 for 1-h archived data.
Storage required for 1 month of data at 3-h archiving includes:
In practice you may not have to maintain all of these versions at your facility.
The ECO1280 is stored in GRIB format. Files are organized by collection and forecast hour.
The model actually runs in 64 bit precision but for most fields GRIB uses 16 bit accuracy. For some (e.g., cloud cover) 8 bit accuracy is used. (You can use ‘grib_ls -P bitsPerValue filename’ to inquire about this parameter.)
A specific file name is gxuz_ml_spec_vo_d_526.grb.
A typical file name is of the form expName_collection_hour.grb. Here _collection_ is of the form _levType_dataRepresentation_. The meaning of the tokens in the ECO1280 filenames are:
That depends. First, estimate how big your download will be. See the FAQ “How big is the ECO1280?” When we downloaded the ECO1280 from ECMWF, we had the best performance downloading from the ECMWF operational dissemination.ecmwf.int ftp server using lftp, which essentially runs multiple parallel ftp sessions. A typical lftp session would be:
Transfer rates on the order of 1 TB/day across the Atlantic are achievable and you should be able to exceed that within the US high-speed networks. For example, lftp transfers between NOAA/Boulder-DMZ and the Theia supercomputer achieved transfer rates of 3.5 TB/day.
The following data files are in the Constants directory on the FTP server:
The Earth in IFS is a sphere of radius 6,371,229 m, and gravity is a constant, g=9.80665 m/s2.
It is best to always interpolate spectral fields to the model Gaussian grid and then interpolate to the desired latitude-longitude grid. The easiest way to do this is to download a Dockerfile from https://github.com/CSU-CIRA/ECO1280_environment that will do the work of obtaining and installing the software and required libraries into a Docker container. Otherwise, to manually install the software on your computer follow the instructions below.
Scripts that can be downloaded here make use of the EMOSLIB ‘int’ tool to convert spectral and gaussian grids to latitude-longitude. Therefore you will need to download and install EMOSLIB, ecCodes, and FFTW from:
Additional information on ecCodes may be found here:
Both FFTW and ecCodes must be installed before installing EMOSLIB. We found it useful to add ‘>& log-xxx.txt &’ where xxx is the name of the build step (and maybe an attempt number) to the end of each command in order to record the STDOUT and STDERR for troubleshooting any build issues. This command syntax should work for both the C Shell and Bourne Shell Families. Some variations on the standard installs are required for using these packages with the ECO1280. In our suggestions below you will see the following syntax: ‘>&’ to capture the command output; ‘&’ at the end of each command line to use batch execution; and ‘’ at the end of lines to indicate that the command continues on the next line. If you do use batch, do not submit the next command in a sequence until the current command is completed.
You’ll also need to have gcc, cmake, NetCDF, and NumPy installed on your computer to build these libraries. The three basic steps for each library are:
Be sure to have an installation location in mind (<install_dir> below)
FFTW must be installed as a shared library rather than a static library and needs to be built twice, as both single-precision and double-precision. The default commands along with recommended variations are shown below. The example command is for the single-precision build, remove –enable-single for the double-precision build.
The recommended steps needed to build FFTW are:
ecCodes is installed as follows:
EMOSLIB is installed as follows:
Specify the location of both the ecCodes and FFTW library and turn on DENABLE_INSTALL_TOOLS, which is needed to include the ‘int’ tool in the build.
In case of problems, enable the debug option when making the ecCodes or EMOSLIB library by adding the following to the cmake command:
A collection of bash shell scripts is provided that can be used to create the regular latitude-longitude data using the EMOSLIB ‘int’ tool. These scripts are in the file ECO1280_int_scripts. (This file is also stored on the sftp site in subdirectory ‘Info’.) Of the scripts provided the main script is ‘convert2ll’. This script reads filenames one by one from STDIN and converts each input file to a regular latitude-longitude NetCDF or GRIB file. Since each file can be converted independently and since ‘int’ uses only a single processor, you could queue multiple ‘convert2ll’ jobs at once. The command line arguments to ‘convert2ll’ are:
Before using convert2ll you must be set three environment variables using the following commands or something similar:
Here we assume that ‘int’ and ‘grib_to_netcdf’ are in the search path. Otherwise set the environment variables to full filenames. The scripts assume the input file names follow the file naming convention for the ECO1280 described previously. Output file names are constructed from the output_directory_name (from option -O), the input file basename concatenated with ‘_2regll_dn’ (with n from option -D), and the extension ‘nc’ or ‘grb’ (depending on option -f).
Note that CF_T1279_R0050000 (or CF_T1279_R0100000) is a Legendre coefficients cache file and is created by ‘int’ if it is not found in the current directory. When this occurs, an error is reported, but it is really a warning. When this file is present it saves some time, but it is large and can be safely deleted when you are done using ‘int’.
To use the ‘int’ or ‘grib_to_netcdf’ tools directly on the command line or in your own scripts use the same options found in the commands in the scripts ‘int_gg2ll’, ‘int_sc2gg’, ‘int_sc2gg_vod’, and ‘ll_grb2nc’.
You can read the whole thing at:
In general, grid to grid interpolation performs bilinear interpolation, generating each point of the output grid from its four neighbouring points in the input grid with the following exceptions:
The truncation can be controlled using the truncation option in INTIN.
In EMOSLIB, controlling –INTOUT:accuracy controls the packing of the resulting fields values (encoding to GRIB file); all operations occur in double precision.
When this happens, the size of the message does not fit in the variable allocated in the GRIB1 message (Octets 5-7 of Section 0). (This happens some but not always for 1/10° resolution latitude-longitude grids, since the GRIB data compression results in different size Section 4’s depending on the particular field.) ECMWF has implemented a work around, but other GRIB decoder software may not work properly since the ECMWF software results in nonstandard GRIB1 messages.
Further for 1/16° grids, the 1/16° grid increments cannot be encoded because it exceeds milli-degree precision allowed by GRIB1.
Possible work arounds—use GRIB2, use ECMWF ecCodes GRIB decoder software only, split messages, convert to NetCDF using ecCodes tool grib_to_netcdf. For details, see section “GRIB edition 1 message size” on the following website: https://software.ecmwf.int/wiki/display/FCST/Detailed+information+of+implementation+of+IFS+cycle+41r2 And “What about regular latitude-longitude grids?” in the following presentation:
Usually, it makes sense to interpolate ECO1280 vertical profiles horizontally to your observation locations at two surrounding time levels, and then interpolate linearly in time, and finally in the vertical.
In the horizontal, bilinear horizontal interpolation should be adequate for most purposes. However, near coastlines, beware: Some quantities are not smooth across a coastline. Also, for convective hydrometeors nearest neighbor may be more appropriate since convection is a grid point phenomena. Here’s an explanation why:
For model variables defined in spectral space, in theory, the spectral representation can be evaluated at any new grid or, indeed any set of locations, we like. We have obtained best results with the ‘int’ tool by a two-step process: (i) Transforming spectral T1279 to the O1280 grid; and (ii) Then interpolating to a regular latitude-longitude grid. In this case, when interpolating the grid variables it makes sense to consider the grid locations to be cell centers. If the new grid is of a high enough resolution (we have used 0.1 × 0.1° resolution latitude-longitude grids) bilinear horizontal interpolation is appropriate.
The ECO1280 archive includes geopotential at the full levels, vertical velocity at the full levels, etc. For most purposes interpolating these quantities to a new grid should be fine, but for delicate computations it may be more consistent to interpolate the model prognostic quantities and then apply the model diagnostics at the target locations.
Model variables defined on the Gaussian grid are another matter. The physics grid calculations assume no variation in the horizontal, i.e., each Gaussian grid point is the center of an infinite homogeneous domain for the calculation of convection, radiation, etc. Therefore, a first order horizontal interpolation would be nearest neighbor. This is done for hydrometeors by Geer et al. (2010b). For surface quantities, nearest neighbor of the same surface type is more appropriate. In addition, for surface quantities, adjustments for differences in topography between the target and nearest neighbor should be considered (e.g., for surface pressure, surface temperature, precipitation, etc.).
In the vertical, generally interpolate in geopotential height or log(pressure). Near the surface differences between geometric height and geopotential height are negligible.
No. As a convenience, for the purpose of forecast verification, the ECO1280 distribution includes the standard variables interpolated to 1/2° latitude-longitude grids at the following mandatory and other selected pressure levels (hPa): 1000, 925, 850, 700, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10, 7, 5, 3, 2, 1. One file is about 142 MB. The variables in collection pl_regll are
Once you have vertically interpolated the NR, you have essentially created “perfect” conventional observations of wind, temperature, and/or humidity (Boukabara et al. 2018). You now need to add random and systematic errors (Errico et al. 2013). This can be simple or complex depending on how realistic these errors must be for the requirements of your OSSE. For other data types such as radiance or radio occultation observations simulating observations will depend on your forward operators. If designed flexibly, the forward operator may be able to work directly with the ECMWF model structure. Otherwise vertically interpolating the ECO1280 to the favored vertical coordinate system of the forward operator will be necessary.
What about observed winds near the surface?
Satellite winds over the ocean are reported as equivalent 10 m neutral stability winds. These include ASCAT, QuikSCAT, SSM/I, and CYGNSS. These are a special case where you should use the ECO1280 10 m neutral winds u- and v-components (228131, 228132) directly.
Other near surface wind observations require special treatment. Winds near the surface follow a log profile, but there are stability effects. For winds above 10 m, it may be sufficient to interpolate log(wind speed) in height. For winds below 10 m, it may be sufficient to adjust the height of the ECO1280 10 m wind components to the height of the observed wind. Here are some methods:
For observations of surface winds that do not include the height, you should use the height, possibly randomly perturbed, assumed in your data assimilation procedures.