5. Analysis of output

5.1. Output file structure

The output is written to three files: output_aux.hdf5, output_indata.hdf5, and output_ray.hdf5. This is a big departure from RH, which contained several more output files. In particular, RH 1.5D will not write all the information that was written by RH, due to the sheer size it would take for large 3D simulations. The files are written in the machine-independent, self-describing HDF5 format. The contents of the files are organised in groups, variables, and attributes. Groups and variables can be imagined as directories and files in a filesystem. Inside groups, different variables and dimensions can be organised. The content of the output files can vary: some runs will have more detailed information and therefore more variables written to the files.

HDF5 is an open, platform-independent format, and therefore interfaces to many programming languages are available. The main interface libraries are available in C, C++, Fortran, and Java. But there are also interfaces for Python (h5py), Julia, IDL (from version 6.2), MATLAB , Octave, Perl, and R.

The RH 1.5D output format is standard HDF5 but it is also compatible with NetCDF 4 readers: in most cases one needs to specify only the variable or group name to read the data. The HDF5 and NetCDF libraries provide useful command line tools, which can be used to gather information about the RH 1.5D files or extract data. Additionally, there is a more complete set of tools written in Python to read and analyse these files.

Warning

Because of the limitations of different languages, not all interfaces support all HDF5 features. Some libraries (e.g. IDL or plain h5py in Python) will not detect missing data in arrays (written with the fill value). In such cases, reading variables with missing data (see Output file structure), the data are read with no warning or indication of those that have special fill values.

The structure of the three output files is given below.

Note

When a column fails to converge, output for that column is not written. This means that the variables that depend on (nx, ny) will have some values missing. HDF5 marks these values as missing data and uses a fill value (of 9.9692e+36). When the 15D_DEPTH_ZCUT option is used, not all heights will be used in the calculation. The code does not read the skipped parts of the atmosphere. When writing such variables of nz, only the points that were used are written to the file, and the rest will be marked as missing data (typically the z cut height varies with the column).

5.1.1. output_aux.hdf5

This file contains the level populations and radiative rates. For each active atom or molecule, it contains different groups called atom_XX or molecule_XX, where XX is the identifier for the species (e.g. MG, CO).

Note

The atmosphere dimensions on many of the output files are not necessarily the same as in the atmosphere file. They depend on the number of columns calculated, which are a function of X/Y_START/END/STEP.

It has the following global attributes:

atmosID Identifier for the atmosphere file.
rev_id Revision identifier.
nx Number of points in x dimension
ny Number of points in y dimension
nz Number of points in height dimension

Inside each of the atom/molecule groups, the following dimensions can exist:

Name Description
x Horizontal x dimension.
y Horizontal y dimension.
height Vertical dimension.
level Number of atomic levels.
line Number of atomic transitions
continuum Number of bound-free transitions.
vibration_level Number of molecule vibration levels.
molecular_line Number of molecular lines.
rotational_state Number of rotational states.

The atom groups can contain the following optional variables:

Name Dimensions Description
populations (level, x, y, height) Atomic populations.
populations_LTE (level, x, y, height) Atomic LTE populations.
Rij_line (line, x, y, height) Radiative rates out of the line.
Rji_line (line, x, y, height) Radiative rates into the line.
Rij_continuum (continuum, x, y, height) Radiative rates out of the bf transition.
Rji_continuum (continuum, x, y, height) Radiative rates into the bf transition.

The molecule groups can contain the following optional variables:

Name Dimensions Description
populations (vibration_level, x, y, height) Molecular populations.
populations_LTE (vibration_level, x, y, height) Molecular LTE populations.

All units are SI.

Note

In older versions it was possible to specify the keyword 15D_WRITE_EXTRA and get additional output written to output_aux.hdf5 (e.g. a new opacity group and more rates). While the procedures are still in writeAux_p.c, the functionality is deprecated because other changes in the code were not compatible with this way of writing the output. It is possible that this functionality will return at a later version.

5.1.2. output_indata.hdf5

This file contains data and metadata related to the run. It contains three groups: input (mostly settings from keyword.input), atmos (atmospheric variables), and mpi (several variables relating to the run).

It has the following global attributes:

atmosID Identifier for the atmosphere file.
rev_id Revision identifier.
nx Number of points in x dimension
ny Number of points in y dimension
nz Number of points in height dimension

The input group contains all the input files (except atmosphere and molecular data), and a few attributes that are options from keyword.input. It contains the following string variables:

Variable Description
atom_groups Array with names of atom groups. Other is the same as atom order in atoms.input
atoms_file_contents Contents of atoms.input saved into a string
keyword_file_contents Contents of keyword.input saved into a string
kurucz_file contents Contents of kurucz.input saved into a string, only if used

The input group also has other groups inside. If Kurucz line lists are used, it contains groups called Kurucz_line_file0, …, Kurucz_line_fileN, where N-1 is the total number of line list files. The other groups are all atom files (PASSIVE and ACTIVE), and they take the names of atom_XX, where XX is the element name (for a list of these, see the variable atom_groups above). Inside all of these groups (Kurucz and atom) there is one variable, called file_contents, which contains the file saved intro a string and an attribute, called file_name, which contains the file name and path. These input options and files are read instead of the original files when doing a rerun.

The atmos groups contains the dimensions x, y, height, element and ray. It also contains the following variables:

Name Dimensions Units Description
temperature (x, y, height) K Temperatures
velocity_z (x, y, height) m s-1 Vertical velocities
electron_density (x, y, height) m s-3 Electron densities
height_scale (x, y, height) m Height scale used. Can be different for every column when depth refine is used.
element_weight (element) a.m.u. Atomic weights
element_abundance (element)   Element abundances relative to hydrogen.
muz (ray)   mu values for each ray.
muz (ray)   mu weights for each ray.
x (x) m Spatial coordinates along x axis.
y (y) m Spatial coordinates along y axis.

Note

When 15D_DEPTH_REFINE is used, each column will have a different (optimised) height scale, but they all have the same number of depth points (nz). In these cases, it is very important to save the height variable because otherwise one does not know how to relate the height relations of quantities from different columns.

The atmos group also contains the following attributes:

moving Unsigned int, 1 if velocity fields present.
stokes Unsigned int, 1 if stokes output present.

The mpi group contains the dimensions x, y, and iteration (maximum number of iterations).

Warning

iteration is currently hardcoded in the code to a maximum of 1500. If you try to run more than 1500 iterations, there will be an error writing to the output.

The mpi group also contains several variables:

Name Dimensions Description
xnum (x) Indices of x positions calculated.
xnum (x) Indices of x positions calculated.
task_map (x, y) Maps which process ran which column.
task_map_number (x, y) Maps which task number each column was.
iterations (x, y) Number of iterations used for each column.
convergence (x, y) Indicates if each column converged or not. Possible values are 1 (converged), 0 (non converged), or -1 (crashed).
delta_max (x, y) Final value for delta_max when iteration finished.
delta_max_history (x, y, iteration) Evolution of delta_max
z_cut (x, y) Height index of the temperature cut.

The mpi group also contains the following attributes: x_start, x_end, x_step, y_start, y_end, and y_step, all of which are options from keyword.input.

5.1.3. output_ray.hdf5

This file contains the synthetic spectra and can also contain extra information such as opacities and the source function. It contains only the root group. Its dimensions are x, y, wavelength, and eventually wavelength_selected and height. The latter two are only present when ray.input specifies more than 0 wavelengths for detailed output, and it matches Nsource, the number of those wavelengths entered in ray.input.

It can contain the following variables:

Name Dimensions Units Description
wavelength (wavelength) nm Wavelength scale.
intensity (x, y, wavelength) W m-2 Hz-1 sr-1 Synthetic disk-centre intensity (Stokes I).
stokes_Q (x, y, wavelength) W m-2 Hz-1 sr-1 Stokes Q. Optional.
stokes_U (x, y, wavelength) W m-2 Hz-1 sr-1 Stokes U. Optional.
stokes_V (x, y, wavelength) W m-2 Hz-1 sr-1 Stokes V. Optional.
tau_one_height (x, y, wavelength) m Height where optical depth reaches unity, for each column. Optional.
wavelength_selected (wavelength_selected)   Wavelength scale for the detailed output variables below. Optional.
wavelength_indices (wavelength_selected)   Indices of wavelengths selected for variables below. Optional.
chi (x, y, height, wavelength_selected) m-1 Total opacity (line and continuum). Optional.
source_function (x, y, height, wavelength_selected) W m-2 Hz-1 sr-1 Total opacity (line and continuum). Optional.
Jlambda (x, y, height, wavelength_selected) W m-2 Hz-1 sr-1 Angle-averaged radiation field. Optional.
scattering (x, y, height, wavelength_selected)   Scattering term multiplied by Jlambda. Optional.

The wavelength is in nm, air or vacuum units, depending if VACUUM_TO_AIR is TRUE or FALSE (in keyword.input). chi is in m-1and tau_one_height in m.

Despite internally being calculated in double precision, all the output (except the wavelength scale) is written in single precision to save disk space.

The full Stokes vector is only written when in keyword.input STOKES_MODE is not NO_STOKES and the STOKES_INPUT is set.

The chi, source_function, and Jlambda variables depend on the 3D grid and on wavelength. Therefore, for even moderate grid sizes they can take huge amounts of space. If nx = ny = nz = 512 and wavelength_selected = 200, each of these variables will need 100Gb of disk space. For a simulation with a cubic grid of 10243 points and saving the full output for 1000 wavelength points, output_ray.hdf5 will occupy a whopping 12Tb per snapshot of disk space. To avoid such problems, these large arrays are only written when ray.input contains Nsource > 0, and for the wavelengths selected.

The output_ray.hdf5 file contains the following global attributes:

atmosID Identifier for the atmosphere file
snapshot_number Number of simulation snapshot (from atmosphere file)
rev_id Revision identifier
nx Number of points in x dimension
ny Number of points in y dimension
nz Number of points in height dimension
nwave Number of wavelength points
wavelength_selected Number of wavelength points selected for detailed output
creation_time Local time when file was created

5.2. Command line tools

Two useful command line tools that come with HDF5 are h5dump and h5repack.

h5dump can be used with the -H option to look at the header of a file: see the dimensions, variables, groups. It can also be used to print a text version of any variable in an HDF5 file (e.g. this can be redirected to a text file). When printing a variable (dataset in HDF5) one uses the option -d variable, and the resulting output is the same as in the -H mode, with the variable printed at the end. The NetCDF ncdump program offers an even clearer look into the file (e.g. used with the -h option to print out the header).

The h5repack program can be used to copy and modify the parameters of HDF5 files. It can convert the files between different format versions, compress variables, etc. Of particular importance is the option for rechunking a file. Chunking in HDF5 files can be used to improve performance by changing the disk structures to improve different read patterns. It is analogous to fully or partially transposing the variables along certain dimensions.

See also

h5dump guide
Detailed information about h5dump.
h5repack guide
Detailed information about h5repack.
Chunking in HDF5
Description on the advantages of chunking.

5.3. Reading output in Python

The helita package has a complete python interface to read the output, input, and visualise files from RH 1.5D. The `helita tools are described in detail in section helita interface.

If `helita is not available, the easiest and fastest way to read the RH 1.5D output (or input) files in Python is via the xarray package. xarray can load the output files as a dataset directly, but in the case of the output_aux.hdf5 and output_indata.hdf5 one needs to specify which group to read (see above).

Here is a quick example on how to read some output from RH 1.5D with xarray:

>>> import xarray
>>> ray = xarray.open_dataset("output_ray.hdf5")
>>> ray
<xarray.Dataset>
Dimensions:              (height: 82, wavelength: 902, wavelength_selected: 10, x: 1, y: 1)
Coordinates:
  * wavelength           (wavelength) float64 28.0 31.4 32.8 33.7 34.3 35.3 ...
  * wavelength_selected  (wavelength_selected) float64 85.1 276.4 278.5
  * x                    (x) float64 0.0
  * y                    (y) float64 0.0
Dimensions without coordinates: height
Data variables:
    Jlambda              (x, y, height, wavelength_selected) float64 ...
    chi                  (x, y, height, wavelength_selected) float64 ...
    intensity            (x, y, wavelength) float64 ...
    scattering           (x, y, height, wavelength_selected) float64 ...
    source_function      (x, y, height, wavelength_selected) float64 ...
    wavelength_indices   (wavelength_selected) int32 ...
Attributes:
    atmosID:              FALC_82_5x5.hdf5 (Wed Jan 10 15:29:28 2018)
    snapshot_number:      0
    rev_id:               001d537  Tiago Pereira  2018-01-10 12:34:07 +0100
    nx:                   1
    ny:                   1
    nz:                   82
    nwave:                902
    wavelength_selected:  3
    creation_time:        2018-01-10T16:16:42+0100
>>> aux = xarray.open_dataset("output_aux.hdf5", group="atom_MG")
>>> aux
<xarray.Dataset>
Dimensions:          (continuum: 10, height: 82, level: 11, line: 15, x: 1, y: 1)
Coordinates:
  * x                (x) float64 0.0
  * y                (y) float64 0.0
Dimensions without coordinates: continuum, height, level, line
Data variables:
    Rij_continuum    (continuum, x, y, height) float64 ...
    Rij_line         (line, x, y, height) float64 ...
    Rji_continuum    (continuum, x, y, height) float64 ...
    Rji_line         (line, x, y, height) float64 ...
    populations      (level, x, y, height) float64 ...
    populations_LTE  (level, x, y, height) float64 ...
Attributes:
    nlevel:      11
    nline:       15
    ncontinuum:  10

5.4. Reading output in IDL

There are no specific IDL routines for reading the output from RH 1.5D. However, there is a utility function that can be used to variables from HDF5/netCDF4 files, under the idl/ directory in a file named read_ncdf_var.pro. The function read_ncdf_var() can be used to read variables from an HDF5 or netCDF4 file, e.g.:

IDL> data = read_ncdf_var("output_ray.hdf5", "intensity")
IDL> help, data
DATA            FLOAT     = Array[902, 512, 512]
IDL> pops = read_ncdf_var("output_aux.hdf5", "populations", groupname="atom_CA")
IDL> help, pops
POPS            FLOAT     = Array[400, 512, 512, 5]

Note

The IDL analysis suite of RH does not work with RH 1.5D.