Batch Tomogram Reconstruction with Batchruntomo in IMOD 4.9
University of Colorado, Boulder
Introduction
Setting General Batch Processing
The Stack Table
Setting Parameters for the Data Sets
Running the Data Sets
Introduction
Etomo provides an interface for reconstructing multiple tomograms
automatically using
Batchruntomo. The data sets
should be sufficiently similar so that, for the most part, the same
parameters and procedures can
be applied to all of them. The
interface allows you to set a number of parameters, but in each case a
different value can be used for an individual data set. The parameters that we
think most likely to vary are included in a table of data sets. For the
other parameters, there is one tab of the interface to set the values to
apply in general, which are referred to as global values. If necessary, you
can open a copy of this screen for an individual data set and set
different values there.
You may want to go through the Example of
Batch Reconstruction either before or after reading this document.
For simplicity, this interface presents a selected subset of the many
parameters that can
be set from the regular reconstruction interface. For now, we are relying
on templates as a mechanism for controlling the values of parameters not
exposed in the interface. Templates and the current editor for saving them
are described in Using Etomo. In
brief, they are text files with the extension ".adoc" containing name-value
pairs called directives, whose format is described in the
Batchruntomo man page. The
available directives are listed in the directive table. If you want to make a template
for personal use and do so by hand, put it in the directory
.etomotemplate under your home directory (this is where Etomo's template
editor places user templates by default). Using Etomo describes what to do with templates
for general use. The plan is for there to be an Advanced button
that will open a full directive editor with all possible choices.
The interface is organized into four tabs that would generally be
visited in sequence from left to right. However, they can all be accessed
and changed at any time. Also note that you can close the interface and
reopen it to resume working on a project; you should find all of the
settings as you left them, although a few may no longer be changed. The
project file has the extension ".ebt".
Setting General Batch Processing
Parameters
The Batch Setup tab has items that should be filled in first.
- Moving stacks to directories. There are two different
options for moving data sets from
their current locations into another location where each will be given
its own directory for processing. If each data set is already in its own
directory for processing, then leave Stacks are
already in dataset directories selected. With either of the other two
options, when
Batchruntomo starts processing
a data set, it creates a directory for it under the indicated location,
with the root name of the data set, and moves the
raw stack(s) there, as well as any associated files (".mdoc", ".rawtlt",
and ".log"). With the option Move all stacks to dataset directories
under, all data set directories will be created in the same
location, which must already exist. After you turn on this radio
button, you can select this
location with the directory chooser window, which will allow you to
create it if necessary. This option is ideal for
handling stacks that have been transferred to a single location after
being acquired, but in fact the data sets can be in multiple locations.
With the option Move stacks to dataset directories under their
current locations, the directories will be created right where the
stacks are and the stacks will be moved down one level for processing.
This option is convenient if stacks are already in the parent directory of
where you want them processed, and it is essential if you have already
sorted stacks into several different parent directories (e.g., one for
each specimen type) and want to maintain that tree structure.
- Starting directive file. You can initialize the global
parameters for these data set with a directive file from previous batch
processing. This provides an alternative to making a template file from
a similar data set. If you later decide that you do not want to use any
starting file, press the Clear button.
- Templates. Templates can be selected just as in the
reconstruction setup page. If you do not have a starting directive file,
these selections are initialized with the defaults that can be chosen in
the Options-Settings dialog. With a starting directive file, the
template entries in that file, if any, are used to initialize these
entries. Select one of the stock system templates in order to activate
the Sobel filter centering during bead tracking with an appropriate
smoothing parameter. The template entries are shown blue because
parameters in the Dataset Values that are derived from a
template are also shown in blue.
- Project Name and Location. The project files consist of the
command file for running
Batchruntomo, the Etomo project
file, and a directive file with global settings. They will be named
with the root name shown in the box. The default name was
designed to be reasonably
unique as well as fairly interpretable (the 6 digits after the date
are hours, minutes, seconds without any colons), but you are free to
change it. The Location is initialized to the directory from
which you started Etomo, but another location can be selected.
The Stack Table
On the Stacks, you add the tilt series that you want to process to a
table. When you press Add Stack(s), a file chooser will open to
allow you to select the stack files. You can select multiple files and add
them together. If you have many dual-axis data sets, you can select all of
the "a" and "b" files together, and the program will show just the "a" files
in the table. The stacks can have an extension of either ".st" or ".mrc";
the latter will be renamed to ".st" for processing.
For your first addition to the table, the Dual
axis, Montage, and Beads on Two Surfaces checkboxes are
set based on the defaults that you have set in the Options-Settings dialog,
as modified by any templates you have chosen. The setting for
Dual Axis box will also be modified as appropriate when both "a" and
"b" stacks are entered, or when the stack root name does not end in "a" or
"b". Further entries will inherit the settings of these three boxes from
the previous line in the table. The Copy Down button will copy these
three settings from the selected line to the one below, which is fairly
useless and should be changed to copy to all lines below. For now, the
easiest way to get these boxes set for a large number of data sets is to add
one, set the buttons, then add the rest.
The Boundary Model is used to indicate regions where the fiducial
seed model should be selected for tracking. If you have
data sets needing such models, check the box before pressing the 3dmod icon
to draw the model, so that 3dmod can be given the right filename and
location. The file is named with the data set root name plus
"_rawbound.mod" and is placed in the current location of the data set. For
a dual-axis data set, the model is transformed to be used with the second
axis. You need to draw one or more contours just on one view, the
zero-degree one if possible.
If entries are made to Exclude Views, the views will
currently be carried through into the coarse and fine aligned stacks but
skipped in tracking, alignment and reconstruction. A directive can be
supplied in a template or starting batch file to remove the excluded views.
Setting Parameters for the Data Sets
- Distortion Corrections. You can select an image distortion field
file and a magnification gradient file if you have those distortion
corrections available. If you have data from SerialEM, the binning will
be detected when the header is scanned. Otherwise, if you have data with
a binning other than 1, you would need to have a directive in a template
file (e.g., "setupset.copyarg.binning = 2").
- X-ray Removal. Removal of X-rays and other extreme artifacts
will be done if Remove X-rays is selected. The raw stack will be
"archived" automatically with
Archiveorig. If you have
large artifacts present on every section, you can select an existing
Manual replacement model to erase them. Or, you can
press Make in 3dmod and make such a model on the first data set.
The convention in these models is that object 1 should have patches in
which each pixel has a model point, object 2 should have lines to
remove, and object 3 should have patches defined by boundary contours.
In any case, the model will be copied into each of the data sets from
where it exists.
- Alignment Method and Tracking Parameters. All of the alignment
methods available in Etomo can be run with automation. With the Autoseed and
track method, the seed model for the second axis
will be done by first running
Transferfid then using
Autofidseed to add points to
the model, which is useful if there is a significant shift between the
two axes. Local tracking is done by default. If you use patch
tracking and break contours into pieces, the pieces will have the
default length that is used in Etomo based on the number of views.
With direct detectors,
especially K2 in counting mode, you may want anti-aliased reduction for
the prealigned stack, and this would also require a template entry
"comparam.prenewst.newstack.AntialiasFilter = -1". Similarly,
antialiased reduction for the final aligned stack would require
"comparam.newst.newstack.AntialiasFilter = -1".
- Alignment. Robust fitting is used in all cases in
Tiltalign with the default
tuning factor of 1. A template entry such as
"comparam.align.tiltalign.KFactorScaling = 0.9" could be used for more
aggressive downweighting of outlying points.
The only parameters that can be set here are
whether to use local alignments and whether to enable solving
for distortion (X-stretching). The program will not allow the latter
unless gold is actually found to be on two surfaces in reasonable
amounts. The new script
Restrictalign will be called
to reduce the alignment parameters automatically in order to maintain a
minumum and/or target ratio of measurements to unknowns. There are
directives to control that process if necessary. Other than this, the
one template entry that might be needed for alignment would be to
enable the beam tilt solution with
"comparam.align.tiltalign.BeamTiltOption = 2".
- Tomogram Positioning. Positioning can be done with
whole binned-down tomograms. For plastic sections, the thickness is chosen
automatically, and Findsection
will be used to find the surface of the section and generate a model for
Tomopitch with 5 pairs of lines.
For cryosamples, an unbinned thickness must be specified in
the Tomogram thickness field, and a complex sequence of operations
is done with Cryoposition. For
this analysis, it is essential that gold beads be removed from the volume
before detecting the structure. When patch tracking or fiducialless
alignment are done, you must indicate whether gold beads are present with
the Sample has gold beads checkbox and fill in the Bead size
field there if it is empty. The extra thickness added when running
Tomopitch can be set with a
directive "comparam.tomopitch.tomopitch.ExtraThickness". For cryo, this
usually needs to be a generous number, and Batchruntomo, will set it to 25 by default.
- CTF Correction. CTF correction can be done with fitting in
Ctfplotter to all individual
images or to a series of blocks of images. For the latter,
select Autofit range and fill in entries for the range of angles
to fit and the angular step between ranges (e.g., 20 and 10 to fit
blocks of views over 20 degrees with 10 degree steps between the
blocks). You must fill in the range and step when using this option,
and the Defocus box with the expected defocus in all cases.
- Gold Erasing. Gold fiducials can be erased using a model
that is completed to have points on all views by selecting Use fiducial
model. Alternatively, all beads can be found in an appropriately
binned tomogram by selecting Find beads in 3D, in which case the
unbinned thickness must be entered in Tomogram thickness. If patch
tracking is used, there is no way to enter the needed bead size through
the interface and a size would have to be entered with a
"setupset.copyarg.gold" directive.
- Reconstruction. The tomogram can be built with
back-projection or with SIRT, or both. For SIRT, indicate a list of
iterations to leave, or more likely just a single number for the number
of iterations. There are three choices for specifying the tomogram
thickness, one by total unbinned pixels, one by binned pixels, and one to
use a calculated value plus a specified margin. The latter will use a
specified thickness as the fallback if there is no calculated thickness
available or if it is too much smaller than the fallback (only 0.4 as
big). This option is the default because it is the most
general-purpose. The
calculated value is initially based on the distance between gold on two
surfaces, but is superceded if positioning is done.
- Postprocessing. The tomogram can be reoriented with
Trimvol, and optionally converted
to bytes if Fraction of Z slices to analyze is selected. There
is a default of 0.33 if the text box is left blank. For
plastic sections, the tomogram can also be trimmed to section limits
found with Findsection by
selecting Find plastic section limits and add. An entry for the
amount to add is required; it can be either the number of binned pixels
or a fraction of the measured thickness.
- Datasets Table for Specific Values. The simple table at the
bottom has a button with which you can open a parameter value dialog for
a single data set. When you press Open, that data set will be given
a copy of the current parameters, and after that point, its values are
separate and unaffected by any further changes in the global values. You
can revert to global values in the standalone dialog, and that will
discard any special values that you set.
Running the Data Sets
When you select the Run tab, you should first make selections in
the Run Actions section to indicate whether to use multiple CPUs and
one or more GPUs. If you make no selections, only a single CPU will be
used. After you select either Use multiple CPUs or Parallel
GPUs, a table appears at the top with
computing resources. If both are selected, the table will show available
CPUs on the left and available GPUs on the
right. Your selections determine what resources are used for each single
reconstruction. Reconstructions are run sequentially, not in parallel with
each other.
If you enable Email notification and enter an address,
Batchruntomo will send an email
whenever a data set is aborted and when all processing is complete. For the
email to work, you may need to define an SMTP mail server; this can be done
in the Options - Settings dialog.
The Subset of steps to run section allows you to control a stopping
or starting point for the run. If you turn on Stop after, you can
select one of the available stopping points. All data sets selected for
running (by means
of the button in the Run column of the Datasets table) will be run to
the same selected stopping point. When the run stops at such a point, you
can then turn on Start from and select a starting point. Generally
you would want to select the starting point paired with the stopping point,
but you can go back to an earlier one if desired. Ordinarily, it would not
work to
select a
starting point later than the earliest point reached by any of the data sets
included in the run, so this ius not allowed by default. However, if you
have completed a step manually, such that it would not be a problem to start
at a later step, then you can select Enable starting from any step
and select any step. When starting past the Fine Alignment step, Batchruntomo no longer recomputes the fine
alignment (which involves adjusting alignment parameters as needed.)
Press Run to start a run. During the run, you can use Kill
Process to stop processing immediately, or Pause to make it stop
after the current data set finishes or reaches the stopping point. After a
Pause or Kill, the Resume button can be used to restart the run from
where it left off. When resuming from a Kill, the data set that was killed
will be run from the beginning or from the selected starting point, not from
the step where it was killed.
Almost nothing can be changed after a Pause
or Kill: data set parameters and starting and stopping points are disabled;
data sets marked for running can be dropped out but none can be added. The
situation is more flexible when all data sets have reached a selected
stopping point; it is possible to manipulate which sets are included in
the run. However, data set parameters currently can still not be changed
(we plan to enable a subset of parameters that will have an effect when
changed). To remove all these restrictions in either situation,
press Reset. This has several consequences: 1) data set parameters
can be edited again; 2) the Resume button is disabled and the program
forgets about what would be needed to resume; 3) all data sets selected for
running will be run from the selected started point or the beginning, even
if they already reached that point. Thus, if you use Reset after a
Pause or Kill, you have to manually turn off the Run checkboxes for
any data sets that have already been run and that you do not wish to rerun.
Etomo saves and runs a single command file in
the project directory, named "rootname.com", where "rootname" is the project
root name from the Batch Setup tab.
During the run, a corresponding log file will be created in the project
directory and will contain all of the log
output from the run. There will be a copy of the portion of that log for
each data set in its respective directory, named "batchruntomo.log"; this can
be opened with the Open Log button in the Run table. The
full log for all runs can be opened from the menu brought up by
right-clicking over the panel. Selected extracts are shown in the Project
Log.
You can exit Etomo after starting a run and reopen it later. The program
will "reconnect" to the run, whether it is finished or not, and update the
status for all data sets.