Batch Tomogram Reconstruction with IMOD

Batch Tomogram Reconstruction with Batchruntomo in IMOD 5.1

University of Colorado, Boulder

Introduction
Setting General Batch Processing
The Stack Table
Setting Basic Parameters for the Data Sets
Using the Advanced Parameter Interface
Running the Data Sets
Running on a Cluster

Introduction

Etomo provides an interface for reconstructing multiple tomograms automatically using Batchruntomo. The data sets should be sufficiently similar so that, for the most part, the same parameters and procedures can be applied to all of them. The interface allows you to set a number of parameters, but in each case a different value can be used for an individual data set. The parameters that we think most likely to vary are included in a table of data sets. For the other parameters, there is one tab of the interface to set the values to apply in general, which are referred to as global values. If necessary, you can open a copy of this screen for an individual data set and set different values there.

You may want to go through the Example of Batch Reconstruction either before or after reading this document.

For simplicity, the basic interface presents a selected subset of the many parameters that can be set from the regular reconstruction interface. However, you can now switch to an advanced interface that presents a much larger number of other parameters. Initially, we relied on templates as a mechanism for controlling the values of parameters not exposed in the interface. Templates still have an important role even with the advanced parameter screen available. Templates and the current editor for saving them are described in Using Etomo. In brief, they are text files with the extension ".adoc" containing name-value pairs called directives, whose format is described in the Batchruntomo man page. The available directives are listed in the directive table. If you want to make a template for personal use and do so by hand, put it in the directory .etomotemplate under your home directory (this is where Etomo's template editor places user templates by default). Using Etomo describes what to do with templates for general use.

The interface is organized into four tabs that would generally be visited in sequence from left to right. However, they can all be accessed and changed at any time. Also note that you can close the interface and reopen it to resume working on a project; you should find all of the settings as you left them, although a few may no longer be changed. The project file has the extension ".ebt".

Setting General Batch Processing Parameters

The Batch Setup tab has items that should be filled in first.

Moving stacks to directories. There are two different options for moving data sets from their current locations into another location where each will be given its own directory for processing. If each data set is already in its own directory for processing, then leave Stacks are already in dataset directories selected. With either of the other two options, when Batchruntomo starts processing a data set, it creates a directory for it under the indicated location, with the root name of the data set, and moves the raw stack(s) there, as well as any associated files (".mdoc", ".rawtlt", and ".log"). With the option Move all stacks to dataset directories under, all data set directories will be created in the same location, which must already exist. After you turn on this radio button, you can select this location with the directory chooser window, which will allow you to create it if necessary. This option is ideal for handling stacks that have been transferred to a single location after being acquired, but in fact the data sets can be in multiple locations. With the option Move stacks to dataset directories under their current locations, the directories will be created right where the stacks are and the stacks will be moved down one level for processing. This option is convenient if stacks are already in the parent directory of where you want them processed, and it is essential if you have already sorted stacks into several different parent directories (e.g., one for each specimen type) and want to maintain that tree structure.
Starting directive file. You can initialize the global parameters for these data set with a directive file from previous batch processing. This provides an alternative to making a template file from a similar data set. If you later decide that you do not want to use any starting file, press the Clear button.
Templates. Templates can be selected just as in the reconstruction setup page. If you do not have a starting directive file, these selections are initialized with the defaults that can be chosen in the Options-Settings dialog. With a starting directive file, the template entries in that file, if any, are used to initialize these entries. Select one of the stock system templates in order to activate the Sobel filter centering during bead tracking with an appropriate smoothing parameter. The template entries are shown blue because parameters in the Dataset Values that are derived from a template are also shown in blue.
Project Name and Location. The project files consist of the command file for running Batchruntomo, the Etomo project file, and a directive file with global settings. They will be named with the root name shown in the box. The default name was designed to be reasonably unique as well as fairly interpretable (the 6 digits after the date are hours, minutes, seconds without any colons), but you are free to change it. The Location is initialized to the directory from which you started Etomo, but another location can be selected.

The Stack Table

On the Stacks, you add the tilt series that you want to process to a table. When you press Add Stack(s), a file chooser will open to allow you to select the stack files. You can select multiple files and add them together. If you have many dual-axis data sets, you can select all of the "a" and "b" files together, and the program will show just the "a" files in the table. The stacks can have an extension of either ".st" or ".mrc"; the latter will be renamed to ".st" for processing.

For your first addition to the table, the Dual axis, Montage, and Beads on Two Surfaces checkboxes are set based on the defaults that you have set in the Options-Settings dialog, as modified by any templates you have chosen. The setting for Dual Axis box will also be modified as appropriate when both "a" and "b" stacks are entered, or when the stack root name does not end in "a" or "b". Further entries will inherit the settings of these three boxes from the previous line in the table. The Copy Down button will copy these three settings from the selected line to the one below, which is fairly useless and should be changed to copy to all lines below. For now, the easiest way to get these boxes set for a large number of data sets is to add one, set the buttons, then add the rest.

The Boundary Model is used to indicate regions where the fiducial seed model should be selected for tracking, or where patch tracking should be done. If you have data sets needing such models, check the box before pressing the 3dmod icon to draw the model, so that 3dmod can be given the right filename and location. The file is named with the data set root name plus "_rawbound.mod" and is placed in the current location of the data set. For a dual-axis data set, the model is transformed to be used with the second axis. When using fiducials, the transformation is based on the previous run of Transferfid for transferring fiducials; when patch tracking, Transferfid is used at this point to find the transformation. You need to draw one or more contours just on one view, the zero-degree one if possible.

If entries are made to Exclude Views, the views will currently be carried through into the coarse and fine aligned stacks but skipped in tracking, alignment and reconstruction. An option can be set in the Preprocessing section of the Advanced screen or a directive supplied in a template or starting batch file to remove the excluded views. That same section has options to enable automatic exclusion of dark images near the ends of the tilt series stack (e.g., by setting SD criterion for excluding high tilt views to 0.5).

Setting Parameters for the Data Sets

Distortion Corrections. You can select an image distortion field file and a magnification gradient file if you have those distortion corrections available. If you have data from SerialEM, the binning will be detected when the header is scanned. Otherwise, if you have data with a binning other than 1, you would need to have a directive in a template file (e.g., "setupset.copyarg.binning = 2").
X-ray Removal. Removal of X-rays and other extreme artifacts will be done if Remove X-rays is selected. The raw stack will be "archived" automatically with Archiveorig. If you have large artifacts present on every section, you can select an existing Manual replacement model to erase them. Or, you can press Make in 3dmod and make such a model on the first data set. The convention in these models is that object 1 should have patches in which each pixel has a model point, object 2 should have lines to remove, and object 3 should have patches defined by boundary contours. In any case, the model will be copied into each of the data sets from where it exists.
Alignment Method and Tracking Parameters. All of the alignment methods available in Etomo can be run with automation. With the Autoseed and track method, the seed model for the second axis will be done by first running Transferfid then using Autofidseed to add points to the model, which is useful if there is a significant shift between the two axes. Local tracking is done by default. If you use patch tracking and break contours into pieces, the pieces will have the default length that is used in Etomo based on the number of views. With direct detectors, especially K2 in counting mode, you may want anti-aliased reduction for the prealigned stack, and this would also require a template entry "comparam.prenewst.newstack.AntialiasFilter = -1". Similarly, antialiased reduction for the final aligned stack would require "comparam.newst.newstack.AntialiasFilter = -1".
Alignment. Robust fitting is used in all cases in Tiltalign with the default tuning factor of 1. A template entry such as "comparam.align.tiltalign.KFactorScaling = 0.9" could be used for more aggressive downweighting of outlying points. The only parameters that can be set here are whether to use local alignments and whether to enable solving for distortion (X-stretching). The program will not allow the latter unless gold is actually found to be on two surfaces in reasonable amounts after fiducial tracking. The new script Restrictalign will be called to reduce the alignment parameters automatically in order to maintain a minumum and/or target ratio of measurements to unknowns. There are directives to control that process if necessary. Other than this, the one template entry that might be needed for alignment would be to enable the beam tilt solution with "comparam.align.tiltalign.BeamTiltOption = 2".
Tomogram Positioning. Positioning can be done with whole binned-down tomograms. For plastic sections, the thickness is chosen automatically, but a fixed value to be used instead can be placed in the Tomogram thickness field. Findsection will be used to find the surface of the section and generate a model for Tomopitch with 5 pairs of lines. For cryosamples, an unbinned thickness must be specified in the Tomogram thickness field, and a complex sequence of operations is done with Cryoposition. For this analysis, it is essential that gold beads be removed from the volume before detecting the structure. When patch tracking or coarse alignment only are done, you must indicate whether gold beads are present with the Sample has gold beads checkbox and fill in the Bead size field there if it is empty. The extra thickness added when running Tomopitch can be set with a directive "comparam.tomopitch.tomopitch.ExtraThickness". For cryo, this usually needs to be a generous number, and Batchruntomo, will set it to 25 by default.
CTF Correction. CTF correction can be done with fitting in Ctfplotter to all individual images or to a series of blocks of images. For the latter, select Autofit range and fill in entries for the range of angles to fit and the angular step between ranges (e.g., 20 and 10 to fit blocks of views over 20 degrees with 10 degree steps between the blocks). You must fill in the range and step when using this option, and the Defocus box with the expected defocus in all cases.
Gold Erasing. Gold fiducials can be erased using a model that is completed to have points on all views by selecting Use fiducial model. Alternatively, all beads can be found in an appropriately binned tomogram by selecting Find beads in 3D, in which case the unbinned thickness must be entered in Tomogram thickness. If patch tracking is used, there is no way to enter the needed bead size through the interface and a size would have to be entered with a "setupset.copyarg.gold" directive.
Reconstruction. The tomogram can be built with regular R-weighted backprojection, with a SIRT-like filter, or with SIRT. When doing either SIRT or the SIRT-like filter, a regular backprojection can also be computed; simply check the boxes for the two types desired. (In fact, you have to turn off R-weighted backprojection to avoid getting both.) For the SIRT-like filter, fill in the Equivalent iterations field with a single value. For SIRT, that field is labeled Iterations to leave and can take a list of iterations to leave, or more likely just a single vale for the number of iterations.
There are three choices for specifying the tomogram thickness, one by total unbinned pixels, one by binned pixels, and one to use a calculated value plus a specified margin. The latter will use a specified thickness as the fallback if there is no calculated thickness available or if it is too much smaller than the fallback (only 0.4 as big). This option is the default because it is the most general-purpose. The calculated value is initially based on the distance between gold on two surfaces, but is superceded if positioning is done.
Postprocessing. The tomogram can be reoriented with Trimvol, and optionally converted to bytes if Fraction of Z slices to analyze is selected. There is a default of 0.33 if the text box is left blank. For plastic sections, the tomogram can also be trimmed to section limits found with Findsection by selecting Find plastic section limits and add. An entry for the amount to add is required; it can be either the number of binned pixels or a fraction of the measured thickness.
The Advanced interface can be used to set other trimming options, specify NAD filtering of the trimmed volume, or generate a reduced and/or filtered volume as in the Post Processing dialog of the reconstruction interface.
Datasets Table for Specific Values. The simple table at the bottom has a button with which you can open a parameter value dialog for a single data set. When you press Open, that data set will be given a copy of the current parameters, and after that point, its values are separate and unaffected by any further changes in the global values. You can revert to global values in the standalone dialog, and that will discard any special values that you set.

Using the Advanced Parameter Interface

Press the Advanced button on the Dataset Values tab to switch to the advanced interface. The same button can be used on the Dataset Values dialog for an individual data set to set advanced parameters for that set. The button changes to Basic can be used to switch back to the basic interface. Unlike Advanced mode in the reconstruction interface, the Advanced dialog here does not include any of the parameters in the basic interface.

This interface is organized as a set of stacked sections that can be individually opened and closed by clicking the bar with the name of the section. The sections correspond to the ones in the master directive table, and all directives described in that table but not included in the basic interface are available in the advanced interface. If you are using a template or starting batch file that includes any directives not listed in the master table, then there will be an additional section at the bottom with those directives.

The directives that appear in the interface are controlled by the two choices in the Which Directives to Show box. Turning on Only items containing a value will make it show just the directives that have a value set from either a template, starting batch file, or by your setting the value. With Only items output to batch file checked, only the directives that will actually be written to batch file will be shown, not ones with template values.

When the directives being shown are restricted, sections with no directives to show are closed and disabled.

The entry fields depend on the type of parameter.

Boolean entries will have a combo box with "Yes", "No", and blank, where blank means no directive will be output.
Items with a fixed set of choices will have a combo box with the choices and a blank for no diretcive. The choices will be listed with a number and description, and that is what will appear in the field, but the actual directive just has the number.
File entries have a file chooser button as well as an eraser button that can be used to clear out an entry.
Numeric entries have a text field. If the label includes "and", then two values are expected. Check the tooltip for a longer description including the type and number of values expected and any additional notes, all taken directly from the directive table.

Values from templates and their labels will be shown in blue, as well as values from the "batchDefaults.adoc" file, which is treated like a bottom-level template. When there is a non-boolean value from a template, the X to the right of the field will be enabled. This button allows you to override the template and revert to a default value by placing a blank directive into the batch file. Press this button to select an override; the field will then turn black and display ">OVERRIDE<". The override button will remain enabled and you can press it again to revert to the template value.

The program checks whether entries match the expected type and number and do not have extraneous characters. Erroneous entries will be displayed in red. However, there is no check for whether more complex restrictions are satisfied, such as two options being mutually exclusive or negative values for entries that should be positive. These errors will not be caught until the program in question runs.

Running the Data Sets

When you select the Run tab, you should first make selections in the Resources to Use section to indicate whether to use multiple CPUs and one or more GPUs. If you make no selections, only a single CPU will be used. After you select either Use multiple CPUs or Parallel GPUs, a table appears at the top with computing resources. If both are selected, the table will show available CPUs on the left and available GPUs on the right.

With Use multiple CPUs selected, you have the option of running batch jobs in parallel. When not using that option, reconstructions are run sequentially. Your selections in the Resources section determine what resources are used for each single reconstruction. When Run multiple batch jobs in parallel is selected, there is one command file per data set and they are all passed to Processchunks to run, along with the number to run at one time. If a cluster is available, the Use a cluster checkbox in the Parallel Processing section will be enabled and can be selected to use the cluster instead. See Running on a Cluster for details. When running with regular computer resources, the selected CPUs are divided as equally as possible among the different jobs, whereas the GPUs are managed dynamically so that each job can have access to multiple GPUs when it reaches a step that needs GPUs. See the Splitbatch man page for details on how this dynamic allocation works. The entry for Maximum # of GPUs to use by one job determines how many GPUs a job will request, but it may get fewer or only one. If Local GPU is selected, it does not mean that there is one GPU on each machine; select this only if you want to use just a single GPU on the machine running Etomo.

If you enable Email notification and enter an address, Batchruntomo will send an email whenever a data set is aborted and when all processing is complete. For the email to work, you may need to define an SMTP mail server; this can be done in the Options - Settings dialog.

The Subset of steps to run section allows you to control a stopping or starting point for the run. If you turn on Stop after, you can select one of the available stopping points. All data sets selected for running (by means of the button in the Run column of the Datasets table) will be run to the same selected stopping point. When the run stops at such a point, you can then turn on Start from and select a starting point. Generally you would want to select the starting point paired with the stopping point, but you can go back to an earlier one if desired. Ordinarily, it would not work to select a starting point later than the earliest point reached by any of the data sets included in the run, so this ius not allowed by default. However, if you have completed a step manually, such that it would not be a problem to start at a later step, then you can select Enable starting from any step and select any step. When starting past the Fine Alignment step, Batchruntomo no longer recomputes the fine alignment (which involves adjusting alignment parameters as needed.)

Press Run to start a run. During the run, you can use Kill Process to stop processing as quickly as possible, or Pause to make it stop after the current data set finishes or reaches the stopping point (or when all running data sets reach that point, if running in parallel). When running multiple jobs in parallel, a Kill will not take effect for each job until it finishes its current step and checks for a quit signal. After a Pause or Kill, the Resume button can be used to restart the run from where it left off. When resuming from a Kill, each data set that was killed will be run from the beginning or from the selected starting point, not from the step where it was killed.

Almost nothing can be changed after a Pause or Kill: data set parameters and starting and stopping points are disabled; data sets marked for running can be dropped out but none can be added. The situation is more flexible when all data sets have reached a selected stopping point; it is possible to manipulate which sets are included in the run. However, data set parameters currently can still not be changed (we plan to enable a subset of parameters that will have an effect when changed). To remove all these restrictions in either situation, press Reset. This has several consequences: 1) data set parameters can be edited again; 2) the Resume button is disabled and the program forgets about what would be needed to resume; 3) all data sets selected for running will be run from the selected started point or the beginning, even if they already reached a later point. Thus, if you use Reset after a Pause or Kill, you have to manually turn off the Run checkboxes for any data sets that have already been run and that you do not wish to rerun.

When not running in parallel, Etomo saves and runs a single command file in the project directory, named "rootname.com", where "rootname" is the project root name from the Batch Setup tab. During the run, a corresponding log file will be created in the project directory and will contain all of the log output from the run. There will be a copy of the portion of that log for each data set in its respective directory, named "batchruntomo.log"; this can be opened with the Open Log button in the Run table. The full log for all runs can be opened from the menu brought up by right-clicking over the panel. Selected extracts are shown in the Project Log. The portion of the project log for each data set can be opened with the Proj Log button.

The situation is similar when running in parallel, except that there is a command file, and eventually a corresponding log file, for each data set in the project directory. This log is essentially the same as the "batchruntomo.log" in the data set directory, but if there is an error before the latter is started, you would have to examine the log for that data set in the project directory.

You can exit Etomo after starting a run and reopen it later. The program will "reconnect" to the run, whether it is finished or not, and update the status for all data sets.

Running on a Cluster

A cluster can be used only when running multiple Batchruntomo jobs in parallel. (If the cpu.adoc file on the system defines only cluster queues and no computers, running in parallel is obligatory.) When using a cluster is selected, there are several changes in the interface: 1) The Maximum # of GPUs to use by one job entry is disabled; instead the number of GPUs is determined by the properties of the queue being run on. 2) The Resources to Use items are disabled. 3) The cluster-specific section How a Batch Job Should Run Processes is enabled and has a set of choices that depend on what kind of queues are available in the table.

A variety of ways of running on a cluster are supported. The Splitbatch man page has a complete description of the different possibilities and how to configure the cpu.adoc for them. The choices enabled in How a Batch Job Should Run Processes reflect which possibilities are available and let you control which kind of queues can be selected in the Parallel Processing table. The possibilities are:

Each job gets access to a single core and no GPU. Batchruntomo will submit chunks for parallelized operations like reconstruction to the queue and run other operations directly on that core. Select Job submits processes to single-core queue to run in this mode. This option is available only if there is a queue defined that provides just one core and no GPU. In this case, the spinner is available in the Used column of the Parallel Processing table to select the maximum number of jobs to submit to the queue.
Each job is allocated several cores on a node, and possibly one or more GPUs. Batchruntomo will run all CPU-based operations using these resources and not submit any of them to the queue. It will allow multi-threaded operations (parallelized with OpenMP) to use all of the cores and run other parallelized operations in multiple chunks. If GPU(s) are provided, it will use those for operations that can be run on a GPU, splitting them into chunks if there are multiple GPUs. Select Job runs on a node and runs processes directly on that node to run in this mode, which is available only if there is a queue with either a GPU or multiple cores. In this case, the the Used column of the table contains no spinner and cannot be edited; its value is kept equal to the value for Run up to # jobs.
Either of these modes can be combined with a different way of accessing a GPU: using a "secondary" queue that provides one GPU. In this case, all operations that can be done on a GPU are split into chunks and the chunks are submitted to that secondary queue. In this way, GPUs are requested only when needed, similar to the dynamic allocation of GPUs when not running on a cluster. Check Use one GPU on secondary queue for this option, which is available only if there is a queue that provides just one GPU. Once this is checked, the radio buttons in the 2nd column of the table are enabled, and an eligible queue can be selected there. Also, the radio buttons in the 1st column are disabled for queues offering a GPU; a CPU-only queue must be selected there. The spinner in the Used column for the secondary queue should be set to the maximum number of GPU jobs that one Batchruntomo job will submit at once. If this is set to one, the operation will not be split into chunks, which is probably more efficient.

After selecting the desired mode of operation (if there are any such choices), select the appropriate queue(s) from among the enabled ones.