Batch Tomogram Reconstruction with Batchruntomo in IMOD 4.11

University of Colorado, Boulder

Setting General Batch Processing
The Stack Table
Setting Basic Parameters for the Data Sets
Using the Advanced Parameter Interface
Running the Data Sets


Etomo provides an interface for reconstructing multiple tomograms automatically using Batchruntomo. The data sets should be sufficiently similar so that, for the most part, the same parameters and procedures can be applied to all of them. The interface allows you to set a number of parameters, but in each case a different value can be used for an individual data set. The parameters that we think most likely to vary are included in a table of data sets. For the other parameters, there is one tab of the interface to set the values to apply in general, which are referred to as global values. If necessary, you can open a copy of this screen for an individual data set and set different values there.

You may want to go through the Example of Batch Reconstruction either before or after reading this document.

For simplicity, the basic interface presents a selected subset of the many parameters that can be set from the regular reconstruction interface. However, you can now switch to an advanced interface that presents a much larger number of other parameters. Initially, we relied on templates as a mechanism for controlling the values of parameters not exposed in the interface. Templates still have an important role even with the advanced parameter screen available. Templates and the current editor for saving them are described in Using Etomo. In brief, they are text files with the extension ".adoc" containing name-value pairs called directives, whose format is described in the Batchruntomo man page. The available directives are listed in the directive table. If you want to make a template for personal use and do so by hand, put it in the directory .etomotemplate under your home directory (this is where Etomo's template editor places user templates by default). Using Etomo describes what to do with templates for general use.

The interface is organized into four tabs that would generally be visited in sequence from left to right. However, they can all be accessed and changed at any time. Also note that you can close the interface and reopen it to resume working on a project; you should find all of the settings as you left them, although a few may no longer be changed. The project file has the extension ".ebt".

Setting General Batch Processing Parameters

The Batch Setup tab has items that should be filled in first.

The Stack Table

On the Stacks, you add the tilt series that you want to process to a table. When you press Add Stack(s), a file chooser will open to allow you to select the stack files. You can select multiple files and add them together. If you have many dual-axis data sets, you can select all of the "a" and "b" files together, and the program will show just the "a" files in the table. The stacks can have an extension of either ".st" or ".mrc"; the latter will be renamed to ".st" for processing.

For your first addition to the table, the Dual axis, Montage, and Beads on Two Surfaces checkboxes are set based on the defaults that you have set in the Options-Settings dialog, as modified by any templates you have chosen. The setting for Dual Axis box will also be modified as appropriate when both "a" and "b" stacks are entered, or when the stack root name does not end in "a" or "b". Further entries will inherit the settings of these three boxes from the previous line in the table. The Copy Down button will copy these three settings from the selected line to the one below, which is fairly useless and should be changed to copy to all lines below. For now, the easiest way to get these boxes set for a large number of data sets is to add one, set the buttons, then add the rest.

The Boundary Model is used to indicate regions where the fiducial seed model should be selected for tracking, or where patch tracking should be done. If you have data sets needing such models, check the box before pressing the 3dmod icon to draw the model, so that 3dmod can be given the right filename and location. The file is named with the data set root name plus "_rawbound.mod" and is placed in the current location of the data set. For a dual-axis data set, the model is transformed to be used with the second axis. When using fiducials, the transformation is based on the previous run of Transferfid for transferring fiducials; when patch tracking, Transferfid is used at this point to find the transformation. You need to draw one or more contours just on one view, the zero-degree one if possible.

If entries are made to Exclude Views, the views will currently be carried through into the coarse and fine aligned stacks but skipped in tracking, alignment and reconstruction. A directive can be supplied in a template or starting batch file to remove the excluded views.

Setting Parameters for the Data Sets

Using the Advanced Parameter Interface

Press the Advanced button on the Dataset Values tab to switch to the advanced interface. The same button can be used on the Dataset Values dialog for an individual data set to set advanced parameters for that set. The button changes to Basic can be used to switch back to the basic interface. Unlike Advanced mode in the reconstruction interface, the Advanced dialog here does not include any of the parameters in the basic interface.

This interface is organized as a set of stacked sections that can be individually opened and closed by clicking the bar with the name of the section. The sections correspond to the ones in the master directive table, and all directives described in that table but not included in the basic interface are available in the advanced interface. If you are using a template or starting batch file that includes any directives not listed in the master table, then there will be an additional section at the bottom with those directives.

The directives that appear in the interface are controlled by the two choices in the Which Directives to Show box. Turning on Only items containing a value will make it show just the directives that have a value set from either a template, starting batch file, or by your setting the value. With Only items output to batch file checked, only the directives that will actually be written to batch file will be shown, not ones with template values.

When the directives being shown are restricted, sections with no directives to show are closed and disabled.

The entry fields depend on the type of parameter.

Values from templates and their labels will be shown in blue, as well as values from the "batchDefaults.adoc" file, which is treated like a bottom-level template. When there is a non-boolean value from a template, the X to the right of the field will be enabled. This button allows you to override the template and revert to a default value by placing a blank directive into the batch file. Press this button to select an override; the field will then turn black and display ">OVERRIDE<". The override button will remain enabled and you can press it again to revert to the template value.

The program checks whether entries match the expected type and number and do not have extraneous characters. Erroneous entries will be displayed in red. However, there is no check for whether more complex restrictions are satisfied, such as two options being mutually exclusive or negative values for entries that should be positive. These errors will not be caught until the program in question runs.

Running the Data Sets

When you select the Run tab, you should first make selections in the Run Actions section to indicate whether to use multiple CPUs and one or more GPUs. If you make no selections, only a single CPU will be used. After you select either Use multiple CPUs or Parallel GPUs, a table appears at the top with computing resources. If both are selected, the table will show available CPUs on the left and available GPUs on the right.

With Use multiple CPUs selected, you have the option of running batch jobs in parallel. When not using that option, reconstructions are run sequentially. Your selections in the Resources section determine what resources are used for each single reconstruction. When Run multiple batch jobs in parallel is selected, there is one command file per data set and they are all passed to Processchunks to run, along with the number to run at one time. The selected CPUs are divided as equally as possible among the different jobs, whereas the GPUs are managed dynamically so that each job can have access to multiple GPUs when it reaches a step that needs GPUs. See the Splitbatch man page for details on how this dynamic allocation works. The entry for Maximum # of GPUs to use by one job determines how many GPUs a job will request, but it may get fewer or only one. If Local GPU is selected, it does not mean that there is one GPU on each machine; select this only if you want to use just a single GPU on the machine running Etomo.

If you enable Email notification and enter an address, Batchruntomo will send an email whenever a data set is aborted and when all processing is complete. For the email to work, you may need to define an SMTP mail server; this can be done in the Options - Settings dialog.

The Subset of steps to run section allows you to control a stopping or starting point for the run. If you turn on Stop after, you can select one of the available stopping points. All data sets selected for running (by means of the button in the Run column of the Datasets table) will be run to the same selected stopping point. When the run stops at such a point, you can then turn on Start from and select a starting point. Generally you would want to select the starting point paired with the stopping point, but you can go back to an earlier one if desired. Ordinarily, it would not work to select a starting point later than the earliest point reached by any of the data sets included in the run, so this ius not allowed by default. However, if you have completed a step manually, such that it would not be a problem to start at a later step, then you can select Enable starting from any step and select any step. When starting past the Fine Alignment step, Batchruntomo no longer recomputes the fine alignment (which involves adjusting alignment parameters as needed.)

Press Run to start a run. During the run, you can use Kill Process to stop processing as quickly as possible, or Pause to make it stop after the current data set finishes or reaches the stopping point (or when all running data sets reach that point, if running in parallel). When running multiple jobs in parallel, a Kill will not take effect for each job until it finishes its current step and checks for a quit signal. After a Pause or Kill, the Resume button can be used to restart the run from where it left off. When resuming from a Kill, each data set that was killed will be run from the beginning or from the selected starting point, not from the step where it was killed.

Almost nothing can be changed after a Pause or Kill: data set parameters and starting and stopping points are disabled; data sets marked for running can be dropped out but none can be added. The situation is more flexible when all data sets have reached a selected stopping point; it is possible to manipulate which sets are included in the run. However, data set parameters currently can still not be changed (we plan to enable a subset of parameters that will have an effect when changed). To remove all these restrictions in either situation, press Reset. This has several consequences: 1) data set parameters can be edited again; 2) the Resume button is disabled and the program forgets about what would be needed to resume; 3) all data sets selected for running will be run from the selected started point or the beginning, even if they already reached a later point. Thus, if you use Reset after a Pause or Kill, you have to manually turn off the Run checkboxes for any data sets that have already been run and that you do not wish to rerun.

When not running in parallel, Etomo saves and runs a single command file in the project directory, named "", where "rootname" is the project root name from the Batch Setup tab. During the run, a corresponding log file will be created in the project directory and will contain all of the log output from the run. There will be a copy of the portion of that log for each data set in its respective directory, named "batchruntomo.log"; this can be opened with the Open Log button in the Run table. The full log for all runs can be opened from the menu brought up by right-clicking over the panel. Selected extracts are shown in the Project Log. The portion of the project log for each data set can be opened with the Proj Log button.

The situation is similar when running in parallel, except that there is a command file, and eventually a corresponding log file, for each data set in the project directory. This log is essentially the same as the "batchruntomo.log" in the data set directory, but if there is an error before the latter is started, you would have to examine the log for that data set in the project directory.

You can exit Etomo after starting a run and reopen it later. The program will "reconnect" to the run, whether it is finished or not, and update the status for all data sets.