Guide to Particle Estimation for Electron Tomography (PEET)

Boulder Laboratory for 3-D Electron Microscopy of Cells


Table of Contents


1. Introduction to PEET
2. The Setup Tab
2.1. Root Name for Output
2.2. Directory
2.3. Volume Table
2.4. Action Buttons
2.5. Reference Section
2.6. Volume Size
2.7. Missing Wedge Compensation Section
2.8. Masking Section
2.9. Particle Y Axis Section
2.10. Initial Motive List Section
3. The Run Tab
3.1. Parallel Processing
3.2. Iteration Table
3.3. Spherical Sampling for Theta and Psi Section
3.4. Number of Particles to Average
3.5. Use Equal Numbers of Particles From All Tomograms
3.6. Optional / Advanced Features
3.7. Action Buttons


1. Introduction to PEET

PEET is a set of programs, separate from IMOD, to align and average particles/subvolumes extracted from 3D volumes.  It iteratively refines the alignment of each particle with a reference volume over several iterations.  Subsequent iterations refine the best alignment found in the previous iteration by aligning to a new reference generated from a subset of the previously aligned particles.  A graphical user interface permits users to set up options (search parameters, etc.), and to control the execution of the process.  This guide will explain the options in the interface.

Each PEET project must reside in its own directory because of the potentially large number of intermediate and output files created.  Input files (IMOD models and volumes, initial motive lists, etc.) can also reside in this directory, but are not required to.

To access the interface and create a new PEET project, run eTomo and either press the New PEET button, or by select File-->New-->PEET from the menu.  This will bring up a dialog allowing you to specify the directory where the new project is to be located, as well as a base or root name to be used in generating output and intermediate file names.  This dialog will also allow you to copy all the settings from an existing project as a starting point for the new one. 

Alternatively, to open an existing PEET project you can simply cd to the desired directory and run eTomo with the name of the desired project file, e.g. "eTomo myProject.epe", or run eTomo with no arguments and select File-->Open-->PEET from the menu.  You can then browse to and select the desired project file.  (PEET project files have a .epe suffix, as described below).

The eTomo PEET gui has 2 tabs, labeled Setup and Run.  The Setup tab has options for defining the data to be worked on and setting up initial conditions and other features of the search.  The Run tab has a table for entering parameters to control each iteration of the search and options for the final averaging operation.

2. The Setup Tab

2.1. Root Name for Output

(Read only). Displays the base name that PEET will use to create output and intermediate files.  To make this manual concrete, we will assume a base name of "myRun".

2.2. Directory

(Read only). Displays the directory containing the PEET project.

Assuming a base name of "myRun", some of the important text files in this directory are listed in the table below,  (Here, #k and #m stand for integers, and "*" is a wildcard which can match any series of characters.)

myRun.epe

The project file, containing eTomo status and other options appearing on the screen but not stored in the PEET parameter file

myRun.prm

The parameter file, containing PEET parameter assignments in Matlab syntax.  See the PEET man page for additional details.

myRun*.com

Command scripts that will distributed to user-selected computers for execution.

myRun_MOTL_Tom#k_Iter#m.csv

Optimal alignments found for tomogram #k particles during alignment iteration #m - 1.  PEET uses the contents as inputs subject to further refinement in iteration #m.  These files are in "comma separated value" (.csv) format, so they can be examined or modified in text editors or in spreadsheet programs like Excel or OpenOffice Calc.  "MOTL" stands for motive list, a set of rotations, translations, and other information about each particle, including the correlation coefficient from the alignment with the reference.

myRun*.log

Logs created by the shell commands.  Each .com file creates a corresponding log file.


Files with .mrc extension in this folder are volumes stored in MRC format.  References and final averages are examples of important volumes created by PEET and stored in this format.  For example, myRun_Ref#m.mrc is the reference volume created for alignment during iteration #m.  Similarly, the average volume resulting from combining #n particles at iteration #m is stored in myRun_AvgVol_#mP#n.mrc.

2.3. Volume Table

The Volume Table can have multiple rows, each referring to a volume and corresponding model to be included in the alignment.  A row can be selected by clicking => near the left end of the row.  Each row can contain 5 entries, described in the following table. 

Volume

The name of a file containing a tomogram in MRC format and oriented so that X / Y planes correspond to the plane of the specimen.  (Note that post-processing will typically be necessary after reconstruction to meet this requirement.)  The same volume can be entered on more than one row.

Model

The name of a file containing an IMOD model for the current volume.  All points in the first object of the model, and only those points, will be used, with each point specifying the center of a particle to be aligned and averaged.  (Points in objects other than the first can be useful for specifying orientation in some cases.  See Y Axis Type below.)  Typically, the object will contain either scattered points for isolated particles, or open contours if the particles lie along filaments, and will be assigned the corresponding type.

Initial MOTL

The name of a file containing an initial motive list, specifying rotations and / or translations required to approximately align each particle with the reference.  This can permit use of a more restricted search for the optimal alignment.  The file format is identical to that of the .csv MOTL files used by PEET to transfer alignment information between iterations.  This parameter is active only if the User supplied csv files option is selected in the "Initial Motive List" section, described below.

Tilt Range

Two values defining the tilt range used to collect the tomogram. Tilt range is needed to correct averages and alignmentts for the effects of the tomographic "missing wedge".  This option is disabled by default.  To enable it, check Enabled under "Missing Wedge Compensation".  If you have a tilt file containing the tilt angles for the currently selected tomogram, you can read the appropriate values from the file by pressing the Read Tilt File button beneath the table.  The default type of file is a "tilt.log", whose angles will always be correct..  You can also change the filter in the file chooser to select a ".tlt" file, whose angles will be correct as long as the no additional angle offset was entered when running the Tilt program.  Tilt range must be specified in order to use missing wedge compensation.


2.4. Action Buttons

Up, Down, Delete, and Dup buttons to the right of the volume table allow modifying the volume table by moving, deleting, or duplicating a selected existing row.  To select a row press => near the left of the row. The Insert button adds a new, blank row a the bottom of the volume table. Volume, Model, and Initial MOTL fields for any row in the volue table can be entered or modified either by selecting the file browser (folder) icon to the right of the field, or by typing directly in the text box.&npsp; Tilt ranges can be entered or modified by typing in the appropriate text box, or by pressing "Read tilt file" if you have the an appropriate file.;

2.5. Reference Section

The Reference section allows specification of the initial reference to be used for the first alignment iteration. If the first radio button is pressed, a single, specified particle from one of the input volumes will be used as the initial reference.&npsp; Tomograms and particles are both numbered sequentially from 1, and particles are numbered as if all the contours were combined into one; e.g. if the first contour has 100 particles, then the 20th particle of the second contour will be particle number 120.&npsp; Note that PEET will align and average only particles and contours contained in the object 1.&npsp; Additional objects will be ignored during alignment and averaging, although they are used by some specific PEET programs (e.g. see modTwist2EM).

If User supplied file is selected, the specified MRC file will be used as the initial reference. In this case, the user is responsible for ensuring that the supplied reference is appropriately sized.  (In the other 2 cases, PEET will choose an adequate reference size automatically).

Finally, if multiparticle reference is selected the specified number (a power of 2 between 4 and 1024 chosen from the drop-down box) of randomly chosen particles will be aligned and combined using a binary tree structure, and the resulting average used as the initial reference.

2.6. Volume Size

Specifies the size in voxels of the subvolumes to be aligned and averaged.  Cubical volumes are prefered when possible, although they are not required.*nbsp; The size specified should be that of the volume of interest plus at least twice the maximum Search Distance (described under the Run Tab below) in each direction.

2.7. Missing Wedge Compensation Section

Check Enabled and specify Tilt Range in the Volume Table to use PEET's missing wedge compensation during alignment and averaging.  When selected, object space averaging of the aligned particles will be replaced by a weighted averaging in Fourier space, with each particle contributing only to regions in which its projections are informative, i.e. those lying outside its tomographic missing wedge.  Additionally, during alignment of single particles with a reference with known missing wedge (typically at the first iteration), correlations coefficients will we normalized by the fractional overlap between the informative regions of the particle and the reference. 

Edge shift specifies the number of pixels in from the edge of the missing wedge to include when averaging with missing wedge compensation.  Some useful information seems to spread into the missing wedge during backprojection.  A value of 1 voxel is typical appropriate.

An additional type of missing wedge compensation is available when selecting particles to be averaged at the end of each alignment iteration (i.e. when creating a new reference or final averages).  If the number of Weight groups is set to a number larger than 1, particles are divided into groups based on their missing wedge orientation.  Correlation scores are then scaled so that each group has approximately the same median score.  The result is that nearly equal numbers of particles from each of the orientation groups will included in the reference (or the final average), thus minimizing a missing wedge bias. The appropriate setting must be chosen heuristically, but depends on the number of available particles as well as the number of distinct orientations present in the data.  This option is most effective if particles assume a variety of original orientations.  If particles all have about the same orientation, (e.g. particles from a non-twisting microtubule sitting in the X-Y plane), this type of compensation may not be necessary or helpful unless Random axial rotations (described below) or a program like modTwist2EM is used to provide varying starting orientations.

2.8. Masking Section

Users can mask out an area of the reference that they want to exclude from being used for cross-correlation.  The default is no masking.  However, the user can choose a sphere, a cylinder, or a use supplied volume as the mask by checking the corresponding radio button.  For example, when averaging an icosahedral virus, you might use a spherical mask to align only on the capsid and mask out the interior of the virus, or vice versa.  A cylindrical mask could be used when aligning subunits of a microtubule to exclude variable material outside the microtubule, such as a neighboring microtubule.

To use a volume containing an arbitrary mask select the User supplied binary file option, and input the name of the file containing the mask volume.  A voxel of the reference volume will be masked out if the value of its corresponding voxel in the mask volume is zero.  The dimensions of the mask volume need not match those of the reference volume.  In this case, the mask volume is centered on the reference, and mask values outside the reference volume are ignored.  Mask values outside the mask volume, but within the reference, are determined by the value of the majority of the 8 corner voxels of the mask volume.  (I.e., if 5 or more corners are zero, the mask will be zero outside the volume;  conversely, if 4 or fewer corners are zero, the mask will be one outside the mask volume.)  A convenient way to generate a mask volume is to use the IMOD program imodmop, which can be used to retain pixels inside of spheres and cylinders as well as inside of closed contours.  For this purpose, it does not matter that this program leaves original density values rather than 1's in the region that is retained.  (I.e. the user supplied mask is not strictly required to be binary.)

If the Sphere or Cylinder mask options are chosen, the user can specify both inner and outer radii.  Reference voxels inside the inner radius or outside the outer radius will be masked out.

A spherical or user supplied mask will be centered on the center of the reference volume.

A cylindrical mask will be center on the reference volume, with its long axis along the reference's Y axis.  Alternatively, you can override the orientation of a cylindrical mask by selecting Manual Cylinder Orientation and specifying the desired Z and Y rotations in degres.

2.9. Particle Y Axis Section

The particle Y axis (or twist axis) is the axis PEET uses for the first angular search, the one referred to as Phi in the iteration table.  (Particle X and Z axes are determined from Y to form a right handed coordinate system, with some arbitrary conventions to resolve ambiguities).  if Tomogram Y axis is selected, the particle Y axis will coincide with the tomogram Y axis.  In this case, Phi will be around the tomogram Y axis, Theta around the tomogram Z axis, and Psi around the tomogram X axis.

If Particle model points is selected, the first rotation axis will vary from particle to particle and will be the vector connecting 2 consecutive model points in the IMOD model contour.  With this option, Phi represents twisting around the contour axis; Theta represents bending or turning in the X-Y plane, and Psi represents dipping in Z.  This option is most useful when particles sit along contours, for example, when particles are along a microtubule.  When using this option, be sure to model separate filaments in separate contours.

End points of contour is similar to Particle model points, except that the vector between the first and the last points in the contour will be taken as the Y axis for each point in that contour.  This can be useful when the contours are nearly straight, as it provides a less noisy estimate of the Y axis.

2.10. Initial Motive List Section

Controls in this section provide several options for setting up an initial motive list with starting orientations for all particles.  This can allow searching over a limited angular range, improving both throughput and accuracy, even when the particles have significantly different starting orientations. 

2.10.1. Set all rotational values to zero

All particles will have their initial orientation left unaltered. This is appropriate if the particles are scattered, independent particles, no prior alignment information is available, and the orientations are either not readily apparent from the images or you choose not to bother extracting this information, preferring instead to use a more extensive alignment search.

2.10.2. Align particle Y axes

Automatically generate initial motive list rotations to align each particles Y axis with that of the reference. This is particularly useful when modeling filaments and Particle Y axis is set to Particle model points or End points of contour.

2.10.3. User supplied csv files

Use this option if you have a motive list file in .csv format specifying the approximate rotation (and, optionally, translation) required to orient each particle.  Such motive list (MOTL) files can be obtained from a prior PEET run or from an auxiliary program that computes initial orientations from a model for a particular situation.  PEET programs modTwist2EM, spikeInit, and slicer2MOTL are examples of such auxilliary programs.  Additionally, program modifyMotiveList can be used to incorporate additional rotations and translations into an existing motive list, e.g. when imposing symmetry.

2.10.4. Uniform random rotations

A random rotation drawn from a uniform distribution will be generated for each particle.  In conjunction with a limited range angular search, this can be useful in avoiding missing wedge bias for isolated particles.  With suspected icosahedral symmetry, for example, a useful approach is to use this option along with angular search ranges of rougly 32 degrees.  Note that random rotations will only help avoid missing wedge bias if other knowledge allows using less than a full spherical search.

2.10.5. Random axial rotations

An initial motive list will be automatically generated which aligns the particles Y axis to that of the reference, followed by a random rotations around that axis.  This can be helpful in minimizing missing wedge bias, e.g. when averaging sections of a cylindrcal filament with similar missing wedge orientations and known axial symmetry (e.g. a 15 protofilament microtubule).  Random axial rotations will help eliminate missing wedge bias only if other knowledge allows restricting the axial (Phi) search range.

3. The Run Tab

3.1. Parallel Processing

PEET can distribute the computational load of aligning and averaging particles over multiple processors on one or more systems, as specified by the user.  This section allows you to select which systems and how many processors on each system are to be used.  See the eTomo User's Guide for details on the controls in this section and on how to configure parallel processing on your system(s). For information on how to use a cluster queue, see the man pages for processchunks and the cpu.adoc file.

3.2. Iteration Table

Each row of this table contrls a single iteration and has five types of entries.

The Angular Search Range entries set search ranges and increments (step sizes) in degrees for the 3 rotations (Phi, Theta, and Psi, referring to rotations around Y, Z, and X axes).  For example, if Max and Step fields of Phi are set to 6 and 2 respectively, Phi will have a range of -6 to +6 and will assume 7 values (-6, -4, -2, 0, 2, 4, and 6).  A general rule is that except for special cases (e.g. a no-search iteration to generate a new reference) earlier iterations should have larger ranges and step sizes than later ones.  Additionally, multiple iterations with larger step sizes are typically faster than a single search with a large range and small step size.  A reasonable compromise is to use a Max of 3 times Step, as in the example above.  Both Max and Step can then be reduced by a factor of 2 at each subsequent iteration, until the desired precision is achieved.  Searches with even fewer steps can be done by using a Max of twice the increment, provided that Max and Step are reduced by a factor of only 1.5-1.6 between iterations.  In many cases, it may be necessary to use more evaluations during the first iteration to obtain a reasonable initial alignment.  To skip searching around a particular axis entirely, set the Max to 0 and Step to 1.

Search Distance can be either a single number or three numbers, and specifies the amount of translation in pixels allowed in the X, Y, and Z directions during alignment searches.  For example, setting it to 2 will limit the translation to between -2 and +2 in each of the 3 directions.  These directions are in the tomogram rather than particle coordinates, so using different limits would be useful only if particles all had similar orientations with respect to at least one axis.  This entry does not affect e execution time, but small values can make the alignment more reliable by preventing spurious correlation peaks at higher translations from being chosen.  Bfore correlation, regions of width search radius at each edge of the volume are set to zero to avoid wrap around effects.  Thus, the larger the search radius, the fewer voxels are actually included in the correlation and the lower the signal-to-noise ratio of the correlation.  A search radius of 0 can be specified to disable searching, allowing the quality of the initial alignment to be determined.

High Freq. Filter specifies parameters for filtering out high frequencies.  As in IMOD, frequencies are in reciprocal pixels, with 0.5 being the Nyquist frequency, and 0.86 corresponding to the highest frequency in the corners of 3D FFTs.  The Cutoff field sets the cutoff frequency, below which there is no attenuation, and above which there is progressively more attenuation. Above the cutoff, the response falls like the right half of a Gaussian with standard deviation given by the Sigma field.  Both the reference and the particle are filtered separately, so the correlation is essentially filtered twice.  One way to see what a reference or particle looks like with a particular filter is apply the same filter to a file with the IMOD program mtffilter, e.g., with the command "mtffilter -3d -low cutoff,sigma input_file output_file".  You can do this with the reference particle extracted by PEET, or you can extract a particle from the tomogram in 3dmod using the rubber band in the Zap window and the Extract entry in the File menu.

Reference Threshold controls how many particles are used to form the reference for the next iteration.  If a value larger than one is specified, it represents the number of particles to use, with higher correlation coefficients chosen preferentially.  A rule of thumb is to use two thirds of the total number of particles.  In some cases, however, you might want to include all particles on a first round to minimize any missing wedge bias in the new reference.  If a value less than one is specified, it represents a correlation coefficient threshold, above which particles will be included in computation of the new reference.  We have not found specifying the threshold in this manner particularly useful.

Duplicate Tolerance provides control over PEET's duplicate rejection logic.  When the search range is comparable to the spacing between particles, and especially when particles lie along filaments, an alignment search can sometimes result in mulitple particles pointing to essentially the same position and orientation.  If not corrected, these errors can bias subsequent references and averages, and can lead to overestimation of resolution using Fourier Shell Correlation.  If Remove duplicate particles after each iteration is checked (below the Iteration Table), PEET will attempt to identify cases of duplication, and will remove offending particles from further averaging or consideration during the current iteration.  The Shift and Angular tolerances specify the maximum separation in integer pixels and degrees, respectively, at at which particles can be considered duplicates.  The shift tolerance is applied separately to each of the tomogram X, Y, and Z coordinates. Duplication is determined independently near the end of each iteration, so a particle ingored as a duplicate in one iteration is not necessarily excluded from later iterations.  A value of 0 for either or both of the angular and shift tolerances can be used to disable duplicate removal for the associated iteration.

As for the Volume Table, action buttons to the right of the Iteration Table allow adding, deleting, or rearranging lines in the table.

3.3.Spherical Sampling for Theta and Psi Section

This option is designed to make full angular searching on the first iteration more efficient by sampling orientation space at relatively uniform intervals.  It is referred to as spherical sampling because the second and third search angles (Theta and Psi) are chosen to given approximately uniform spacing when represented on the surface of a sphere.  (Phi corresponds to rotation around the particle's twist or Y axis).  Simply varying both Theta and Psi with regular increments results in oversampling near the "poles".  With spherical sampling enabled, PEET avoids this oversampling.  If Full sphere is checked, PEET will sample the whole sphere.  If Half sphere is checked, PEET will only sample one hemisphere.  The Sample interval field specifies the Theta search interval in degrees, as well as the Psi search interval at the equator.  For a unit sample sphere, sample points are placed on latitude lines separated by the sample interval, with approximately 360 / Sample interval points along along each latitude line at the equator, decreasing with latitude to no more than a single point at each pole.  For either a full or half sphere angular search, enter a Max of 180, with an appropriate Step for Phi in the Iteration Table, as well as the Sample interval for spherical samping.  When in doubt, it is typically reasonable to choose the Sample interval equal to the Phi Step size.

3.4. Number of Particles to Average

Specify how many particles to use to form averages.  For example, if Start, Step, End, and Additional numbers are set to to 50, 50, 200, and 205, respectively, there will be 5 final averages, containing 50, 100, 150, 200 and 205 particles.  If End is not bigger than Start by a multiple of the increment (e.g., 160), the last number in the sequence will be one smaller than End (e.g., 150).  Values listed under Additional numbers must be comma separated, monotonically increasing, and greater than the End value of the arithmetic sequence.  The largest average requested should be less than the total number of particles availale.

3.5. Use Equal Numbers of Particles from All Tomograms

These checkboxes provide another tool for reducing bias when combining data from multiple tomograms.  If checked, PEET will attempt to use equal numbers of particles from each tomogram when computing averages or new references.  (Unequal numbers may still be used if required to acheive the requested number of particles).  If left unchecked, it will choose particles with the highest correlation scores regardless of which tomogram(s) they are in.

3.6. Optional / Advanced Features

This section contains a number of useful, but less often used controls.

If Align averages to have their Y axes vertical isselected, average volumes will be reoriented to have their Y axis approximately vertical.  This is particularly useful for particles along a straight or slightly curved filament.  Note that the reorientation is applied only to the particles as they are averaged and is not refelcted in the motive list. 

if Use absolute value of cross-correlation is checked, PEET will maximize the absolute value of the cross-correlation, rather than its signed value, during alignment searches.  This can help prevent noise from reinforcing to match features in the reference.  This works well for many biological structures which are globular or irregularly shaped.  Use it with caution, however, with repeating or highly symmetric structures where there is a chance of aligning "out of phase" (i.e. dark on light).

If Save individual aligned particles is checked, each individual subvolume contributing to one of the final averages will be saved to an MRC file named aligned*.mrc.  Use this option judiciously, as it can consume large amounts of disk space.

Normally PEET treats the Angular Search Ranges and Search Distance from each iteration as completely separate from those at any other iteration. So, for example, if Search Distances of 5, 4, and 3 voxels were specified at each of 3 iterations, it is possible that a particle could be translated by up to 5+4+3 = 12 voxels in each of the X, Y, and Z directions.  If Strict search limit checking is checked, the maximum shift from the starting position in each direction is limited to the largest distance specified at any iteration... in this case 5 voxels.  (Maximum Angular Search Ranges are treated similarly).*nbsp; This can be useful when refining an initial alignment, since the optimum at each iteration is expected to be within the search range of the previous ones.

Particles per CPU controls the splitting of alignment searches into "chunks" for parallel execution on multiple CPUS.  No more than the specified number of particles will be sent simultaneously to a single CPU for alignment.  The smaller this value, the more command and log files will be created, and, within limits, the greater the overall throughput.  Higher values, however, can lead to excessive delays should you need to kill (or pause) and then resume the processing.  The default of 5 is a good choice for most purposes.

Debug level (0, 1, 2, or 3) controls the amount of information included in the log files, with higher numbers leading to more verbose logs..

PEET can filter out low frequencies from both particles and the reference.  Low frequency cutoff is the frequency in inverse pixels below which the signal will be attenuated.  As in the iteration table, Sigma is the standard deviation of the Gaussian whose left half defines the shape of the resulting high-pass filter.  We have rarely found low frequency filtering to be useful, and do not recommend its routine use.

3.7. Action Buttons

After reviewing and selecting Systems and / or processors in the Processing Table, press Run to start processing.  At this point, eTomo will write the parameter file and run the PEET program prmParser, producing a series of command (.com) files which will be executed by processchunks to carry out the requested computations.  During processing status will be indicated by progress bar at the top of the page. Additional status information may be obtained from the messages in the log window, as well as from the Parallel processing section. Note that once a run has been started it is permissible to close the eTomo gui if desired; if the same project is later opened in a new eTomo window, it will reconnect to any processes still running and update the status accordingly. .

When processing is complete, press Open Averaged Volumes in 3dmod to view the computed averages in 3dmod.  The Isosurface window will be opened automatically.  Note that the isosurface threshold can be adjusted independently for each average.  Similarly, pressing Open Reference Files in 3dmod will allow you to view the references generated at each iteration.  You can also select Remake Averaged Volumes to recompute averages using the existing alignment, e.g. after changing the number of particles to average or other settings.