alignframes(1) General Commands Manual alignframes(1) NAME alignframes - Aligns and sums camera movie frames and stacks sums SYNOPSIS alignframes options DESCRIPTION Alignframes aligns frames from direct electron detectors and other cam- eras that can output a set of frames from an acquisition. It can take input from multiple frame files and produce a single stack that is ready to use for tilt series processing. It implements two basic alternatives for an initial alignment that can optionally be refined: aligning each frame to a reference accumulated from previously aligned frames; or solving for the shifts of individual images by fitting to shifts measured between many pairs of frames. The latter approach has several advantages and can be applied in various ways. The advantages are: 1) It involves multiple measurements for every image and thus can average over more information for aligning the first few frames than the cumulative method can. 2) Robust regression is used for the fitting, so a small proportion of bad alignments can be tolerated and should not degrade the result. Thus, this method should be more resistant to occasional failed corre- lations due to fixed pattern noise. 3) The fitting yields an estimate of the residual error for each mea- surement, from which it is possible to derive an error measure that reflects the overall quality of the alignment and that can be used to compare results with different parameters. The fitting to shifts between pairs of frames is done for successive sets of frames, 7 by default. With that setting, each frame is aligned to all preceding frames for the first 7 frames, then the first fit is done and a best shift determined for the first frame. The 8th frame is then aligned to frames 2 through 7, and the next fit yields a shift for the second frame. It is possible to align every frame to every other one and do one fit to find all of the shifts at once, but this approach does not seem to give any advantage in the typical case, and the number of correlations can become quite large because it is proportional to the square of the number of frames. When pairwise fits with sets of more than 7 frames are indicated, another alternative strategy is to do pairwise fits with sets of half the frames (see below). The program allows one to test the quality of the fits with different high frequency filter cutoffs, and even with different binnings. Mul- tiple filters can be tested quickly in one run through the data, but testing multiple binnings require multiple runs through the data. After the initial alignment, it is possible to realign each frame to a reference consisting of all other aligned frames. (For high-noise data, it is essential to leave the frame being aligned out of the ref- erence, or it would dominate the alignment.) This refinement can help with after alignment to a cumulative reference, but has seemed super- fluous in most tests with initial alignments from pairwise shifts. All frame data are maintained as Fourier transforms, so each additional alignment only involves one inverse FFT. Frames are shifted into alignment in Fourier space to avoid losses from interpolation, and they are reduced in size for the final sum (if at all) by cropping in Fourier space to avoid aliasing. Fourier transforms of even and odd frames are summed separately and a Fourier ring correlation is computed routinely. The program reports the frequencies at which the FRC crosses 0.5 and 0.25 in cycles/pixel of the summed images, and also reports the mean value around a fre- quency of 0.25/pixel (half-Nyquist). The FRC is the only tool for com- paring results with the cumulative alignment to those with pairwise alignment or for assessing the change from refining a pairwise align- ment at the end. The FRC can also be used for validating the choice of filter or binning suggested by trying multiple values. However, changes in the FRC may generally be quite small, so it will usually be helpful to assess a change a parameters with a number of frame stacks. The program Subtractcurves can help in this assessment as described below. Quite strong high-frequency filtering is needed for typical frame alignments. The filter cutoff is entered in frequency units of the unbinned data so that a particular value has about the same effect at different binnings. The default filter (0.06/pixel) is close to what is typically needed, but smaller values (down to 0.05) or possibly larger values (up to ~0.1) may give better results. Binning (actually antialiased reduction in size) accomplishes most of the removal of high-frequency noise prior to the application of the frequency filter, so these two operations are partly redundant. Most of the motivation for binning is to speed up the alignment; however, after a certain point, additional binning will somewhat reduce the accuracy with which the correlation peak position can be measured. Thus, the program facilitates testing with different binnings, although such testing is probably only needed when getting started with a particular class of data, whereas testing with multiple filters is likely to be used more routinely. Alignment Strategies The big challenges in aligning frames are the low signal-to-noise ratio and interference from fixed pattern noise. There are some significant distinctions between tilt series and single particle data when consid- ering what methods to apply. First, tilt series may have a lower dose per frame, but they are likely to have more features in the image than some single-particle images, and thus more signal to align with. Sec- ond, there can be beam damage such as doming of the ice within a series of single-particle frames, which would never happen for a set a tilt series frames. One can think of a series of possible strategies for dealing with increasingly difficult data: 1) When there is strong signal and no appreciable fixed pattern noise, the simple method of aligning to a cumulative reference of already aligned frames may be adequate. Here, refinement at the end may help if it improves the shifts for the first few frames, which were subject to the noisiest correlations. 2) Images with reasonable SNR and little fixed pattern noise should work with the default method of aligning all pairs among successive sets of 7 frames. 3) If the noise is higher or signal lower, or if there is serious fixed pattern noise, then pairwise alignment among much larger sets of frames will be needed. 3A) For tilt series data, using all pairs of frames would generally be appropriate. 3B) For single particle data, it is better to do pairwise alignment among half the total number of frames, to avoid correlating across large changes in the specimen. However, if fixed pattern noise is a problem, it may be necessary to use all pairs instead, so that some correlations with large shifts will be available for all frames. (Fixed pattern noise makes the correlations unreliable or inaccurate for small shifts; even when the filtered peak at the origin is smaller than the true correlation peak, it can displace the peak position if the two peaks overlap.) 3C) Refinement at the end is risky if there is serious fixed pattern noise. 4) If the noise is just too high for alignments between single frames to work, then grouping can be used. A group size of 3 can help considerably. Refinements can be done at the end by correlating either single frames or, if necessary, just the grouped frames with sums of other frames. Text Output with Single Parameter Settings In all cases, output for a file starts with a line like File 1 (Feb21_10.12.15.mrc): 11 frames When there is only one filter and binning being used, the following two lines present summary statistics for the results of the doing robust fitting to the shifts for each set of pairwise alignments. All of the values and distances are in unbinned pixels. The first line has these items: Weighted residual mean: The mean residual error value, averaged over all of the fits. This is a weighted error, so aberrant shifts that are completely down-weighted are not reflected here. SD: The standard deviation of those weighted mean errors. mean max: The mean value of the maximum weighted residual, averaged over all the fits max max: The maximum weighted residual seen in any of the fits The second line has these items: Max unweighted resid mean: The average of the maximum unweighted residual values seen in the fits max: The maximum unweighted residual seen in any fit Dist: Raw sum of distances from one frame position to the next smoothed: Sum of smoothed distances. If spline smoothing is used, this is the distance for the smoothed shifts that are used to sum the images. Otherwise, this is based on a local polynomial smoothing that is not very good. Finally, there is a summary of the output from Fourier ring correla- tion. This line reports the frequencies (in cycles/pixel) at which the curve crosses below 0.5, 0.25, and 0.125. The last number on the line is the mean value of the curve around 0.25/pixel (half-Nyquist). If the sum is reduced in size, these frequencies are in terms of the binned pixels. The most important value on the first line is the weighted residual mean. After each fit, the shift between each pair of images is com- puted from the solved shifts of the individual images, and the residual for that shift is the difference between the computed and measured val- ues, multiplied by the weighting factor applied to the measurement. The mean of these errors gives an indication of how accurate the shifts should be, on average. Good values are in the range of 0.05 to ~0.3, but some sets may give values in the range of 1-3. The latter cases are a sign that one should try analyzing a higher number of pairwise shifts or even try grouping. Using more pairwise shifts will not improve the mean residual but will allow greater averaging over these random errors; grouping should improve the residual. The maximum residual values on the first line should not be many times larger than the mean; these values indicate that there may be a shift in error by that amount. The Max unweighted resid mean on the second line reflects how often there are bad shifts measured between pairs of images. With no bad shifts, it should be not very much bigger than the maximum weighted residual; values above a few pixels may indicate that there are bad correlations in most fits. Reducing the filter cutoff, reducing the maximum allowed shift, and grouping may improve this value; increasing the number pairs being fit will increase the ability of the program to reject the bad shifts as outliers. (As long as the weighted maximum residuals are not high, the program is already able to reject the bad shifts.) The distance values on the second line reflect the total specimen move- ment, and comparison between raw and smoothed distances gives some indication of how jittery the solved shifts are. They become more important when trying different filter settings, as explained below. Text Output with Tests of Binning and/or Filters If you specify either multiple filters with the -vary option or multi- ple binnings with the -test option, there will be output similar to what was just described for each condition being tested. An initial line for each condition shows the binning, the filter cutoff value, and the sigma for the filter falloff. The latter is varied in proportion to the cutoff value. When there are multiple filters, the program will compose a "hybrid" solution that is based on the filter that gives the lowest residual error after each fit to a set of pairwise shifts. The set of results from this hybrid solution appear after the ones for the various fil- ters, with an initial line showing Hybrid results, bin =. This solu- tion is not used by default, only if the -hybrid option is given. After all of the results, there will be a line indicating which condi- tion is considered best, such as: File 1: Best at bin = 8 rad2 = 0.060 sig2 = 0.0086 mean res = 0.121 However, the selection of a best solution must be treated with caution. If fixed pattern noise is significant, the fit may improve dramatically with high filter cutoffs. One sign that fixed pattern noise could be taking over is a substantial decline in the distance travelled with higher filter settings. If the program is run on more than one file, then at the end there will be a report of the number of times each combination of binning and fil- ter cutoff gave the best solution. For example: Number of times each condition is best (rad2 in parentheses): bin = 6 3 (0.050) 1 (0.060) 0 (0.080) bin = 8 3 (0.050) 1 (0.060) 1 (0.080) This indicates that a cutoff of 0.05 is generally better than 0.06, and that there is not much difference between binnings 6 and 8. Not all combinations of binning and filter cutoff are meaningful. Fil- ter cutoffs that are at or above the Nyquist frequency for a particular binning will have little or no effect. Here are the Nyquist frequen- cies for common binnings: Binning Nyquist frequency 2 0.25 3 0.167 4 0.125 6 0.083 8 0.062 Support for Frame Files with Extended Header Data If the frame files have an extended header (as files saved by UCSFtomo do), the program will look for several features: 1) If the header is large enough to contain a gain reference, then these data will be extracted and used to gain-normalize the frames, unless a different gain reference is supplied with the -gain option. The reference will be assumed to be in the correct orientation to apply to the frames; but if it is not, the -rotation option can be used to reorient it. 2) If the header appears to contain valid tilt angles, the program will use these values to break the frames into separate sets for align- ing and summing, unless the -break option is given. It will recognize one place where there are two sets at the same angle and make two sums there, provided that there are at least 5 sets of frames of the same size prior to that place. This, if you use -frame to try aligning a subset of tilt angles right around the starting point of a tilt series, you may have to use -break to prevent these two images from being com- bined. The tilt angles will be placed into the extended header of the output file unless provided by another source (the -tilt or -stack options). 3) If the extended header contains valid entries for pixel size and tilt axis rotation angle, the pixel size will be placed into the stan- dard header location and a title will be added with the rotation angle. Note that although multiple files can still be entered when there is an extended header, the program will insist that their properties match. Computer and GPU Memory Usage The amount of computer memory required for this processing depends mostly on the size of the images and whether all of the frames will be held in memory until all alignments are completed. Each frame held in memory requires 4 times as many bytes as pixels. Frames will be held in memory if only one binning is used and any of the following options are used: multiple filters (unless the -hybrid option is used to allow the final setting of one shift after each pairwise fit), pairwise alignment among all frames, refinement after initial alignment, or smoothing of shifts (which occurs by default with 15 or more frames). Otherwise, the number of frames held until the end will equal the num- ber of frames used for pairwise fits (plus the group size minus one, if grouping). However, when more than one binning is tested, the program will not hold any frames but instead read them in again on a second pass. The amount of memory needed for the binned images being aligned can also become large if the binning is small. These images are all held until the end if there is refinement of the initial alignment; other- wise the number retained will be the number used for pairwise fits. When using a GPU, the size of unbinned images dominates the usage, but space is needed for only 3 or 4 image-sized arrays. This can approach 1 GB for ~8K images. The frames are actually held in computer memory until their shifts are finalized. Binned images for alignment will also consume significant amounts of the GPU memory; these images stay on the GPU instead of in computer memory, and the requirements there are the same as they would be in computer memory. If binning is only to ~2Kx2K and there are 60 frames, with refinement at the end, then 1 GB would be needed; fewer frames or more binning (both typical) would result in half this requirement for the aligned frames. Thus, 2GB of memory should suffice for typical usage, but 4 GB would handle almost any anticipated need. OPTIONS Alignframes uses the PIP package for input (see the manual page for pip). Options can be specified either as command line arguments (with the -) or one per line in a command file (without the -). Options can be abbreviated to unique letters; the currently valid abbreviations for short names are shown in parentheses. -input (-in) OR -InputFile File name Input file with images to correlate. Non-option arguments will also be used for input files, with those entries used after any names entered with this option. If -foutput is entered, all non-option arguments will be used for input files; otherwise all but the last will be. Input files need not be entered if an mdoc file is entered with the names of the frame files. (Suc- cessive entries accumulate) -output (-o) OR -OutputImageFile File name If this option is not entered, the last non-option argument will be used for this output file. An output file is required unless -nosum is entered. -list (-l) OR -ListOfInputFiles File name Name of file with list of input files, one per line. Filenames entered this way are equivalent to ones entered with -input or as non-option arguments; the latter two entries cannot be used along with a file list. -break (-br) OR -BreakFramesIntoSets Integer If the input consists of a series of single-frame files, this option must be used to combine them into one or more sets of frames to be aligned and summed. Additionally, the option can be used to break a file with many frames into multiple sets of frames, each of which will be aligned and summed. The input frames (either the whole collection of single-frame files, or the frames in one multi-frame input file) will be divided into groups of the given size, with any extra frames distributed among the initial groups. For example, for 50 single-image files or one file with 50 frames, and an entry of 8, there will be 6 summed images, with 9 frames in the first two and 8 frames in the rest. There must be at least as many single-image files, or as many frames in each input file, as the number given. For single-image files, this option cannot be entered with the -frame or -assess options. For multi-frame files, the option cannot be entered with -assess and will work with -frame. In either case, the option should work with -stack or -mdoc unless frame filenames are being taken from the mdoc file. Frame files with tilt angles in the extended header will automatically be broken into sets by tilt angle, so this option is not needed in that case. -skip (-sk) OR -SkipFileChecks Skip initial check that all input files have the same size and data mode; this check can take significant time with many non- MRC single-frame files. This option is allowed only with sin- gle-frame input files. -stack (-sta) OR -CorrespondingStack File name Name of image stack of sums corresponding to the input files, such as a tilt series where each image is a sum of unaligned frames. This file will be used for the basic header information of the output file, thus preserving titles and extended header data. -mdoc (-md) OR -MetadataFile File name Name of a metadata autodoc (mdoc) file with a section for each input file to be aligned and stacked. This file is an alterna- tive way to get basic header information for the output file, as well as tilt angles into the extended header. In addition, if there are no input filenames entered as arguments, input file- names will be obtained from all of the sections in the mdoc file with "SubFramePath" entries. However, the paths in those entries are ignored; the frame files must all be in the current directory unless the -path option is entered with an alternative path. This capability is useful for bidirectional tilt series or if a Record image was acquired more than once at a tilt angle, since only the frame file for the last Record image will be used. If input filenames are entered as arguments, there must be at least as many sections in the mdoc file as input files if tilt angles are to be obtained from the mdoc file (i.e., if -tilt is not entered). -path (-pat) OR -PathToFramesInMdoc Text string Current path to the frame files listed in an mdoc file, when these are being used as the input filenames. If this option is not entered, the program must be run in the directory where the frames are located to access files listed in an mdoc file. -ignore (-ig) OR -IgnoreZvaluesInMdoc Take sections in order from the mdoc file instead of by Z value. With this option, mdoc file sections can be removed or rear- ranged to control which frame files are stacked. Otherwise, sections must exist for all Z values being accessed, starting at 0. -adjust (-ad) OR -AdjustAndWriteMdoc Correct entries in the input mdoc file for changes in image size, binning, pixel size, or data mode, and write a new file with the name of the output file plus ".mdoc". This option has no effect unless an mdoc is entered. -tilt (-ti) OR -TiltAngleFile File name File with tilt angles to insert into the header of the output file. The file should have one tilt angle per line, and must have at least as many angles as frame files being stacked. Tilt angles will be placed into the extended header in the UCSF/FEI format, one floating point value per section. With this entry, tilt angles will not be used from a corresponding stack or mdoc file. -xfext (-x) OR -TransformExtension Text string Extension for output file(s) with image transformations having shifts in columns 5 and 6. One file will be produced for each input file, with the input file extension replaced by the given extension. These files have the absolute shifts being applied to each frame, not relative shifts between successive frames. -frc OR -FRCOutputFile File name Output file for Fourier ring correlations between sums of even and odd frames, which are computed when a sum is produced. The file will have a series of lines, each with the file number, the frequency at the center of ring, and the correlation coeffi- cient. When a GPU is used, the program may not compute the FRC if there is only enough memory to sum into one buffer on the GPU instead of two. -ring (-ri) OR -RingSpacingForFRC Floating point Spacing between the rings of the Fourier ring correlation, in cycles/pixel of the summed images. The default is 0.005, which is needed for resolving closely spaced CTF oscillations. Smaller values like 0.02 - 0.05 will provide more averaging for situations with more widely spaced oscillations. See the sec- tion below, Evaluating and Visualizing Differences with FRC Curves. -gain (-ga) OR -GainReferenceFile File name Gain reference for normalizing unprocessed or dark-subtracted frames. The gain reference should be a floating point file with a mean of 1. If this option is entered, it supercedes a gain reference found in the extended header of the frame files. -rotation (-ro) OR -RotationAndFlip Integer Rotation and flip operation that needs to be applied to the gain reference to match the orientation of the frames being cor- rected. Enter a number from 0 to 7 by taking the rotation angle counterclockwise divided by 90, plus 4 for a flip around the Y axis before the rotation. (This corresponds to the RotationAnd- Flip property used in SerialEM for a K2 camera, but it is also possible to save such frames without the rotation and flip.) Enter -1 to have this number taken from an "r/f" entry in the title of the first input file. -dark (-da) OR -DarkReferenceFile File name Dark reference to be subtracted before multiplying by a gain reference when the frames are saved as unprocessed data. -defect (-def) OR -CameraDefectFile File name File of camera defects to correct. The defect file is put out by SerialEM for versions of DigitalMicrograph from GMS 2.3.1 and higher when frames are not gain-normalized. The program will determine the binning of the image relative to these defect coordinates by assuming that the images are more than half the camera size. It will decide to scale the coordinates in the defect list up by 2 if necessary for super-resolution frames. These decisions will be reported and can be overridden with the next two options. -double (-do) OR -DoubleDefectCoords Scale camera defect coordinates by 2 if they are not already scaled. This option should not be needed. -imagebinned (-im) OR -ImagesAreBinned Floating point Binning of images, which could be needed for defect correction if frames are not bigger than half the camera size. -truncate (-tru) OR -TruncateAbove Two floats Replace values above the given limit with the mean of surround- ing values. The mean is taken from pixels in a 7x7 area, excluding the center 9. -binning (-bi) OR -AlignAndSumBinning Two integers Image reductions to apply when aligning and when summing. The default for summing is 1, and the default for aligning is chosen by seeing which binning out of 2, 3, 4, 6, or 8 brings the size being correlated closest to 1250. If -test is entered with one or more binnings, an entry for alignment binning is ignored. -mode (-mo) OR -ModeToOutput Integer Mode for output image file: 0 for bytes, 1 or 6 for signed or unsigned integers, or 2 for floating point. The default is to use the mode of the input file unless is it 0, in which case the default is to use mode 1. -scale (-sc) OR -ScalingOfSum Floating point Amount to scale summed values before output. The default is no scaling; however, note that reduction of the output size will scale the data up by the square of the reduction factor. Such scaling mimics the summing of counts by binning during data acquisition. -total (-to) OR -TotalScalingOfData Floating point Search the titles of the first input file for a scaling factor, and apply an additional scaling to the summed values to bring the total scaling to the amount entered. If no scaling is found in a title, it is assumed to be 1 and the full scaling specified here will be applied. A default total scaling of 30 will be applied if the input data consists of bytes or 4-bit values, gain normalization is being applied, and the output mode is not set to 2. -frames (-fra) OR -StartingEndingFrames Two integers First and last frame in each file to align and sum, numbered from 1. The default is to do all the frames. The starting frame number must be no bigger than the smallest number of frames in any file. -group (-gr) OR -GroupSize Integer Number of frames to sum for correlations between groups of frames; such groups are needed when correlations between single frames are too noisy to give reliable results. Since correla- tions are done only between non-overlapping groups, grouping reduces the number of measured shifts from which each frames's shift can be determined. Frames will be grouped in one of two ways: in non-overlapping blocks, or in successive overlapping groups, referred to as "slide grouping". The latter is used only when the total number of frames is large enough to allow a linear equation to be fit to the shifts; for example, this requires 8 frames for a group size of 3. With slide grouping, each frame will have a different shift. If the program has to drop back to block grouping, all frames in a block will have the same shift. -pair (-pai) OR -PairwiseFrames Integer Number of frames or groups to use in successive pairwise align- ments, or 0 to use alignment to a cumulative reference of already-aligned frames. The default is 7. With an entry of -1 or a value equal to or bigger than the number of frames, the program will align all pairs of frames or groups and do a single fit. With an entry of -2, -3, or -4, it will do pairwise align- ments among sets of one-half, one-third, or one-fourth of the frames or groups, but with a minimum of 7 included. -kfactor (-k) OR -KFactorForFits Floating point K factor for robust fitting to pairwise alignments. The default is 4.5; a smaller value will down-weight more outliers in the fits. -reverse (-rev) OR -ReverseOrder Reverse order of processing and start with the last image. This should make very little different when using pairwise align- ments, but is a potentially useful option when using alignment to a cumulative reference, unless there is substantial fixed pattern noise. -shift (-sh) OR -ShiftLimit Integer Limit on distance to search for correlation peak, in unbinned pixels. If the previous frame was aligned to the same reference being aligned to, the center of the region searched corresponds to the peak position for the previous frame. The default is 20. -refine (-ref) OR -RefineAlignment Integer Refine an initial alignment based on pairwise correlations by correlating each frame with an aligned sum of all but that frame. The entry gives the maximum number of iterations that will be run, but iterations will stop if the biggest change in shift falls below a threshold. -rgroup (-rg) OR -RefineWithGroupSums When using group sums for the initial alignment, refine the alignment with group sums as well, instead of single frames. This may be needed if the signal-to-noise ratio is too low even for the correlation between a single frame and the sum of other frames. The shifts converge more slowly in this case, so more iterations may be needed. -stop (-sto) OR -StopIterationsAtShift Floating point Maximum change in shift at which to stop iterating the refine- ment of initial shifts by correlating with the sum of frames. The default is 0.1. -rrad2 (-rr) OR -RefineRadius2 Floating point High frequency filter cutoff (radius 2) for refining the align- ment. The default is to use the same filter that was used to obtain the alignment, or the filter that gave the best overall error value when a hybrid alignment was used. -smooth (-sm) OR -MinForSplineSmoothing Integer Smooth the shifts with a spline curve whose smoothing parameter is found with generalized cross-validation, but only if the num- ber of frames is at least as big as the entered value. This method requires a fairly large number of frames to be reliable; the documentation for the cross-validation code being used sug- gests 20 frames may be needed. Smoothing should not be used with less than 10 frames. For numbers between 10 and ~20, a minority of images may come out slightly worse with smoothing, so it would be advisable to evaluate results with and without smoothing, such as with an FRC. The default is currently 20; enter 0 to disable smoothing. -plottable (-pl) OR -PlottableShiftFile File name Filename for output file with raw and smoothed shifts. The smoothed shifts will be from spline smoothing if it was done, otherwise from local polynomial smoothing. The shifts will be put into the file one per line, starting with a type number of 10 times the file number for the raw shifts or that value plus 1 for the smoothed shifts (e.g., 10 and 11 for the first file). -trim (-tri) OR -TrimFraction Floating point Fraction of image size to trim off each edge for correlations. The default is 0.02, which is the same amount as the padding. -taper (-ta) OR -TaperFraction Floating point Fraction of image size to taper on each edge for correlations. The image is tapered down to the mean at its edge. This taper- ing is usually an important component for reliable correlations. The default is 0.1. If this fraction is set to 0 and there is no trimming, the program obtains the FFT for cross-correlation by extracting it from the FFT of the full image instead of by reducing, padding and tapering the image and taking an FFT of that. The padding/tapering extent of the full image is increased from 2% on each edge to 5% in this case. -antialias (-an) OR -AntialiasFilter Integer Type of filter for image reduction when trimming or tapering. The standard values of 1 to 6 are available as in Newstack, with 1 corresponding to binning. The default is 4 for a Mitchell filter, which seems to be optimal on average for this application. -radius1 OR -FilterRadius1 Floating point Low spatial frequencies in the cross-correlations will be atten- uated by a Gaussian curve that is 1 at this cutoff radius and falls off below this radius with a standard deviation specified by FilterSigma2. Spatial frequency units range from 0 to 0.5. This option is here for the sake of completeness; use Filter- Sigma1 instead of this entry for more predictable attenuation of low frequencies. -radius2 OR -FilterRadius2 Floating point High spatial frequencies in the cross-correlationd will be attenuated by a Gaussian curve that is 1 at this cutoff radius and falls off above this radius with a standard deviation speci- fied by FilterSigma2. Unlike in other applications, this value is entered in frequency units (1/pixel) of the input frames, not of the reduced images being correlated. It is scaled by the reduction before being applied to the reduced image, which means that a particular value will give about the same amount of fil- tering regardless of the binning. The default is 0.06. -sigma1 OR -FilterSigma1 Floating point Sigma value to filter low frequencies in the correlations with a curve that is an inverted Gaussian. This filter is 0 at 0 fre- quency and decays up to 1 with the given sigma value. However, if a negative value of radius1 is entered, this filter will be zero from 0 to |radius1| then decay up to 1. The default is 0.03, expressed in frequency units (1/pixel) of the reduced images being correlated. -sigma2 OR -FilterSigma2 Floating point Sigma value for the Gaussian rolloff below and above the cutoff frequencies specified by FilterRadius1 and FilterRadius2. Like radius 2, this value is entered in unbinned frequency units and will be scaled by the reduction being applied for alignment. The default is 0.0086. -test (-te) OR -TestBinnings Multiple integers Set of binnings at which to test pairwise alignments. Each bin- ning involves a separate pass through the frames of an input file, plus another pass to make a sum at the end with the best binning. -vary (-v) OR -VaryFilter Multiple floats Set of radius2 filter values to test. This option can be entered separately for each binning, but that should not be nec- essary, for two reasons. First, because these values are in unbinned frequency units, each one would have about the same effect for the different binnings. Second, there is little cost to applying extra filters, because different filters are applied to a small subarea of an unfiltered correlation. Sigma2 will automatically be set for each filter so that it is in the same ratio to the particular radius2 value as the basic sigma2 is to the basic radius2 value. Thus, to provide a different set of sigma2 values for these filters, you need to enter -radius2 or -sigma2. (Successive entries accumulate) -assess (-as) OR -AssessWithFrames Two integers Starting and ending frame to use during testing of parameters with multiple binnings or filters, numbered from 1. The default is to use all frames. -good (-go) OR -GoodEnoughError Floating point Combined error measure that is sufficient to stop testing bin- nings for a particular set of frames. This error is a weighted sum of two measures: the mean of the weighted mean residual errors from the set of fits, and the maximum weighted residual seen in any fit. (These are described as Weighted residual mean and max max in the section below, Text Output with Single Param- eter Settings.) The latter is weighted by the value entered by -weight option. -weight (-w) OR -MaxResidualWeight Floating point Weighting applied to maximum weighted residual from all fits when combining with the mean weighted residual to obtain a sin- gle error measure. The default is 0.1. -hybrid (-hy) OR -UseHybridShifts Derive a set of shifts while alignments are being done by using the results from the best filter after each individual fit. By default, when given multiple filters to test, the program will decide on the best overall filter after all fits are done and use the shifts from that filter. This option will reduce memory requirements, unless the alignment is being refined at the end. -nosum (-n) OR -NoSumsOutput Do alignments without making a summed image; no output filename should be entered. -gpu (-gp) OR -UseGPU Integer Use the GPU (graphical processing unit) for computations if pos- sible; enter 0 to use the best GPU on the system, or the number of a specific GPU (numbered from 1). If GPU memory is a limita- tion, the program will prioritize forming the sum on the GPU over doing the alignment there, and will compute odd and even sums as the lowest priority. If alignment becomes possible on the GPU only by deferring the summing, and if CPU memory is suf- ficient for that, then it will keep the entire stack of frames in memory and sum them after aligning. -memory (-me) OR -MemoryLimitGB Floating point Limit on memory usage in gigabytes. When the memory usage would exceed this amount for a set of input frames, the program will run through the data in two passes, one to get the alignment and one to make the sum. This may not be possible if the -assess option is used. The default is 12 GB. -debug (-deb) OR -DebugOutput Integer This entry is a sum of flags for particular kinds of output: the last digit controls the printed output from the program (1 for basic output, 2 for more verbose output from each fit); 10 will give timing output, 100 will give output of cross-correlations during initial alignment; 1000 will give output of refining cor- relations; 10000 will give output of the sums of odd and even frames. Images are output with names "faimg-n.mrc" where n is increased sequentially. -help (-he) OR -usage Print help output -StandardInput Read parameter entries from standard input EXAMPLES The example commands below would all be entered in one line, or as mul- tiple lines with a backslash at the end of every line except the last, as they are shown here due to line length limits. Most options could be abbreviated more than they are. Frames from Tilt Series An image from a tilt series with ~4Kx~4K images might align well with a command as simple as alignframes Feb21_10.43.50.mrc Feb21_10.43.50_ali.mrc where you can add "-gpu 0" to use a GPU on this or any of the following commands. This will use a default binning of 3 and filter cutoff of 0.06/pixel. If you already know that you want a different binning (say, 4) or filter (say, 0.05), then the easiest way to enter this is with alignframes -bin 4,1 -vary 0.05 Feb21_10.43.50.mrc \ Feb21_10.43.50_ali.mrc where using -vary will scale the sigma for the filter rolloff automati- cally. If the frame files for your tilt series list in order from one high tilt to the other and the tilt series file is available (say, cell4.mrc), alignframes -bin 4,1 -vary 0.05 -stack cell4.mrc Feb21_*.mrc \ cell4_ali.mrc will process the entire tilt series and move information from the header of the tilt series file into the new file. However, if the tilt series was taken bidirectionally and the frames files do not list in order, or if the there were any duplicate images taken, then you want to supply the .mdoc file instead of the stack and input files: alignframes -bin 4,1 -vary 0.05 -mdoc cell4.mrc.mdoc cell4_ali.mrc which will process the frames in the right order and place essential information in the output file header. To explore filter settings, it is best to run on a collection of files, perhaps about 20. Suppose "Feb21_10.4*.mrc" lists the desired number of files, you could explore a range of filters at binning 3 with alignframes -vary 0.05,.06,.08,.1 -nosum Feb21_10.4*.mrc The summary at the end will tell which filter gave the lowest mean residual. Be sure to scan through the results and look at whether either the mean residual or the distance moved declines a lot with increased filtering, which may be a sign that fixed pattern noise is a problem. Also note whether the maximum unweighted residual is high. If it goes over 5-10 pixels, you can problem reduce the occurrence of bad fits by restricting the allowed shift with "-shift 10" or even "-shift 5". Obviously, you do not want to set this limit lower than the maximum possible shift from one frame to the next. Evaluating and Visualizing Differences with FRC Curves Suppose you want to use FRCs to evaluate the difference between two conditions, such as filter 0.05 versus 0.06, or with and without refinement. You need to make output sums to get an FRC. alignframes -vary 0.05 -frc sample.frc Feb21_10.4*.mrc sample.mrc alignframes -vary 0.05 -ref 5 -frc sample-refine.frc \ Feb21_10.4*.mrc sample.mrc The ".frc" files have all of the FRC curves for a run, each with a "type" number equal to its file number in the alignframes text output. You can plot one curve (say, the fourth one) with onegenplot -ty 4 -sym 0 sample.frc To compare FRC's, run subtractcurves -ave 10 -rad 2 sample-refine.frc sample.frc \ refine-diff.dat which will subtract each pair of corresponding curves and average over 10 points to reduce the noise in the differences. You could have avoided the averaging here with the option "-ring .05" to alignframes, which will make the FRC curve much less noisy in Onegenplot. (The default FRC output is good for seeing the CTF oscillations in single- particle data, but a larger ring size will often be more appropriate for tilt series.) You could examine each difference in a separate graph with onegenplot -ty 1 -sym 0 refine-diff.dat & onegenplot -ty 2 -sym 0 refine-diff.dat & etc. But to assess the overall benefit of the difference in conditions, it is more efficient to look at a lot of points at once: onegenplot -ty 1,2,3,4,5,6,7,8,9,10 refine-diff.dat & onegenplot -ty 11,12,13,14,15,16,17,18,19,20 refine-diff.dat & etc. When looking at these curves, treat any improvements below about 1.3 times the cutoff frequency with caution because they could arise from overfitting. Improvements past this point would not reflect overfit- ting, although if the fit is locking in on fixed pattern noise, it would boost high frequency correlations. Frames for Single-Particle Reconstruction For single-particle data, the best initial approach is to fit to pair- wise shifts among about half of the frames. So for a set of 34 frames, if you wanted to do an initial assessment with two filters and look at the FRC, alignframes -vary 0.05,.06 -pair 17 -frc test.frc \ Feb15_10.13.15.mrc test.mrc which will use a default binning of 6 for ~8K frames. A more flexible command for fitting to shifts among half the frames, regardless of the exact frame count, is alignframes -vary 0.05,.06 -pair -2 -frc test.frc \ Feb15_10.13.15.mrc test.mrc If you had some indication that fixed pattern noise was a problem, or if the data seemed particularly noisy, giving mean residuals above 1, you could use pairwise shifts among all frames with alignframes -vary 0.05 -pair -1 -frc test.frc Feb15_10.13.15.mrc \ test.mrc For even noisier situations, you can use grouping to reduce the mean residual and improve the fits: alignframes -vary 0.05 -pair -1 -group 3 -refine 5 -frc test.frc \ Feb15_10.13.15.mrc test.mrc where the refinement at the end may or may not be beneficial, depending on just how noisy the single frames are. The option specifies up to 5 iterations of refinement, which is almost always sufficient unless refining the data as groups with the -rgroup option. Processing of Raw Frames If you have collected dark-subtracted data from a K2 camera into MRC or compressed TIFF files, you can process them with alignframes -gain SuperRef_Feb20_20.26.21.dm4 -scale 39.3 -rot -1 \ -defect defects_Feb20_20.26.21.txt -pair 17 Feb21_10.32.56.tif \ Feb21_10.32.56_ali.mrc using the gain reference copied into the data directory and the defect file written there by SerialEM. The option "-rot -1" specifies that the gain reference needs to be rotated by the value for rotation and flip indicated by "r/f" in a label in the file header. MRC files started having this label in SerialEM 3.4, TIFF files in SerialEM 3.5, so check either kind of file by running "header" on the file. If the label does not show "r/f", the number to use in the "-rot" entry would be the RotationAndFlip value in the SerialEM properties file. If frames were saved without rotation, do not include this option. Either the "-scale" option as shown here, or the "-total" option with a total scaling, or the option "-mode 2", should be entered to preserve the precision of the data when they are written. The program will use a default total scaling of 30 when normalizing if the input data are byte values and data are not being written as floating point with "-mode 2". However, it is probably better not to rely on the default and instead scale the electron counts by the same factor that is applied in Seri- alEM (39.3 in this example, a typical value when not dividing by 2). You could add an option such as "-trunc 7" to replace values above 7 with the local mean. Use the command clip hist Feb21_10.32.56.tif to see the distribution of pixel values. There will be a point at which the values stop falling rapidly and then have a long tail; removal of values above there is indicated. Assessing Fixed Pattern Noise This procedure is not convenient, but does work. The first step when fixed pattern noise is suspected to affect the correlations is to out- put the individual shifts with the option "-deb 2". Use only a single filter for simpler output. You will see a series of lines like 1 to 0 -1.95 -0.17 near 0.00 0.00 2 to 0 -1.57 0.02 near -1.95 -0.17 2 to 1 -0.60 -0.77 near 0.00 0.00 3 to 0 -1.79 0.44 near -1.90 -0.30 3 to 1 0.44 -1.32 near -0.28 -0.45 3 to 2 0.36 0.02 near 0.00 0.00 Each line shows a shift between a pair of frames (numbered from 0), then the shift that it was assumed to be "near" when comparing with the maximum allowed shift. If you see many shifts close to 0 (but not exactly 0), then they could be due to fixed pattern noise. (shifts of exactly 0 might indicated thatthe maximum shift was too low. Try with some higher filter values and see if the incidence of near- zero values increases. Set a binning lower than the default (e.g., 4 instead of 6, 2 instead of 3) if necessary to make a filter value be below Nyquist. In the above example, a higher filter gave: 1 to 0 -0.80 -0.22 near 0.00 0.00 2 to 0 -0.71 0.07 near -0.80 -0.22 2 to 1 -0.08 -0.38 near 0.00 0.00 3 to 0 -0.92 0.42 near -0.77 -0.15 3 to 1 0.17 -0.81 near -0.02 -0.15 3 to 2 0.30 -0.29 near 0.00 0.00 To visualize the correlation peak from fixed pattern noise, you need to run the program with options that will make it save unbinned correla- tions with no high-frequency filtering. This is available only when not using a GPU and when specifying more than one filter. alignframes -bin 1,1 -vary 0.05,0.06 -deb 102 -frame 1,5 -nosum \ Feb15_10.13.15.mrc There will be series of lines indicating the names of saved images and the measured shift between a pair of frames. Pick one with more than a few pixels of shift, which may end up being the one between frames 0 and 4. For example: Saved lf correlation image frame 4 in ./faimg-18.mrc mean -0.00 Saved correlation image frame 4 in ./faimg-19.mrc mean -0.03 4 to 0 -0.59 -2.70 near -1.90 0.14 The last line shows the shift between frames 4 and 0, then the shift that it was assumed to be "near" when comparing with the maximum allowed shift. You want to examine the "lf correlation image" in "faimg-18.mrc" ("faimg-19.mrc" is a small subarea with high-frequency filtering, centered on the expected shift, so is not suitable). You can open this file directly in 3dmod, but it will easier to load a cen- tered subarea. Run "header faimg-18.mrc" to see its size, NX and NY. Determine these values: xst = NX / 2 - 200 xnd = NX / 2 + 199 yst = NY / 2 - 200 ynd = NY / 2 + 199 and open the file with 3dmod -x xst,xnd -y yst,ynd faimg-18.mrc Open a slicer window and close it; this will place the current point marker on the middle pixel. Zoom the Zap window up to about 6 and turn off the high-quality display (the checkerboard) to see individual pix- els. Adjust black and white levels to 0 and 255. The degree to which this central pixel stands out indicates the amount of fixed pattern noise. Increase the black level to see just how much it stands out. If you can see 4 brighter pixels around the central one, the situation is particularly bad. You may wonder where the real correlation peak went! It is very dif- fuse and the high-frequency filtering is essential. Use Edit-Image- Process to open the processing dialog and select the Fourier filtering panel. Set the high frequency cutoff to your filter value and the falloff to one-seventh of that and press "Apply" to filter. The real peak will now be prominent. If it is close to the origin, you might see its position change with different filters as it merges with the fixed pattern peak. AUTHOR David Mastronarde SEE ALSO Email bug reports to mast at colorado dot edu. IMOD 4.9.10 alignframes(1)