Content-type: text/html Manpage of usePreviousPca


Section: User Commands (1)
Index Return to Main Contents


usePreviousPca - decompose aligned subvolumes using previously determined principal components  


usePreviousPca pcaMatFile prmFile iter numPart
usePreviousPca pcaMatFile prmFile iter numPart avgFilename
usePreviousPca pcaMatFile prmFile iter numPart avgFilename autoclose
usePreviousPcaSP pcaMatFile prmFile iter numPart
usePreviousPcaSP pcaMatFile prmFile iter numPart avgFilename
usePreviousPcaSP pcaMatFile prmFile iter numPart avgFilename autoclose  


Clustering / unsupervised classification of aligned subvolumes in PEET is typically done by wedge-mask difference (WMD) corrected principal component analysis (PCA, see programs pca and pcaSD), followed by k-means clustering (see program clusterPca).

PCA is compute and memory intensive, however, and can become time consuming or impractical for large data sets. A reasonable alternative in these cases is to perform PCA on a representative subset of the data, and then to invoke usePreviousPca (or its single-precision variant, usePreviousPcaSP) in place of pca or pcaSP on the full dataset prior running clusterPca.

The path / name of the *.mat file containing the results of the previous principal component analysis. (Caution: usePreviousPca generates an output *.mat prefixed with pca<numParticles> as described below. Make sure that the input pcaMatFile is located in a different directory or is named so that it will not be over-written!)
The name of the parameter file. See the PEET and averageAll man pages and below for descriptions of parameter file settings. The parameter file must contain the same settings previously used to align and average the subvolumes; it may also contain additional parameters described below to specify the behavior of usePreviousPca.
An integer specifying the alignment iteration number to analyze.
An integer specifying the number of particles to analyze.
avgFilename (optional, but highly recommended)
The name of the MRC file containing the averaged subvolume to subtract when computing wedge-masked differences. Typically, this should be an average containing numParticles subvolumes. If this parameter is omitted, the wedge-masked difference correction will not be performed. Specifying averageFilename is not effective and can be omitted when pcaMethod (below) is 3.
Normally, usePreviousPca plots a histogram of the coefficients along the first eigenvectors and waits for the user to manually close this window before exiting. If autoclose is non-zero, it will exit on completion without waiting.

Parameters below are specific to pca / usePreviousPca:

pcaSzSubvol = integer vector of length 3
The size (less than or equal to szVol) of a central subvolume to analyze. Default is to use the full particle size (szVol).
pcaMethod = < 1 | 2 | 3 >
pcaFnParticleMask = string
If desired, restrict analysis to specific region(s) within the subvolue by specifying the name of an MRC volume of size szVol containing a binary mask with non-zero entries indicating the voxels to be analyzed and 0's the voxels to ignore. (Default = no mask; i.e. use all voxels).
pcaNumEigenimages = <int>
The number of eigenimages to save. (Default = 4).
pcaMaxNumComponents = <int>
An upper limit of the number of principal components and corresponding coefficients/features to be saved (Default = 20). Saving many more components that will be used for clustering will consume large amounts of disk space and result in long write times.

If <basename> is the output basename (fnOutput) specified in the .prm file, the primary output of usePreviousPca will be a file pca<numParticles>_<basename>.mat containing the results of the decomposition and other data required for program clusterPca. Depending on the volume size, number of particles, and the number of components saved this file can require 10's of gigabytes.

Additionally, usePreviousPca will produce a histograms presenting the distributions of the first 8 coefficients/features. These graphs are helpful for choosing features to use for subsequent clustering. A copy in pdf format will also be written to file usePreviousPcaFig<1>.pdf. Other formats may be selected from the figure's pull-down file menu.

NOTES: like pca, this program is both compute and memory intensive. A fast, multi-core machine with at least 32 GB of ram is suggested for typical applications (e.g. 600 particles of size 140x140x140 voxels). Specific requirements scale roughly with the product of volume size and number of particles. Insufficient memory will result in thrashing (the system will become unresponsive while showing very low cpu usage) or an error message. When full resolution is not required, prior binning may make a previously unworkable situation tractable, and will also reduce noise sensitivity. Alternatively, program usePreviousPcaSP is functionally identical to except that key data structures and computations are single- rather than double-precision, reducing memory requirements by nearly a factor of 2.  


John Heumann  


PEET(1), alignSubset(1), averageAll(1), removeDuplicates(1), clusterPca(1), and pca(1).




This document was created by man2html, using the manual pages.
Time: 18:16:05 GMT, January 11, 2021