PCA is compute and memory intensive, however, and can become time consuming or impractical for large data sets. A reasonable alternative in these cases is to perform PCA on a representative subset of the data, and then to invoke usePreviousPca (or its single-precision variant, usePreviousPcaSP) in place of pca or pcaSP on the full dataset prior running clusterPca.
Additionally, usePreviousPca will produce a histograms presenting the distributions of the first 8 coefficients/features. These graphs are helpful for choosing features to use for subsequent clustering. A copy in pdf format will also be written to file usePreviousPcaFig<1>.pdf. Other formats may be selected from the figure's pull-down file menu.
NOTES: like pca, this program is both compute and memory intensive. A fast, multi-core machine with at least 32 GB of ram is suggested for typical applications (e.g. 600 particles of size 140x140x140 voxels). Specific requirements scale roughly with the product of volume size and number of particles. Insufficient memory will result in thrashing (the system will become unresponsive while showing very low cpu usage) or an error message. When full resolution is not required, prior binning may make a previously unworkable situation tractable, and will also reduce noise sensitivity. Alternatively, program usePreviousPcaSP is functionally identical to except that key data structures and computations are single- rather than double-precision, reducing memory requirements by nearly a factor of 2.