splitbatch(1) General Commands Manual splitbatch(1) NAME splitbatch - Produce multiple command files for running batchruntomo in parallel SYNOPSIS splitbatch -num # [-max #] batchruntomo_command_file DESCRIPTION Splitbatch is a Python script that will take a command file for running the Batchruntomo program on multiple tilt series and produce multi- ple command files (jobs), with one data set per file, to be run in par- allel by Processchunks. The input command file should contain CPUMachineList and GPUMachineList entries to Batchruntomo with the full collection of resources avail- able for all of the jobs. Splitbatch will make sure that the CPUMa- chineList specifies at least two machines, but does not otherwise deal with the CPUs. Instead, Processchunks must be given this full pro- cessor list as well as the maximum number of jobs to run in parallel in its -M option. It will divide up the CPUs so that each job is directed to use a specific set of CPUs, and so that the first machine assigned for each job, the one the job is run on, has as many cores as possible to use for programs parallelized with OpenMP. In this way, competition between jobs for CPU cores should be minimized. Since GPUs are more scarce and used only occasionally, the GPU list is not subdivided and assigned either in the command files or by Pro- ceschunks(1). Instead, when a job needs to run a step with GPUs, it passes the full list of available GPUs to the Gpuallocator program along with the maximum number of GPUs that it would like to use. The program keeps track of GPUs allocated to the set of jobs and responds with whatever GPUs are available, from 1 to the given maximum. If none are available, Batchruntomo will keep trying indefinitely, issuing periodic warnings when the wait is long. It is not clear whether the best strategy is to set the maximum to the full number available or not. If it is set to the full number, then every job will get all the GPUs when they are free and other jobs will have to wait when they are already reserved. Otherwise, a job is much less likely to have to wait but will get fewer GPUs, perhaps only one, while the rest of the GPUs may become idle when their jobs finish. With this arrangement for allocating CPUs dynamically, the jobs must be run with Processchunks at the command line. All of the jobs can be controlled through a single file that Batchrun- tomo(1) checks for quit, pause, or finish signals. The name of this file is the rootname of the bacth input file with extension ".cmds". To make all jobs quit as soon as possible, use the command echo Q > rootname.cmds or use F instead of Q to make them all quit after finishing their cur- rent data sets. Using a Cluster Queue The management of resources is completely different when running Batchruntomo in parallel on a cluster queue. Four different differ- ent arrangements for use of cluster resources are supported: 1) A single CPU per run. The top-level Processchunks puts each Batchruntomo job on the queue up to the number to run in parallel. The batch command file has information about the queue command and max- imum number of jobs to submit at once, subject to override by environ- ment variables set by Processchunks when it runs it. selections to take effect. Batchruntomo runs single command files directly and runs operations that can be parallelized in chunks on the queue with Processchunks. Everything is limited to a single thread, so opera- tions parallelized with multi-threading (most of the single command files) will run more slowly than usual. This mode provides efficient use of CPU resources on the cluster but may incur less efficient use of the file system and memory caching if many data sets are run at once. 2) Multiple CPU cores and one or more GPUs per run. This requires that the Queuechunk command being used (generally obtained by Etomo from the cpu.adoc) has a resource request for a specific number of cores and possible GPUs. Batchruntomo runs single command files directly and allows them to use all the cores available for multi- threaded operations. Operations on a CPU that can be parallelized in chunks are run directly with Processchunks, up to given number of cores. Operations on a GPU are also run directly in chunks if more than one GPU is available, or as a single command file if there is only one GPU. This mode is comparable to running Batchruntomo on a sin- gle computer, where sometimes the cores and more often the GPU are fully utilized. 3-1 and 3-2) A separate queue for GPU operations, with one GPU per task. The main queue would not have a GPU allocated. Steps using a GPU are run on this secondary queue. Because Batchruntomo has to submit each GPU operation to run on this queue, and no single IMOD process uses more than one GPU, only a single GPU can be taken advantage of by each command file or chunk being run. When more than one GPU is avail- able overall on this queue, operations will be spit into chunks and each chunk submitted to the queue; otherwise a single command file will be submitted. In mode 3-1, only a single core is available and non-GPU operations are run as for mode 1. In mode 3-2, multiple cores are available to each job and non-GPU operations are run as for mode 2. This mode allows more efficient use of GPUs on the cluster, provided that the cluster is configured to allocate just one GPU from a multi- GPU node when the queue submission requests that. If not, mode 2 is best. Each of these modes involves a particular set of options give to Pro- cesschunks(1) and Batchruntomo. The options in the batch command file will govern if the command file is run outside of Process- chunks(1), but if it is run by Processchunks, the options provided to the latter override the ones in the command file. Mode 1) The queue is specified to Processchunks by -q: the maximum number of queue entries The machine list, a queuechunk command and to Batchruntomo by -QueueCommand: the queuechunk command -MaxJobsOnQueue: the maximum number of queue entries Mode 2) The queue is specified to Processchunks by -q: the maximum number of queue entries The machine list, a queuechunk command -JC: the number of cores -JG: the number of GPUs and to Batchruntomo by: -CoresPerClusterJob: the number of cores -GPUsPerClusterJob: the number of GPUs Modes 3-1 and 3-2) The secondary queue is specified to Process- chunks(1) by -SQ: the queuechunk command -SN: the maximum number of queue entries to submit at once and to Batchruntomo by -GPUQueueCommand: the queuchunk command -MaxGPUJobsOnQueue: the maximum number of queue entries The primary queue is specified to both programs as for modes 1 and 2, respectively, except that for mode 2, only cores should be specified, not GPUs. To enforce consistent usage, Splitbatch will object if the Batchrun- tomo(1) command file has either of the CPUMachineList or GPUMachineList options for non-cluster processing included along with the main queue options, -QueueCommand or -CoresPerClusterJob. OPTIONS When the program is invoked with no arguments or with -h, it gives a usage statement that shows the default values for these options as well as the currently allowed abbreviations to the short option names. -comfile OR -CommandFile File The input Batchruntomo command file name, with or without its extension. This entry is required. The command file can be entered either specifically with this option or as a non-option argument. -maxgpu OR -MaxGPUsForOneJob value Maximum number of GPUs to request for a step that needs a GPU. The default is 4. -help OR -usage Print a usage statement and exit. CLUSTER EXAMPLES This section lists some Processchunks commands for running in the different modes, and also shows how the entries should appear in cpu.adoc Mode 1 - a simple queue named "cluster": processchunks -M 4 -q 32 -Q cluster 'queuechunk -t pbs' \ batchJul25-193628_BB1 cpu.adoc: [Queue = cluster] command = queuechunk -t pbs number = 300 Mode 2 - a queue where multiple cores and GPU's can be requested: processchunks -M 4 -q 6 -JC 8 -JG 4 'queuechunk -t slurm -l -c8,-n1,--partition=sgpu' \ batchJul25-193628_BB1 cpu.adoc: [Queue = BatchGPU] command = queuechunk -t slurm -l -c8,-n1,--partition=sgpu coresPerClusterJob = 8 gpusPerClusterJob = 4 number = 8 Such a configuration is best for accessing GPUs when the cluster software is not configured to give only on GPU instead of all the GPUs on the node. Here the "number" would be set to the number of such GPU nodes available on the cluster. The coresPerClusterJob should be set to the same number as whatever entry in the queue command specifies the number of cores to allocate ("-c8" here). Note that the attributes coresPerClusterJob and gpusPerClusterJob were originally named coresPerNode and gpusPerNode prior to 4.12.40. Both were misleading because such a job may not get an entire cluster node, and using coresPerNode in this context conflicted with the previous usage of it. coresPerNode is still used when specifying a slurm queue running in exclusive resource allocation mode, but such a queue is not suitable for multi-core processing within a batch job and will be treated by Etomo as a simple queue having one core per job. Mode 3-1 - a primary queue with one core and a secondary queue with one GPU: processchunks -M 4 -q 32 -SN 4 -SQ 'queuechunk -t slurm -l -c1,-n1,-G1,--partition=aa100 'queuechunk -t slurm -l -c1,-n1,--partition=amilan' \ batchJul25-193628_BB1 cpu.adoc: [Queue = BatchPrimary] command = queuechunk -t slurm -l -c1,-n1,--partition=amilan number = 300 [Queue = alpine-1GPU] command = queuechunk -t slurm -l -c1,-n1,-G1,--partition=aa100 coresPerClusterJob = 1 gpusPerClusterJob = 1 number = 24 Here the number might be the total number of GPUs available (e.g., 8 machines with 3 each), provided that the cluster software can assign each one separately. With "-SN 4", each batch job will submit up to 4 chunks to the queue with GPUs. If the queue software does allow getting more than one core per job, it might be preferable to use the next setup instead: Mode 3-2 - a primary queue with six cores and a secondary queue with one GPU: processchunks -M 4 -q 32 -SN 4 -SQ 'queuechunk -t slurm -l -c1,-n1,-G1,--partition=aa100 'queuechunk -t slurm -l -c6,-n1,--partition=amilan' \ batchJul25-193628_BB1 cpu.adoc: [Queue = BatchCores] command = queuechunk -t slurm -l -c6,-n1,--partition=amilan coresPerClusterJob = 6 number = 50 [Queue = alpine-1GPU] command = queuechunk -t slurm -l -c1,-n1,-G1,--partition=aa100 coresPerClusterJob = 1 gpusPerClusterJob = 1 number = 24 If you define any multicore queues, note that these are useful only for parallel Batchruntomo processing. For all other parallel processing in Etomo, you need to define an additional simple queue with only one core allocated per job, since that is all that will be used in those situations. FILES The command files are given the same root name as the input file and are numbered from "-001". There is a finishing file to remove command files, but the log files are left. AUTHOR David Mastronarde <mast at colorado dot edu> SEE ALSO processchunks, batchruntomo splitbatch(1)