Difference between revisions of "How to submit jobs on Combo"
m |
m |
||
Line 37: | Line 37: | ||
Estimating the number of nodes and the number of CPU cores is equally important. Requesting more nodes or more CPU cores than the job needs will remove these resources from the available pool. | Estimating the number of nodes and the number of CPU cores is equally important. Requesting more nodes or more CPU cores than the job needs will remove these resources from the available pool. | ||
+ | |||
+ | |||
+ | ==Specify the nodes to be used== | ||
+ | You may specify which node to be used by options in the submission script. | ||
+ | |||
+ | Example: | ||
+ | qsub -l nodes=compute-0-2:ppn=4+compute-0-3:ppn=3 | ||
+ | |||
+ | This requests 4 processors on compute-0-2 and 3 processors on compute-0-3. | ||
==Submit Your Jobs== | ==Submit Your Jobs== |
Latest revision as of 17:18, 10 September 2014
Contents
[hide]Queuing System
For the efficient use of the cluster, two Monitoring/Job Management software (PBS/Torque and Maui) have been installed.
After logging into to the cluster, the user is on the master node. When a program is run, it is also immediately run on the master. This is the "interactive mode", which is convenient for running simple commands like ls, vi, etc. or for editing/compiling a program. But, long computing jobs should be submitted through the queuing system. The submitted job will be in a queue waiting for its turn, then will be sent to one or more compute node(s), which the job will have dedicated access to until it finishes. Therefore, the job will run faster and the cluster will be more efficiently utilized.
Sample PBS job scripts
PBS job script for Parallel OPENMPI
You'll need to use PBS (Portable Batch System) scripts to set up and launch jobs on any cluster. While it is possible to submit batch requests using an elaborate command line invocation, it is much easier to use PBS scripts, which are more transparent and can be reapplied for sets of slightly different jobs. A PBS script performs these two key jobs:
1. It tells the scheduler about your job, such as:
- The name of the program executable
- How many CPUs you need and length of time to run the job
- What to do if something goes wrong
2. The scheduler will 'run' your script when it comes time to launch your job.
A typical PBS script looks like this:
#!/bin/bash #PBS -l nodes=1:ppn=1,walltime=5:00:00 #PBS -N jobname #PBS -M yourID@my.cityu.edu.hk #PBS -m abe cd /your/job/directory/ ./Command &> output exit 0;
The first "#PBS -l" line tells the scheduler to use one node with one processor per node (1 CPU in total), and this job will abort if not completed in 12 hours. You should put your job's name after "#PBS -N". If you would like to receive emails regarding to this job, you may leave your email address after "#PBS -M". The "#PBS -m abe" asks the system to email you when the job Aborts, Begins, and Ends.
Estimating Resources Requested
Estimating walltime as accurately as possible will help MOAB/Torque to schedule your job more efficiently. If your job requires 10-20 hours to finish do not ask for a much longer walltime. Please review available queues and queue parameters offered by Combo.
Estimating the number of nodes and the number of CPU cores is equally important. Requesting more nodes or more CPU cores than the job needs will remove these resources from the available pool.
Specify the nodes to be used
You may specify which node to be used by options in the submission script.
Example:
qsub -l nodes=compute-0-2:ppn=4+compute-0-3:ppn=3
This requests 4 processors on compute-0-2 and 3 processors on compute-0-3.
Submit Your Jobs
Submitting Serial Jobs
A serial job on Combo is defined as a job that does not require more than one node, which do not involve any inter-compute node data communications either.
Although OpenMP (NOT OpenMPI) jobs can use more than one CPU cores, all such cores are within a node. The OpenMP jobs, as a result, are serial jobs.
A serial job usually takes 1 to 16 CPU core in a node. We specify this in the "#PBS -l" line. The PBS script should be like this,
#!/bin/bash #PBS -l nodes=1:ppn=1,walltime=5:00:00 #PBS -N jobname #PBS -M yourID@my.cityu.edu.hk #PBS -m abe cd /your/job/directory/ ./Command &> output exit 0;
Submit your batch job from the frontend with the command
qsub [job_script]
You get the job_name and job_id assigned, which can be used with various command.
Monitor Your Jobs
To see the progress information of running jobs, the command showq(Maui) and qstat(Torque) can be used. Both commands give you a summary of the status of submitted jobs and queues They give slightly different types of information. qstat shows a list of all running and waiting jobs in the queue, sorted by job identifier.
Please note that sometimes it takes a minute for submitted job to showq up under showq.
Another difference is that qstat shows time used for running jobs, while showq displays time left until the job will be killed by the queue system. When a job has finished it will no longer appear in the qstat or showq output.
Besides, the web based cluster monitor Ganglia is a very helpful tools to monitor the compute-node loading/status. Go to http://combos.tk/ganglia and from there you could view the status of Combo
To delete a running job, use
qdel [jobid]
Frequently Used PBS Command
PBS supplies a command line interface. This is used to submit, monitor, modify, and delete jobs. The following are some frequent used PBS user commands and their functions:
Command | Description |
---|---|
qsub | Submit a job |
qstat | List all information of queues and jobs |
qdel jobid | Delete a job |
Frequently Used qsub option
Command | Description |
---|---|
qsub -l list | Set job resource list |
qsub -N <jobname> | Set job name to <jobname> |
qsub -q <queue_name> | Submit to queue <queue_name> |
The resource requested on command line has a high preference than the directive line in the script file. For an example, submit job by command
qsub -l nodes=2:ppn=4 [jobscript]
this job will run on 2 compute nodes with 4 processors each instead of what stated in the script file.
qsub -l nodes=compute-0-0:ppn=16 [jobscript]
this job will run on the specified node (compute-0-0 in this case) with 16 processors.
Frequently Used qstat option
Command | Description |
---|---|
qstat -a | List all jobs with details |
qstat -q | List all queues on the system |
qstat -n | List all jobs with node information |
qstat -u userid | List all jobs owned by user userid |
qstat -r | List all running jobs |
qstat -f jobid | List all information known about specified job(jobid) |
Must Read
- Very nice qsub tutorial from NYU: https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub
Acknowledgement
With reference to:
- High Performance Cluster Computing Centre (HPCCC) at Hong Kong Baptist University, http://www.sci.hkbu.edu.hk/hpccc/index.php
- High Performace Computing Service (HPCS) at Cambridge, http://www.hpc.cam.ac.uk/
- HPC at NYU, https://wikis.nyu.edu/display/NYUHPC/High+Performance+Computing+at+NYU