Linux server PBS task queue job submission script usage

This article describes how inLinuxIn the server, thePBS(Portable Batch System) job management system scripts to submit tasks to theserver (computer)queue and execute the task's method.

Recently, we need to execute code tasks in the school's public supercomputer; like most supercomputing devices, it also needs to submit, manage, and sort the tasks of different users by means of a job queue, so that different users can use the supercomputing device's resources in a more equitable way. Since this supercomputer in the school is based onPBSto submit a task, so here's an introduction to writing aPBSScripts, thus putting your ownCode Execution RequirementsSubmitted toserver (computer)(a.k.a. submitting a task) method.

Among them.PBS(Portable Batch System) is an open source software for managing and scheduling computing tasks; it is a commonly used job scheduling system for managing and allocating computing resources on large-scale computing clusters or supercomputers. In use, we need to first submit the job to the computing cluster.PBSJob scheduling and allocation will be done based on resource availability, job priority and other factors. The basic workflow is as follows:

user-developedPBSScripts that describe the resource requirements, execution commands, and other relevant information for the task.
userPBScommand, which commits the above-written script to thePBSSystem.
PBSThe system places the job in the job queue to wait for execution based on the resource requirements of the job in the script and the available resources of the cluster.
When there are available computing resources, thePBSThe system selects a job and assigns it to the appropriate compute node.
The job executes on the compute node until it completes, or a preset runtime limit is reached, or an error occurs during task execution, etc.

So what follows is an introduction to writingPBSscript and based on its method of submitting its own tasks to the server.

First of all, let's clarify the requirements of this article. It is known that we currently have an executable file under a path on the server (or there are1classifier for individual things or people, general, catch-all classifierPythoncode file); we would like to follow up in supercomputing on this executable (or thePythoncode file) to run.

With the need clearly defined, the next step is to get started. First, if there is a need, we cancdGo to your own working directory. I'm going to go directly into the directory where the executable is stored; the code is as follows.

cd Data_Reflectance_Rec

Subsequently, take a look at the files in the current path based on the following code.lsUsed to list files and subdirectories in a directory.

ls

Next, create the following code based onPBSscript, which I have named herepy_task.pbs; of which..pbsjust likePBSThe fixed extension of the script file. Subsequently, we added a new file to thePBSThe script used by the system to submit tasks is this file.

touch py_task.pbs

Among them.touchis a commonly used command to create a blank file or update the access and modification timestamps of an existing file. After creation, you can look at the file under the current path again based on the following code.

ls

Execute the above code as shown below. It can be seen that thepy_task.pbsthis onePBSThe script file has been created.

Once the script file has been created, we can start editing this file. Here, I've chosen to base theVimto edit, so just execute the following code.

vim py_task.pbs

Among them.Vimis a powerful text editor widely used for writing code and editing text in a command line environment. Execute the above code as shown below. It can be seen thatpy_task.pbsthis onePBSThe script file has beenVimOpen up.

Next, pressikey to enter the text editing state; as shown below.

This can then be followed by theVimeditor-in-chiefPBSScript file. Here we give2classifier for individual things or people, general, catch-all classifierPBSA template for the script file; wherein the first1The templates are shown below.

#!/bin/bash
#PBS -N py_task
#PBS -q rtlab1_4
#PBS -l nodes=1:ppn=4
#PBS -l walltime=00:30:00
#PBS -o /data1/home/LiliAircas/Data_Reflectance_Rec/task/py_task.out
#PBS -e /data1/home/LiliAircas/Data_Reflectance_Rec/task/py_task.err
hostname
date "+%Y/%m/%d %H:%M:%S"
python /data1/home/LiliAircas/Data_Reflectance_Rec/code/
date "+%Y/%m/%d %H:%M:%S"
Finally, remember to leave a blank line

Of these, No.1A row is ashebang(also known ashashbang) line, which specifies the interpreter to be used to interpret the script. Here./bin/bashIndicates that the script will be run by theBashInterpreter execution.

Next, from the first2These are the beginning of the line.#The statement that begins, not the comment, but thePBSJob instructions for the job scheduling system. These instructions are organized in#PBSstarts with a different option specified:-N py_taskIndicates that the name of the job ispy_task，-q rtlab1_4Indicates that the job will be submitted to thertlab1_4Queue;-l nodes=1:ppn=4Indicates that you are specifying the use of the1A node (node) and4processor to run the job;-l walltime=00:30:00Indicates that the maximum runtime of the job is30Minutes. Subsequent2lines of code, each specifying the job'sstandard output、error outputThe file where it is located.

Immediately after that, the subsequent2line outputs the hostname and the current date and time of the currently executing script, respectively; then it starts calling thePythoninterpreter executionthis onePythoncode file. Finally, the current datetime is output again, thus allowing us to approximate the length of the task's execution based on it.

The blank line at the end, some tutorials say it's to conform to the script file specification and provide readability and structural clarity; others say that on some versions of the server if you don't add this blank line, it will result in unrecognized script commands. So just to be sure, I added the1Row by empty row.

The above script file is edited as shown below. Note that there are some errors in the following screenshot, for examplehostnameIt's written asHostnameand there is no blank line in the last line. So for the images, you should just refer to them; mainly follow the formatting in the aforementioned text version of the code to modify your ownPBSScript file.

In addition, we then give1classifier for individual things or people, general, catch-all classifierPBSThe script executes the template of the executable as follows.

#!/bin/bash
#PBS -N py_task
#PBS -q rtlab1_4
#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -o /data1/home/LiliAircas/Data_Reflectance_Rec/code/py_task.out
#PBS -e /data1/home/LiliAircas/Data_Reflectance_Rec/code/py_task.err
hostname
date "+%Y/%m/%d %H:%M:%S"
cd /data1/home/LiliAircas/Data_Reflectance_Rec/code
./Alignment_Server
date "+%Y/%m/%d %H:%M:%S"

Among them, the meaning of the script file has been described earlier, so we will not introduce them one by one here.

existVimOnce you have finished editing your own script file in theVim. First, we need to press theEsckey to exit the edit mode; subsequently, sequentially enter the:wqthese3key to save and exitVIm。

We can then submit our ownPBSscript file to the system; this can be accomplished with the following code.

qsub py_task.pbs

The above code will take our aforementioned editedPBSscript filepy_task.pbsSubmit toPBSin the job scheduling system and start waiting for the system to allocate resources to execute the job. Execute the above code as shown below.

If there are no problems, it will come up with a number as shown above; this is the one for the task we just submittedID。

Of course, there are times when executing the above code will result in an error as shown below, i.e.qsub: submit error (Unauthorized Request...)The word error is reported.

Most of these are caused by submitting a job to a queue to which you do not have access; in this case, you need to contact the server's administrators so that you can get access.

Next, a few morePBSCommon commands for the system.

First, we can get the current supercomputer for all nodes by using the following code.

pbsnodes

Execute the above code as shown below; you can see that the information about the different nodes is listed.

It is also possible to follow the above command with the name of a specific node, thus obtaining information only about the specified node; as in the following code.

pbsnodes cu02

The execution of the above code is shown below; where, as in the above figure, each of the code that is currently beingthis nodeof the tasks running on theID, are displayed, for example, in the purple box in the figure below is a particular task'sID。

Second, we can get the current status of all the tasks in the queue by using the following code.

qstat

Executing the above code, as shown below; you can see that there is a task, which was submitted by myself. In my case, after executing the above code, I can only see my own submitted tasks, but not tasks submitted by other people that exist in the queue at the same time - it feels like this may have been set up by the administrators of our school's servers, so that each user can only see the tasks that have been submitted by his or her own account.

Once again, the details of the tasks in the queue can also be viewed by using the following code.

qstat -f

Execute the above code as shown below.

Additionally, it is possible to passqdelcommand plus the task'sID, removes the specified task from the queue; for example, the following code.

qdel 1250752

Execute the above code and pass theqstatcommand to view the tasks in the queue, you can see that the specified task has been deleted - but with a delay: after executing theqdelImmediately after executionqstatIf you can see the1250752This task is still in place; it will be executed laterqstatI can't see it until I see it.1250752The mission disappeared.

Once the tasks have been executed, we can then execute the following in turn2code, open and view the standard output, error output file of the job.catis a commonly used command to concatenate specified files and print their contents.

cat py_task.out
cat py_task.err

Execute the above code as shown below. Of course, I have some permission errors in the task to be executed, so I've added the following code to thepy_task.errfile, which gives the contents of the error report during the execution of the task.

If there are no errors, then you can view the result file of the task execution in the context of your task.

At this point, the job is done.