This article describes how inLinuxIn the server, thePBS(Portable Batch System) job management system scripts to submit tasks to theserver (computer)queue and execute the task's method.
Recently, we need to execute code tasks in the school's public supercomputer; like most supercomputing devices, it also needs to submit, manage, and sort the tasks of different users by means of a job queue, so that different users can use the supercomputing device's resources in a more equitable way. Since this supercomputer in the school is based onPBSto submit a task, so here's an introduction to writing aPBSScripts, thus putting your ownCode Execution RequirementsSubmitted toserver (computer)(a.k.a. submitting a task) method.
Among them.PBS(Portable Batch System) is an open source software for managing and scheduling computing tasks; it is a commonly used job scheduling system for managing and allocating computing resources on large-scale computing clusters or supercomputers. In use, we need to first submit the job to the computing cluster.PBSJob scheduling and allocation will be done based on resource availability, job priority and other factors. The basic workflow is as follows:
- user-developedPBSScripts that describe the resource requirements, execution commands, and other relevant information for the task.
- userPBScommand, which commits the above-written script to thePBSSystem.
- PBSThe system places the job in the job queue to wait for execution based on the resource requirements of the job in the script and the available resources of the cluster.
- When there are available computing resources, thePBSThe system selects a job and assigns it to the appropriate compute node.
- The job executes on the compute node until it completes, or a preset runtime limit is reached, or an error occurs during task execution, etc.
So what follows is an introduction to writingPBSscript and based on its method of submitting its own tasks to the server.
First of all, let's clarify the requirements of this article. It is known that we currently have an executable file under a path on the server (or there are1
classifier for individual things or people, general, catch-all classifierPythoncode file); we would like to follow up in supercomputing on this executable (or thePythoncode file) to run.
With the need clearly defined, the next step is to get started. First, if there is a need, we cancd
Go to your own working directory. I'm going to go directly into the directory where the executable is stored; the code is as follows.
cd Data_Reflectance_Rec
Subsequently, take a look at the files in the current path based on the following code.ls
Used to list files and subdirectories in a directory.
ls
Next, create the following code based onPBSscript, which I have named herepy_task.pbs
; of which..pbs
just likePBSThe fixed extension of the script file. Subsequently, we added a new file to thePBSThe script used by the system to submit tasks is this file.
touch py_task.pbs
Among them.touch
is a commonly used command to create a blank file or update the access and modification timestamps of an existing file. After creation, you can look at the file under the current path again based on the following code.
ls
Execute the above code as shown below. It can be seen that thepy_task.pbs
this onePBSThe script file has been created.
Once the script file has been created, we can start editing this file. Here, I've chosen to base theVimto edit, so just execute the following code.
vim py_task.pbs
Among them.Vimis a powerful text editor widely used for writing code and editing text in a command line environment. Execute the above code as shown below. It can be seen thatpy_task.pbs
this onePBSThe script file has beenVimOpen up.
Next, pressi
key to enter the text editing state; as shown below.
This can then be followed by theVimeditor-in-chiefPBSScript file. Here we give2
classifier for individual things or people, general, catch-all classifierPBSA template for the script file; wherein the first1
The templates are shown below.
#!/bin/bash
#PBS -N py_task
#PBS -q rtlab1_4
#PBS -l nodes=1:ppn=4
#PBS -l walltime=00:30:00
#PBS -o /data1/home/LiliAircas/Data_Reflectance_Rec/task/py_task.out
#PBS -e /data1/home/LiliAircas/Data_Reflectance_Rec/task/py_task.err
hostname
date "+%Y/%m/%d %H:%M:%S"
python /data1/home/LiliAircas/Data_Reflectance_Rec/code/
date "+%Y/%m/%d %H:%M:%S"
Finally, remember to leave a blank line
Of these, No.1
A row is ashebang(also known ashashbang) line, which specifies the interpreter to be used to interpret the script. Here./bin/bash
Indicates that the script will be run by theBashInterpreter execution.
Next, from the first2
These are the beginning of the line.#
The statement that begins, not the comment, but thePBSJob instructions for the job scheduling system. These instructions are organized in#PBS
starts with a different option specified:-N py_task
Indicates that the name of the job ispy_task
,-q rtlab1_4
Indicates that the job will be submitted to thertlab1_4
Queue;-l nodes=1:ppn=4
Indicates that you are specifying the use of the1
A node (node) and4
processor to run the job;-l walltime=00:30:00
Indicates that the maximum runtime of the job is30
Minutes. Subsequent2
lines of code, each specifying the job'sstandard output、error outputThe file where it is located.
Immediately after that, the subsequent2
line outputs the hostname and the current date and time of the currently executing script, respectively; then it starts calling thePythoninterpreter executionthis onePythoncode file. Finally, the current datetime is output again, thus allowing us to approximate the length of the task's execution based on it.
The blank line at the end, some tutorials say it's to conform to the script file specification and provide readability and structural clarity; others say that on some versions of the server if you don't add this blank line, it will result in unrecognized script commands. So just to be sure, I added the1
Row by empty row.
The above script file is edited as shown below. Note that there are some errors in the following screenshot, for examplehostname
It's written asHostname
and there is no blank line in the last line. So for the images, you should just refer to them; mainly follow the formatting in the aforementioned text version of the code to modify your ownPBSScript file.
In addition, we then give1
classifier for individual things or people, general, catch-all classifierPBSThe script executes the template of the executable as follows.
#!/bin/bash
#PBS -N py_task
#PBS -q rtlab1_4
#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -o /data1/home/LiliAircas/Data_Reflectance_Rec/code/py_task.out
#PBS -e /data1/home/LiliAircas/Data_Reflectance_Rec/code/py_task.err
hostname
date "+%Y/%m/%d %H:%M:%S"
cd /data1/home/LiliAircas/Data_Reflectance_Rec/code
./Alignment_Server
date "+%Y/%m/%d %H:%M:%S"
Among them, the meaning of the script file has been described earlier, so we will not introduce them one by one here.
existVimOnce you have finished editing your own script file in theVim. First, we need to press theEsc
key to exit the edit mode; subsequently, sequentially enter the:wq
these3
key to save and exitVIm。
We can then submit our ownPBSscript file to the system; this can be accomplished with the following code.
qsub py_task.pbs
The above code will take our aforementioned editedPBSscript filepy_task.pbs
Submit toPBSin the job scheduling system and start waiting for the system to allocate resources to execute the job. Execute the above code as shown below.
If there are no problems, it will come up with a number as shown above; this is the one for the task we just submittedID。
Of course, there are times when executing the above code will result in an error as shown below, i.e.qsub: submit error (Unauthorized Request...)
The word error is reported.
Most of these are caused by submitting a job to a queue to which you do not have access; in this case, you need to contact the server's administrators so that you can get access.
Next, a few morePBSCommon commands for the system.
First, we can get the current supercomputer for all nodes by using the following code.
pbsnodes
Execute the above code as shown below; you can see that the information about the different nodes is listed.
It is also possible to follow the above command with the name of a specific node, thus obtaining information only about the specified node; as in the following code.
pbsnodes cu02
The execution of the above code is shown below; where, as in the above figure, each of the code that is currently beingthis nodeof the tasks running on theID, are displayed, for example, in the purple box in the figure below is a particular task'sID。
Second, we can get the current status of all the tasks in the queue by using the following code.
qstat
Executing the above code, as shown below; you can see that there is a task, which was submitted by myself. In my case, after executing the above code, I can only see my own submitted tasks, but not tasks submitted by other people that exist in the queue at the same time - it feels like this may have been set up by the administrators of our school's servers, so that each user can only see the tasks that have been submitted by his or her own account.
Once again, the details of the tasks in the queue can also be viewed by using the following code.
qstat -f
Execute the above code as shown below.
Additionally, it is possible to passqdel
command plus the task'sID, removes the specified task from the queue; for example, the following code.
qdel 1250752
Execute the above code and pass theqstat
command to view the tasks in the queue, you can see that the specified task has been deleted - but with a delay: after executing theqdel
Immediately after executionqstat
If you can see the1250752
This task is still in place; it will be executed laterqstat
I can't see it until I see it.1250752
The mission disappeared.
Once the tasks have been executed, we can then execute the following in turn2
code, open and view the standard output, error output file of the job.cat
is a commonly used command to concatenate specified files and print their contents.
cat py_task.out
cat py_task.err
Execute the above code as shown below. Of course, I have some permission errors in the task to be executed, so I've added the following code to thepy_task.err
file, which gives the contents of the error report during the execution of the task.
If there are no errors, then you can view the result file of the task execution in the context of your task.
At this point, the job is done.