preamble
It's been a while since I've blogged lately, too much going on, and I've got a new fun project I'm working on, so I'll come back with a post about it when I'm pretty much done with the follow-up.
In our current AI project, the team needs to share a GPU server for model training and data processing. In order for every team member to use this server efficiently, we decided to set up a multi-user shared environment. This way, everyone can easily access and utilize the server's powerful computing power, whether it is for code development, model testing or result validation.
This article will document the process of configuring a Linux shared environment and also hopefully help teams who are facing similar requirements.
Setting up users and groups
To effectively manage our GPU servers, we first need to create new user accounts and organize these accounts into a dedicated user group. Doing so makes it easier to manage permissions and access controls.
Creating Users and Groups
First create a group
sudo groupadd gpugroup
Next, create a user
sudo adduser [user name]
Then add the user to the group
sudo usermod -a -G gpugroup username
Configuring SSH Public Key Login
In order for team members to securely connect to the server via SSH, public key authentication is used uniformly and password logins are prohibited.
Each user needs to generate a pair of keys (if they don't already have them) and send the public keys to the administrator. The administrator then needs to add these public keys to the user's.ssh/authorized_keys
Documentation.
First, make sure that each user's home directory has the.ssh
directory, if it does not exist, you can create it using the following command:
sudo mkdir /home/[username]/.ssh
sudo chmod 700 /home/[username]/.ssh
Then, add the public key to theauthorized_keys
Documentation:
echo [Public key content] >> /home/[user ID]/.ssh/authorized_keys
sudo chmod 600 /home/[user ID]/.ssh/authorized_keys
interchangeability[Username]
cap (a poem)[Contents of public key]
for the actual username and public key.
Creating a shared folder
Previously I put the code in the home directory, but in practice I found that this was not conducive to sharing.
After all, it's always weird to set a folder in a user's home directory as a shared directory for other users to access.
It's usually better to create a special directory for shared projects for security and organization purposes, and asking the GPT said so
security: Avoid unnecessary permission leaks. When you put your project in someone's home directory, you may accidentally give too much access to other users, which may lead to leakage of sensitive information or accidental data corruption.
manageability: A dedicated shared directory makes administration simple and straightforward. You can easily control who can access this directory without worrying about affecting other personal data or settings.
scalability: As your team grows, more users may need access to these shared resources. Having a separate shared directory makes it easier to manage users and permissions, rather than constantly adjusting permission settings under the home directory.
Clearly defined competencies: A separate shared directory makes setting and adjusting permissions much clearer and easier. For example, customized security policies and backup policies can be set for this directory.
So I'm in/srv
A directory was created in the directory/srv/projects
Use to share.
Group Permission Setting
Next, the team's shared project folder/srv/projects
The ownership is set to thegpugroup
Group.
sudo chown -R root:gpugroup /srv/projects
sudo chmod -R 775 /srv/projects
Set Group ID (SGID)
SGID is a special permission setting that ensures that any new file or directory created in a directory automatically inherits that directory's group. This is perfect for our project directory because it allows all members of the team to access and modify files without having to worry about group settings for individual files.
You can use the following command to set the SGID bit:
sudo chmod g+s /path/to/directory
For example, if our project directory is/srv/projects
The order will be:
sudo chmod g+s /srv/projects
After this is set up, all the files in the/srv/projects
New files and directories created under will automatically be set to that group, keeping permissions consistent.
Adjust umask (optional)
PS: I didn't set this up, it's possible but not necessary
umask is a system setting that determines the default permissions for newly created files and directories. To ensure that team members can edit each other's files, we need to set an appropriate umask value.
The usual umask value is022
This means that the default permissions for newly created files are644
(read and write for users, read for groups and others), the default permissions for the new directory are755
(read/write execution for users, read/execute for groups and others). For the sake of teamwork, we can set umask to002
This way the permissions of the new file are664
(users and groups can read and write, others can read), and the permissions of the new directory are775
(read-write execution for users and groups, read-write execution for others).
You can temporarily change the umask value with the following command:
umask 002
In order to change it permanently, it needs to be set in the user's shell configuration file, for example.bashrc
maybe.profile
:
echo "umask 002" >> ~/.bashrc
(sth. or sb) else
There are a few more details, such as the fact that I installed conda into my home directory.
Now you also have to reinstall it in /srv/apps, but not in home!
This way all users can share the python virtual environment
Show progress when copying files
The cp -R command does not support displaying progress
You can use rsync or pv instead.
rsync
rsync -ah --progress source destination
Parameter Explanation:
-
-a
is the archive mode, which preserves symbolic links, file permissions, user group information, and so on. -
-h
Make the output easier to read, using a human-readable format. -
--progress
Displays the progress of copying.
pv
This is used with a compression tool, some systems don't come with it and you may have to install it first.
tar cf - source/ | pv | tar xf - -C destination
This command willsource
directory is packaged and passed through thepv
Display the progress, and then add thedestination
directory to unpack it.
wrap-up
That's pretty much it.
I've also recently explored some project management tools, deploying MatterMost
You can write an article documenting this next.