- 00 Requirements
- 01 Install docker
- 02 Preparation
- 03 Configure Dockerfile and docker composer
- 04 Start docker
- 05 Test whether you can ssh to connect to this docker (may need to combine 04 to debug)
(Acknowledgements to experts with very strong technical skills
00 Requirements
You need to configure some new servers, you can only connect through ssh [email protected], and then create your own docker under the /data1 disk, and use ssh to connect to docker to use the server.
(The addresses of boss and 172.16.1.100 are both fictitious. When using them, you need to replace them with the server address you want to configure and the account you can use)
System: Ubuntu 20.04, with nvidia graphics card.
01 Install docker
(I need to configure docker already installed in the server, so I didn't do this step. The following tutorial was generated by LLM)
# First, make sure there are no older versions of Docker on the system
sudo apt-get remove docker docker-engine containerd runc
# Then, update the package list and install the necessary packages
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
# Add Docker's official GPG key
curl -fsSL /linux/ubuntu/gpg | sudo apt-key add -
# Set up a stable version of Docker repository
sudo add-apt-repository "deb [arch=amd64] /linux/ubuntu $(lsb_release -cs) stable"
# Update the package list to include packages in the Docker repository
sudo apt-get update
# Install Docker CE, Docker CLI and Containerd
sudo apt-get install docker-ce docker-ce-cli
# Check Docker installation version
docker --version
# Verify that Docker is installed successfully. This command will download and run a test image.
sudo docker run hello-world
# Finally, configure Docker to boot
sudo systemctl enable docker
In order not to run the Docker command using sudo, you can add the current user to the docker group:
sudo usermod -aG docker $USER
Log in or restart the system for the group changes to take effect.
02 Preparation
Create a new directory that is ready to put docker and change the directory permissions: (<user_name> is my name. When running the command, you need to replace it with the name you want docker to have)
sudo mkdir /data1/<user_name>
sudo chown boss /data1/<user_name>/ -R
sudo chgrp boss /data1/<user_name>/ -R
mkdir /data1/<user_name>/docker
mkdir /data1/<user_name>/project
Configure authorized_keys for ssh:
cd /data1/<user_name>/docker/
vim authorized_keys
# Copy the content of id_rsa.pub in the local computer user/.ssh
03 Configure Dockerfile and docker composer
Create a new Dockerfile:
cd /data1/<user_name>/docker/
vim Dockerfile
The specific content of Dockerfile:
# Check out what images are in docker images
FROM nvidia/cuda:11.6.0-devel-ubuntu20.04
# Set the time zone
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
# Install basic software
RUN apt-get update && \
apt-get install -y \
openssh-server \
python3 \
python3-pip \
vim \
git \
wget \
curl \
unzip \
sudo \
net-tools \
iputils-ping \
build-essential \
cmake \
htop \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install other software
RUN apt-get update && \
apt-get install -y \
tmux \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Create a user (make the same UID as the host to avoid permission issues)
RUN useradd -m -u 1001 -s /bin/bash <user_name>
# sudo without password
RUN echo "<user_name> ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
USER <user_name>
WORKDIR /home/<user_name>
# Create .ssh directory and set permissions
RUN mkdir -p /home/<user_name>/.ssh && \
chown -R <user_name>:<user_name> /home/<user_name>/.ssh && \
chmod 700 /home/<user_name>/.ssh
# Install Conda
RUN wget /miniconda/Miniconda3-latest-Linux-x86_64.sh -O && \
bash -b -p /home/<user_name>/miniconda && \
rm
RUN /home/<user_name>/miniconda/bin/conda init bash
CMD ["/bin/bash"]
Before configuring docker composer, confirm which port is available:
sudo netstat -tuln
# Find a port that is not listed, such as 8012
Then, create a new docker composer:
cd /data1/<user_name>/docker/
vim
Specific content:
version: '3.8'
services:
<user_name>:
container_name: <user_name> # Set container name
build: . # Use Dockerfile in the current directory to build the image
image: <user_name> # Mirror name
restart: unless-stopped
runtime: nvidia # Enable GPU support
Ports:
- "8012:22" # Select an unoccupied port (please confirm that 8012 is available)
Volumes:
- /data1/<user_name>/project:/home/<user_name>/project # Mount the project directory
- /data1/<user_name>/docker/authorized_keys:/home/<user_name>/.ssh/authorized_keys # SSH
environment:
- NVIDIA_DRIVER_CAPABILITIES=all
command: /bin/bash -c "sudo service ssh start && sleep infinity"
A docker composer compatible with the old version of docker (I don't know any old versions, they are all written by experts)
services:
container_name: <user_name> # Set container name
build: . # Use Dockerfile in the current directory to build the image
restart: unless-stopped
Ports:
- "8012:22" # Select an unoccupied port (please confirm that 8012 is available)
Volumes:
- /data1/<user_name>/project:/home/<user_name>/project # Mount the project directory
- /data1/<user_name>/docker/authorized_keys:/home/<user_name>/.ssh/authorized_keys # SSH
environment:
- NVIDIA_DRIVER_CAPABILITIES=all
command: /bin/bash -c "sudo service ssh start && sleep infinity"
04 Start docker
Then, start docker:
cd /data1/<user_name>/docker/
docker compose build # build Dockerfile
docker compose up -d # Start docker
# Old version docker
docker-compose build # build Dockerfile
docker-compose up -d # Start docker
# Enter docker and take a look
docker exec -it <user_name> bash
# Then ls, you will see two directories: miniconda and project. All files that need to be mapped to disk and do not want to be lost need to be placed in the project.
# View directory permissions
ls -al
# If you find permission problems, exit docker and change the permissions in the directory
sudo chown boss /data1/<user_name>/ -R
sudo chgrp boss /data1/<user_name>/ -R
# If you find that the Dockerfile is written incorrectly, or you want to add something, you can run it again
docker compose build # build Dockerfile
docker compose up -d # Start docker
# Assuming you have entered docker, you want to change the permissions of docker ./ssh
docker exec -it <user_name> bash
sudo chown <user_name> ~/.ssh -R
sudo chgrp <user_name> ~/.ssh -R
# Stop and start docker temporarily
docker compose stop
docker compose start
# Kill docker
docker compose down
05 Test whether you can ssh to connect to this docker (may need to combine 04 to debug)
# Connect on your local computer
ssh -p 8012 <user_name>@172.16.1.100
The ssh connection is unsuccessful (such as letting the password enter), which is likely to be a problem with the permissions of .ssh or authorize_keys inside and outside dockers. The outside dockers must be changed to boss, and the inside dockers must be changed to <user_name>.
If it appears while connecting
ECDSA host key for [172.16.1.100]:8012 has changed and you have requested strict checking.
Host key verification failed.
You need to delete 172.16.1.100 in known_host. The above error message will give the commands that need to be executed.