With the rapid development of AI and big models, GPU resource sharing on the cloud becomes necessary because it reduces hardware cost, improves resource utilization efficiency, and meets the demand for massively parallel computing for model training and inference.
In the built-in resource scheduling function of kubernetes, GPU scheduling can only be based on the "number of cores" for scheduling, but deep learning and other algorithms in the process of execution of the algorithmic program, the higher resource consumption is the graphics memory, which creates a lot of wasted resources.
There are two current GPU resource sharing schemes. One is to decompose a real GPU into multiple virtual GPUs, i.e., vGPUs, so that scheduling can be done based on the number of vGPUs; the other is to schedule based on the GPU's video memory.
In this article, we will talk about how to install the kubernetes component to enable scheduling of resources based on GPU memory.
System Information
-
System: centos stream8
-
Kernel: 4.18.0-490.el8.x86_64
-
Driver: NVIDIA-Linux-x86_64-470.182.03
-
docker:20.10.24
-
kubernetes version: 1.24.0
1. Driver installation
Please visit the nvida website to install it yourself:/Download/?lang=en-us
2. docker installation
Please install docker or other container runtime by yourself, if you use other container runtime, please refer to NVIDA official website for the third step of configuration./datacenter/cloud-native/container-toolkit/#installation-guide
Note: Officially, docker, containerd and podman are supported, but this document has only verified the use of docker, so please be aware of the differences if running with other containers.
3. NVIDIA Container Toolkit Installation
- Setting up the repository with GPG Key
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L /libnvidia-container/$distribution/ | sudo tee /etc//
- Start Installation
sudo dnf clean expire-cache --refresh
sudo dnf install -y nvidia-container-toolkit
- Modify the docker configuration file to add the container runtime implementation
sudo nvidia-ctk runtime configure --runtime=docker
- Modify /etc/docker/, set nvidia as default container runtime (required)
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
- Restart docker and start verifying that it takes effect
sudo systemctl restart docker
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
If the following data is returned, the configuration was successful
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
4. Installation of the K8S GPU scheduler
- First execute the following yaml to deploy the scheduler
#
---
kind: ClusterRole
apiVersion: ./v1
metadata:
name: gpushare-schd-extender
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- pods
verbs:
- update
- patch
- get
- list
- watch
- apiGroups:
- ""
resources:
- bindings
- pods/binding
verbs:
- create
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gpushare-schd-extender
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: ./v1
metadata:
name: gpushare-schd-extender
namespace: kube-system
roleRef:
apiGroup: .
kind: ClusterRole
name: gpushare-schd-extender
subjects:
- kind: ServiceAccount
name: gpushare-schd-extender
namespace: kube-system
# deployment yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: gpushare-schd-extender
namespace: kube-system
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: gpushare
component: gpushare-schd-extender
template:
metadata:
labels:
app: gpushare
component: gpushare-schd-extender
annotations:
/critical-pod: ''
spec:
hostNetwork: true
tolerations:
- effect: NoSchedule
operator: Exists
key: /master
- effect: NoSchedule
key: /control-plane
operator: Exists
- effect: NoSchedule
operator: Exists
key: /uninitialized
nodeSelector:
/control-plane: ""
serviceAccount: gpushare-schd-extender
containers:
- name: gpushare-schd-extender
image: /acs/k8s-gpushare-schd-extender:1.11-d170d8a
env:
- name: LOG_LEVEL
value: debug
- name: PORT
value: "12345"
#
---
apiVersion: v1
kind: Service
metadata:
name: gpushare-schd-extender
namespace: kube-system
labels:
app: gpushare
component: gpushare-schd-extender
spec:
type: NodePort
ports:
- port: 12345
name: http
targetPort: 12345
nodePort: 32766
selector:
# select app=ingress-nginx pods
app: gpushare
component: gpushare-schd-extender
- Add a scheduling policy configuration file to the /etc/kubernetes directory
#
---
apiVersion: ./v1beta2
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /etc/kubernetes/
extenders:
# I don't know why it's not supportedsvcThe way to call the,mustnodeport
- urlPrefix: "-system:12345/gpushare-scheduler"
filterVerb: filter
bindVerb: bind
enableHTTPS: false
nodeCacheCapable: true
managedResources:
- name: /gpu-mem
ignoredByScheduler: false
ignorable: false
upper-system:12345 Be careful to replace {nodeIP}:{nodeport port of gpushare-schd-extender} with your locally deployed {nodeIP}, otherwise it will not be accessible!
The query command is as follows:
kubectl get service gpushare-schd-extender -n kube-system -o jsonpath='{.[?(@.name=="http")].nodePort}'
- Modify kubernetes scheduling configuration /etc/kubernetes/manifests/
1. existcommondAdd
- --config=/etc/kubernetes/
2. increasepodMounting directories
existvolumeMounts:Add
- mountPath: /etc/kubernetes/
name: scheduler-policy-config
readOnly: true
existvolumes:Add
- hostPath:
path: /etc/kubernetes/
type: FileOrCreate
name: scheduler-policy-config
Note: Don't make any corrections here, or you may make inexplicable errors
Examples are shown below:
- Configuring rbac and installing the device plugin
kubectl create -f /AliyunContainerService/gpushare-device-plugin/master/
kubectl create -f /AliyunContainerService/gpushare-device-plugin/master/
5. Adding labels to GPU nodes
kubectl label node <target_node> gpushare=true
6. install the kubectl gpu plugin
cd /usr/bin/
wget /AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
chmod u+x /usr/bin/kubectl-inspect-gpushare
7. Validation
- Querying GPU resource usage using kubectl
# kubectl inspect gpushare
NAME IPADDRESS GPU0(Allocated/Total) GPU Memory(GiB)
-uf61h64dz1tmlob9hmtb 192.168.0.71 6/15 6/15
-uf61h64dz1tmlob9hmtc 192.168.0.70 3/15 3/15
------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
9/30 (30%)
- Create a resource with GPU requirements and view its resource scheduling
apiVersion: apps/v1
kind: Deployment
metadata:
name: binpack-1
labels:
app: binpack-1
spec:
replicas: 1
selector: # define how the deployment finds the pods it manages
matchLabels:
app: binpack-1
template: # define the pods specifications
metadata:
labels:
app: binpack-1
spec:
tolerations:
- effect: NoSchedule
key: cloudClusterNo
operator: Exists
containers:
- name: binpack-1
image: cheyang/gpu-player:v2
resources:
limits:
# work unit (one's workplace)GiB
/gpu-mem: 3
8. Troubleshooting
If you find that the resource was not installed successfully during the installation process, you can view the logs via pod
kubectl get po -n kube-system -o=wide | grep gpushare-device
kubecl logs -n kube-system <pod_name>
Reference address:
NVIDA official container-toolkit installation documentation./datacenter/cloud-native/container-toolkit/#docker
AliCloud GPU plugin installation:/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/