安装

安装依赖环境

要装的两个依赖分别是:gcc、kernel-devel,其中需要注意的是,kernel-devel的版本需要与当前内核的版本一致,不然后面会出现找不到文件的情况。

1)查看我的内核版本:

1
2
[root@k104 vGPU]# uname -r
3.10.0-1127.el7.x86_64

2)查看一下可以安装的版本,安装对应内核版本:

1
2
[root@k104 vGPU]# yum list | grep kernel-devel
kernel-devel.x86_64 3.10.0-1127.el7 @/kernel-devel-3.10.0-1127.el7.x86_64

3)安装依赖

1
yum install kernel-devel-$(uname -r) gcc dkms -y

屏蔽系统自带的nouveau(重启生效)

1
2
3
4
5
6
7
8
9
10
11
12
echo "blacklist nouveau" >> /lib/modprobe.d/dist-blacklist.conf
echo "options nouveau modeset=0" >> /lib/modprobe.d/dist-blacklist.conf
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
systemctl set-default multi-user.target

#执行结束后重启
reboot

#重启后查看是否成功禁用
lspci -nn |grep -i NVI
lspci -kkd 10de:1eb8

禁用前:

禁用后:

安装NVIDIA GPU驱动

1
2
3
4
5
# 启动安装
chmod +x NVIDIA-Linux-x86_64-470.82.01.run && ./NVIDIA-Linux-x86_64-470.82.01.run

# 安装完成检测
nvidia-smi

PS: 如果遇到报错 :Error: failed to start container “nginx”: Error response from daemon: error gathering device information while adding custom device “/dev/nvidia-uvm”: no such file or directory

可以尝试手动加载,参考链接: https://blog.csdn.net/JosephThatwho/article/details/107869332

1
2
3
4
5
6
7
8
9
10
11
[root@k8s-node3 package]# ls /dev | grep nvidia
nvidia0
nvidia-caps
nvidiactl
[root@k8s-node3 package]# nvidia-modprobe -u -c=0
[root@k8s-node3 package]# ls /dev | grep nvidia
nvidia0
nvidia-caps
nvidiactl
nvidia-uvm
nvidia-uvm-tools

参考文档