简介

随着AI与云应用的发展,逐渐对GPU的图形算力提出了更高的要求。工业级的GPU卡单卡性能很高,直接提供给某一个虚机或者程序使用,很容易造成资源的浪费,如何灵活的利用GPU算力成为各厂商的需求,因此各种GPU虚拟化方案便应运而生。

目前主流的GPU虚拟化方案主要有以下三个方向:

  • Mediated Pass-Through方向,该方向主要由NVIDIA推进,NVIDIA的GRID vGPU为NVIDIA提供的源生能力,通过NVIDIA vGPU驱动实现显存及算力的切分,根据vGPU的类型可选择启用不同的切分模型。

  • API Forwarding方向,该方向主要的方案有vCUDA、rCUDA等,主要应用于容器场景,通过CUDA层API的拦截与转发,将程序的GPU请求转发至GPU卡处理。

  • Device Emulation方向,该方向主要应用于虚拟化场景,通过虚拟层对GPU的模拟,配合物理GPU硬件,实现GPU能力的透传。

各方案的具体性能水平如下:

本文主要介绍Device Emulation方向中的VirtIO GPU方案实现,主要从原理及部署实操角度进行介绍。文末附有GPU虚拟化技术的所有参考文档,感兴趣的同学可以深入学习研究。

工作原理

There are a few parts to this implementation.

QEMU, virglrenderer and virtio-gpu. The way it works is by letting the guest applications speak unmodified OpenGL to the Mesa. But instead of Mesa handing commands over to the hardware it is channeled through virtio-gpu on the guest to QEMU on the host.

QEMU then receives the raw graphics stack state (Gallium state) and interprets it using virglrenderer from the raw state into an OpenGL form, which can be executed as entirely normal OpenGL on the host machine.

The host OpenGL stack does not even have to be Mesa, and could for example be the proprietary nvidia stack.

部署流程

环境信息

  • CPU:Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz

  • GPU:NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8]

  • Kernel:5.4.134-1.el7.elrepo.x86_64

  • OS:CentOS Linux release 7.8.2003 (Core)

  • Python:3.6.8

  • GCC:4.8.5

  • Qemu:4.2.0

  • Libvirt:5.9.0

基础配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 安装开发套件(非必要安装,可后面缺啥补啥)
sudo yum -y groupinstall Development

# 安装meson编译构建工具
pip3 install meson

# 安装ninja编译构建工具
wget https://github.com/ninja-build/ninja/releases/download/v1.11.1/ninja-linux.zip
unzip ninja-linux.zip && cp ninja /usr/bin/

# 安装GCC 8.xx版本(系统自带的GCC 4.xx无法满足编译需求)
sudo yum -y install centos-release-scl
sudo yum -y install devtoolset-8-gcc*
scl enable devtoolset-8 bash
gcc -v

# 安装编译依赖包
sudo yum install -y cmake ninja-build

# 安装libepoxy依赖包
sudo yum -y install libX11-devel mesa-libEGL mesa-libEGL-devel

编译Libepoxy

libepoxy is a library for managing OpenGL function pointers for you. And it is a dependency of virglrenderer, which we’ll get to below.

1
2
3
4
5
6
git clone https://github.com/anholt/libepoxy.git

cd libepoxy && mkdir _build && cd _build
meson --prefix=/usr
ninja
sudo ninja install

编译Virglrender

Virgilrenderer is the component that QEMU uses to provide accelerated rendering.
It receives Gallium states from the guest kernel via its virtio-gpu interface, which are then translated into OpenGL on the host. It also translates shaders from the TGSI format used by Gallium into the GLSL format used by OpenGL.

1
2
3
4
5
6
git clone git://anongit.freedesktop.org/virglrenderer

cd virglrenderer && mkdir _build && cd _build
meson --prefix=/usr
ninja
sudo ninja install

编译Qemu

虚拟化层需要启用VirGLRenderer与OpenGL能力,默认rpm安装为关闭状态,这里需要下载源码后重新编译。常见的编译问题可参考文档《Libvirt研发:Qemu编译》。

这里版本选择的是4.2.0,测试过以下两个版本的Qemu,均存在一些问题:

  • 2.12.0版本,编译可以成功,但是拉起虚机时会报unsupported configuration: This QEMU doesn’t support OpenGL rendernode with egl-headless graphics type。

  • 7.2.0版本,由于该版本编译需要的C环境与CentOS 7.8差异较大,编译无法通过。

1
2
3
4
5
6
7
wget https://download.qemu.org/qemu-4.2.0.tar.xz --no-check-certificate

../configure --target-list=x86_64-softmmu --enable-kvm \
--enable-spice --enable-vnc --enable-guest-agent \
--enable-rbd --enable-seccomp --enable-numa \
--enable-virglrenderer --enable-opengl \
--disable-glusterfs

Libvirt Domain XML

需要将虚机XML文件中关于graphics与video片段调整为如下内容,主要改动点有:

  • graphics协议调整为spice协议

  • graphics启动egl-headless,用于OpenGL的支持,来使用本地设备进行3D加速。

    Libvirt官方文档介绍内容如下:

    This display type provides support for an OpenGL accelerated display accessible both locally and remotely (for comparison, Spice’s native OpenGL support only works locally using UNIX sockets at the moment, but has better performance). Since this display type doesn’t provide any window or graphical console like the other types, for practical reasons it should be paired with either vnc or spice graphics types. This display type is only supported by QEMU domains (needs QEMU 2.10 or newer). 5.0.0 this element accepts a  sub-element with an optional attribute rendernode which can be used to specify an absolute path to a host’s DRI device to be used for OpenGL rendering.

  • video devices类型使用virtio,启用accel3d能力。

1
2
3
4
5
6
7
8
9
10
11
12
<graphics type='spice' autoport='yes' listen='0.0.0.0'>
<listen type='address' address='0.0.0.0'/>
</graphics>
<graphics type='egl-headless'>
<gl rendernode='/dev/dri/renderD128'/>
</graphics>
<video>
<model type='virtio' heads='1' primary='yes'>
<acceleration accel3d='yes'/>
</model>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
</video>

虚机最终运行的Qemu命令如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/usr/local/bin/qemu-system-x86_64 -name guest=instance-00000012,debug-threads=on \
-S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-55-instance-00000012/master-key.aes \
-machine pc-q35-2.12,accel=kvm,usb=off,dump-guest-core=off \
-cpu Cascadelake-Server,ss=on,vmx=on,hypervisor=off,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=off,kvm=off \
-m 8192 -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -uuid f56bfe1f-43ec-42c8-b4a8-7705fd22ff01 \
-smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=20.3.0,serial=f56bfe1f-43ec-42c8-b4a8-7705fd22ff01,uuid=f56bfe1f-43ec-42c8-b4a8-7705fd22ff01,family=Virtual Machine \
-no-user-config -nodefaults -chardev socket,id=charmonitor,fd=43,server,nowait -mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on \
-device qemu-xhci,id=usb,bus=pci.2,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \
-drive file=/var/lib/nova/instances/f56bfe1f-43ec-42c8-b4a8-7705fd22ff01/disk.config,format=raw,if=none,id=drive-sata0-0-0,readonly=on,cache=none,discard=unmap \
-device ide-cd,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0,write-cache=on \
-object secret,id=virtio-disk0-secret0,data=yNCwz/EwiF7jJ6PbcEdzYkLjqrH9K9rf8dGLsf4CKkA=,keyid=masterKey0,iv=NySICz8lLuVEceJIZjk+Ug==,format=base64 \
-drive file=rbd:cinder-volumes/7d0fcc10-a094-4acb-8b4f-b4ec3756fb93:id=cinder:auth_supported=cephx\;none:mon_host=111.111.9.104\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap \
-device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on,serial=7d0fcc10-a094-4acb-8b4f-b4ec3756fb93 \
-netdev tap,fd=47,id=hostnet0,vhost=on,vhostfd=48 -device virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:5b:9f:f2,bus=pci.1,addr=0x0 \
-add-fd set=3,fd=50 -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on -device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=49,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5903,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
-display egl-headless,rendernode=/dev/dri/renderD128 \
-device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pcie.0,addr=0x1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

能力测试

本次测试使用winserver 2019虚机进行测试,通过检测虚机内DirectX功能启用情况及3D游戏运行情况,来判断虚机是否具备了3D加速能力。

DirectX功能

win + r启动运行窗口,输入dxdiag,启动DirectX诊断工具,检测结果显示DirectX功能均已启动。

游戏测试

本次测试使用的游戏为《英雄联盟》,未启用3D能力情况下,无法进入游戏界面。经过VirtIO GPU配置后,可成功打开游戏界面。但目前测试效果并不理想,性能方面还比较卡,无法跟真实的显卡相媲美。

参考文档

关键文档

技术介绍

源码分析

实操记录

GPU虚拟化方案