Skip to content

GPU em K3s via LXC

GPU em K3s via LXC — Guia Operacional

Visao geral

LXC privileged no Proxmox com GPU compartilhada (bind mount) rodando K3s standalone. O host mantem acesso total a GPU — nao ha exclusividade.

Host Proxmox (pve-ippri-31)
|-- NVIDIA Driver 580.x (host, .run installer)
|-- /dev/nvidia* (compartilhado via bind mount)
|-- sysctl: vm.overcommit_memory=1, kernel.panic=10 (/etc/sysctl.d/99-k3s-lxc.conf)
|
+-- LXC gpu-sp-01 (privileged, nesting, keyctl)
|-- NVIDIA driver userspace (.run --no-kernel-module, mesma versao do host)
|-- nvidia-container-toolkit + CDI specs
|-- /dev/kmsg -> /dev/console (via kmsg-fix.service)
+-- K3s server (default-runtime: nvidia)
|-- containerd (auto-detecta nvidia runtime)
+-- NVIDIA Device Plugin (CDI) → nvidia.com/gpu: 1

Pre-requisitos

No host Proxmox

  • NVIDIA driver instalado (nvidia-smi funcional)
  • Template LXC Debian 13 disponivel (local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst)
  • Sysctl para K3s em LXC:
Terminal window
# /etc/sysctl.d/99-k3s-lxc.conf (no host)
vm.overcommit_memory = 1
kernel.panic = 10
kernel.panic_on_oops = 1

Infra

  • Pulumi configurado (PULUMI_CONFIG_PASSPHRASE + auth via env vars: PROXMOX_VE_ENDPOINT, PROXMOX_VE_USERNAME, PROXMOX_VE_PASSWORD)
  • Storage linstor-ssd-01 ativo

Provisionar LXC

Terminal window
cd infra/proxmox-sp
source venv/bin/activate
export PULUMI_CONFIG_PASSPHRASE="<passphrase>"
export PROXMOX_VE_ENDPOINT="https://192.168.0.11:8006"
export PROXMOX_VE_USERNAME="root@pam"
export PROXMOX_VE_PASSWORD="<senha>"
pulumi up

O LXC gpu-sp-01 (VMID 1010) sera criado no pve-ippri-31 com:

  • 8 cores, 64 GB RAM, 100 GB disco (linstor-ssd-01)
  • IP 192.168.0.61/23
  • Nesting + keyctl habilitados

GPU passthrough (manual)

O Pulumi provider tem bug com device_passthroughs. Adicionar manualmente:

Terminal window
# No host, editar /etc/pve/lxc/1010.conf:
cat >> /etc/pve/lxc/1010.conf << 'EOF'
# GPU passthrough (NVIDIA)
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
EOF
pct stop 1010 && pct start 1010

Major numbers: 195 = nvidia, 510 = nvidia-uvm. Verificar com ls -la /dev/nvidia* no host.

Instalar driver userspace no LXC

O LXC compartilha o kernel do host mas precisa dos binarios/libs NVIDIA matching:

Terminal window
ssh root@192.168.0.61
DRIVER_VERSION=$(cat /sys/module/nvidia/version) # via bind mount do host
wget -O /tmp/nvidia.run "https://us.download.nvidia.com/XFree86/Linux-x86_64/${DRIVER_VERSION}/NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"
chmod +x /tmp/nvidia.run
/tmp/nvidia.run --no-kernel-module --silent
nvidia-smi # deve funcionar

Configurar LXC (nvidia-ctk + K3s + GPU)

Terminal window
# Aplicar sysctl no host primeiro (se nao feito)
ssh root@<host> "sysctl -p /etc/sysctl.d/99-k3s-lxc.conf"
# Rodar playbook
uv run ansible-playbook config/proxmox/gpu-lxc.yaml -e target=gpu-sp-01

O playbook:

  1. Instala pre-requisitos (curl, gnupg) e cria /dev/kmsg fix
  2. Instala nvidia-container-toolkit e gera CDI specs
  3. Instala K3s com default-runtime: nvidia e protect-kernel-defaults: true
  4. Deploya NVIDIA Device Plugin (DaemonSet)

Validar

GPU no LXC

Terminal window
ssh root@192.168.0.61 nvidia-smi

K3s + GPU

Terminal window
ssh root@192.168.0.61
kubectl get nodes
kubectl describe node | grep nvidia.com/gpu
# Deve mostrar: nvidia.com/gpu: 1

Smoke test (pod com GPU)

Terminal window
kubectl run gpu-test --image=nvidia/cuda:12.9.0-base-ubuntu24.04 \
--restart=Never \
--overrides='{"spec":{"containers":[{"name":"gpu-test","image":"nvidia/cuda:12.9.0-base-ubuntu24.04","command":["nvidia-smi"],"resources":{"limits":{"nvidia.com/gpu":"1"}}}]}}'
kubectl logs gpu-test
kubectl delete pod gpu-test

Troubleshooting

nvidia-smi falha no LXC

GPU nao acessivel via bind mount. Verificar:

Terminal window
# No host, checar config do LXC:
grep -E "lxc\.(mount|cgroup)" /etc/pve/lxc/1010.conf
# Deve conter linhas lxc.cgroup2.devices.allow e lxc.mount.entry
# Verificar major numbers:
ls -la /dev/nvidia* # 195 e 510 (ou outro)

K3s falha com “open /dev/kmsg: no such file”

O kmsg-fix.service nao rodou. Verificar:

Terminal window
systemctl status kmsg-fix
ls -la /dev/kmsg # deve ser symlink → /dev/console

K3s falha com “open /proc/sys/vm/overcommit_memory: read-only”

Sysctl nao configurado no HOST. O LXC herda /proc/sys read-only do host:

Terminal window
# No HOST (nao no LXC):
cat /etc/sysctl.d/99-k3s-lxc.conf
sysctl vm.overcommit_memory # deve ser 1

Device plugin nao detecta GPU (“Incompatible strategy”)

CDI specs nao gerados. Dentro do LXC:

Terminal window
nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Reiniciar device plugin:
kubectl delete pod -n kube-system -l app.kubernetes.io/name=nvidia-device-plugin

Pod com GPU fica Pending

Terminal window
kubectl describe pod <nome> | grep -A10 Events

Se “Insufficient nvidia.com/gpu”: outro pod ja esta usando (1 GPU = 1 pod sem sharing).

Adicionar novos GPU nodes

Para replicar em ippri-33 ou ippri-34:

  1. Adicionar LXC em infra/proxmox-sp/config.py
  2. Adicionar ao inventory/hosts.yaml
  3. No host: configurar sysctl + GPU passthrough no LXC
  4. Instalar driver userspace (mesma versao do host)
  5. Rodar playbook

Proximas fases

FaseO que vemReferencia
1DRA + KAI Scheduler (GPU sharing)roadmap/gpu-platform.md
2vLLM + Open WebUIroadmap/gpu-platform.md
3TensorZero + Langfuseroadmap/gpu-platform.md