GPU em K3s via LXC
GPU em K3s via LXC — Guia Operacional
Visao geral
LXC privileged no Proxmox com GPU compartilhada (bind mount) rodando K3s standalone. O host mantem acesso total a GPU — nao ha exclusividade.
Host Proxmox (pve-ippri-31)|-- NVIDIA Driver 580.x (host, .run installer)|-- /dev/nvidia* (compartilhado via bind mount)|-- sysctl: vm.overcommit_memory=1, kernel.panic=10 (/etc/sysctl.d/99-k3s-lxc.conf)|+-- LXC gpu-sp-01 (privileged, nesting, keyctl) |-- NVIDIA driver userspace (.run --no-kernel-module, mesma versao do host) |-- nvidia-container-toolkit + CDI specs |-- /dev/kmsg -> /dev/console (via kmsg-fix.service) +-- K3s server (default-runtime: nvidia) |-- containerd (auto-detecta nvidia runtime) +-- NVIDIA Device Plugin (CDI) → nvidia.com/gpu: 1Pre-requisitos
No host Proxmox
- NVIDIA driver instalado (
nvidia-smifuncional) - Template LXC Debian 13 disponivel (
local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst) - Sysctl para K3s em LXC:
# /etc/sysctl.d/99-k3s-lxc.conf (no host)vm.overcommit_memory = 1kernel.panic = 10kernel.panic_on_oops = 1Infra
- Pulumi configurado (
PULUMI_CONFIG_PASSPHRASE+ auth via env vars:PROXMOX_VE_ENDPOINT,PROXMOX_VE_USERNAME,PROXMOX_VE_PASSWORD) - Storage linstor-ssd-01 ativo
Provisionar LXC
cd infra/proxmox-spsource venv/bin/activateexport PULUMI_CONFIG_PASSPHRASE="<passphrase>"export PROXMOX_VE_ENDPOINT="https://192.168.0.11:8006"export PROXMOX_VE_USERNAME="root@pam"export PROXMOX_VE_PASSWORD="<senha>"pulumi upO LXC gpu-sp-01 (VMID 1010) sera criado no pve-ippri-31 com:
- 8 cores, 64 GB RAM, 100 GB disco (linstor-ssd-01)
- IP 192.168.0.61/23
- Nesting + keyctl habilitados
GPU passthrough (manual)
O Pulumi provider tem bug com device_passthroughs. Adicionar manualmente:
# No host, editar /etc/pve/lxc/1010.conf:cat >> /etc/pve/lxc/1010.conf << 'EOF'
# GPU passthrough (NVIDIA)lxc.cgroup2.devices.allow: c 195:* rwmlxc.cgroup2.devices.allow: c 510:* rwmlxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=filelxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=filelxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=filelxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=fileEOF
pct stop 1010 && pct start 1010Major numbers: 195 = nvidia, 510 = nvidia-uvm. Verificar com ls -la /dev/nvidia* no host.
Instalar driver userspace no LXC
O LXC compartilha o kernel do host mas precisa dos binarios/libs NVIDIA matching:
ssh root@192.168.0.61DRIVER_VERSION=$(cat /sys/module/nvidia/version) # via bind mount do hostwget -O /tmp/nvidia.run "https://us.download.nvidia.com/XFree86/Linux-x86_64/${DRIVER_VERSION}/NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"chmod +x /tmp/nvidia.run/tmp/nvidia.run --no-kernel-module --silentnvidia-smi # deve funcionarConfigurar LXC (nvidia-ctk + K3s + GPU)
# Aplicar sysctl no host primeiro (se nao feito)ssh root@<host> "sysctl -p /etc/sysctl.d/99-k3s-lxc.conf"
# Rodar playbookuv run ansible-playbook config/proxmox/gpu-lxc.yaml -e target=gpu-sp-01O playbook:
- Instala pre-requisitos (curl, gnupg) e cria /dev/kmsg fix
- Instala nvidia-container-toolkit e gera CDI specs
- Instala K3s com
default-runtime: nvidiaeprotect-kernel-defaults: true - Deploya NVIDIA Device Plugin (DaemonSet)
Validar
GPU no LXC
ssh root@192.168.0.61 nvidia-smiK3s + GPU
ssh root@192.168.0.61kubectl get nodeskubectl describe node | grep nvidia.com/gpu# Deve mostrar: nvidia.com/gpu: 1Smoke test (pod com GPU)
kubectl run gpu-test --image=nvidia/cuda:12.9.0-base-ubuntu24.04 \ --restart=Never \ --overrides='{"spec":{"containers":[{"name":"gpu-test","image":"nvidia/cuda:12.9.0-base-ubuntu24.04","command":["nvidia-smi"],"resources":{"limits":{"nvidia.com/gpu":"1"}}}]}}'
kubectl logs gpu-testkubectl delete pod gpu-testTroubleshooting
nvidia-smi falha no LXC
GPU nao acessivel via bind mount. Verificar:
# No host, checar config do LXC:grep -E "lxc\.(mount|cgroup)" /etc/pve/lxc/1010.conf# Deve conter linhas lxc.cgroup2.devices.allow e lxc.mount.entry
# Verificar major numbers:ls -la /dev/nvidia* # 195 e 510 (ou outro)K3s falha com “open /dev/kmsg: no such file”
O kmsg-fix.service nao rodou. Verificar:
systemctl status kmsg-fixls -la /dev/kmsg # deve ser symlink → /dev/consoleK3s falha com “open /proc/sys/vm/overcommit_memory: read-only”
Sysctl nao configurado no HOST. O LXC herda /proc/sys read-only do host:
# No HOST (nao no LXC):cat /etc/sysctl.d/99-k3s-lxc.confsysctl vm.overcommit_memory # deve ser 1Device plugin nao detecta GPU (“Incompatible strategy”)
CDI specs nao gerados. Dentro do LXC:
nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml# Reiniciar device plugin:kubectl delete pod -n kube-system -l app.kubernetes.io/name=nvidia-device-pluginPod com GPU fica Pending
kubectl describe pod <nome> | grep -A10 EventsSe “Insufficient nvidia.com/gpu”: outro pod ja esta usando (1 GPU = 1 pod sem sharing).
Adicionar novos GPU nodes
Para replicar em ippri-33 ou ippri-34:
- Adicionar LXC em
infra/proxmox-sp/config.py - Adicionar ao
inventory/hosts.yaml - No host: configurar sysctl + GPU passthrough no LXC
- Instalar driver userspace (mesma versao do host)
- Rodar playbook
Proximas fases
| Fase | O que vem | Referencia |
|---|---|---|
| 1 | DRA + KAI Scheduler (GPU sharing) | roadmap/gpu-platform.md |
| 2 | vLLM + Open WebUI | roadmap/gpu-platform.md |
| 3 | TensorZero + Langfuse | roadmap/gpu-platform.md |