Tutor/Open edX
Tutor/Open edX
Plataforma de cursos online (LMS) deployada via ArgoCD com manifests gerados pelo Tutor v21.0.4.
Visao geral
- Tutor v21.0.4 gera manifests Kustomize (nao usamos
tutor k8s start— quem aplica e o ArgoCD) - 11 deployments, 4 PVCs, 3 Jobs (ArgoCD hooks)
- Caddy como router interno — recebe tudo no NodePort 31855 e roteia por Host header
- Namespace:
tutor-openedx
Arquitetura
Internet |Traefik (debian-proxy, TLS) |NodePort 31855 |Caddy (router interno) | +-- lms.colabh.org -----------> lms:8000 (LMS — Django/uWSGI) +-- cms.lms.colabh.org -------> cms:8000 (Studio — Django/uWSGI) +-- apps.lms.colabh.org ------> mfe:8002 (Micro Frontends — React) +-- meilisearch.lms.colabh.org> meilisearch:7700 (Search) +-- notes.lms.colabh.org -----> notes:8000 (notas — futuro) +-- discovery.lms.colabh.org -> discovery:8000 (catalogo — futuro)
Servicos internos (ClusterIP): mysql:3306, mongodb:27017, redis:6379, smtp:8025 lms-worker, cms-worker (Celery)Dominios
Todos os subdominios apontam para o mesmo backend (Caddy NodePort 31855). O Caddy faz host-based routing interno.
| Dominio | Servico | Papel |
|---|---|---|
lms.colabh.org | lms:8000 | LMS — interface do aluno |
cms.lms.colabh.org | cms:8000 | Studio — criacao de cursos |
apps.lms.colabh.org | mfe:8002 | Micro Frontends (authn, learning, gradebook, etc.) |
meilisearch.lms.colabh.org | meilisearch:7700 | API de busca publica |
notes.lms.colabh.org | notes:8000 | Notas do aluno (nao deployado ainda) |
discovery.lms.colabh.org | discovery:8000 | Catalogo de cursos (nao deployado ainda) |
DNS Cloudflare: todos sao CNAME para traefik-afq-franca.cppsunesp.org (mesmo target que lms.colabh.org).
Criados via API Cloudflare:
# Criar CNAME para novo subdominio Open edXsource .envcurl -X POST -H "Authorization: Bearer $CF_DNS_API_TOKEN" \ -H "Content-Type: application/json" \ "https://api.cloudflare.com/client/v4/zones/70cd39241c7861ecdf4644477069bf3c/dns_records" \ -d '{"type":"CNAME","name":"SUBDOMINIO.colabh.org","content":"traefik-afq-franca.cppsunesp.org","ttl":1,"proxied":false}'Rota Traefik: config/traefik/routes/tutor-openedx.yaml — uma unica IngressRoute com || para todos os hosts. Ao adicionar um subdominio, adicionar tambem na regra match: do IngressRoute.
Componentes e recursos
| Componente | Imagem | Requests (CPU/Mem) | Limits (Mem) | PVC | Papel |
|---|---|---|---|---|---|
| caddy | caddy:2.7.4 | 50m / 64Mi | 128Mi | — | Router HTTP interno |
| lms | overhangio/openedx:21.0.4-indigo | 500m / 1Gi | 2Gi | — | LMS Django (uWSGI, 2 workers) |
| cms | overhangio/openedx:21.0.4-indigo | 250m / 1Gi | 2Gi | — | Studio Django (uWSGI, 2 workers) |
| lms-worker | overhangio/openedx:21.0.4-indigo | 250m / 512Mi | 1Gi | — | Celery worker LMS |
| cms-worker | overhangio/openedx:21.0.4-indigo | 250m / 512Mi | 1Gi | — | Celery worker CMS |
| mfe | overhangio/openedx-mfe:21.0.0-indigo | 50m / 128Mi | 256Mi | — | Micro Frontends React (Caddy interno) |
| meilisearch | getmeili/meilisearch:v1.8.4 | 100m / 256Mi | 1Gi | 5Gi | Motor de busca |
| mysql | mysql:8.4.0 | 250m / 512Mi | 1Gi | 5Gi | Banco relacional (cursos, usuarios) |
| mongodb | mongo:7.0.28 | 100m / 256Mi | 512Mi | 5Gi | Banco documental (modulestore, forum) |
| redis | redis:7.4.5 | 50m / 128Mi | 256Mi | 1Gi | Cache + message broker (Celery) |
| smtp | devture/exim-relay:4.96-r1-0 | 50m / 64Mi | 128Mi | — | Relay de email |
Total requests: ~1.9 CPU, ~3.5 Gi RAM Total limits: ~10 Gi RAM Total PVCs: 16 Gi (meilisearch 5 + mysql 5 + mongodb 5 + redis 1)
Jobs (ArgoCD hooks)
| Job | Hook | Papel |
|---|---|---|
mysql-job | PostSync (wave 0) | Cria DB + user openedx (wait loop ate MySQL subir) |
lms-job | PostSync (wave 1) | Migrations Django, collectstatic, create Sites |
cms-job | PostSync (wave 1) | Migrations Django do CMS, collectstatic |
Todos usam hook-delete-policy: HookSucceeded — o Job e removido apos sucesso. Sync waves garantem que mysql-job completa antes de lms/cms-job.
Estrutura de arquivos
apps/tutor-openedx/ kustomization.yml -- Kustomize: namespace, labels, configMapGenerator version -- Versao do Tutor (21.0.4) k8s/ namespace.yml -- Namespace tutor-openedx deployments.yml -- 11 Deployments services.yml -- 9 Services (Caddy = NodePort, resto = ClusterIP) volumes.yml -- 4 PVCs jobs.yml -- 3 Jobs com ArgoCD hooks apps/ caddy/Caddyfile -- Config do Caddy (host routing, body limits, gzip) openedx/ config/lms.env.yml -- Variaveis de ambiente LMS (Django) config/cms.env.yml -- Variaveis de ambiente CMS (Django) settings/lms/ -- Django settings LMS (production.py) settings/cms/ -- Django settings CMS (production.py) uwsgi.ini -- Config uWSGI redis/redis.conf -- Config Redis plugins/mfe/apps/mfe/ Caddyfile -- Caddy do MFE (serve React apps + proxy /api/mfe_config)Decisoes arquiteturais
Por que manter o Caddy (em vez de remover e usar NodePorts diretos)
O Caddy funciona como router interno do Tutor. Sem ele, precisariamos de 6 NodePorts separados + 6 IngressRoutes no Traefik. Com o Caddy:
- 1 NodePort (31855) para tudo
- Preserva body limits por rota (1MB profile images, 250MB uploads CMS, 4MB padrao LMS)
- Preserva favicon rewrite, gzip compression
- Preserva proxy
/api/mfe_configdo MFE para o LMS (routing interno) - Atualizacoes do Tutor regeneram o Caddyfile — sem retrabalho manual
Por que Kustomize e nao Helm
O Tutor nao tem Helm chart oficial. Ele gera kustomization.yml com configMapGenerator que cria ConfigMaps a partir dos arquivos Python/YAML de configuracao. O ArgoCD detecta kustomization.yml automaticamente e usa Kustomize.
Por que secrets em plaintext no Git
Temporario. O Tutor gera secrets (MySQL root password, Django secret key, JWT RSA key, Meilisearch API key) nos arquivos de configuracao. Eles estao no .gitleaks.toml allowlist.
Roadmap: migrar para SOPS/age (encripta no repo, ArgoCD decripta no deploy).
Por que local-path storage (e nao LINSTOR StorageClass)
O disco da VM 103 ja e replicado pelo LINSTOR na camada Proxmox (DRBD entre nodes 21 e 31). Usar LINSTOR CSI dentro da VM seria replicacao dupla. O local-path provisioner do K3s cria PVCs em /var/lib/rancher/k3s/storage/, que ja esta dentro do volume LINSTOR replicado.
Operacoes
Criar superusuario
# No pod LMS (via kubectl na VM 103)kubectl -n tutor-openedx exec -it deploy/lms -- \ ./manage.py lms createsuperuser --username admin --email admin@lms.colabh.orgReindexar cursos no Meilisearch
kubectl -n tutor-openedx exec -it deploy/lms -- \ ./manage.py lms reindex_studio --experimentalVer logs
# LMSkubectl -n tutor-openedx logs deploy/lms -f# CMSkubectl -n tutor-openedx logs deploy/cms -f# Workerskubectl -n tutor-openedx logs deploy/lms-worker -fkubectl -n tutor-openedx logs deploy/cms-worker -fUpgrade do Tutor
Quando sair nova versao do Tutor/Open edX:
-
Criar branch:
git checkout -b upgrade/tutor-vXX -
Na VM 103 (ssh vm-cpps-02):
Terminal window uv pip install --upgrade "tutor[full]"tutor config save # mantem config existente, atualiza defaults# Manifests re-gerados em ~/.local/share/tutor/env/ -
Copiar para o repo local:
Terminal window # Da maquina local:ssh vm-cpps-02 "cd /root/.local/share/tutor/env && tar czf - --exclude=build --exclude=dev --exclude=local ." \| tar xzf - -C apps/tutor-openedx/ -
Revisar diff:
Terminal window git diff apps/tutor-openedx/# Verificar: imagens mudaram? novas env vars? ports? ConfigMaps? -
Reajustar o que o Tutor sobrescreveu:
k8s/services.yml: Caddy deve ser NodePort 31855 (Tutor gera ClusterIP)k8s/services.yml: MFE deve ser ClusterIP (Tutor gera NodePort)k8s/jobs.yml: annotations ArgoCD hooks (PreSync/PostSync)k8s/deployments.yml: resource requests/limitskustomization.yml: namespace deve sertutor-openedx(Tutor geraopenedx)k8s/namespace.yml: name deve sertutor-openedx- Remover arquivos dev/test se copiados
-
Validar:
Terminal window # Na VM:scp -r apps/tutor-openedx/ vm-cpps-02:/tmp/tutor-validate/ssh vm-cpps-02 "kubectl kustomize /tmp/tutor-validate/" -
Commit, push, merge para main. ArgoCD aplica o diff (rolling update).
Troubleshooting
| Sintoma | Causa provavel | Solucao |
|---|---|---|
| Pods em CrashLoop | ConfigMap ausente ou incorreto | Verificar kustomize build, conferir que configMapGenerator gera todos os CMs |
| mysql-job falha (PreSync) | MySQL nao subiu ainda | Verificar PVC MySQL, logs do pod mysql |
| lms-job/cms-job falham (PostSync) | MySQL nao acessivel ou migrations falham | Verificar servico mysql, logs do job |
| MFE nao carrega (CORS) | Dominio nao na whitelist | Verificar CORS_ORIGIN_WHITELIST em apps/openedx/settings/lms/production.py |
| 502 em notes/discovery | Servicos nao deployados | Normal — notes e discovery serao adicionados futuramente |
| Caddy 502 | Pod destino nao esta rodando | kubectl -n tutor-openedx get pods — verificar qual pod esta down |
| Login redireciona para HTTP | ENABLE_HTTPS=false no Tutor | Esperado — TLS e terminado no Traefik, nao no Caddy |
Backup (a implementar)
| Componente | Metodo | Frequencia | Destino |
|---|---|---|---|
| MySQL | CronJob mysqldump | Diario | SeaweedFS S3 (s3.colabh.org) |
| MongoDB | CronJob mongodump | Diario | SeaweedFS S3 |
| Meilisearch | Nao precisa (rebuild do MySQL) | — | — |
| Redis | Nao precisa (cache efemero) | — | — |
Protecao contra falha de hardware: ja coberta pelo LINSTOR (disco da VM replicado via DRBD).
Protecao contra erro de aplicacao (corrupao, delete acidental): precisa de backup logico (mysqldump/mongodump).