Skip to content

Tutor/Open edX

Tutor/Open edX

Plataforma de cursos online (LMS) deployada via ArgoCD com manifests gerados pelo Tutor v21.0.4.


Visao geral

  • Tutor v21.0.4 gera manifests Kustomize (nao usamos tutor k8s start — quem aplica e o ArgoCD)
  • 11 deployments, 4 PVCs, 3 Jobs (ArgoCD hooks)
  • Caddy como router interno — recebe tudo no NodePort 31855 e roteia por Host header
  • Namespace: tutor-openedx

Arquitetura

Internet
|
Traefik (debian-proxy, TLS)
|
NodePort 31855
|
Caddy (router interno)
|
+-- lms.colabh.org -----------> lms:8000 (LMS — Django/uWSGI)
+-- cms.lms.colabh.org -------> cms:8000 (Studio — Django/uWSGI)
+-- apps.lms.colabh.org ------> mfe:8002 (Micro Frontends — React)
+-- meilisearch.lms.colabh.org> meilisearch:7700 (Search)
+-- notes.lms.colabh.org -----> notes:8000 (notas — futuro)
+-- discovery.lms.colabh.org -> discovery:8000 (catalogo — futuro)
Servicos internos (ClusterIP):
mysql:3306, mongodb:27017, redis:6379, smtp:8025
lms-worker, cms-worker (Celery)

Dominios

Todos os subdominios apontam para o mesmo backend (Caddy NodePort 31855). O Caddy faz host-based routing interno.

DominioServicoPapel
lms.colabh.orglms:8000LMS — interface do aluno
cms.lms.colabh.orgcms:8000Studio — criacao de cursos
apps.lms.colabh.orgmfe:8002Micro Frontends (authn, learning, gradebook, etc.)
meilisearch.lms.colabh.orgmeilisearch:7700API de busca publica
notes.lms.colabh.orgnotes:8000Notas do aluno (nao deployado ainda)
discovery.lms.colabh.orgdiscovery:8000Catalogo de cursos (nao deployado ainda)

DNS Cloudflare: todos sao CNAME para traefik-afq-franca.cppsunesp.org (mesmo target que lms.colabh.org).

Criados via API Cloudflare:

Terminal window
# Criar CNAME para novo subdominio Open edX
source .env
curl -X POST -H "Authorization: Bearer $CF_DNS_API_TOKEN" \
-H "Content-Type: application/json" \
"https://api.cloudflare.com/client/v4/zones/70cd39241c7861ecdf4644477069bf3c/dns_records" \
-d '{"type":"CNAME","name":"SUBDOMINIO.colabh.org","content":"traefik-afq-franca.cppsunesp.org","ttl":1,"proxied":false}'

Rota Traefik: config/traefik/routes/tutor-openedx.yaml — uma unica IngressRoute com || para todos os hosts. Ao adicionar um subdominio, adicionar tambem na regra match: do IngressRoute.


Componentes e recursos

ComponenteImagemRequests (CPU/Mem)Limits (Mem)PVCPapel
caddycaddy:2.7.450m / 64Mi128MiRouter HTTP interno
lmsoverhangio/openedx:21.0.4-indigo500m / 1Gi2GiLMS Django (uWSGI, 2 workers)
cmsoverhangio/openedx:21.0.4-indigo250m / 1Gi2GiStudio Django (uWSGI, 2 workers)
lms-workeroverhangio/openedx:21.0.4-indigo250m / 512Mi1GiCelery worker LMS
cms-workeroverhangio/openedx:21.0.4-indigo250m / 512Mi1GiCelery worker CMS
mfeoverhangio/openedx-mfe:21.0.0-indigo50m / 128Mi256MiMicro Frontends React (Caddy interno)
meilisearchgetmeili/meilisearch:v1.8.4100m / 256Mi1Gi5GiMotor de busca
mysqlmysql:8.4.0250m / 512Mi1Gi5GiBanco relacional (cursos, usuarios)
mongodbmongo:7.0.28100m / 256Mi512Mi5GiBanco documental (modulestore, forum)
redisredis:7.4.550m / 128Mi256Mi1GiCache + message broker (Celery)
smtpdevture/exim-relay:4.96-r1-050m / 64Mi128MiRelay de email

Total requests: ~1.9 CPU, ~3.5 Gi RAM Total limits: ~10 Gi RAM Total PVCs: 16 Gi (meilisearch 5 + mysql 5 + mongodb 5 + redis 1)


Jobs (ArgoCD hooks)

JobHookPapel
mysql-jobPostSync (wave 0)Cria DB + user openedx (wait loop ate MySQL subir)
lms-jobPostSync (wave 1)Migrations Django, collectstatic, create Sites
cms-jobPostSync (wave 1)Migrations Django do CMS, collectstatic

Todos usam hook-delete-policy: HookSucceeded — o Job e removido apos sucesso. Sync waves garantem que mysql-job completa antes de lms/cms-job.


Estrutura de arquivos

apps/tutor-openedx/
kustomization.yml -- Kustomize: namespace, labels, configMapGenerator
version -- Versao do Tutor (21.0.4)
k8s/
namespace.yml -- Namespace tutor-openedx
deployments.yml -- 11 Deployments
services.yml -- 9 Services (Caddy = NodePort, resto = ClusterIP)
volumes.yml -- 4 PVCs
jobs.yml -- 3 Jobs com ArgoCD hooks
apps/
caddy/Caddyfile -- Config do Caddy (host routing, body limits, gzip)
openedx/
config/lms.env.yml -- Variaveis de ambiente LMS (Django)
config/cms.env.yml -- Variaveis de ambiente CMS (Django)
settings/lms/ -- Django settings LMS (production.py)
settings/cms/ -- Django settings CMS (production.py)
uwsgi.ini -- Config uWSGI
redis/redis.conf -- Config Redis
plugins/mfe/apps/mfe/
Caddyfile -- Caddy do MFE (serve React apps + proxy /api/mfe_config)

Decisoes arquiteturais

Por que manter o Caddy (em vez de remover e usar NodePorts diretos)

O Caddy funciona como router interno do Tutor. Sem ele, precisariamos de 6 NodePorts separados + 6 IngressRoutes no Traefik. Com o Caddy:

  • 1 NodePort (31855) para tudo
  • Preserva body limits por rota (1MB profile images, 250MB uploads CMS, 4MB padrao LMS)
  • Preserva favicon rewrite, gzip compression
  • Preserva proxy /api/mfe_config do MFE para o LMS (routing interno)
  • Atualizacoes do Tutor regeneram o Caddyfile — sem retrabalho manual

Por que Kustomize e nao Helm

O Tutor nao tem Helm chart oficial. Ele gera kustomization.yml com configMapGenerator que cria ConfigMaps a partir dos arquivos Python/YAML de configuracao. O ArgoCD detecta kustomization.yml automaticamente e usa Kustomize.

Por que secrets em plaintext no Git

Temporario. O Tutor gera secrets (MySQL root password, Django secret key, JWT RSA key, Meilisearch API key) nos arquivos de configuracao. Eles estao no .gitleaks.toml allowlist.

Roadmap: migrar para SOPS/age (encripta no repo, ArgoCD decripta no deploy).

Por que local-path storage (e nao LINSTOR StorageClass)

O disco da VM 103 ja e replicado pelo LINSTOR na camada Proxmox (DRBD entre nodes 21 e 31). Usar LINSTOR CSI dentro da VM seria replicacao dupla. O local-path provisioner do K3s cria PVCs em /var/lib/rancher/k3s/storage/, que ja esta dentro do volume LINSTOR replicado.


Operacoes

Criar superusuario

Terminal window
# No pod LMS (via kubectl na VM 103)
kubectl -n tutor-openedx exec -it deploy/lms -- \
./manage.py lms createsuperuser --username admin --email admin@lms.colabh.org

Reindexar cursos no Meilisearch

Terminal window
kubectl -n tutor-openedx exec -it deploy/lms -- \
./manage.py lms reindex_studio --experimental

Ver logs

Terminal window
# LMS
kubectl -n tutor-openedx logs deploy/lms -f
# CMS
kubectl -n tutor-openedx logs deploy/cms -f
# Workers
kubectl -n tutor-openedx logs deploy/lms-worker -f
kubectl -n tutor-openedx logs deploy/cms-worker -f

Upgrade do Tutor

Quando sair nova versao do Tutor/Open edX:

  1. Criar branch: git checkout -b upgrade/tutor-vXX

  2. Na VM 103 (ssh vm-cpps-02):

    Terminal window
    uv pip install --upgrade "tutor[full]"
    tutor config save # mantem config existente, atualiza defaults
    # Manifests re-gerados em ~/.local/share/tutor/env/
  3. Copiar para o repo local:

    Terminal window
    # Da maquina local:
    ssh vm-cpps-02 "cd /root/.local/share/tutor/env && tar czf - --exclude=build --exclude=dev --exclude=local ." \
    | tar xzf - -C apps/tutor-openedx/
  4. Revisar diff:

    Terminal window
    git diff apps/tutor-openedx/
    # Verificar: imagens mudaram? novas env vars? ports? ConfigMaps?
  5. Reajustar o que o Tutor sobrescreveu:

    • k8s/services.yml: Caddy deve ser NodePort 31855 (Tutor gera ClusterIP)
    • k8s/services.yml: MFE deve ser ClusterIP (Tutor gera NodePort)
    • k8s/jobs.yml: annotations ArgoCD hooks (PreSync/PostSync)
    • k8s/deployments.yml: resource requests/limits
    • kustomization.yml: namespace deve ser tutor-openedx (Tutor gera openedx)
    • k8s/namespace.yml: name deve ser tutor-openedx
    • Remover arquivos dev/test se copiados
  6. Validar:

    Terminal window
    # Na VM:
    scp -r apps/tutor-openedx/ vm-cpps-02:/tmp/tutor-validate/
    ssh vm-cpps-02 "kubectl kustomize /tmp/tutor-validate/"
  7. Commit, push, merge para main. ArgoCD aplica o diff (rolling update).


Troubleshooting

SintomaCausa provavelSolucao
Pods em CrashLoopConfigMap ausente ou incorretoVerificar kustomize build, conferir que configMapGenerator gera todos os CMs
mysql-job falha (PreSync)MySQL nao subiu aindaVerificar PVC MySQL, logs do pod mysql
lms-job/cms-job falham (PostSync)MySQL nao acessivel ou migrations falhamVerificar servico mysql, logs do job
MFE nao carrega (CORS)Dominio nao na whitelistVerificar CORS_ORIGIN_WHITELIST em apps/openedx/settings/lms/production.py
502 em notes/discoveryServicos nao deployadosNormal — notes e discovery serao adicionados futuramente
Caddy 502Pod destino nao esta rodandokubectl -n tutor-openedx get pods — verificar qual pod esta down
Login redireciona para HTTPENABLE_HTTPS=false no TutorEsperado — TLS e terminado no Traefik, nao no Caddy

Backup (a implementar)

ComponenteMetodoFrequenciaDestino
MySQLCronJob mysqldumpDiarioSeaweedFS S3 (s3.colabh.org)
MongoDBCronJob mongodumpDiarioSeaweedFS S3
MeilisearchNao precisa (rebuild do MySQL)
RedisNao precisa (cache efemero)

Protecao contra falha de hardware: ja coberta pelo LINSTOR (disco da VM replicado via DRBD).

Protecao contra erro de aplicacao (corrupao, delete acidental): precisa de backup logico (mysqldump/mongodump).