Keycloak Kubernetes Setup
01Architecture Overview
This setup deploys Keycloak as a StatefulSet backed by a dedicated PostgreSQL instance. External traffic flows through Cloudflare (TLS termination), hits the NGINX Ingress Controller, and is proxied to the Keycloak service on port 8080.
Browser ──HTTPS──▶ Cloudflare ──HTTP──▶ NGINX Ingress ──HTTP──▶ Keycloak :8080 │ ▼ PostgreSQL :5432 │ ▼ PersistentVolume (10Gi)
Keycloak's embedded Infinispan cache uses JGroups for cluster discovery, backed by a headless Service (keycloak-discovery). When you scale to two replicas, sessions replicate automatically between pods.
02Prerequisites
Before you begin, make sure you have the following in place:
- A running Kubernetes cluster (k3s, EKS, GKE, AKS, etc.)
- NGINX Ingress Controller installed (via Helm or manifest)
- A StorageClass that supports dynamic provisioning (e.g.
hcloud-volumes,gp3,standard) - kubectl configured and pointing at your cluster
- A domain with DNS managed by Cloudflare (or any reverse proxy)
Verify your setup:
03The Complete Manifest
The entire stack lives in a single YAML file. It creates a dedicated namespace, a Secret for credentials, a PostgreSQL Deployment with persistent storage, and the Keycloak StatefulSet with an NGINX Ingress resource.
Namespace & Secrets
WarningNever use default credentials in production. Generate strong passwords and consider using an external secret manager like Vault, Sealed Secrets, or External Secrets Operator.
PostgreSQL with Persistent Storage
NoteThe subPath: pgdata is important — PostgreSQL requires the data directory to be empty on first init. Using a subPath prevents the volume root's lost+found directory from causing initialization failures.
Keycloak StatefulSet & Services
NGINX Ingress (Cloudflare TLS Termination)
Why proxy-buffer-size?Keycloak sends large HTTP headers (especially during OIDC token exchanges). Without increasing the buffer size, NGINX returns 502 Bad Gateway errors. The 128k value is a safe default for Keycloak deployments.
04Deploying to the Cluster
Apply the manifest and watch the pods come up:
PostgreSQL should be Running within 30 seconds. Keycloak takes longer (1–3 minutes) because it runs database migrations on first start. The startup probe gives it up to 10 minutes (300 × 2s) before Kubernetes considers it failed.
Once both pods show Running and 1/1 READY, verify the Ingress:
05Cloudflare & DNS Configuration
Two things to configure on the Cloudflare dashboard:
DNS Record
Create an A record pointing your domain to the external IP of the NGINX Ingress Controller's LoadBalancer service. Enable the orange cloud (Proxy) for Cloudflare protection.
| Type | Name | Content | Proxy |
|---|---|---|---|
| A | sso | YOUR_LOADBALANCER_IP | Proxied (orange) |
SSL/TLS Mode
Set the SSL/TLS encryption mode to Full (not "Full (Strict)"). Since Keycloak runs on HTTP behind the Ingress with no cluster-side certificate, "Full" tells Cloudflare to encrypt the client-facing connection while accepting HTTP from your origin.
ImportantDo not use "Flexible" mode — it can cause redirect loops with Keycloak's KC_PROXY_HEADERS configuration. "Full" is the correct choice.
06Without NGINX Ingress & Persistent Volumes
Not every cluster has an Ingress Controller or a dynamic StorageClass. Here's how to adapt the manifest for a minimal setup — useful for development, testing, or bare-metal clusters without a cloud volume provisioner.
What Changes
| Component | Full Setup | Minimal Setup |
|---|---|---|
| Ingress | NGINX Ingress + Cloudflare | NodePort Service (direct access) |
| PostgreSQL Storage | PVC with StorageClass | emptyDir (ephemeral) |
| TLS | Cloudflare terminates | None (HTTP only) |
| Hostname | KC_HOSTNAME set | KC_HOSTNAME_STRICT=false |
Modified PostgreSQL (No PVC)
Remove the PersistentVolumeClaim entirely and replace the volume with emptyDir:
WarningWith emptyDir, all data is lost when the PostgreSQL pod restarts. This is acceptable for development and testing only. Never use this in production.
NodePort Instead of Ingress
Remove the Ingress resource and change the Keycloak Service type from ClusterIP to NodePort:
Keycloak Environment Changes
Replace the hostname configuration in the Keycloak StatefulSet env section:
Access Keycloak at http://<NODE_IP>:30080.
07Post-Installation Steps
After Keycloak starts, you'll see a warning banner: "You are logged in as a temporary admin user." This is expected. The bootstrap admin account should be replaced with a permanent one.
Create a Permanent Admin
- Log in to the Admin Console at
https://sso.example.com/admin - Navigate to the master realm → Users → Create user
- Set a username, email, first/last name
- Go to Credentials tab → set a strong password (toggle Temporary to Off)
- Go to Role mappings → assign the admin role
- Log out, log in with the new account, and delete the bootstrap
adminuser
Create Your First Realm
- Click Create Realm next to "Current realm"
- Give it a name (e.g.
myapp) - Create users and clients (OIDC/SAML) within that realm
Best PracticeNever use the master realm for application users. Keep it exclusively for Keycloak administration. Create separate realms for each application or tenant.
08Production Hardening Checklist
| Area | Action |
|---|---|
| Credentials | Use Kubernetes Secrets with strong, generated passwords. Consider External Secrets Operator or Sealed Secrets. |
| Database | Use a managed PostgreSQL service or a proper operator (CloudNativePG, Zalando). Never run production databases with emptyDir. |
| Replicas | Scale Keycloak to 2 replicas for high availability. JGroups handles session replication automatically. |
| Resources | Tune CPU/memory requests and limits based on actual load. Monitor with Prometheus + Grafana. |
| TLS | If not using Cloudflare, configure cert-manager with Let's Encrypt for automated certificate management. |
| Backups | Set up regular PostgreSQL backups (pg_dump, WAL archiving, or operator-managed backups). |
| Network Policies | Restrict traffic so only the Keycloak pods can reach PostgreSQL, and only the Ingress can reach Keycloak. |
| Monitoring | Enable Keycloak metrics endpoint and scrape with Prometheus. Set up alerts for pod restarts and health check failures. |