Integrating Storage in Kubernetes
In an ideal world, every microservice would be completely stateless — handling requests, returning responses, and storing nothing. Stateless services are easy to scale, replace, and redeploy. However, almost every real system has state somewhere: a relational database holding customer records, a search index, a message broker, a cache. At some point, data has to live somewhere.
Integrating that data with Kubernetes is often the most challenging part of building a distributed system. Containerized, cloud-native patterns — decoupled, immutable, declarative — apply naturally to stateless web APIs. Storage is different. Storage solutions often require imperative setup steps, direct IP addressing, or physical proximity to data. These patterns do not fit neatly into the container model.
Kubernetes offers several approaches for dealing with stateful workloads. This article covers the first and most common: importing an external storage service that already exists outside the cluster. Future articles cover reliable singletons and StatefulSets.
The Problem of Data Gravity
Most containerized systems are not built from scratch. They are adapted from existing applications that run on virtual machines, and those VMs hold years of production data. You cannot simply containerize the application and leave the database behind — the data has mass, a pull toward where it already lives. Migrating terabytes of production data is expensive, risky, and time-consuming. This tendency for existing data to resist movement is called data gravity.
Kubernetes provides a clean mechanism for dealing with this: you can represent an external service inside the cluster as if it were a native Kubernetes Service. Your applications never know the difference. The database appears to them as just another cluster service, even though it is actually running on a VM or in a cloud provider's managed database offering.
This pattern is also extremely useful for maintaining identical configuration between environments. In production your application connects to a legacy on-premises database. In testing it connects to a lightweight transient database container. You can name both my-database — one in the prod namespace, one in the test namespace. The application configuration never changes between environments; only the actual backing service differs.
Core Concepts
Services Without Selectors
When you create a normal Kubernetes Service, you provide a label selector — a query that finds the Pods that should receive traffic. But for an external service, there are no Pods. Instead, there is just a DNS hostname or IP address sitting outside the cluster.
Kubernetes supports this with two approaches depending on whether you have a DNS name or only an IP address for the external service.
ExternalName Services (DNS-based)
If your external service has a DNS name, use a Service of type ExternalName. Instead of creating an A record (an IP-to-name mapping) in the cluster's internal DNS, Kubernetes creates a CNAME record that aliases your chosen service name to the external hostname.
The key benefit is that applications inside the cluster use the short, stable name you chose (for example, analytics-db). The external DNS name — which may be long, managed by a cloud provider, or subject to change — is hidden behind that alias. If the external database moves to a different hostname, you update the Service definition once and nothing else changes.
IP-Address Services with Endpoints (IP-based)
Sometimes you do not have a DNS name for the external service — only an IP address. In this case Kubernetes can still represent the service internally, but you must manage the mapping manually using an Endpoints resource.
Normally Kubernetes populates Endpoints automatically by watching Pods that match a Service's label selector. When there is no selector, Kubernetes allocates a virtual IP for the Service but leaves the Endpoints list empty. You are responsible for creating the Endpoints object yourself, pointing it at the external IP.
Because Kubernetes will not update this Endpoints record automatically, you must ensure the IP address stays current. Either guarantee that the external server's IP never changes (a static IP assignment), or build automation that updates the Endpoints record whenever the IP changes.
Namespaces Enable Environment Parity
One of the most powerful benefits of representing external services inside Kubernetes is namespace isolation. Consider a Reporting API deployed into two namespaces:
| Namespace | Service Name | Resolves To |
|---|---|---|
prod | analytics-db | analytics-db.databases.company.com (production server) |
test | analytics-db | analytics-db-test.databases.company.com (test server) |
Both versions of the Reporting API use the connection string Host=analytics-db. Neither knows nor cares that a different backing server is used in each environment.
Comparison: ExternalName vs IP-based External Service
| ExternalName Service | Service + Endpoints (IP-based) | |
| Requires DNS name? | Yes | No — IP address is sufficient |
| DNS record type created | CNAME | A record (virtual IP) |
| Endpoints managed by | Kubernetes (automatic CNAME) | You (manually) |
| Load balancing across IPs | No (DNS round-robin only) | Yes — list multiple IPs in addresses |
| Update required when backend moves | Update externalName field | Update the Endpoints resource |
Hands-On: Kubernetes Commands
Inspecting Services and Endpoints
View all Services in a Namespace:
Describe a Service to see its type and how it is configured:
View the Endpoints for a Service. For a selector-based Service this is populated automatically. For an IP-based external Service, check that your manual Endpoints were accepted:
Testing DNS Resolution from Inside the Cluster
The most reliable way to confirm that a Service resolves correctly is to run a temporary Pod inside the same Namespace and perform a DNS lookup:
For an ExternalName Service you should see the CNAME chain leading to the external hostname. For an IP-based Service you should see the virtual cluster IP.
Testing TCP Connectivity
Confirm that traffic actually reaches the external server on the expected port:
Editing an Endpoints Resource
If the external server's IP address changes, update the Endpoints record in place:
Step-by-Step Example
In this walkthrough we configure a .NET Reporting API to connect to two external PostgreSQL databases: one reached by DNS name, and one reached by IP address only. Both are made available to the application under stable, cluster-internal names.
Step 1: Create the Namespace
Step 2: Import a DNS-Named External Database (ExternalName)
The main analytics database is a managed PostgreSQL instance. Its hostname is analytics-db.databases.company.com. We import it into the cluster under the name analytics-db. Save this as analytics-db-externalname.yaml:
From this moment, any Pod in the reporting Namespace can resolve the hostname analytics-db. Kubernetes DNS will return a CNAME pointing to analytics-db.databases.company.com, which then resolves to the cloud database's IP address. The application does not need to know the cloud provider's hostname at all.
Verify the CNAME record was created:
Step 3: Import an IP-Based External Database
A legacy metrics database runs on a VM with no DNS name — only an IP address (10.0.1.25) and port 5432. To import this, we need two resources: a Service to give it a stable cluster name, and an Endpoints resource to tell Kubernetes where traffic should go.
First, create the Service with no selector. Save this as legacy-metrics-db-service.yaml:
At this point the Service exists and has a virtual IP address, but it has no endpoints — traffic sent to it will go nowhere. Now create the Endpoints resource. The name must exactly match the Service name. Save this as legacy-metrics-db-endpoints.yaml:
Now traffic sent to legacy-metrics-db:5432 inside the cluster will be forwarded to 10.0.1.25:5432.
If the database has two replicas for redundancy, list both IPs in the addresses array:
Step 4: Deploy the Reporting API
The Reporting API is a .NET 10 application. It connects to analytics-db using the short cluster-internal hostname — it never needs to know that the database is external. Save this as reporting-api-deployment.yaml:
Notice that the connection string uses Host=analytics-db — just the short service name. Kubernetes DNS handles the rest. This same connection string works unchanged whether the backing database is a cloud-managed instance, a VM, or a containerized database running inside the cluster.
Step 5: Verify the Setup
Confirm that the Deployment is running:
Check that both Services have entries in the cluster:
Verify the Endpoints were created correctly:
Test DNS resolution from a temporary Pod inside the reporting Namespace:
Step 6: Demonstrate Environment Parity
The real power of this pattern becomes clear when you consider environment parity. Create a separate reporting-test Namespace:
Deploy a analytics-db ExternalName Service in the test Namespace, pointing to a different test database:
Now you can deploy the exact same Reporting API Deployment (with the same connection string Host=analytics-db) into reporting-test, and it will automatically connect to the test database — zero configuration changes required.
Step 7: Clean Up
Summary
Integrating external storage into Kubernetes doesn't require migrating data. By representing external services as Kubernetes Services, you gain all the benefits of cluster-native service discovery while keeping your existing infrastructure exactly where it is. Here is what we covered:
- Data gravity means existing data resists movement. Kubernetes provides a way to reference external services without migrating them.
- For external services reachable by a DNS name, use a Service of type
ExternalName. Kubernetes creates a CNAME record in cluster DNS that aliases your chosen name to the external hostname. - For external services reachable only by IP address, create a Service without a selector and manually create an
Endpointsresource that maps the Service to the external IP. You are responsible for keeping the Endpoints record up to date. - Namespace isolation makes environment parity trivial. Deploy the same application with the same configuration into
prodandtestnamespaces, with each namespace containing a same-named Service pointing to its environment-specific backend. - Applications written to use a cluster-internal service name (like
analytics-db) require no code or configuration changes when the backing service is later migrated into the cluster as a native workload.