Kubernetes Entry Created: 10 Mar 2026 Updated: 10 Mar 2026

Kubernetes Jobs and CronJobs

So far in this series we have focused on long-running processes — web APIs, databases, and background services that run until they are explicitly upgraded or decommissioned. These workloads are managed by Deployments, ReplicaSets, and DaemonSets, all of which keep Pods running indefinitely.

But not every workload is long-running. Many real-world tasks are short-lived and one-off: applying a database migration, generating a monthly report, compressing old log files, or sending a batch of notification emails. If you ran these tasks as regular Pods, Kubernetes would restart them endlessly after they finish — your migration would re-apply on every exit, your report would regenerate in a loop.

The Job object solves this problem. A Job creates one or more Pods and ensures they run until successful termination (exit code 0). Once a Pod completes successfully, the Job does not restart it. If a Pod fails, the Job controller creates a replacement. This makes Jobs ideal for batch processing, one-time setup tasks, and any workload that should run to completion and then stop.

Kubernetes also provides the CronJob object, which creates a new Job on a recurring schedule — just like the Unix cron daemon, but with the full power of Kubernetes orchestration behind it.

This article covers every Job pattern you will encounter in production, from simple one-shot tasks to parallel batch processing and work-queue consumers, plus scheduled CronJobs for recurring work.

Core Concepts

Step 1: Jobs vs. Regular Pods — Why the Difference Matters

A regular Pod managed by a Deployment has a restartPolicy of Always. This means the kubelet will restart the container every time it exits, regardless of the exit code. That behavior is perfect for a web server that should stay up 24/7, but disastrous for a task like a database migration:

ScenarioRegular Pod (restartPolicy: Always)Job Pod (restartPolicy: OnFailure / Never)
Container exits with code 0 (success)Container is restarted immediatelyPod is marked Completed — no restart
Container exits with code 1 (failure)Container is restarted immediatelyContainer is restarted in-place (OnFailure) or a new Pod is created (Never)
Node crashes mid-executionPod rescheduled on another nodeJob controller creates a new Pod on another node

The key insight: a Job stops when the work is done. A Deployment never stops on purpose — it always tries to keep Pods running.

Step 2: The Job Object — How It Works

The Job object is a Kubernetes controller responsible for creating and managing Pods defined in its spec.template. It tracks how many Pods have completed successfully and continues creating new Pods until the desired number of completions is reached.

Here is the lifecycle of a Job:

  1. You submit a Job manifest to the Kubernetes API server.
  2. The Job controller reads the Pod template and creates one or more Pods.
  3. Each Pod is scheduled onto a node and runs the specified container(s).
  4. If a Pod succeeds (exits with code 0), the Job controller records the completion. If a Pod fails, the controller either restarts it in-place or creates a replacement, depending on the restartPolicy.
  5. Once the number of successful completions reaches spec.completions, the Job is marked as Complete.
  6. The completed Job and its Pods are retained in the cluster so you can inspect their logs. They are not automatically deleted.

Because Jobs have a finite lifespan, the Job controller automatically generates a unique controller-uid label and applies it to every Pod it creates. This prevents accidental overlap with Pods from other Jobs or controllers.

Step 3: Job Patterns — One Shot, Parallel, and Work Queue

Jobs support three primary patterns, controlled by two fields: completions (how many Pods must succeed) and parallelism (how many Pods can run at the same time). Think of completions as how much work and parallelism as how many workers:

PatternUse CaseBehaviorcompletionsparallelism
One ShotDatabase migration, one-time setupA single Pod runs once until it succeeds11
Parallel Fixed CompletionsBatch report generation, data processingMultiple Pods run in parallel until a fixed total succeeds1+1+
Work QueueProcessing items from a message queueMultiple Pods run in parallel; Job completes when the first Pod exits 01 (or unset)2+

An analogy: imagine a restaurant kitchen. A one-shot job is like a single chef cooking one special dish — once it is done, the kitchen closes. A parallel fixed completions job is like four chefs each preparing their own dish — once all dishes are ready, the kitchen closes. A work queue job is like four chefs sharing one stack of orders — they keep cooking until the order stack is empty.

Step 4: Understanding completions and parallelism

Let's break down the two key fields in detail:

  1. spec.completions — The total number of Pods that must terminate successfully for the Job to be considered complete. Defaults to 1. If you set completions: 10, the Job will keep creating Pods until 10 have exited with code 0.
  2. spec.parallelism — The maximum number of Pods that can run simultaneously. Defaults to 1. If you set parallelism: 5 and completions: 10, Kubernetes will run up to 5 Pods at a time, launching new ones as existing Pods finish, until 10 total succeed.

A practical example: suppose you need to generate 800 invoices and each Pod generates 100. You need 8 successful completions (completions: 8). You have enough cluster resources for 4 Pods at once (parallelism: 4). Kubernetes will launch 4 Pods, and as each finishes, a new one takes its place, until all 8 complete.

Step 5: Restart Policies and Failure Handling

Jobs only support two restart policies: OnFailure and Never. The Always policy is not allowed because it would conflict with the Job's run-to-completion semantics. The choice between them affects how failures are handled:

restartPolicyOn FailureResultRecommendation
OnFailureThe kubelet restarts the container inside the same PodOne Pod, multiple restarts. Clean and tidy.Use this for most Jobs
NeverThe Job controller creates a brand-new PodMultiple failed Pods accumulate. Useful for debugging.Use when you need to inspect each failed Pod

When a Pod fails with restartPolicy: OnFailure, the kubelet applies an exponential backoff delay (10s, 20s, 40s, ... up to 6 minutes) before restarting. This is called CrashLoopBackOff and prevents a crashing container from consuming node resources in a tight loop.

The spec.backoffLimit field (default: 6) sets the maximum number of retries before the Job is marked as Failed. Choose this value based on how transient you expect failures to be. For a database migration that either works or doesn't, a backoffLimit of 2–4 is reasonable. For a flaky network call, you might set it higher.

You can also use liveness probes with Jobs. If a worker Pod gets stuck (for example, a deadlock or infinite loop with no progress), the liveness probe will detect it and restart or replace the Pod — just like it does for Deployment Pods.

Step 6: CronJobs — Scheduling Recurring Jobs

A CronJob is a higher-level object that creates a new Job on a schedule. It uses the standard Unix cron syntax:

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of the week (0 - 6, Sunday = 0)
# │ │ │ │ │
# * * * * *

Common schedule examples:

Schedule ExpressionMeaning
0 2 * * *Every day at 2:00 AM
*/15 * * * *Every 15 minutes
0 0 1 * *First day of every month at midnight
0 */6 * * *Every 6 hours
30 8 * * 1-5Weekdays at 8:30 AM

CronJobs have two important history settings:

  1. successfulJobsHistoryLimit (default: 3) — How many completed Jobs to keep. Older ones are automatically deleted.
  2. failedJobsHistoryLimit (default: 1) — How many failed Jobs to keep for debugging.

The CronJob controller also supports a concurrencyPolicy field that controls what happens when a new Job is due but the previous one is still running:

  1. Allow (default) — Multiple Jobs can run concurrently.
  2. Forbid — Skip the new Job if the previous one is still running.
  3. Replace — Cancel the running Job and start a new one.

Step 7: Job History and Automatic Cleanup

Unlike Deployments, completed Jobs and their Pods are not automatically deleted. They remain in the cluster so you can inspect their logs and status. Over time, this can clutter your cluster with thousands of completed Job objects.

There are two ways to handle cleanup:

  1. TTL controller — Set spec.ttlSecondsAfterFinished on the Job. For example, ttlSecondsAfterFinished: 3600 deletes the Job and its Pods one hour after completion. This is the recommended approach for automated cleanup.
  2. Manual cleanup — Use kubectl delete job <name> to remove a Job and all its Pods.

For CronJobs, the successfulJobsHistoryLimit and failedJobsHistoryLimit fields handle this automatically.

Hands-On: Kubernetes Commands

Create a Job from a manifest file:

kubectl apply -f schema-migrate-job.yaml

List all Jobs in the current namespace (including completed ones):

kubectl get jobs

Describe a Job to see its status, completions, and events:

kubectl describe job schema-migrate

List Pods created by a specific Job using label selectors:

kubectl get pods -l job-name=schema-migrate

View the logs of a completed Job Pod:

kubectl logs job/schema-migrate

Watch Pods in real time as a parallel Job progresses:

kubectl get pods -l job-name=invoice-batch --watch

Delete a Job and all its Pods:

kubectl delete job schema-migrate

Delete a Job but keep its Pods running (orphan them):

kubectl delete job schema-migrate --cascade=orphan

Create a CronJob from a manifest file:

kubectl apply -f report-cleanup-cronjob.yaml

List all CronJobs:

kubectl get cronjobs

Describe a CronJob to see its schedule, last run, and next trigger time:

kubectl describe cronjob report-cleanup

Manually trigger a CronJob immediately (creates a one-off Job):

kubectl create job manual-cleanup --from=cronjob/report-cleanup

Suspend a CronJob to stop it from creating new Jobs:

kubectl patch cronjob report-cleanup -p '{"spec":{"suspend":true}}'

Resume a suspended CronJob:

kubectl patch cronjob report-cleanup -p '{"spec":{"suspend":false}}'

Clean up all resources labelled for this chapter:

kubectl delete job,cronjob,rs,svc -l chapter=jobs

Step-by-Step Example

In this walkthrough you will deploy a one-shot database migration Job, a parallel batch processing Job, a work-queue consumer pattern with multiple workers, and a scheduled CronJob. Each example builds on a different Job pattern so you can see how completions, parallelism, and restartPolicy interact in practice.

Step 1: Deploy a One-Shot Database Migration Job

One-shot Jobs are the simplest pattern: one Pod, one task, run until success. This is perfect for tasks like database schema migrations. The following Job simulates running Entity Framework Core migrations for a .NET ShippingApi application. Save the manifest as schema-migrate-job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
name: schema-migrate
labels:
app: shipping-api
component: migration
chapter: jobs
spec:
backoffLimit: 4
template:
metadata:
labels:
app: shipping-api
component: migration
chapter: jobs
spec:
containers:
- name: migrate
image: mcr.microsoft.com/dotnet/sdk:10.0
command:
- sh
- -c
- |
echo "Starting database migration for ShippingApi...";
echo "Applying migration: 001_CreateShipmentsTable";
sleep 2;
echo "Applying migration: 002_AddTrackingIndex";
sleep 2;
echo "Applying migration: 003_SeedCarrierData";
sleep 2;
echo "All migrations applied successfully.";
exit 0
env:
- name: DOTNET_ENVIRONMENT
value: "Production"
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
restartPolicy: OnFailure

Apply the manifest to create the Job:

kubectl apply -f schema-migrate-job.yaml

Key points about this manifest:

  1. restartPolicy: OnFailure tells the kubelet to restart the container inside the same Pod if it exits with a non-zero code. The Pod stays on the same node.
  2. backoffLimit: 4 means the Job will retry up to 4 times before being marked as Failed.
  3. Neither completions nor parallelism is set, so both default to 1 — the classic one-shot pattern.

Step 2: Inspect the Job and Its Pod

After applying the Job, check its status:

kubectl describe job schema-migrate

You will see output similar to:

Name: schema-migrate
Namespace: default
Parallelism: 1
Completions: 1
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Events:
... Reason Message
... ------ -------
... SuccessfulCreate Created pod: schema-migrate-7x2kp

The Pods Statuses line confirms one Pod succeeded. Now view the Pod's logs to see the migration output:

kubectl logs job/schema-migrate

Expected output:

Starting database migration for ShippingApi...
Applying migration: 001_CreateShipmentsTable
Applying migration: 002_AddTrackingIndex
Applying migration: 003_SeedCarrierData
All migrations applied successfully.

The Job ran exactly once, the migration completed, and the Pod stopped. Kubernetes will not restart it. The Job object and its Pod remain in the cluster so you can inspect them. You can also see the Pod in a Completed state:

kubectl get pods -l job-name=schema-migrate
NAME READY STATUS RESTARTS AGE
schema-migrate-7x2kp 0/1 Completed 0 45s

Step 3: Understand Job Failure Behavior

What happens when a Job Pod fails? The behavior depends on the restartPolicy:

Scenario A: restartPolicy: OnFailure

When the container exits with a non-zero code, the kubelet restarts it inside the same Pod. You will see the RESTARTS counter increment and eventually CrashLoopBackOff status as the kubelet applies exponential backoff delays:

kubectl get pods -l job-name=schema-migrate

NAME READY STATUS RESTARTS AGE
schema-migrate-7x2kp 0/1 CrashLoopBackOff 4 3m

This is the recommended approach for most Jobs because it keeps the cluster clean — only one Pod exists, and it retries in place.

Scenario B: restartPolicy: Never

With restartPolicy: Never, the kubelet does not restart the failed container. Instead, it marks the Pod as failed. The Job controller notices and creates a brand-new Pod. If the failure persists, you end up with multiple failed Pods:

kubectl get pods -l job-name=schema-migrate

NAME READY STATUS RESTARTS AGE
schema-migrate-abc12 0/1 Error 0 2m
schema-migrate-def34 0/1 Error 0 90s
schema-migrate-ghi56 0/1 Error 0 60s
schema-migrate-jkl78 1/1 Running 0 10s

This creates "junk" in your cluster but is useful when you need to inspect each failed Pod individually (for example, to read different error logs from each attempt). For production use, prefer restartPolicy: OnFailure to keep things tidy.

Step 4: Deploy a Parallel Batch Processing Job

Now let's scale up. Suppose your billing system needs to generate invoices in batches. You want 8 total batch runs, each generating a set of invoices, with up to 4 running simultaneously. This is the parallel fixed completions pattern.

Save the following manifest as invoice-batch-job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
name: invoice-batch
labels:
app: billing-api
component: batch
chapter: jobs
spec:
completions: 8
parallelism: 4
backoffLimit: 6
template:
metadata:
labels:
app: billing-api
component: batch
chapter: jobs
spec:
containers:
- name: invoice-generator
image: mcr.microsoft.com/dotnet/runtime:10.0
command:
- sh
- -c
- |
echo "Invoice batch worker started on $(hostname)";
for i in $(seq 1 5); do
echo "Generating invoice $i of 5...";
sleep 1;
done;
echo "Batch complete.";
exit 0
env:
- name: DOTNET_ENVIRONMENT
value: "Production"
resources:
requests:
cpu: "200m"
memory: "128Mi"
limits:
cpu: "400m"
memory: "256Mi"
restartPolicy: OnFailure

Apply it:

kubectl apply -f invoice-batch-job.yaml

The Job will immediately launch 4 Pods (the parallelism limit). As each Pod completes, a new one is created until 8 total have succeeded.

Step 5: Monitor Parallel Job Progress

Use the --watch flag to see Pods come and go in real time:

kubectl get pods -l job-name=invoice-batch --watch

You will see output similar to:

NAME READY STATUS RESTARTS AGE
invoice-batch-2nk7f 1/1 Running 0 3s
invoice-batch-8xp4m 1/1 Running 0 3s
invoice-batch-kw9vt 1/1 Running 0 3s
invoice-batch-r5hbn 1/1 Running 0 3s
invoice-batch-2nk7f 0/1 Completed 0 8s
invoice-batch-zt3qx 0/1 Pending 0 0s
invoice-batch-zt3qx 1/1 Running 0 1s
...

Notice that exactly 4 Pods run at any time. As one completes, a new one is scheduled immediately. Once all 8 completions are done, no more Pods are created.

Check the final Job status:

kubectl describe job invoice-batch
Completions: 8
Parallelism: 4
Pods Statuses: 0 Running / 8 Succeeded / 0 Failed

All 8 batches completed successfully. Clean up before the next step:

kubectl delete job invoice-batch

Step 6: Set Up a Work Queue Infrastructure

The work queue pattern is different from fixed completions. Instead of each Pod doing independent work, all Pods pull items from a shared queue. The Job completes when the queue is empty and a worker exits successfully.

In this example, we will deploy a Redis server as the work queue and then launch consumer workers. First, create a ReplicaSet to manage the Redis queue server. Using a ReplicaSet (rather than a bare Pod) ensures the server is automatically restarted if the node fails. Save this as task-queue-rs.yaml:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: task-queue
labels:
app: task-queue
component: server
chapter: jobs
spec:
replicas: 1
selector:
matchLabels:
app: task-queue
component: server
chapter: jobs
template:
metadata:
labels:
app: task-queue
component: server
chapter: jobs
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
protocol: TCP
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"

Apply it:

kubectl apply -f task-queue-rs.yaml

Next, expose the Redis server with a Service so that worker Pods can find it via DNS. Save this as task-queue-service.yaml:

apiVersion: v1
kind: Service
metadata:
name: task-queue
labels:
app: task-queue
component: server
chapter: jobs
spec:
ports:
- port: 6379
protocol: TCP
targetPort: 6379
selector:
app: task-queue
component: server

Apply it:

kubectl apply -f task-queue-service.yaml

Verify the queue server is running:

kubectl get pods -l app=task-queue
NAME READY STATUS RESTARTS AGE
task-queue-9m4xz 1/1 Running 0 30s

Step 7: Create Consumer Workers for the Queue

Now deploy a Job that runs 5 worker Pods in parallel. Each worker simulates pulling items from the queue and processing them. In a real application, these workers would connect to Redis and dequeue work items. Save this as queue-processor-job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
name: queue-processor
labels:
app: task-queue
component: worker
chapter: jobs
spec:
parallelism: 5
template:
metadata:
labels:
app: task-queue
component: worker
chapter: jobs
spec:
containers:
- name: worker
image: busybox:1.37
command:
- sh
- -c
- |
echo "Queue worker started on $(hostname)";
ITEMS=$((RANDOM % 5 + 3));
for i in $(seq 1 $ITEMS); do
echo "Processing work item $i of $ITEMS...";
sleep 2;
done;
echo "No more items in queue. Worker exiting.";
exit 0
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
restartPolicy: OnFailure

Apply it:

kubectl apply -f queue-processor-job.yaml

Notice that completions is not set. When you omit completions and set parallelism to more than 1, Kubernetes enters work queue mode: once any single Pod exits with code 0, the Job begins winding down. No new Pods are started, and the Job waits for all running Pods to finish.

Watch the workers process their items:

kubectl get pods -l job-name=queue-processor --watch
NAME READY STATUS RESTARTS AGE
queue-processor-4kf7n 1/1 Running 0 5s
queue-processor-8m2xp 1/1 Running 0 5s
queue-processor-bn3rv 1/1 Running 0 5s
queue-processor-ht6wz 1/1 Running 0 5s
queue-processor-ws9jk 1/1 Running 0 5s
queue-processor-bn3rv 0/1 Completed 0 11s
queue-processor-4kf7n 0/1 Completed 0 15s
...

All 5 workers run in parallel, process their items, and exit. Once all are done, the Job is complete. Check the logs of any worker to see its output:

kubectl logs job/queue-processor

Step 8: Schedule a CronJob for Recurring Cleanup

Finally, let's set up a CronJob that runs a cleanup task every day at 2:00 AM. This is useful for tasks like purging old reports, archiving logs, or compacting database indexes. Save the following as report-cleanup-cronjob.yaml:

apiVersion: batch/v1
kind: CronJob
metadata:
name: report-cleanup
labels:
app: billing-api
component: cleanup
chapter: jobs
spec:
schedule: "0 2 * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
app: billing-api
component: cleanup
chapter: jobs
spec:
containers:
- name: cleanup
image: mcr.microsoft.com/dotnet/runtime:10.0
command:
- sh
- -c
- |
echo "Report cleanup started at $(date)";
echo "Scanning for expired reports older than 30 days...";
sleep 3;
echo "Removed 47 expired reports.";
echo "Compacting database indexes...";
sleep 2;
echo "Cleanup finished at $(date)";
exit 0
env:
- name: DOTNET_ENVIRONMENT
value: "Production"
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "250m"
memory: "256Mi"
restartPolicy: OnFailure

Apply it:

kubectl apply -f report-cleanup-cronjob.yaml

Verify the CronJob was created:

kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
report-cleanup 0 2 * * * False 0 <none> 10s

The CronJob will not run until 2:00 AM. To test it immediately, you can manually create a Job from the CronJob template:

kubectl create job manual-cleanup --from=cronjob/report-cleanup

Watch the manually triggered Job run:

kubectl logs job/manual-cleanup
Report cleanup started at Wed Mar 11 02:00:00 UTC 2026
Scanning for expired reports older than 30 days...
Removed 47 expired reports.
Compacting database indexes...
Cleanup finished at Wed Mar 11 02:00:05 UTC 2026

Key CronJob details to note:

  1. successfulJobsHistoryLimit: 3 keeps only the last 3 successful Job runs. Older completed Jobs are automatically deleted.
  2. failedJobsHistoryLimit: 1 keeps only the most recent failed Job for debugging.
  3. You can suspend a CronJob at any time to stop it from creating new Jobs, and resume it later. This is useful during maintenance windows.

Step 9: Clean Up All Resources

All resources in this walkthrough were labelled with chapter: jobs. Use this label to clean everything up in one command:

kubectl delete job,cronjob,rs,svc -l chapter=jobs

Also delete the manually triggered Job:

kubectl delete job manual-cleanup

Summary

  1. A Job creates one or more Pods that run until successful termination. Unlike Deployments, a Job does not restart Pods that exit successfully.
  2. The one-shot pattern (completions: 1, parallelism: 1) is perfect for one-time tasks like database migrations.
  3. The parallel fixed completions pattern processes batch work by running multiple Pods simultaneously until a target number of completions is reached.
  4. The work queue pattern runs multiple parallel workers that pull from a shared queue. The Job completes when a worker exits successfully, signalling the queue is empty.
  5. Use restartPolicy: OnFailure for most Jobs to keep the cluster clean. Use restartPolicy: Never only when you need to inspect each failed Pod.
  6. backoffLimit controls how many retries a Job attempts before being marked as Failed.
  7. A CronJob creates a new Job on a cron schedule. Use successfulJobsHistoryLimit and failedJobsHistoryLimit to control how many old Jobs are retained.
  8. Use ttlSecondsAfterFinished on standalone Jobs to automatically clean them up after completion.


Share this lesson: