Introduction
Kubernetes Pods are the smallest deployable units in Kubernetes, representing one or more containers that share resources. Understanding the Pod lifecycle is crucial for debugging, monitoring, and managing applications in Kubernetes.
This guide visualizes the complete Pod lifecycle:
- Pod Creation: From YAML manifest to scheduling
- State Transitions: Pending → Running → Succeeded/Failed
- Init Containers: Pre-application setup
- Container Restart Policies: How Kubernetes handles failures
- Termination: Graceful shutdown process
Part 1: Pod Lifecycle Overview
Complete Pod State Machine
Image pull failed
Invalid config Running --> Succeeded: All containers
completed successfully
(restartPolicy: Never/OnFailure) Running --> Failed: Container failed
and won't restart
Pod deleted during run Running --> Running: Container restarted
(restartPolicy: Always/OnFailure) Succeeded --> [*]: Pod cleanup Failed --> [*]: Pod cleanup Running --> Terminating: Delete request
received Terminating --> Succeeded: Graceful shutdown
successful Terminating --> Failed: Force termination
after grace period note right of Pending Pod accepted by cluster - Waiting for scheduling - Pulling images - Starting init containers - Creating container runtime end note note right of Running Pod is executing - At least 1 container running - Could be starting/restarting - Application serving traffic - Health checks active end note note right of Succeeded All containers terminated successfully - Exit code 0 - Will not be restarted - Job/CronJob completed end note note right of Failed Pod terminated in failure - Non-zero exit code - OOMKilled - Exceeded restart limit - Node failure end note note right of Terminating Pod shutting down - SIGTERM sent - Grace period active - Endpoints removed - Cleanup in progress end note
Pod Creation to Running Flow
Validates YAML
Writes to etcd] APIServer --> Scheduler{Scheduler finds
suitable node?} Scheduler -->|No| PendingNoNode[Status: Pending
Reason: Unschedulable
- Insufficient resources
- Node selector mismatch
- Taints/tolerations] Scheduler -->|Yes| AssignNode[Pod assigned to Node
Update: spec.nodeName] AssignNode --> Kubelet[Kubelet on target node
receives Pod spec] Kubelet --> PullImages{Pull container
images} PullImages -->|Failed| ImagePullError[Status: Pending
Reason: ImagePullBackOff
- Image doesn't exist
- Registry auth failed
- Network issues] PullImages -->|Success| InitContainers{Init containers
defined?} InitContainers -->|Yes| RunInit[Run init containers
sequentially] InitContainers -->|No| CreateContainers RunInit --> InitSuccess{All init
containers
succeeded?} InitSuccess -->|No| InitFailed[Status: Init:Error
or Init:CrashLoopBackOff] InitSuccess -->|Yes| CreateContainers[Create main containers
Setup networking
Mount volumes] CreateContainers --> StartContainers[Start all containers
in Pod] StartContainers --> HealthChecks{Startup probe
defined?} HealthChecks -->|Yes| StartupProbe[Execute startup probe] HealthChecks -->|No| Running StartupProbe --> StartupResult{Probe
passed?} StartupResult -->|No| ProbeFailed[Container not ready
If fails too long:
CrashLoopBackOff] StartupResult -->|Yes| Running[Status: Running
- Container ready
- Liveness probe active
- Readiness probe active] Running --> ServingTraffic[Pod receives traffic
Added to Service endpoints] style PendingNoNode fill:#78350f,stroke:#f59e0b style ImagePullError fill:#7f1d1d,stroke:#ef4444 style InitFailed fill:#7f1d1d,stroke:#ef4444 style Running fill:#064e3b,stroke:#10b981 style ServingTraffic fill:#064e3b,stroke:#10b981
Part 2: Pod Creation Sequence
API Server to Kubelet Communication
- Required fields
- Resource limits
- Security context API->>ETCD: Write Pod object
Status: Pending
nodeName:
- CPU/Memory available
- Affinity rules
- Taints/Tolerations
Best node: node-1 Sched->>API: Bind Pod to node-1 API->>ETCD: Update Pod.spec.nodeName = "node-1" Note over Kubelet: Watch for Pods on node-1 Kubelet->>API: Get Pod specifications API-->>Kubelet: Pod details Kubelet->>Runtime: Pull image: nginx:1.21 Runtime->>Reg: Pull nginx:1.21 Reg-->>Runtime: Image layers Note over Runtime: Extract and cache image Kubelet->>Runtime: Create container
with Pod spec config Runtime-->>Kubelet: Container created Kubelet->>Runtime: Start container Runtime-->>Kubelet: Container started Kubelet->>API: Update Pod Status:
Phase: Running
containerStatuses: ready API->>ETCD: Save Pod status Kubelet->>Kubelet: Start health checks
- Startup probe
- Readiness probe
- Liveness probe Note over Kubelet,Runtime: Continuous monitoring
and health checking
Part 3: Init Containers
Init containers run before app containers and must complete successfully before the main containers start.
Init Container Execution Flow
check-database] Init1Start --> Init1Run[Execute:
while not nc -z db 5432
sleep 2
done] Init1Run --> Init1Result{Exit code
= 0?} Init1Result -->|No| Init1Failed[Init:Error
Wait and retry
with backoff] Init1Failed -.->|Retry| Init1Start Init1Result -->|Yes| Init2Start[Start Init Container 2:
setup-config] Init2Start --> Init2Run[Execute:
cp /config-template/* /config/
chmod 600 /config/*] Init2Run --> Init2Result{Exit code
= 0?} Init2Result -->|No| Init2Failed[Init:Error
Wait and retry] Init2Failed -.->|Retry| Init2Start Init2Result -->|Yes| Init3Start[Start Init Container 3:
migration] Init3Start --> Init3Run[Execute:
./run-migrations.sh] Init3Run --> Init3Result{Exit code
= 0?} Init3Result -->|No| Init3Failed[Init:CrashLoopBackOff
If retries exceeded] Init3Failed -.->|Retry| Init3Start Init3Result -->|Yes| InitComplete[All init containers
completed successfully ✓] InitComplete --> StartMain[Start main containers
Status: Running] style Init1Failed fill:#7f1d1d,stroke:#ef4444 style Init2Failed fill:#7f1d1d,stroke:#ef4444 style Init3Failed fill:#7f1d1d,stroke:#ef4444 style InitComplete fill:#064e3b,stroke:#10b981 style StartMain fill:#064e3b,stroke:#10b981
Init Container Example
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
initContainers:
# Init container 1: Wait for database
- name: check-database
image: busybox:1.35
command: ['sh', '-c']
args:
- |
until nc -z postgres-service 5432; do
echo "Waiting for database..."
sleep 2
done
echo "Database is ready!"
# Init container 2: Setup configuration
- name: setup-config
image: busybox:1.35
command: ['sh', '-c']
args:
- |
cp /config-template/app.conf /config/
chmod 600 /config/app.conf
volumeMounts:
- name: config
mountPath: /config
- name: config-template
mountPath: /config-template
# Init container 3: Run migrations
- name: run-migrations
image: myapp:v1.0
command: ['./migrate']
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
# Main application container
containers:
- name: myapp
image: myapp:v1.0
ports:
- containerPort: 8080
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
emptyDir: {}
- name: config-template
configMap:
name: app-config
Part 4: Container Restart Policies
Restart Policy Decision Tree
successfully] GetExitCode -->|Non-zero| Failure[Container failed] Success --> CheckPolicySuccess{Restart
Policy?} Failure --> CheckPolicyFailure{Restart
Policy?} CheckPolicySuccess -->|Always| RestartSuccess[Restart container
Wait: backoff delay] CheckPolicySuccess -->|OnFailure| NoRestart1[No restart
Status: Succeeded] CheckPolicySuccess -->|Never| NoRestart2[No restart
Status: Succeeded] CheckPolicyFailure -->|Always| RestartFailure1[Restart container
Wait: backoff delay] CheckPolicyFailure -->|OnFailure| RestartFailure2[Restart container
Wait: backoff delay] CheckPolicyFailure -->|Never| NoRestart3[No restart
Status: Failed] RestartSuccess --> Backoff1[Backoff calculation:
delay = min280s] RestartFailure1 --> Backoff2[Backoff calculation:
delay = min280s] RestartFailure2 --> Backoff3[Backoff calculation:
delay = min280s] Backoff1 --> Wait1[Wait delay seconds] Backoff2 --> Wait2[Wait delay seconds] Backoff3 --> Wait3[Wait delay seconds] Wait1 --> Attempt1[Restart attempt] Wait2 --> Attempt2[Restart attempt] Wait3 --> Attempt3[Restart attempt] Attempt1 --> CheckCount1{Too many
restarts?} Attempt2 --> CheckCount2{Too many
restarts?} Attempt3 --> CheckCount3{Too many
restarts?} CheckCount1 -->|Yes| CrashLoop1[CrashLoopBackOff] CheckCount2 -->|Yes| CrashLoop2[CrashLoopBackOff] CheckCount3 -->|Yes| CrashLoop3[CrashLoopBackOff] CheckCount1 -->|No| StartContainer1[Start container] CheckCount2 -->|No| StartContainer2[Start container] CheckCount3 -->|No| StartContainer3[Start container] NoRestart1 --> AllDone{All containers
done?} NoRestart2 --> AllDone NoRestart3 --> AllDone AllDone -->|Yes| PodComplete[Pod Status:
Succeeded or Failed] style NoRestart1 fill:#064e3b,stroke:#10b981 style NoRestart2 fill:#064e3b,stroke:#10b981 style NoRestart3 fill:#7f1d1d,stroke:#ef4444 style CrashLoop1 fill:#7f1d1d,stroke:#ef4444 style CrashLoop2 fill:#7f1d1d,stroke:#ef4444 style CrashLoop3 fill:#7f1d1d,stroke:#ef4444 style PodComplete fill:#1e3a8a,stroke:#3b82f6
Restart Policy Comparison
| Policy | On Success (Exit 0) | On Failure (Exit ≠ 0) | Use Case |
|---|---|---|---|
| Always | Restart with backoff | Restart with backoff | Long-running services, web servers |
| OnFailure | No restart | Restart with backoff | Batch jobs that should retry on failure |
| Never | No restart | No restart | One-time tasks, completed jobs |
Restart Policy Examples
# Always restart - for services
apiVersion: v1
kind: Pod
metadata:
name: web-server
spec:
restartPolicy: Always # Default
containers:
- name: nginx
image: nginx:1.21
---
# Restart on failure - for batch jobs
apiVersion: v1
kind: Pod
metadata:
name: data-processor
spec:
restartPolicy: OnFailure
containers:
- name: processor
image: data-processor:v1
---
# Never restart - for one-time tasks
apiVersion: v1
kind: Pod
metadata:
name: migration
spec:
restartPolicy: Never
containers:
- name: migrate
image: migrate:v1
Part 5: Health Checks (Probes)
Kubernetes uses three types of probes to check container health:
Probe Types and Execution Flow
configured?} Startup -->|No| Readiness1[Skip to readiness probe] Startup -->|Yes| StartupExec[Execute startup probe
every initialDelaySeconds] StartupExec --> StartupResult{Probe
result?} StartupResult -->|Success| StartupPass[Startup probe passed
Container initialized ✓] StartupResult -->|Failure| StartupCheck{Exceeded
failureThreshold?} StartupCheck -->|No| StartupWait[Wait periodSeconds] StartupWait --> StartupExec StartupCheck -->|Yes| KillContainer[Kill container
Apply restart policy] StartupPass --> Parallel{Run continuously} Parallel --> Liveness[Liveness Probe
Is container alive?] Parallel --> Readiness[Readiness Probe
Can accept traffic?] Liveness --> LivenessExec[Execute every
periodSeconds] Readiness --> ReadinessExec[Execute every
periodSeconds] LivenessExec --> LivenessResult{Result?} ReadinessExec --> ReadinessResult{Result?} LivenessResult -->|Success| LivenessOK[Container healthy ✓
Reset failure count] LivenessResult -->|Failure| LivenessCount{Consecutive
failures >=
failureThreshold?} LivenessCount -->|No| LivenessWait[Wait for next check] LivenessCount -->|Yes| RestartContainer[Restart container
Container unhealthy] LivenessOK --> LivenessWait LivenessWait -.-> LivenessExec ReadinessResult -->|Success| ReadyForTraffic[Mark Ready ✓
Add to Service endpoints
Receive traffic] ReadinessResult -->|Failure| NotReady[Mark Not Ready ✗
Remove from endpoints
No traffic] ReadyForTraffic --> ReadinessWait[Wait for next check] NotReady --> ReadinessWait ReadinessWait -.-> ReadinessExec RestartContainer -.->|After restart| Start style StartupPass fill:#064e3b,stroke:#10b981 style LivenessOK fill:#064e3b,stroke:#10b981 style ReadyForTraffic fill:#064e3b,stroke:#10b981 style KillContainer fill:#7f1d1d,stroke:#ef4444 style RestartContainer fill:#7f1d1d,stroke:#ef4444 style NotReady fill:#78350f,stroke:#f59e0b
Probe Configuration Examples
apiVersion: v1
kind: Pod
metadata:
name: app-with-probes
spec:
containers:
- name: app
image: myapp:v1.0
ports:
- containerPort: 8080
# Startup probe - runs first, protects slow-starting apps
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # 30 * 5 = 150 seconds to start
# Liveness probe - restarts container if fails
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 5
# Readiness probe - controls traffic routing
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 3
# Example TCP probe
# livenessProbe:
# tcpSocket:
# port: 8080
# periodSeconds: 10
# Example exec probe
# livenessProbe:
# exec:
# command:
# - cat
# - /tmp/healthy
# periodSeconds: 10
Part 6: Pod Termination
Graceful Shutdown Process
Grace period: 30s (default) Note over API: Pod enters
Terminating state par Remove from load balancer API->>Endpoint: Update endpoints Endpoint->>Endpoint: Remove Pod from Service Note over Endpoint: No new traffic
routed to Pod and Send termination signal API->>Kubelet: Terminate Pod Kubelet->>Kubelet: Execute preStop hook
(if defined) Note over Kubelet: PreStop hook runs
e.g., /shutdown endpoint
Max time: grace period Kubelet->>Container: Send SIGTERM signal Note over Container: Application receives
SIGTERM and begins
graceful shutdown:
- Finish current requests
- Close connections
- Save state
- Release resources end Note over Container: Shutdown in progress...
Grace period: 30s alt Container exits before grace period Container-->>Kubelet: Process exited (code 0) Note over Kubelet: Clean shutdown ✓ Kubelet->>Kubelet: Remove container Kubelet->>API: Pod terminated successfully else Grace period expires Note over Kubelet: 30 seconds elapsed
Container still running Kubelet->>Container: Send SIGKILL signal Note over Container: Forced termination ✗
Process killed immediately Container-->>Kubelet: Process killed Kubelet->>Kubelet: Remove container Kubelet->>API: Pod terminated (forced) end Kubelet->>Kubelet: Clean up:
- Remove volumes
- Release network
- Delete container API->>API: Remove Pod object from etcd API-->>User: Pod deleted
Pod with PreStop Hook Example
apiVersion: v1
kind: Pod
metadata:
name: graceful-shutdown
spec:
terminationGracePeriodSeconds: 60 # Wait up to 60s
containers:
- name: app
image: myapp:v1.0
ports:
- containerPort: 8080
lifecycle:
# Called before SIGTERM
preStop:
exec:
command: ["/bin/sh", "-c"]
args:
- |
# Notify application to stop accepting new requests
curl -X POST http://localhost:8080/shutdown
# Wait for in-flight requests to complete
sleep 15
Part 7: Common Pod Issues and Debugging
Pod Status Troubleshooting
not ready| NotReady[Readiness Issues] CheckStatus -->|Error/Failed| Failed[Container Failed] Pending --> PendingChecks[Check:
❯ kubectl describe pod
❯ Events section
Common causes:
- Insufficient resources
- No nodes match selector
- Taints on nodes
- Volume mount issues] ImageIssue --> ImageChecks[Check:
❯ kubectl describe pod
❯ Look for image name
Common causes:
- Typo in image name
- Image doesn't exist
- Registry auth needed
- Network issues] Crashing --> CrashChecks[Check:
❯ kubectl logs pod
❯ kubectl logs pod --previous
❯ kubectl describe pod
Common causes:
- Application crash
- Missing config/secrets
- Failed liveness probe
- OOMKilled] NotReady --> ReadyChecks[Check:
❯ kubectl describe pod
❯ Check readiness probe
❯ kubectl logs pod
Common causes:
- Readiness probe failing
- App not listening on port
- Dependencies not ready
- Slow startup] Failed --> FailedChecks[Check:
❯ kubectl logs pod
❯ kubectl describe pod
❯ Exit code in status
Common causes:
- Application error
- Init container failed
- Invalid command
- Resource limits exceeded] style PendingChecks fill:#78350f,stroke:#f59e0b style ImageChecks fill:#7f1d1d,stroke:#ef4444 style CrashChecks fill:#7f1d1d,stroke:#ef4444 style ReadyChecks fill:#78350f,stroke:#f59e0b style FailedChecks fill:#7f1d1d,stroke:#ef4444
Essential Debugging Commands
# Get pod status
kubectl get pods
kubectl get pods -o wide # Show node and IP
# Detailed pod information
kubectl describe pod <pod-name>
# View pod logs
kubectl logs <pod-name>
kubectl logs <pod-name> -c <container-name> # Multi-container pod
kubectl logs <pod-name> --previous # Previous container instance
kubectl logs <pod-name> --follow # Stream logs
# Execute commands in pod
kubectl exec <pod-name> -- <command>
kubectl exec -it <pod-name> -- /bin/sh # Interactive shell
# Check pod events
kubectl get events --sort-by=.metadata.creationTimestamp
# View pod YAML
kubectl get pod <pod-name> -o yaml
# Check resource usage
kubectl top pod <pod-name>
# Port forwarding for local access
kubectl port-forward <pod-name> 8080:80
Part 8: Pod Lifecycle Best Practices
Configuration Checklist
- Resource Requests and Limits
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
- Health Checks
startupProbe: # For slow-starting apps
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
livenessProbe: # Restart if unhealthy
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
readinessProbe: # Control traffic routing
httpGet:
path: /ready
port: 8080
periodSeconds: 5
- Graceful Shutdown
terminationGracePeriodSeconds: 60
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
- Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Comparison: Pod Restart Policies
| Scenario | Always | OnFailure | Never |
|---|---|---|---|
| Success (Exit 0) | Restarts | Stays stopped | Stays stopped |
| Failure (Exit ≠ 0) | Restarts | Restarts | Stays stopped |
| Best for | Services, daemons | Batch jobs, tasks | One-time jobs |
| Example | Web server, API | Data processing | Database migration |
| Pod final status | Always Running | Succeeded/Failed | Succeeded/Failed |
Conclusion
Understanding the Kubernetes Pod lifecycle is essential for:
- Debugging: Quickly identify why Pods aren’t running
- Reliability: Configure proper health checks and restart policies
- Performance: Optimize startup and shutdown processes
- Observability: Know where to look when things go wrong
Key takeaways:
- Pods transition through well-defined states: Pending → Running → Succeeded/Failed
- Init containers prepare the environment before app containers start
- Restart policies determine how Kubernetes handles container failures
- Health probes (startup, liveness, readiness) ensure application health
- Graceful shutdown with preStop hooks prevents data loss
The visual diagrams in this guide show how Kubernetes orchestrates containerized applications from creation to termination.
Further Reading
- Kubernetes Pod Lifecycle
- Configure Liveness, Readiness and Startup Probes
- Init Containers
- Pod Termination
Master the Pod lifecycle to build resilient Kubernetes applications!