Kubernetes Liveness vs Readiness Probe

Checking the health of applications is an essential task in cloud-native environments. By default, the Kubelet on each node checks for the health of each process running in a container. If the process exits Kubernetes will restart the container. This is the simplest way of checking the health of a container.

Never run multiple processes in a single container. If you run multiple processes at once, one might fail, and the container would still be considered healthy.

This kind of health check is called a process health check and is very useful for applications that can detect failures and shut themself down. But in many cases, this won’t be enough. For example, if your application runs into a deadlock or the response time is too long due to resource constraints, it will still be healthy, and Kubernetes won’t shut down the container. That’s where Liveness and Readiness Probes come into play.

What are Liveness and Readiness Probes
Liveness Probe
Readiness Probe
Liveness and Readiness Probes in Action
Conclusion

What are Liveness and Readiness Probes

Liveness and Readiness Probes in Kubernetes are methods to check the health of a container from the outside of the container. This is important because the application itself might not be capable of monitoring its health. Following methods are available for a probe:

Exec probe
- Executes a given command in the container namespace and expects 0 as exit code
TCP Socket probe
- Tests for a successful TCP connection
HTTP probe
- performs an HTTP GET to the container and expects a successful response code between 200 and 399

Liveness Probe

A Liveness Probe is similar to the process health check. It tries to fix the problem of an unhealthy container by killing the container and starting a new one. If killing the container doesn’t help, there is no benefit to having a failing Liveness Probe, as Kubernetes kills the container and doesn’t fix the underlying problem.

An example of a useful Liveness Probe is a terminated database connection in a web application. Some applications establish database connection only in their initialization phase. If the connection gets interrupted the application won’t establish a new connection when the database is available again. A failing Liveness Probe would force the container to restart and run the application initialization until the application connects to the database successfully.

Readiness Probe

Sometimes a container may not be healthy, and restarting it may not help either. The difference of a Readiness Probe to a Liveness Probe is the action that is taken for a health check failure. If a readiness health check fails the container is removed from the service endpoints of Kubernetes. So Kubernetes makes sure that an unhealthy container does not receive any new traffic.

The Readiness Probe wears its name because the most common use-case is that a container has a startup phase and needs some time to get ready to serve requests. Another good example of a Readiness Probe is testing a container’s response time to HTTP Requests. Longer response times may indicate that the container is under high traffic and a Readiness Probe could be used to shield the container from new traffic till its response time decreases again.

Liveness and Readiness Probes in Action

To emphasize the description above we’ll implement a small example of Liveness and Readiness Probes with Go. We start with a Go application that calls time.Sleep to simulate a 30-second initialization phase. After that, it creates a file to indicate the initialization was successful. Additionally, we add two HTTP handler functions:

/health
- returns HTTP 200 for healthy and 500 for unhealthy internal state
- prints the current time if called
/kill
- changes the internal health state of the application to false

main.go

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"time"
)

func main() {
	time.Sleep(time.Second * time.Duration(30))
	ioutil.WriteFile("init.txt", []byte("Initialization done."), 0644)

	healthy := true

	http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
		fmt.Printf("Called at: %v\n", time.Now())
		if healthy {
			w.WriteHeader(http.StatusOK)
			w.Write([]byte("healthy"))
		} else {
			w.WriteHeader(http.StatusInternalServerError)
			w.Write([]byte("unhealthy"))
		}
	})

	http.HandleFunc("/kill", func(w http.ResponseWriter, r *http.Request) {
		healthy = false
		w.Write([]byte("Set to unhealthy"))
	})

	err := http.ListenAndServe(":8080", nil)
	if err != nil {
		panic(err)
	}
}

I published a Docker Image on Docker Hub that runs this application.

Now let’s have a look into the Kubernetes resource description to run this application with a Deployment and specify the Liveness and Readiness Probes.

deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: probe-test-deployment
  labels:
    app: probe-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: probe-test
  template:
    metadata:
      labels:
        app: probe-test
    spec:
      containers:
      - name: probe-test
        image: mjgodocker/probe-test:1.0.0
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 35
        readinessProbe:
            exec:
              command: ["stat", "init.txt"]

The deployment starts two pods with the docker image running the Go app. We specified a Liveness and a Readiness Probe. The Liveness Probe sends an HTTP Get to container port 8080 with the path /health. We set an initial delay of 35 seconds otherwise the container would shut down before it is even listening to HTTP requests. The readiness probe uses the exec-Method to run the command “stat init.txt” in the container’s namespace. This command returns a 0 exit code if the file init.txt, which is created 30 seconds after the app starts, exists.

We’ll apply this deployment to a local minikube cluster in a moment. But first, let’s create another resource description for a NodePort Service to make the two replicas available with the minikube-IP and load-balance between them.

service.yml

apiVersion: v1
kind: Service
metadata:
  name: probe-test-service
spec:
  type: NodePort
  selector:
    app: probe-test
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
      nodePort: 30001

Now we apply the two resources:

kubectl apply - f deployment.yml
kubectl apply -f service.yml
kubectl get pods
# Output:
# NAME                                     READY   STATUS    RESTARTS   AGE
# probe-test-deployment-7f44df4478-4sl98   0/1     Running   0          10s
# probe-test-deployment-7f44df4478-hht72   0/1     Running   0          10s

You can see that both pods are in the status “Running” but 0/1 Containers are Ready until the 30-second sleep in the application startup is over. Attempts to connect to the application through the service will fail. As soon as the app creates the init.txt the pods will be ready to serve requests through the service. Test that by sending a curl:

# Get your minikube IP
$MINIKUBE_IP=$(minikube ip)

curl http://$MINIKUBE_IP:30001/health
# Output: healthy
# You can use kubectl logs <pod> for both pods to see that they both serve requests

Now let’s set one of the Pods to unhealthy by calling the /kill endpoint. After that, we monitor what’s happening with incoming requests by using “watch” to call the /health endpoint every second.

curl http://$MINIKUBE_IP:30001/kill

# Call the health endpoint once every second
watch -n 1 http://$MINIKUBE_IP:30001/health

Can’t use watch in PowerShell or cmd? Just switch to a real shell, or google for some weird loop stuff. For example while ($true){ curl http://$MINIKUBE_IP:30001/health ; sleep 1 }

You’ll see that the curl responds alternatly with “healthy” and “unhealthy”. The load balancing works and the requests with an unhealthy response are processed by the unhealthy service. Probes are executed in a default period of 10 seconds. You can modify that with the “periodSeconds” attribute of a Liveness or Readiness Probe definition. After max. 10 seconds the Liveness Probe will detect the failure and kill the container. Now every curl returns “healthy’.

Kubernetes killed the unhealthy container and started a new one. But the curl does not return any errors. Our Readiness Probe prevents the new container from being listed in the service endpoints until the app eventually creates the init.txt file.

Conclusion

Liveness Probes are great if failures in a container can be fixed by restarting the container. However, keep in mind that you are responsible to implement good methods to obtain the health of your container. Your app has to provide an interface for Kubernetes that allows Kubernetes to check the health in a reliable manner. There are many libraries that help to implement such features. One example for Go is https://github.com/heptiolabs/healthcheck .

A Readiness Probe shields your container from receiving requests as long as it signals that it is ready to process them. There a many use cases a Readiness Probe comes useful:

containers need time to init, like in our example
you’re running a cluster application like elasticsearch and the nodes have to find each other before accepting requests
the response time of your container gets too high, and you want to shield it from traffic for some time
the rolling update deployment strategy of Kubernetes uses Readiness Probes to execute zero-downtime deployments