Debug OOMKilled Go applications in Kubernetes

This post covers how to Debug OOMKilled Go applications in Kubernetes.

Dump Go applications memory profile
Kubernetes config for the Go application
- Pass through memory limit with Kubernetes Downward API
- Configure Pod to persist memory profiles
Analyze the memory profile
- Get the memory profile from the volume
- Analyze the memory profile

Have you ever wondered how to debug an OOMKilled Go application in Kubernetes? If the container gets killed the memory is released, and there is no chance to get a memory profile of the application. It is common that such errors are caused by edge cases. The container restarts and the error is gone.

The hard part about this is if you don’t know what caused the extensive memory usage, you don’t know how to reproduce it. Furthermore, if you don’t know how to reproduce it how should you debug it?

I’ve hit a problem like this recently for the operator of the open-source project I’m working on. I added a feature to automatically dump the memory usage of the container once it hits a certain threshold. I think this is useful for other Go applications. So I extracted the package into a dedicated repository: https://github.com/johannes94/go-heapdump-threshold.

The idea is to regularly check for the current memory usage and compare it with the configured memory limit. We use those checks to automatically write a heap profile to the filesystem before the Go application gets OOMKilled.

Let’s delve into an example.

Dump Go applications memory profile

The core struct of the package is HeapProfiler it uses the pprof package to write memory profiles of the Go application. You can create a new HeapProfiler by calling the function NewHeapProfiler:

...
	// This create a NewHeapProfiler with following configuration
	// - Start to dump heap profiles when used heap is greater than or equal to 80% of the limit
	// - A memory limit defined in bytes e.g. 100000000 for 100MB
	// - Create files at the "./dump" directory
	// - Wait for 1 Minute once a heap profile was written before writing another one
	heapProfiler := heapprofiler.NewHeapProfiler(0.80, limit, "./dump", 1*time.Minute)
...

After you create the HeapProfiler you can start checking for the threshold in a separate goroutine:

...	
	ctx := context.Background()
	// Start a background process that checks the threshold every 30 seconds and dumps a heap profile if necessary
	go heapProfiler.DumpHeapOnThreshhold(ctx, 30*time.Second)
...

Then start the other tasks of your Go application as usual. With this goroutine running your Go application will write a heap profile to ./dump/heapdump/<datetime> every 1 minute when the memory consumption is greater than the configured limit.

Kubernetes config for the Go application

We use the Go applications pod configuration to pass through the memory limit and persist the memory profile throughout container restarts. Here’s an entire pod.yaml example:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: example-app
  name: example-app
spec:
  containers:
  - image: example-app:latest
    name: example-app
    resources:
      requests:
        cpu: 200m
        memory: 100Mi
      limits:
        cpu: 200m
        memory: 150Mi
    env:
    - name: MEMORY_LIMIT
      valueFrom:
        resourceFieldRef:
          containerName: example-app
          resource: limits.memory
    imagePullPolicy: Never
    volumeMounts:
      - mountPath: ./dump
        name: dump
  volumes:
  - name: dump
    emptyDir: {}

Let’s look at the crucial parts here in detail.

Pass through memory limit with Kubernetes Downward API

The Go application needs to be aware of the configured limits for the container in order to calculate the threshold used to trigger a dump of the memory profile. This limit can be passed through to the container as an environment variable using the Kubernetes Downward API.

# pod.yaml
env:
- name: MEMORY_LIMIT
  valueFrom:
    resourceFieldRef:
      containerName: example-app
      resource: limits.memory

We can then use the environment variable within the Go application to set the limit of HeapProfiler.

Configure Pod to persist memory profiles

HeapProfiler writes the memory profile to a configurable path in the container’s filesystem. We can use Kubernetes volumes to persist those files throughout container restarts. In this example, we’re using an emptyDir Volume. If you’d like to persist memory profiles throughout Pod recreation, consider using a PersistenVolume.

# pod.yaml
    volumeMounts:
      - mountPath: ./dump
        name: dump
  volumes:
  - name: dump
    emptyDir: {}

Analyze the memory profile

Get the memory profile from the volume

With the configuration above, the next time you hit a situation where your Go application hits an OOMKilled error, you can find a memory profile of the application right in the volume. By analyzing that profile, you have a chance to figure out what’s causing the extensive memory usage and find the root cause of the OOMKilled error.

You can get the memory profile from the emptyDir or the PV by using kubectl cp. Some optimized container images don’t satisfy the requirements to run, in this case, you can try to run:

kubectl exec <pod-name> -c <container-name> cat <path-to-dump> > heapprofile.dump

If you’re container also doesn’t provide the cat command. Try adding a sidecar container to you’re pod.yaml that does.

Analyze the memory profile

Once you have the memory profile you can analyze it to find the root cause of your Go application OOMKilled error. Here are some resources you can use that show how to work with pprof memory profiles:

I hope this package helps you debug OOMKilled Go applications in Kubernetes.

Follow me on LinkedIn for more Content related to Go and Kubernetes