This post covers how to Debug OOMKilled Go applications in Kubernetes.
Have you ever wondered how to debug an OOMKilled Go application in Kubernetes? If the container gets killed the memory is released, and there is no chance to get a memory profile of the application. It is common that such errors are caused by edge cases. The container restarts and the error is gone.
The hard part about this is if you don’t know what caused the extensive memory usage, you don’t know how to reproduce it. Furthermore, if you don’t know how to reproduce it how should you debug it?
I’ve hit a problem like this recently for the operator of the open-source project I’m working on. I added a feature to automatically dump the memory usage of the container once it hits a certain threshold. I think this is useful for other Go applications. So I extracted the package into a dedicated repository: https://github.com/johannes94/go-heapdump-threshold.
The idea is to regularly check for the current memory usage and compare it with the configured memory limit. We use those checks to automatically write a heap profile to the filesystem before the Go application gets OOMKilled.
Let’s delve into an example.
Dump Go applications memory profile
The core struct of the package is HeapProfiler
it uses the pprof package to write memory profiles of the Go application. You can create a new HeapProfiler by calling the function NewHeapProfiler:
... // This create a NewHeapProfiler with following configuration // - Start to dump heap profiles when used heap is greater than or equal to 80% of the limit // - A memory limit defined in bytes e.g. 100000000 for 100MB // - Create files at the "./dump" directory // - Wait for 1 Minute once a heap profile was written before writing another one heapProfiler := heapprofiler.NewHeapProfiler(0.80, limit, "./dump", 1*time.Minute) ...
After you create the HeapProfiler you can start checking for the threshold in a separate goroutine:
... ctx := context.Background() // Start a background process that checks the threshold every 30 seconds and dumps a heap profile if necessary go heapProfiler.DumpHeapOnThreshhold(ctx, 30*time.Second) ...
Then start the other tasks of your Go application as usual. With this goroutine running your Go application will write a heap profile to ./dump/heapdump/<datetime>
every 1 minute when the memory consumption is greater than the configured limit.
Kubernetes config for the Go application
We use the Go applications pod configuration to pass through the memory limit and persist the memory profile throughout container restarts. Here’s an entire pod.yaml example:
apiVersion: v1 kind: Pod metadata: labels: run: example-app name: example-app spec: containers: - image: example-app:latest name: example-app resources: requests: cpu: 200m memory: 100Mi limits: cpu: 200m memory: 150Mi env: - name: MEMORY_LIMIT valueFrom: resourceFieldRef: containerName: example-app resource: limits.memory imagePullPolicy: Never volumeMounts: - mountPath: ./dump name: dump volumes: - name: dump emptyDir: {}
Let’s look at the crucial parts here in detail.
Pass through memory limit with Kubernetes Downward API
The Go application needs to be aware of the configured limits for the container in order to calculate the threshold used to trigger a dump of the memory profile. This limit can be passed through to the container as an environment variable using the Kubernetes Downward API.
# pod.yaml env: - name: MEMORY_LIMIT valueFrom: resourceFieldRef: containerName: example-app resource: limits.memory
We can then use the environment variable within the Go application to set the limit of HeapProfiler.
Configure Pod to persist memory profiles
HeapProfiler
writes the memory profile to a configurable path in the container’s filesystem. We can use Kubernetes volumes to persist those files throughout container restarts. In this example, we’re using an emptyDir Volume. If you’d like to persist memory profiles throughout Pod recreation, consider using a PersistenVolume.
# pod.yaml volumeMounts: - mountPath: ./dump name: dump volumes: - name: dump emptyDir: {}
Analyze the memory profile
Get the memory profile from the volume
With the configuration above, the next time you hit a situation where your Go application hits an OOMKilled error, you can find a memory profile of the application right in the volume. By analyzing that profile, you have a chance to figure out what’s causing the extensive memory usage and find the root cause of the OOMKilled error.
You can get the memory profile from the emptyDir or the PV by using kubectl cp
. Some optimized container images don’t satisfy the requirements to run, in this case, you can try to run:
kubectl exec <pod-name> -c <container-name> cat <path-to-dump> > heapprofile.dump
If you’re container also doesn’t provide the cat
command. Try adding a sidecar container to you’re pod.yaml
that does.
Analyze the memory profile
Once you have the memory profile you can analyze it to find the root cause of your Go application OOMKilled error. Here are some resources you can use that show how to work with pprof memory profiles:
- https://go.dev/doc/diagnostics
- https://tusharsheth.medium.com/how-i-found-memory-leaks-in-the-golang-app-using-pprof-56e5d55363ba
- https://medium.com/compass-true-north/memory-profiling-a-go-service-cd62b90619f9
I hope this package helps you debug OOMKilled Go applications in Kubernetes.
Follow me on LinkedIn for more Content related to Go and Kubernetes