Troubleshooting Applications on Kubernetes

John Harris

This guide lists common commands and approaches to troubleshoot applications on Kubernetes. In this guide we assume that:

  • You are familiar with kubectl, the Kubernetes command-line client.
  • You have access to the Kubernetes cluster you want to troubleshoot
  • You are familiar with the common Kubernetes resources, such as Deployments, Services, Pods, etc.

Below is a list covering some common issues and how to troubleshoot them in a Kubernetes environment.

Pods showing ‘CrashLoopBackOff’ status

This usually indicates an issue with the application. Use the kubectl logs command to get logs from the pod.

If the pod has multiple containers, you first have to find the container that is crashing.

Use the kubectl describe command on the pod to figure out which container is crashing. The following example shows the list of containers in the kubectl describe output. Notice how the bad container’s last state is Terminated. This is the container that keeps crashing.

Containers:
  bad:
    Container ID:  containerd://dd42e41890e04253915445...
    Image:         busybox
    Image ID:      docker.io/library/busybox@sha256:83...
    Port:          <none>
    Host Port:     <none>
    Args:
      sleep
      1
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 18 May 2020 10:47:03 -0400
      Finished:     Mon, 18 May 2020 10:47:04 -0400
    Ready:          False
    Restart Count:  3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from 
      default-token-dfl9d (ro)
  good:
    Container ID:   containerd://8a8ce59842cce4d8c98f...
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:30...
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 18 May 2020 10:46:14 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from 
      default-token-dfl9d (ro)

Services are unreachable or not available

As a confidence check, it is always useful to verify that the service has endpoints.

Use the kubectl get endpoints command to verify that a service has at least one endpoint:

$ kubectl get endpoints example
NAME      ENDPOINTS                       AGE
example   10.244.0.21:80,10.244.0.22:80   27s

If the service does not have endpoints, verify the following:

  • The service’s pod selector matches the labels on the desired pods.
  • The pods backing the service are passing their readiness probe.

For more in depth troubleshooting, you can utilize a dnsutil pod described here

Pods showing ‘Pending’ status

If the pod is stuck in Pending state, this means that the pod cannot be scheduled to a node. The most common cause of this issue is that there is no node with enough resources to satisfy the pod’s resource requests.

To diagnose this issue, use kubectl describe and look at the events at the bottom of the output. The following is an example that shows what to look for:

$ kubectl describe pod example
### Output truncated for brevity
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  17s (x2 over 17s)  default-scheduler  0/1 nodes are available: 1 Insufficient memory.

Pods showing ‘ContainerCreating’ status

The most common causes for this issue are:

  • Missing configmaps referenced in volume mounts
  • Missing secrets referenced in volume mounts

To diagnose this issue, use kubectl describe on the pod and look at the events at the bottom of the output. The following is an example that shows what to look for:

$ kubectl describe pod example
### Output truncated for brevity
Events:
  Type     Reason       Age               From                         Message
  ----     ------       ----              ----                         -------
  Normal   Scheduled    10s               default-scheduler            Successfully assigned kube-system/example-796885bff7-cf7nc to kind-control-plane
  Warning  FailedMount  3s (x5 over 10s)  kubelet, kind-control-plane  MountVolume.SetUp failed for volume "foo" : secret "foo" not found

Pods showing ‘ErrImagePull’ Pod status

The ErrImagePull condition means that the node is unable to pull the container image from the container image registry (e.g. Harbor). Some potential causes of this issue:

  • The registry is unavailable or inaccessible from the node
  • The container image does not exist in the registry
  • The container image specified in the deployment manifest is incorrect

Use the kubectl describe command on the pod to troubleshoot this issue. The events section at the bottom of the output should have useful context. The following example shows what to look for:

$ kubectl describe pod example
### Output truncated for brevity
Events:
  Type     Reason     Age   From                         Message
  ----     ------     ----  ----                         -------
  Normal   Scheduled  11s   default-scheduler            Successfully assigned kube-system/example-7cc7c59cbb-4h6cv to kind-control-plane
  Normal   Pulling    11s   kubelet, kind-control-plane  Pulling image "non-existent"
  Warning  Failed     10s   kubelet, kind-control-plane  Failed to pull image "non-existent": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/non-existent:latest": failed to resolve reference "docker.io/library/non-existent:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  Warning  Failed     10s   kubelet, kind-control-plane  Error: ErrImagePull
  Normal   BackOff    10s   kubelet, kind-control-plane  Back-off pulling image "non-existent"

Useful Commands

This section lists commands that are useful in day-to-day interactions with Kubernetes:

Listing resources

Use the kubectl get command to list resources of one or more types:

kubectl get deployments,pods

Specify the wide output format for additional information:

kubectl get pods -o wide
NAME                     READY   STATUS              RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
cassandra-0              0/1     ContainerCreating   0          47s   <none>       kind-control-plane   <none>           <none>
redis-5c7c978f78-wlbkn   1/1     Running             0          27s   10.244.0.6   kind-control-plane   <none>           <none>

Use the --show-labels to display the labels of resources

kubectl get pods --show-labels
NAME                     READY   STATUS              RESTARTS   AGE   LABELS
cassandra-0              0/1     ContainerCreating   0          98s   app=cassandra,chart=cassandra-5.4.2,controller-revision-hash=cassandra-6d7b4575f6,heritage=Helm,release=cassandra,statefulset.kubernetes.io/pod-name=cassandra-0
redis-5c7c978f78-wlbkn   1/1     Running             0          78s   pod-template-hash=5c7c978f78,run=redis

Use the yaml output format if you want to get the entire YAML definition of a resource:

kubectl -n bow get deployment vendor-abstraction -o yaml

Getting application logs

Use the kubectl logs command to get application logs.

kubectl -n bow logs <pod name>

To specify the container within the pod using the -c flag:

kubectl -n bow logs vendor-abstraction -c tag-blink-servce

Use the -f or --follow flag to follow/tail the logs.

Forward local ports into a Pod

You can open forward local ports into the pod’s network using kubectl port-forward. With this command, you essentially open a network tunnel into the pod.

For example, if you have a pod that listens on port 9090, you can forward your local machines 8080 into the pod’s port 9090 using the following command:

kubectl -n bow port-forward zuul-gateway 8080:9090

Once this command is running, you can access the pod’s 9090 port via localhost:8080.

Exec into a running Pod

You can run commands within the context of your pod using kubectl exec. Do not use this to configure or modify application behavior at run time.

For example, to run ps -ef in a container, you would run:

kubectl exec -it example-pod -- ps -ef

Note: Keep in mind that the container must have the binary you are trying to execute (ps in the above example). Otherwise, you will get an error.

Use Ephemeral Containers (alpha) if the binary is not available.


Get the documentation of a specific resource kind

You can get the documentation of a specific resource kind (e.g. Deployment or Pod) using kubectl explain. This command will fetch the API documentation of the resource and display it in the terminal.

For example, to get the Pod documentation:

kubectl explain pod

You can drill into specific fields within the resource. For example, to the get pod’s spec field documentation:

kubectl explain pod.spec