A fundamental skill needed by all practitioners deploying to Kubernetes is debugging issues as they arise on the Kubernetes Platform. Issues can range from application deployment issues, to Kubernetes system issues, or to network issues. The problem is, what is a good starting point for debugging?
In this section, you will learn:
To help solidify your learning, it is recommended that you learn these commands, because they work on vanilla and a majority of Kubernetes releases.
Note: A discussion about ephemeral debug containers is further in this section. It is recommended that you use Kubernetes version 1.23 for the beta release of this feature, enabled by default. If your Kubernetes versions is less than 1.23, you must manually enable this feature.
The general workflow is to get the events, learn about the pod state and then get the pod logs. If these steps do not provide sufficient information, further steps are needed. For example, checking the application configuration and network connectivity within a container.
Always run:
kubectl get events -n <namespace> --sort-by .lastTimestamp [-w]
This provides a list of events that occur in a namespace. The most
recent event is at the bottom of the list. Adding the optional -w
flag
will allow the output to be watched.
Run:
kubectl describe pod/<pod-name> -n <namespace>
to get more details of a pod.
Run:
kubectl logs <pod-name> -n <namespace> [name-of-container, if multiple] [-f]
This gets logs from a pod. Adding the optional -f
flag enables
tailing of the logs. Note that logs only work with the pod API, but you
can still selects a pod to get logs from a deployment/daemonset perspective. For example,
kubectl logs ds/<daemonset-name> -n <namespace> [name-of-container, if multiple] [-f]
A useful command line tool is stern because it allows tailing of logs from multiple pods and containers. Stern can also apply regular expressions to filter specific logs. Kail is another similar log helper tool. For most debugging purposes, try to limit the logging scope to avoid cognitive overload that may distract you from finding the real problem.
If you cannot access the cluster, check the log aggregation
platform, if the cluster is connected to one. For example, Splunk or ELK if they
are setup. From the log platforms, try searching by the node name/IP and/or
components such as kubelet
and kube-proxy
. If the cluster is not accessible
because kubectl
commands do not work, you must access the cluster nodes to
find more details. See “Accessing nodes” for more information.
If the cluster is still accessible and existing logs pinpoints to another system issue, it is possible to access the pod container to perform tests such as validating the state of a running process, it’s configuration, or to check a container’s network connectivity. See “Accessing containers” for more information.
The above is a good general workflow to start debugging. The documentation on Kubernetes Monitoring, Logging and Debugging provides a good outline of other debugging approaches and techniques that could be utilized.
The goal here is to access the container that an application is running on and view log files and/or perform further debugging commands, such as validating running configuration.
The following are some techniques to get on containers.
kubectl exec
Accessing a container can be achieved through the kubectl exec
command.
However, this may not be permitted on a cluster due to RBAC
permissions. To exec into a container, run:
kubectl exec -it -n <namespace> <pod-name> [-c <container-name>] "--" sh -c "clear; (bash || ash || sh)"
Note: The above exec command is from Lens. Using tools like Lens, can make this a lot easier via a GUI.
Sometimes there is no accessible shell. One other method is to start a pod with a container having all the network debugging tools. The netshoot container image has a number of tools that make it easy to debug network issues. You can instantiate a netshoot container and exec into it to perform tests as follows,
kubectl run netshoot --image=nicolaka/netshoot && \
kubectl exec -it -n netshoot "--" sh -c "clear; (bash || ash || sh)"
However, this is a new pod that does not have access to a target pod’s filesystem. This approach is useful for debugging aspects such as network connectivity. A deep dive of these aspects as problem sources are explored in the next learning path on Heuristic Approach.
As discussed in the previous section, using kubectl exec
may be insufficient
due to the target container having no shell, or if the container has already
crashed and is inaccessible. To overcome this, use a different image
deployed via kubectl run
. This should be a different pod that is not sharing
the process namespace of the pod of interest for debugging. This means you
cannot access the process details or its filesystem in a different pod. To alleviate
this limitation, use debug ephemeral containers.
Introduced as an alpha in Kubernetes 1.16 and currently as a beta in Kubernetes
1.23,
ephemeral containers are helpful for troubleshooting issues. By default, the
EphemeralContainer
feature-gate is
enabled
as a beta in Kubernetes 1.23. Otherwise, you must enable the feature-gate first before it can be used.
The documentation for debug containers
show usage examples demonstrating how to use debug containers
. It is included
as follows, along with content aligning to this article.
Ephemeral containers are similar to regular containers. The difference is that they are limited in functionality. For example, ephemeral containers have no restart capability, no ports, no scheduling guarantees, and no startup/liveness probes. To view a list of limitations, see What is an Ephemeral Container?.
Compared to what was previously done with netshoot
and kubectl exec
, the key
feature here is that an ephemeral container can be attached to a pod that you
want to debug. For example, the target pod, by sharing it’s process
namespace.
This means that it is possible to see a target pod container’s processes
and filesystem. There are a number of variations of the kubectl debug
command. Its usage depends on what is required for debugging. One
additional note to consider is that the debug containers continue to run
after exit and needs to be manually deleted. For example, kubectl delete pod <name-of-debug-pod>
.
A no shell container means that it is not possible to add kubectl exec
into a
container using a shell command, such as sh
and bash
. For this scenario,
the follow command can be run:
kubectl debug <name-of-existing-pod> -it --image=nicolaka/netshoot --target=<name-of-container-in-existing-pod>
This utilizes the netshoot
image, with all the network debugging utilities
as a separate ephemeral container running in the existing pod named
<name-of-existing-pod>
. It also opens an interactive command prompt using
-it
. --target
lets you target the existing process namespace of the
other existing running container. This is usually the same name as the pod.
This scenario introduces the problem where a container has crashed or completed
and it is not possible to kubectl exec
into a container because the container
no longer exists. You can address this by making a
copy of the target pod attached with an ephemeral container to inspect its
process and filesystem. Use the following command.
kubectl debug <name-of-existing-pod> -it --image=nicolaka/netshoot --share-processes --copy-to==<new-name-of-existing-pod>
This creates a debug container using netshoot
. The debug container shares the process
namespace in a new pod, named <new-name-of-existing-pod>
as a copy of
<name-of-existing-pod>
.
This scenario is different from the previous one, in that the startup command needs to be changed to ensure that a container remains running or additional information is extracted. This is achieved through a copied target pod without affecting the original target pod. One situation where this is particularly useful is when a target container has crashed or ends immediately on startup and the goal is to interactively tryout the process or review its filesystem. To do this, the startup command requires a change that does not end the container, or a command that helps to provide more information. For example, changing the debug verbosity parameter to an application, issuing a long sleep command, or run as a shell. You can achieve this with the following command:
kubectl debug <name-of-existing-pod> -it --copy-to=<new-name-of-existing-pod> --container=<name-of-container-in-pod> -- sh
Take note of the command specifier -- sh
. The example changes the start
command to call the shell on the copied container
. This may need to be replaced
with an appropriate start command that enables debugging of the pod.
This scenario creates a copy of a target container with a different container image. This is useful in situations where a production container may not contain all the utilities that allow debugging, or does not have debug level outputs. It is run with the follow command:
kubectl debug <name-of-existing-pod> --copy-to=<new-name-of-existing-pod> --set-image=*=nicolaka/netshoot
The difference is the parameter --set-image=*=nicolaka/netshoot
,
which works the same way as kubectl set image
, where it sets all
existing container images, specified by *
, with the new image
nicolaka/netshoot
.
The goal of accessing the nodes is to ascertain why a pod running on the node is
failing, or why a particular node is failing. Two areas that you can check are the
log files of the pods, and the process status of Kubernetes static pods and
services. You can also perform debugging commands, such as netcat
,
and curl
to debug network connectivity issues if these tools exists within
the host.
To determine where the log files are located, it is recommended to check the
documentation of components to see which log files to access.
For most distributions of Kubernetes, log files can be found within
/var/logs/
. For the containerd container runtime, /var/logs/pods
holds pod
logs per namespace and container, and /var/logs/containers
is a symlink of the
container logs in /var/logs/pods
.
Key Kubernetes components, such as Kubelet and containerd (for containerd
container runtime), use systemd
in most distributions to initialize Kubernetes
components. To retrieve logs so that you can inspect the journals of these components,
run the following command:
journalctl -u <name-of-component>
Usually, <name-of-component>
is just kubelet
. If that doesn’t work, you can
find the systemd
unit for kubelet by grep
ing for kubelet
inside of
/etc/systemd/system
.
If the Kubernetes components were not started by systemd
, use lsof
against
the kubelet
process to see where the logs are getting written to:
pgrep -i kubelet | xargs lsof
In some Kubernetes distributions, the Kubernetes components run as Docker
containers. If they are not running in systemd
, use docker ps
or crictl ps
to check if this is the case.
Some Kubernetes distributions also configure the Kubernetes components to send
their logs to syslog
, which is usually written to /var/log/messages
.
For readers unfamiliar with how to access nodes, this section describes three methods of how to get on a node.
Use SSH.
SSH access is performed via SSH keys. To access the nodes via SSH keys, run the following command:
ssh -i <certificate.pem> <username>@<ip/fqdn of node>
If only password access is setup, you can SSH with the following command.
ssh <username>@<ip/fqdn of node>
and enter the password thereafter, when prompted.
Accessing the nodes via kubectl exec
.
If the RBAC permits and with privileged container permissions, it is possible to
mount a node’s log as a hostpath into a pod, and inspect those logs after
kubectl exec
‘ing into a container. The following yaml creates a
debugging pod to help with this:
kubectl apply -f - <<-EOF
apiVersion: v1
kind: Deployment
metadata:
name: debugging
labels:
app: debugging
spec:
template:
metadata:
labels:
app: debugging
spec:
containers:
- image: nicolaka/netshoot
volumeMounts:
- mountPath: /node-logs
name: node-logs
volumes:
- name: node-logs
hostPath:
directory: /var/logs
type: Directory
EOF
Debugging nodes use ephemeral debug containers.
In addition to the other use cases of debug containers, it is also possible
to create an ephemeral privileged container on a target node of interest
with its filesystem mounted at /host
. Ephemeral debug containers
opens access to the node’s host filesystem, its network, and process
namespace. This is run with an interactive shell using the following
command:
kubectl debug node/<name-of-node> -it --image=nicolaka/netshoot