26 June 2022

kubernetes debugging

By Aravind

Set alias for long commands
Check which file is consuming the most space
Delete all Evicted pods
Advertise out of a specific interface
Label a node
Join a cluster
Starting cluster
uninstall k8s
install k8s
renew-certs

Set alias for long commands

alias k=kubectl

Try it out!

root@k8s-master:~/cka_practice/cert# k get pods
NAME                             READY   STATUS    RESTARTS   AGE
virt-launcher-ubuntu-kv1-mgvvz   1/1     Running   0          18d
virt-launcher-vsrx-sriov-qrfxm   2/2     Running   0          19d

Check which file is consuming the most space

du -h <dir> 2>/dev/null | grep '[0-9\.]\+G'

Delete all Evicted pods

This can be used when there are 100’s of evicted pods and you want to delete all of them

kubectl get pods | grep Evicted | awk '{print $1}' | xargs kubectl delete pod

Advertise out of a specific interface

Typically k8s picks the interface with default IP when creating the cluster using kubeadm init. One can specify the interface in order to advertise the API. using the below flag in kubeadm init

--apiserver-advertise-addresses=<the eth1 ip addr>

Example

kubeadm init --apiserver-advertise-address=192.169.1.11 --pod-network-cidr=10.244.0.0/16

Label a node

Sometimes if node is not labeled as master pods may not get scheduled. Or if node selectors are being used and label mismatch/not found. you can label the node as below

Find labels

root@master:~# kubectl get nodes --show-labels
NAME      STATUS   ROLES           AGE     VERSION   LABELS
master    Ready    control-plane   2m43s   v1.25.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
worker1   Ready    <none>          116s    v1.25.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker1,kubernetes.io/os=linux
worker2   Ready    <none>          72s     v1.25.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker2,kubernetes.io/os=linux

Label a node

root@master:~# kubectl label nodes master node-role.kubernetes.io/master=
node/master labeled
root@master:~# kubectl label nodes worker1 node-role.kubernetes.io/worker=
node/worker1 labeled
root@master:~# kubectl label nodes worker2 node-role.kubernetes.io/worker=

Verify

root@master:~# kubectl get nodes
NAME      STATUS   ROLES                  AGE     VERSION
master    Ready    control-plane,master   4m26s   v1.25.3
worker1   Ready    worker                 3m39s   v1.25.3
worker2   Ready    worker                 2m55s   v1.25.3

Join a cluster

when a cluster is already created and want to join a new node , we would need the token

Obtain the token

$ kubeadm token list
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
jt5hul.iy8150scfrqvf3l3   1h          2022-11-17T19:29:31Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

Create the token

$ kubeadm token create --print-join-command
kubeadm join 192.168.2.12:6443 --token nmlxzk.kt5gr2e5pgi8nb4n --discovery-token-ca-cert-hash sha256:3a032c12536c479c959aec0ee2d300a8847d9d0a75f6bd9892be1d61c6afbdb5

Use the token on new worker node

user@newworker:~$ sudo kubeadm join 192.168.2.12:6443 --token nmlxzk.kt5gr2e5pgi8nb4n --discovery-token-ca-cert-hash sha256:3a032c12536c479c959aec0ee2d300a8847d9d0a75f6bd9892be1d61c6afbdb5
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

Verify

$ kubectl get nodes
NAME       STATUS     ROLES           AGE     VERSION
master     NotReady   control-plane   22h     v1.25.4
worker1    NotReady   <none>          22h     v1.25.4
worker2    NotReady   <none>          6m18s   v1.25.4

Starting cluster

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Error while dialing dial unix /var/run/dockershim.sock

when a k8s cluster is brought up there are situations where you might have seen the error “Error while dialing dial unix /var/run/dockershim.sock”. This would be seen in below

root@jcnr2:~# crictl ps
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
E0428 19:02:30.325516   24230 remote_runtime.go:390] "ListContainers with filter from runtime service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\"" filter="&ContainerFilter{Id:,State:&ContainerStateValue{State:CONT

To solve, the below, we need to ensure below is added to file /etc/crictl.yaml

root@vm:~# more /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock

Restart service

sudo systemctl restart containerd

Verify

root@jcnr2:~# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD

kubelet failing

root@vm:~# service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset:>
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Fri 2023-04-28 1>
       Docs: https://kubernetes.io/docs/home/
    Process: 25125 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_C>
   Main PID: 25125 (code=exited, status=1/FAILURE)

Notice that kubelet is not runing and hence kubeadm init fails . In order to fix sometimes, we might have

sudo kubeadm reset
sudo kubeadm init phase certs all
sudo kubeadm init phase kubeconfig all
sudo kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16
sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
sudo kubeadm init --v=1 --skip-phases=certs,kubeconfig,control-plane --ignore-preflight-errors=all --pod-network-cidr 10.244.0.0/16

Load images in CRIO

images loaded should be in .tar format and hence the files are .tgz are used, we need to unzip them.

root@jcnr2:~# gunzip crpd.tgz

root@jcnr2:~# ls -l crpd.tar
-rw-r--r-- 1 root root 507279360 Apr 28 19:26 crpd.tar

root@jcnr2:~# ctr -n=k8s.io image import crpd.tar
unpacking docker.io/library/crpd:23.1R1.8 (sha256:bb82530036904d12f19bc2036a3734450712014780e3d27b8de841929a16fc97)...done

root@jcnr2:~# crictl images | grep crpd
docker.io/library/crpd                    23.1R1.8            a1748707249d3       507MB

Pods not scheduled

This could be because the node may not be tainted. Modify them accordingly to allow pod scheduling since this should be an all in one cluster. The master node itself will run pods.

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Master node labeling

kubectl label node ubuntu node-role.kubernetes.io/master=

Make file issues when compiling

Error when compiling in ubuntu using Make

strip jcnr
make: strip: Command not found
make: *** [Makefile:38: docker-images] Error 127

To fix this we need to install below along with installing make

apt install binutils

Connection refused when using kubectl commands

we might see connection refused when running commands such as kubectl get pods

[root@testvm user]# kubectl get pods -A
The connection to the server localhost:8080 was refused - did you specify the right host or port?

in such cases, there could be multiple reasons.

Verify kubectl config

[root@testvm user]# kubectl config view
apiVersion: v1
clusters: null
contexts: null
current-context: ""
kind: Config
preferences: {}
users: null

This is incorrect and it needs to have values.

Verify if pods are running

[root@testvm03 test]# crictl ps
CONTAINER           IMAGE                                                              CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
a421932c4e18f       28d55f91d3d8f7f6bd66168eed2cfd72b448be5e7807055c05a77499ce5c0674   3 minutes ago       Running             kube-proxy                0                   5503fb3113a7a       kube-proxy-2xz8r
d43febcc6a688       3ea2571fcc83d8e2cb02bff3da18165a38799e59c78cbe845cd807631f0c5cc3   4 minutes ago       Running             kube-controller-manager   0                   1c7c2ed2f969b       kube-controller-manager-testvm.ocpvm2.net
578e38437b299       165df46c1bb9b79c9b441ac039ac408ed6788164404dad56f966c810dc61f05a   4 minutes ago       Running             kube-scheduler            1                   a705545a02333       kube-scheduler-testvm.ocpvm2.net
e50fdc53512fc       dc245db8c2faecaeac427ebcdf308ebe2c60e40728bf4f45f33d857ef3179969   4 minutes ago       Running             kube-apiserver            1                   18e27e12fa20e       kube-apiserver-testvm.ocpvm2.net
4d1213b151d2e       4694d02f8e611efdffe9fb83a86d9d2969ef57b4b56622388eca6627287d6fd6   4 minutes ago       Running             etcd                      1                   065ab508a642c       etcd-testvm.ocpvm2.net

This shows that context is the issue and pods are running correctly. This could happen because of config file not placed correctly.

verify if config file is correct under ~/.kube/config
if above has content and if you used sudo to bring up the cluster as a sudo user , verify if same is present under /root
if not, copy
```
 cp /home/test/.kube/ .
```

Validate the above using

[root@testvm03 test ~]# kubectl get pods -A
NAMESPACE     NAME                                        READY   STATUS              RESTARTS   AGE
kube-system   coredns-565d847f94-d6r69                    0/1     ContainerCreating   0          8m37s
kube-system   coredns-565d847f94-sbmt8                    0/1     ContainerCreating   0          8m37s
kube-system   etcd-testvm.ocpvm2.net                      1/1     Running             1          8m51s
kube-system   kube-apiserver-testvm.ocpvm2.net            1/1     Running             1          8m53s
kube-system   kube-controller-manager-testvm.ocpvm2.net   1/1     Running             0          8m51s
kube-system   kube-proxy-2xz8r                            1/1     Running             0          8m37s
kube-system   kube-scheduler-testvm.ocpvm2.net            1/1     Running             1          8m51s

Multiple runtimes found

if you find below multiple run time error, edit the file

[user@testvm images]$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
Found multiple CRI endpoints on the host. Please define which one do you wish to use by setting the 'criSocket' field in the kubeadm configuration file: unix:///var/run/containerd/containerd.sock, unix:///var/run/crio/crio.sock
To see the stack trace of this error execute with --v=5 or higher

This is because both crio and containerd socks are present and we would need to uninstall one of them. Alternatively, you point to exact runtime we would want to connect

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket /var/run/containerd/containerd.sock

The flag --cri-socker should also be passed for kubeadm resets as well if we have a cluster running on a different socket

Images not present in crictl even though loaded successfully

This is because of container run time not being recognized correctly. Ensure containerd is referenced everywhere correctly . Follow the below steps if you need to migrate from one container runtime to another (docker -> containerd or crio -> containerd).

Migrating to a different container run time

drain the node

[user@testvm images]$ kubectl drain testvm.ocpvm2.net --ignore-daemonsets
node/testvm.ocpvm2.net cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-c5v45, kube-system/kube-multus-ds-tnjhl, kube-system/kube-proxy-2xz8r
evicting pod kube-system/coredns-565d847f94-w7snq
evicting pod kube-system/coredns-565d847f94-sbmt8
pod/coredns-565d847f94-w7snq evicted
pod/coredns-565d847f94-sbmt8 evicted
node/testvm.ocpvm2.net drained

Stop kubelet

[user@testvm images]$ systemctl stop kubelet

Edit node file

[user@testvm images]$ sudo kubectl edit no testvm.ocpvm2.net
node/testvm.ocpvm2.net edited

« change value of kubeadm.alpha.kubernetes.io/cri-socket to unix:///run/containerd/containerd.sock » save and close

Verify if node is using different run time

[user@testvm images]$ kubectl get nodes -o wide
NAME                STATUS                     ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                               KERNEL-VERSION                 CONTAINER-RUNTIME
testvm.ocpvm2.net   Ready,SchedulingDisabled   control-plane   16h   v1.25.4   192.168.2.16   <none>        Red Hat Enterprise Linux 8.7 (Ootpa)   4.18.0-425.19.2.el8_7.x86_64   containerd://1.6.21

[user@testvm ~]$ sudo kubectl get pods -A
[sudo] password for user:
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   coredns-565d847f94-9wb5z                    0/1     Pending   0          6m37s
kube-system   coredns-565d847f94-gmqkg                    0/1     Pending   0          6m37s
kube-system   etcd-testvm.ocpvm2.net                      1/1     Running   8          6m50s
kube-system   kube-apiserver-testvm.ocpvm2.net            1/1     Running   8          6m50s
kube-system   kube-controller-manager-testvm.ocpvm2.net   1/1     Running   28         6m49s
kube-system   kube-proxy-t66jp                            1/1     Running   0          6m37s
kube-system   kube-scheduler-testvm.ocpvm2.net            1/1     Running   21         6m50s

Kubernetes nodes not ready ?

when the nodes are not ready kubectl describe node < name> gives an idea.

Here , we notice that InvalidDiskCapacity

Events:
  Type     Reason                   Age                From             Message
  ----     ------                   ----               ----             -------
  Normal   Starting                 42m                kube-proxy
  Normal   NodeHasSufficientMemory  42m (x4 over 42m)  kubelet          Node testvm.ocpvm2.net status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    42m (x4 over 42m)  kubelet          Node testvm.ocpvm2.net status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     42m (x4 over 42m)  kubelet          Node testvm.ocpvm2.net status is now: NodeHasSufficientPID
  Normal   Starting                 42m                kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity      42m                kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  42m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  42m                kubelet          Node testvm.ocpvm2.net status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    42m                kubelet          Node testvm.ocpvm2.net status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     42m                kubelet          Node testvm.ocpvm2.net status is now: NodeHasSufficientPID

Look at space and ensure things are correct

df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         58G     0   58G   0% /dev
tmpfs            63G     0   63G   0% /dev/shm
tmpfs            63G   92M   63G   1% /run
tmpfs            63G     0   63G   0% /sys/fs/cgroup
/dev/nvme0n1p5  625G   32G  594G   5% /
/dev/nvme0n1p3  301G   12G  289G   4% /home
/dev/nvme0n1p2 1014M  414M  601M  41% /boot
/dev/nvme0n1p1  1.1G   10M  1.1G   1% /boot/efi
tmpfs            13G  4.0K   13G   1% /run/user/1007

Looks like configuration of containerd is the problem . Re-generate and restart containerd

sudo containerd config default > config.toml
sudo cp config.toml /etc/containerd/config.toml
[user@testvm ~]$ systemctl restart containerd

Validate

[user@testvm ~]$ sudo kubectl get nodes
NAME                STATUS   ROLES                  AGE   VERSION
testvm.ocpvm2.net   Ready    control-plane,master   50m   v1.25.4

[user@testvm ~]$ sudo kubectl get pods -A
NAMESPACE      NAME                                        READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-p2xrc                       1/1     Running   0          43m
kube-system    coredns-565d847f94-9wb5z                    1/1     Running   0          50m
kube-system    coredns-565d847f94-gmqkg                    1/1     Running   0          50m
kube-system    etcd-testvm.ocpvm2.net                      1/1     Running   8          50m
kube-system    kube-apiserver-testvm.ocpvm2.net            1/1     Running   8          50m
kube-system    kube-controller-manager-testvm.ocpvm2.net   1/1     Running   28         50m
kube-system    kube-proxy-t66jp                            1/1     Running   0          50m
kube-system    kube-scheduler-testvm.ocpvm2.net            1/1     Running   21         50m

8. Uninstall k8s cluster

kubeadm reset 
/*On Debian base Operating systems you can use the following command.*/

# on debian base 
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube* 

# on debian base
sudo apt-get autoremove

#on centos base
sudo yum autoremove

/For all/
sudo rm -rf ~/.kube

kubeadm reset -f
rm -rf /etc/cni /etc/kubernetes /var/lib/dockershim /var/lib/etcd /var/lib/kubelet /var/run/kubernetes ~/.kube/*
iptables -F && iptables -X
iptables -t nat -F && iptables -t nat -X
iptables -t raw -F && iptables -t raw -X
iptables -t mangle -F && iptables -t mangle -X
systemctl restart docker

9.Installaing

sudo apt-get update
sudo apt install apt-transport-https curl
Install containerd (reference: https://docs.docker.com/engine/install/ubuntu/)

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install containerd.io
Create containerd configuration

sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml
Edit /etc/containerd/config.toml

sudo nano /etc/containerd/config.toml set SystemdCgroup = true
sudo systemctl restart containerd
Install Kubernetes

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt install kubeadm kubelet kubectl kubernetes-cni
Disable swap

sudo swapoff -a
Check and remove any swap entry if exists

sudo nano /etc/fstab
Avoid error "/proc/sys/net/bridge/bridge-nf-call-iptables does not exist" on kubeinit (reference https://github.com/kubernetes/kubeadm/issues/1062). This is not necessary if docker is also installed in step 6.

sudo modprobe br_netfilter
sudo nano /proc/sys/net/ipv4/ip_forward Edit entry in ip_forward file and change to 1. (Or use sysctl -w net.ipv4.ip_forward=1 - thanks to @dpjanes, see comments)
kubeinit for use with Flannel

sudo kubeadm init --pod-network-cidr=10.244.0.0/16
Copy to config as kubadm command says

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Apply Flannel (reference https://github.com/flannel-io/flannel)

kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/v0.20.2/Documentation/kube-flannel.yml
All should be running now:

kubectl get pods --all-namespaces

NAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-mcjmm                 1/1     Running   0          76s
kube-system    coredns-787d4945fb-fb59g              1/1     Running   0          8m8s
kube-system    coredns-787d4945fb-t25tj              1/1     Running   0          8m8s
kube-system    etcd-kube-master                      1/1     Running   0          8m19s
kube-system    kube-apiserver-kube-master            1/1     Running   0          8m19s
kube-system    kube-controller-manager-kube-master   1/1     Running   0          8m19s
kube-system    kube-proxy-2hz29                      1/1     Running   0          8m8s
kube-system    kube-scheduler-kube-master            1/1     Running   0          8m19s
Share
Follow

10. Renew ceritificates

Sometimes certificates on cluster expires and it is painful to fix that with the different resources we find.

First check if pods are running as expected

kubectl get pods --insecure-skip-tls-verify=true

Verify cert expirations

kubeadm certs check-expiration

Delete all existing certs

rm /etc/kubernetes/pki/apiserver*
rm /etc/kubernetes/pkt/front*

Re-initialize and renew again to make sure process is clean without errors

kubeadm init phase certs all
kubeadm certs renew all

Restart kubelet and export again

service kubelet restart 
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

all commands should now work fine

Switching context on kubernetes/Openshift

[core@b4-96-91-d4-e2-c0 Juniper_Cloud_Native_Router_23.2]$ kubectl config get-contexts
CURRENT   NAME                                           CLUSTER                    AUTHINFO                                NAMESPACE
          admin                                          qctq05                     admin
*         jcnr/api-qctq05-idev-net:6443/system:admin     api-qctq05-idev-net:6443   system:admin/api-qctq05-idev-net:6443   jcnr
          pktgen/api-qctq05-idev-net:6443/system:admin   api-qctq05-idev-net:6443   system:admin/api-qctq05-idev-net:6443   pktgen

[core@b4-96-91-d4-e2-c0 Juniper_Cloud_Native_Router_23.2]$ oc config use-context admin
Switched to context "admin".
[core@b4-96-91-d4-e2-c0 Juniper_Cloud_Native_Router_23.2]$ helm ls

[core@b4-96-91-d4-e2-c0 Juniper_Cloud_Native_Router_23.2]$ kubectl config get-contexts
CURRENT   NAME                                           CLUSTER                    AUTHINFO                                NAMESPACE
*         admin                                          qctq05                     admin
          jcnr/api-qctq05-idev-net:6443/system:admin     api-qctq05-idev-net:6443   system:admin/api-qctq05-idev-net:6443   jcnr
          pktgen/api-qctq05-idev-net:6443/system:admin   api-qctq05-idev-net:6443   system:admin/api-qctq05-idev-net:6443   pktgen

[ kubernetes ] tags: kubernetes

kubernetes debugging

Table of contents

Set alias for long commands

Try it out!

Check which file is consuming the most space

Delete all Evicted pods

Advertise out of a specific interface

Label a node

Find labels

Label a node

Verify

Join a cluster

Obtain the token

Create the token

Use the token on new worker node

Verify

Starting cluster

Error while dialing dial unix /var/run/dockershim.sock

Restart service

Verify

kubelet failing

Load images in CRIO

Pods not scheduled

Master node labeling

Make file issues when compiling

Connection refused when using kubectl commands

Verify kubectl config

Verify if pods are running

Multiple runtimes found

Images not present in crictl even though loaded successfully

Migrating to a different container run time

drain the node

Stop kubelet

Edit node file

Verify if node is using different run time

Kubernetes nodes not ready ?

Validate

8. Uninstall k8s cluster

9.Installaing

10. Renew ceritificates

First check if pods are running as expected

Verify cert expirations

Delete all existing certs

Re-initialize and renew again to make sure process is clean without errors

Restart kubelet and export again

Switching context on kubernetes/Openshift