Troubleshooting AKS Cluster PODS with status CrashLoopBackOff
In this draft, I write about how I troubleshoot the CrashLoopBackOff failure detected on pods from an AKS deployment for a containerized app I created with React.
App repository: https://github.com/leoimewore/job_search
See below the initial deployment and service manifest file used on this project
apiVersion: apps/v1
kind: Deployment
metadata:
name: job-search
spec:
replicas: 3
selector:
matchLabels:
app: job-search
template:
metadata:
labels:
app: job-search
spec:
containers:
- name: job-search
image: leoimewore/job_search:10
ports:
- containerPort: 3000
---
apiVersion: v1
kind: Service
metadata:
name: job-search
spec:
selector:
app: job-search
ports:
- port: 80
protocol: TCP
targetPort: 3000
kubectl get pods
The first step was to inspect the deployment
kubectl describe deploy job-search
The results indicated three unavailable replicas, as shown at the top. I also reviewed the event section, but it contained no useful information about the pod error.
The second step was to inspect the pod
kubectl describe pod job-search-54f78ffd47-flzfc
The event section of this command showed a Backoff warning:
kubectl get --raw "/api/v1/nodes/aks-nodepool1-10044506-vmss000000/proxy/logs/messages"|grep kubelet | grep back-of
I searched the Kubelet logs for any additional information and I was able to get this
I also checked the logs on the pods directly
kubectl logs job-search-54f78ffd47-flzfc
With this information, I was able to know that there must be an issue with my docker container, and with continuous investigation, I identified the architecture of my docker image was different from the AKS Cluster. see below:
docker image inspect be8 | grep Archi
Result: “Architecture”: “arm64”, While my k8s cluster was “amd64”
Next step i quickly made use of docker buildx API to inorder to ma multi-architecture containers
Bootstrap a buildx image on my machine
docker buildx create --name multiarch --bootstrap --use
Also rebuilt my Dockerfile with this
docker buildx build --push \
--platform linux/arm64,linux/amd64 \
--tag leoimewore/job_search:14 \
.
I confirmed this on my repository and it showed a multi-platform
Finally made an update on my deployment manifest file image
kubectl set image deployment/job-search job-search=leoimewore/job_search:14
This created three new replicas with the status Running:
Thank you and I hope this helped develop a step-by-step process to troubleshoot pod issues in Kubernetes Clusters. Kindly provide feedback on areas of improvement or mistakes and share comments.