1个k8s 集群报错的可能解决方法,早上遇到一个测试集群无法访问,现象和排错这里记录一下。
错误现象1,
[root@test-10-23-2-8 ~]# kubectl get po
发现没有快速返回,等待1-2分钟后才出现 Error from server (InternalError): an error on the server ("") has prevented the request from succeeding
看详细日志的命令
1 2 |
kubectl version --v=7 |
I1203 08:55:22.083493 3691533 loader.go:375] Config loaded from file: /root/.kube/config
I1203 08:55:22.084521 3691533 cert_rotation.go:137] Starting client certificate rotation controller
I1203 08:55:22.084529 3691533 round_trippers.go:420] GET https://127.0.0.1:11081/version?timeout=32s
I1203 08:55:22.084544 3691533 round_trippers.go:427] Request Headers:
I1203 08:55:22.084551 3691533 round_trippers.go:431] Accept: application/json, */*
I1203 08:55:22.084556 3691533 round_trippers.go:431] User-Agent: kubectl/v1.18.6 (linux/amd64) kubernetes/dc73548
I1203 08:55:22.108562 3691533 round_trippers.go:446] Response Status: 200 OK in 23 milliseconds
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dc73548d265", GitTreeState:"clean", BuildDate:"2020-07-28T08:10:15Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"cae93a63867", GitTreeState:"clean", BuildDate:"2020-08-11T08:41:18Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
错误现象2,
管理网页出现
无法访问kubernetesetcd
kubernetes和etcd出了一些小问题,但应用不会因此受影响,您可以尝试重新加载。
如问题仍然存在,请联系客服进行解决。
调试模式看到
message: "HTTPSConnectionPool(host='172.31.0.1', port=443): Max retries exceeded with url: /apis/apps/v1/namespaces/test-system/deployments (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 113] No route to host',))"
错误现象3,
查看节点kubelet日志
12月 03 08:03:15 test-10-23-2-8 kubelet[35086]: E1203 08:03:15.884685 35086 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: an error on the server ("") has prevented the request from succeeding (get leases.coordination.k8s.io test-10-23-2-8)
12月 03 08:03:16 test-10-23-2-8 kubelet[35086]: I1203 08:03:16.404936 35086 prober.go:124] Readiness probe for "calico-node-t2nvp_kube-system(9bb8e18d-2b6a-4996-a87f-e152b7935bb4):calico-node" failed (failure): 2020-12-03 00:03:16.356 [INFO][195255] confd/health.go 180: Number of node(s) with BGP peering established = 0
12月 03 08:03:16 test-10-23-2-8 kubelet[35086]: calico/node is not ready: BIRD is not ready: BGP not established with 10.23.2.21,10.23.2.19,10.21.2.147
12月 03 08:03:16 test-10-23-2-8 kubelet[35086]: I1203 08:03:16.645775 35086 trace.go:116] Trace[1666794611]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:135 (started: 2020-12-03 08:02:46.279600487 +0800 CST m=+56421.330370531) (total time: 30.366144752s):
12月 03 08:03:16 test-10-23-2-8 kubelet[35086]: Trace[1666794611]: [30.366144752s] [30.366144752s] END
12月 03 08:03:16 test-10-23-2-8 kubelet[35086]: E1203 08:03:16.645801 35086 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: an error on the server ("") has prevented the request from succeeding (get csidrivers.storage.k8s.io)
看到第3点日志其实就差不多应该明白了,有机器无法连上10.23.2.21,10.23.2.19,10.21.2.147
检查发现这3个机器被人关机了 ,依次启动,问题解决,所以这个时候不要慌,先看看集群里的其它机器是不是失联了,再从日志里找找线索。
文章评论