本文为12月第二周网站忘记续费后从webcache里面找回的,丢失图片,见谅。
缘起
一个奇特的标题 i40e-ovs-tc . 起初的现象是k8s集群终端无法 ssh登陆 ,3台物理机机器都这样,提示 kex_exchange_identification: read: Connection reset by peer,通过 bmc web界面查看到疯狂刷屏的内核日志告警 "i40e 0000:18:00.0: Invalid traffic class"。
修复过程
同事提醒尝试通过 alt+f1~alt+f6 切换,效果不佳,依然疯狂刷屏,登陆很困难,半盲打状态下输入了账号密码登陆了,重启了sshd,似乎有时能登陆了,先尝试一波 更新i40e的驱动。
更新i40e驱动
驱动地址,最新版本为2.17.4
大概装了这么一些包
yum -y install rpm-build rpmdevtools
yum install gcc kernel-header kernel-devel
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd i40e-2.16.11
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.uPhM53
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd i40e-2.16.11
+ make -C src clean
make: Entering directory /root/rpmbuild/BUILD/i40e-2.16.11/src'
/root/rpmbuild/BUILD/i40e-2.16.11/src'
common.mk:82: *** Kernel header files not in any of the expected locations.
common.mk:83: *** Install the appropriate kernel development package, e.g.
common.mk:84: *** kernel-devel, for building kernel modules and try again. Stop.
make: Leaving directory
error: Bad exit status from /var/tmp/rpm-tmp.uPhM53 (%build)
最终一股脑全装上
<Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/i40e-2.17.4-1.x86_64
Wrote: /root/rpmbuild/RPMS/x86_64/i40e-2.17.4-1.x86_64.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/auxiliary-1.0.0-1.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.n2andN
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd i40e-2.17.4
+ rm -rf /root/rpmbuild/BUILDROOT/i40e-2.17.4-1.x86_64
+ exit 0
更新驱动
rpm -Uvh i40e-2.17.4-1.x86_64.rpm
rpm -Uvh auxiliary-1.0.0-1.x86_64.rpm
查看 modinfo i40e
modinfo i40e
filename: /lib/modules/3.10.0-1160.el7.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko.xz
version: 2.8.20-k
似乎一直没变,后来手动make,make install一轮
更换 rmmod i40e; modprobe i40e
dracut --force
终于完成驱动更新了,版本从2.8.20-k更新为2.17.4
ovs 和 tc
可是开机仍然报错 Invalid traffic class,查了一下代码,应该和 tc 有关
ovs-vsctl remove Open_vSwitch . other_config hw-offload
ovs-vsctl remove Open_vSwitch . other_config tc-policy
systemctl restart openvswitch
文章评论