一个伪linux粉丝的blog

  1. 首页
  2. unix/linux
  3. 正文

ClusterLoader2压测K8s的血泪史

29 5 月, 2025 10点热度 0人点赞 0条评论

背景:

最近接了个POC任务,其中一项是用ClusterLoader2对K8s集群进行压测。本以为是个"常规项目",结果这玩意儿前前后后折腾了我2周!昨晚终于跑通,现在趁热纪录,给自己和新手避坑~

 

🚀 极简操作指南

克隆代码(版本要对!)

1
git clone -b release-1.31 https://github.com/kubernetes/perf-tests.git

本次k8s 环境1.31 , 因此直接clone 这个版本分支

终极启动命令

1
./clusterloader --masterip=10.29.229.221 --testconfig=config.yaml --provider=local --provider-configs=ROOT_KUBECONFIG=/root/kubeconfig4clusterloader --kubeconfig=/root/kubeconfig4clusterloader --v=5 --enable-prometheus-server=false   --nodes=2 2>&1 | tee output.txt

 

用到的config.yaml  测试配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
name: nginx-pod-latency-test
namespace:
  number: 1
 
tuningSets:
- name: Uniform5qps
  globalQPSLoad:
    qps: 10
    burst: 10
 
steps:
- module:
    path: /perf-tests/clusterloader2/testing/load/modules/pod-startup-latency.yaml
    params:
      namespaces: 1
      minPodsInSmallCluster: 10
      image: registry.cn-shanghai.aliyuncs.com/my2021/2048:motd
      resourceLimits:
        - name: cpu
          limit: 1
        - name: memory
          limit: 1Gi
 
- module:
    path: /perf-tests/clusterloader2/testing/load/modules/reconcile-objects.yaml
    params:
      actionName: "create"
      namespaces: 1
      tuningSet: Uniform5qps
      testMaxReplicaFactor: 1.0
      operationTimeout: "5m"
      deploymentImage: "registry.cn-shanghai.aliyuncs.com/my2021/2048:motd"
      daemonSetImage: "registry.cn-shanghai.aliyuncs.com/my2021/2048:motd"
      bigDeploymentSize: 10
      bigDeploymentsPerNamespace: 1
      mediumDeploymentSize: 0
      mediumDeploymentsPerNamespace: 0
      smallDeploymentSize: 0
      smallDeploymentsPerNamespace: 0
      smallStatefulSetSize: 0
      smallStatefulSetsPerNamespace: 0
      mediumStatefulSetSize: 0
      mediumStatefulSetsPerNamespace: 0
      smallJobSize: 0
      smallJobsPerNamespace: 0
      mediumJobSize: 0
      mediumJobsPerNamespace: 0
      bigJobSize: 0
      bigJobsPerNamespace: 0
      daemonSetReplicas: 0
      CL2_CHECK_IF_PODS_ARE_UPDATED: "true"
      CL2_DISABLE_DAEMONSETS: "true"
      CL2_ENABLE_PVS: "false"
 
- module:
    path: /perf-tests/clusterloader2/testing/load/modules/reconcile-objects.yaml
    params:
      actionName: "delete"
      namespaces: 1
      tuningSet: Uniform5qps
      testMaxReplicaFactor: 0.0
      operationTimeout: "5m"
      bigDeploymentSize: 10
      bigDeploymentsPerNamespace: 0
      mediumDeploymentSize: 0
      mediumDeploymentsPerNamespace: 0
      smallDeploymentSize: 0
      smallDeploymentsPerNamespace: 0
      smallStatefulSetSize: 0
      smallStatefulSetsPerNamespace: 0
      mediumStatefulSetSize: 0
      mediumStatefulSetsPerNamespace: 0
      smallJobSize: 0
      smallJobsPerNamespace: 0
      mediumJobSize: 0
      mediumJobsPerNamespace: 0
      bigJobSize: 0
      bigJobsPerNamespace: 0
      daemonSetReplicas: 0
      CL2_CHECK_IF_PODS_ARE_UPDATED: "true"
      CL2_DISABLE_DAEMONSETS: "true"
      CL2_ENABLE_PVS: "false"

 

💥 踩过的深坑

 

第一个坑,构建二进制文件失败

go build -x -v -o clusterloader  clusterloader.go

提示版本报错

go: errors parsing go.mod:
/root/perf-tests/clusterloader2/go.mod:4: invalid go version '1.22.4': must match format 1.23

按照提示,改下即可继续.   同事告诉我go 1.24.3版本构建无需修改这个文件, 升级了节点的go环境,再跑果然正常了。

安装go 1.24版本 的2种操作

1
2
3
4
5
6
7
8
9
sudo add-apt-repository ppa:longsleep/golang-backports
sudo apt install golang-1.24
echo 'export PATH=/usr/lib/go-1.24/bin:$PATH' >> ~/.profile
sed -i 's|export PATH=$PATH:/usr/lib/go-1.24/bin|export PATH=/usr/lib/go-1.24/bin:$PATH|' ~/.profile
source ~/.profile
which go
go env GOROOT
cd perf-tests/clusterloader2/cmd/
GOARCH=amd64 GOOS=linux go build -x -v -o clusterloader clusterloader.go

小结:Go版本≥1.24才能丝滑构建

第3个坑:路径玄学

  • 灵异现象:配置文件死活找不到

配置的config.yaml 文件,里面的引用文件居然不能用绝对路径,只能相对它的路径,如下这种格式

- module:
path: /perf-tests/clusterloader2/testing/load/modules/reconcile-objects.yaml

 

第3个坑 prometheus

参数启用了--enable-prometheus-server=true 时,

会创建一个monitor的ns,和几个pod,以及 名为ssd的sc

kubectl get sc |grep ssd
ssd kubernetes.io/gce-pd Delete Immediate false 125m

解决办法自然是不用prometheus ,这里的值改成fasle,或者直接去掉。

 

调参相关

1, 默认延时问题

这里的值5秒,虚拟环境还真拉不起一个pod,改长点,后来发现物理机可以1-3秒全部跑完,不用改。

{{$podStartupLatencyThreshold := DefaultParam .CL2_POD_STARTUP_LATENCY_THRESHOLD "5s"}}  #改到50s

/perf-tests/clusterloader2/testing/load/simple-deployment.yaml

2, configmap和secrets问题

deployment.yaml ,里面调用了 configmap和secrets ,导致big 测试时过不去,注释掉这2个相关设置清净了。

🎉 胜利的曙光

成功压测结果类似如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
......
I0529 20:08:37.024845 1474234 simple_test_executor.go:87] StatelessPodStartupLatency_PodStartupLatency: {
  "version": "1.0",
  "dataItems": [
    {
      "data": {
        "Perc50": 32737.558438,
        "Perc90": 44789.393112,
        "Perc99": 48925.585207
      },
      "unit": "ms",
      "labels": {
        "Metric": "schedule_to_watch"
      }
    },
    {
      "data": {
        "Perc50": 33592.923438,
        "Perc90": 44913.893112,
        "Perc99": 50046.076207
      },
      "unit": "ms",
      "labels": {
        "Metric": "pod_startup"
      }
    },
......
I0529 20:08:37.027944 1474234 reflector.go:319] Stopping reflector apps/v1, Resource=deployments (0s) from pkg/mod/k8s.io/client-go@v0.32.3/tools/cache/reflector.go:251
I0529 20:08:47.083242 1474234 simple_test_executor.go:402] Resources cleanup time: 10.058361346s
I0529 20:08:47.083302 1474234 clusterloader.go:252] --------------------------------------------------------------------------------
I0529 20:08:47.083309 1474234 clusterloader.go:253] Test Finished
I0529 20:08:47.083314 1474234 clusterloader.go:254]   Test: config-5.yaml
I0529 20:08:47.083320 1474234 clusterloader.go:255]   Status: Success
I0529 20:08:47.083326 1474234 clusterloader.go:259] --------------------------------------------------------------------------------
 
JUnit report was created: /root/junit.xml
I0529 20:08:47.086100 1474234 prometheus.go:331] Get snapshot from Prometheus

 

补充说明,最终在物理机上测试,平均1-3秒搞定。

 

参考资料

k8s部署zabbix 6.0并添加监控(监控k8s资源)
https://bingerambo.com/posts/2020/12/k8s集群性能测试-clusterloader/
K8s集群性能测试

 

相关文章:

  1. kloxo lighttpd failed
  2. redmine-install-notes
  3. "too many open files" from kubectl logs
  4. iptables forbidden port
标签: clusterloader
最后更新:29 5 月, 2025

wanjie

这个人很懒,什么都没留下

点赞
< 上一篇

文章评论

razz evil exclaim smile redface biggrin eek confused idea lol mad twisted rolleyes wink cool arrow neutral cry mrgreen drooling persevering
取消回复

This site uses Akismet to reduce spam. Learn how your comment data is processed.

归档
分类
  • network / 332篇
  • Uncategorized / 116篇
  • unix/linux / 122篇
  • 业界资讯 / 38篇
  • 公司杂事 / 11篇
  • 数码影像 / 12篇
  • 美剧 / 3篇
  • 美图共赏 / 21篇
  • 英语学习 / 3篇
标签聚合
职责 Linux Ubuntu nexus dreamhost空间 邮件归档 gitlab 刷机 VPS squid d90 docker 网通 网站运营 虚拟主机 Google Voice kernel Google 天翼live k8s 泰国 dreamhost Nginx ldap webhook deepseek unveiled today postgres brew iMac

COPYRIGHT © 2008-2025 wanjie.info. ALL RIGHTS RESERVED.

Theme Kratos Made By Seaton Jiang