UCX测试方法

摘要

本文介绍如何测试UCX的UCT和UCP两部分的接口

基础知识

UCT是UCX的底层,负责与底层的通信硬件进行交互,仅提供简单的通信接口,通过上层的UCP的封装,再提供更高级的功能给MPI等软件

UCT

Step 1:选择设备和传输模式

UCT测试中必须设置-d <device>-x <transport>参数,这两个参数的获取可以通过ucx_info -d中的Transport和紧随其后的Device字段来获知

例如:

ucx_perftest -d mlx5_0:1 -x rc

Step 2:选择测试项目

测试项目通过-t <test>来选择,UCT的测试项目有如下这些

 am_lat - UCT active message latency
put_lat - UCT put latency
add_lat - UCT atomic add latency
    get - UCT get latency / bandwidth / message rate
   fadd - UCT atomic fetch-and-add latency / rate
   swap - UCT atomic swap latency / rate
  cswap - UCT atomic compare-and-swap latency / rate
  am_bw - UCT active message bandwidth / message rate

Step 3:选择data layout

注意部分测试需要选择特定的data layout,通过-D <layout>来进行选择

short - short messages (default, cannot be used for get)
bcopy - copy-out (cannot be used for atomics)
zcopy - zero-copy (cannot be used for atomics)
iov    - scatter-gather list (iovec)

Step 4:选择消息大小

部分测试带宽的项目需要-s <size>指定消息的大小来打满带宽,测试迭代次数可以通过-s <size>来制定

当然这样测试一次只能测一组size大小,很麻烦,所以还可以使用-b <file>参数来通过配置文件指定多组参数组合,这个配置文件在ucx的安装目录<path-to-ucx>/share/ucx/perftest有很多的例子

Step 5:启动服务端与客户端测试

服务端可以省略除了-b <file>以外的参数,除非只进行一项测试(不能使用while true; do ucx_perftest; done 来代替-b的作用)

ucx_perftest -b ./msg_pow2

客户端需要填写服务端地址等参数,例如

ucx_perftest -d mlx5_0:1 -x rc_verbs -t put_bw -D zcopy -b ./msg_pow2 gpu6

其他参数

暂未关注其他参数的作用,建议自己多看看,如发现有用会补充到这里

UCP

Step 1:选择测试项目

测试项目通过-t <test>来选择,UCP的测试项目有如下这些

     tag_lat - UCP tag match latency
      tag_bw - UCP tag match bandwidth
tag_sync_lat - UCP tag sync match latency
 tag_sync_bw - UCP tag sync match bandwidth
 ucp_put_lat - UCP put latency
  ucp_put_bw - UCP put bandwidth
     ucp_get - UCP get latency / bandwidth / message rate
     ucp_add - UCP atomic add bandwidth / message rate
    ucp_fadd - UCP atomic fetch-and-add latency / bandwidth / rate
    ucp_swap - UCP atomic swap latency / bandwidth / rate
   ucp_cswap - UCP atomic compare-and-swap latency / bandwidth / rate
   stream_bw - UCP stream bandwidth
  stream_lat - UCP stream latency
  ucp_am_lat - UCP am latency
   ucp_am_bw - UCP am bandwidth / message rate

Step 2:选择消息大小

部分测试带宽的项目需要-s <size>指定消息的大小来打满带宽,测试迭代次数可以通过-s <size>来制定

当然这样测试一次只能测一组size大小,很麻烦,所以还可以使用-b <file>参数来通过配置文件指定多组参数组合,这个配置文件在ucx的安装目录<path-to-ucx>/share/ucx/perftest有很多的例子

Step 3:选择传输方法

通过环境变量UCX_TLSUCX_SELF_DEVICESUCX_SHM_DEVICESUCX_NET_DEVICES来选择传输方法

UCX中环境变量的作用、选项、默认配置,可以通过ucx_info -f来查询

UCX的当前环境变量配置可以通过ucx_info -c来查看

Step 4:启动服务端与客户端测试

服务端可以省略除了-b <file>以外的参数,除非只进行一项测试(不能使用while true; do ucx_perftest; done 来代替-b的作用)

ucx_perftest -b ./msg_pow2

客户端需要填写服务端地址等参数,例如

ucx_perftest -t ucp_put_bw -b ./msg_pow2 gpu6

其他参数

暂未关注其他参数的作用,建议自己多看看,如发现有用会补充到这里

UCT与UCP公共参数

内存选项

可以选择主存,也可以选择显存

-m <send mem type>[,<recv mem type>]
memory type of message for sender and receiver (host)
    host - System memory
    cuda - NVIDIA GPU memory
    cuda-managed - NVIDIA GPU managed/unified memory

输出

  • -f:如果单项测试运行时间大于1秒,会输出当前运行进度及测试结果,如果想要只输出最终结果,就加这个参数
  • -v:将结果以CSV格式输出,方便数据后处理

绑核

可以对CPU进行绑核,这里的编号是指核的编号,不是socket的编号,理论上可以提升性能,还可以消除UCX WARN CPU affinity is not set. Performance may be impacted的Warning

-c <cpulist>   set affinity to this CPU list (separated by comma) (off)

这侧参数通用需要在服务端和客户端都设置,不然另一边照样报Warning

其他参数

暂未关注其他参数的作用,建议自己多看看,如发现有用会补充到这里

测试案例

100G EDR IB网卡测试

先说结论

以下结论中的~符号表述该数据是个约数,表示“左右”

  • 在IB perftest测试中,使用具有对网卡【不】具有亲和性的CPU,IB(_verbs)的延迟会升高~160ns
  • 在IB perftest测试中,使用具有对网卡【不】具有亲和性的CPU,IB(_verbs)的小数据包消息速率(MsgRate[Mpps])会降低~26%
  • 在IB perftest测试中,使用具有对网卡【不】具有亲和性的CPU,IB(_verbs)的(大数据包)带宽几乎没有影响
  • 在延迟测试中,无论是UCT还是UCP的(小数据包)延迟相比IB(_verbs)几乎没有变化
  • 在延迟测试中,UCT的(小数据包)对比原生RoCE,使用具有对网卡具有亲和性的CPU,消息速率下降了20%;使用具有对网卡【不】具有亲和性的CPU,消息速率下降了29%
  • 在延迟测试中,UCP的(小数据包)对比原生RoCE,使用具有对网卡具有亲和性的CPU,消息速率下降了28%;使用具有对网卡【不】具有亲和性的CPU,消息速率下降了25%
  • 在UCT延迟测试中,put_lat延迟最低,am_lat延迟稍高,add_lat延迟最高,add_lat高出~1us
  • 在UCT延迟测试中,使用具有对网卡【不】具有亲和性的CPU,UCT的小数据包延迟、消息速率(MsgRate[Mpps])、带宽劣化~10%
  • 在UCT带宽测试中,UCT能达到的最大带宽与IB(_verbs)一致
  • 在UCT带宽测试中,如果使用bcopy,可以在~4KB打满带宽,但最大的包大小只能支持到8256(应该是由于相关参数设置)
  • 在UCT带宽测试中,IB(_verbs)在~4KB大小就能打满带宽,如果使用zero-copy,UCT在4KB只能到满带宽的~10%,UCT需要512KB的包才能打满带宽
  • 在UCT带宽测试中,使用具有对网卡【不】具有亲和性的CPU,UCT的(大数据包)带宽几乎没有影响
  • 在UCP延迟测试中,使用具有对网卡【不】具有亲和性的CPU,UCP的小数据包延迟劣化~10%
  • 在UCP延迟测试中,不同操作的延迟、消息速率、(小数据包)带宽差异巨大
  • 在UCP带宽测试中,带宽速度并不总是随着包的大小增大而增大,比如stream_bw,猜测可能需要调参?
  • 在UCP带宽测试中,使用具有对网卡【具有】亲和性的CPU,可以在4KB大小的数据包达到IB(_verbs)~82%的性能(ucp_am_bw
  • 在UCP带宽测试中,使用具有对网卡【不】具有亲和性的CPU,UCT的带宽在4KB大小上再劣化16%(ucp_am_bw

ib perftest测试

延迟测试
近CPU
# numactl --physcpubind=14 ib_write_lat -F -d mlx5_0 --iters 100000 gpu6
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 220[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       100000          1.48           11.65        1.53              1.55             0.30            1.58        8.12
---------------------------------------------------------------------------------------
远CPU
numactl --physcpubind=0 ib_write_lat -F -d mlx5_0 --iters 100000 gpu6
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 220[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       100000          1.59           9.30         1.70              1.71             0.24            1.75        6.13
---------------------------------------------------------------------------------------
带宽测试
近CPU
numactl --physcpubind=14 ib_write_bw -F -a -d mlx5_0 --iters=10000 --perform_warm_up gpu6
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          10000            12.85              12.78              6.699589
 4          10000            27.49              26.79              7.023679
 8          10000            54.67              53.17              6.968536
 16         10000            110.25             108.84             7.133210
 32         10000            217.47             216.09             7.080729
 64         10000            441.01             436.70             7.154857
 128        10000            882.02             860.80             7.051688
 256        10000            1706.98            1661.90            6.807136
 512        10000            3368.53            3354.59            6.870204
 1024       10000            6016.92            5825.04            5.964844
 2048       10000            8935.23            8922.25            4.568190
 4096       10000            10371.08            10362.64                  2.652835
 8192       10000            10418.60            10404.98                  1.331837
 16384      10000            10302.62            10302.14                  0.659337
 32768      10000            10310.39            10302.12                  0.329668
 65536      10000            10329.40            10329.02                  0.165264
 131072     10000            10338.65            10338.29                  0.082706
 262144     10000            10419.36            10416.33                  0.041665
 524288     10000            10392.79            10385.47                  0.020771
 1048576    10000            10379.42            10378.66                  0.010379
 2097152    10000            10380.60            10379.62                  0.005190
 4194304    10000            10380.31            10380.04                  0.002595
 8388608    10000            10383.98            10383.97                  0.001298
---------------------------------------------------------------------------------------
远CPU
numactl --physcpubind=0 ib_write_bw -F -a -d mlx5_0 --iters=10000 --perform_warm_up gpu6
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          10000            7.38               7.09               3.719234
 4          10000            19.87              19.74              5.173599
 8          10000            40.39              40.26              5.276447
 16         10000            79.64              79.05              5.180663
 32         10000            159.28             158.22             5.184397
 64         10000            320.49             314.66             5.155441
 128        10000            643.59             634.78             5.200158
 256        10000            1254.02            1231.33            5.043531
 512        10000            2538.20            2530.61            5.182694
 1024       10000            5026.05            5017.23            5.137640
 2048       10000            8251.26            8239.19            4.218463
 4096       10000            9963.10            9952.78            2.547910
 8192       10000            10328.81            10323.26                  1.321378
 16384      10000            10323.53            10317.71                  0.660334
 32768      10000            10363.19            10363.04                  0.331617
 65536      10000            10349.26            10348.72                  0.165580
 131072     10000            10383.06            10375.07                  0.083001
 262144     10000            10375.78            10368.52                  0.041474
 524288     10000            10380.77            10379.93                  0.020760
 1048576    10000            10395.79            10394.86                  0.010395
 2097152    10000            10420.25            10261.75                  0.005131
 4194304    10000            10406.98            10237.38                  0.002559
 8388608    10000            10340.00            10282.59                  0.001285
---------------------------------------------------------------------------------------

UCT测试

延迟测试
近CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 14 -t put_lat -f gpu6
                     1000000      1.550     1.562     1.570        4.89       4.86      640327      636851
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 14 -t am_lat -f gpu6
                     1000000      1.616     1.650     1.648        4.62       4.63      606179      606657
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 14 -t add_lat -f gpu6
                     1000000      2.553     2.591     2.594        2.94       2.94      385903      385516
远CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 0 -t put_lat -f gpu6
                     1000000      1.713     1.729     1.729        4.41       4.41      578447      578202
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 0 -t am_lat -f gpu6
                     1000000      1.869     1.903     1.900        4.01       4.02      525571      526398
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 0 -t add_lat -f gpu6
                     1000000      2.772     2.797     2.795        2.73       2.73      357502      357794
带宽测试
近CPU
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 14 -t put_bw -D bcopy -b ./msg_pow2 -f gpu6
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         8           2000000      0.145     0.173     0.173       44.15      44.15     5786886     5786886
        16           2000000      0.148     0.176     0.176       86.74      86.74     5684532     5684532
        32           2000000      0.146     0.175     0.175      174.81     174.81     5728052     5728052
        64           2000000      0.147     0.186     0.186      327.66     327.66     5368320     5368320
       128           1400000      0.148     0.196     0.196      621.62     621.62     5092302     5092302
       256            700000      0.153     0.181     0.181     1346.55    1346.55     5515470     5515470
       512            300000      0.180     0.222     0.222     2195.09    2195.09     4495550     4495550
      1024            200000      0.192     0.221     0.221     4424.35    4424.35     4530561     4530561
      2048            100000      0.248     0.321     0.321     6083.51    6083.51     3114791     3114791
      4096            100000      0.332     0.386     0.386    10122.70   10122.70     2591438     2591438
      8192             80000      0.662     0.736     0.736    10617.50   10617.50     1359057     1359057
[1701272248.196790] [gpu5:137304:0]         libperf.c:542  UCX  ERROR Message size (16384) is larger than max supported (8256)
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 14 -t put_bw -D zcopy -b ./msg_pow2 -f gpu6
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         8           2000000      2.954     3.010     2.984        2.53       2.56      332265      335101
        16           2000000      3.005     3.079     3.009        4.96       5.07      325063      332382
        32           2000000      3.002     3.025     3.014       10.09      10.13      330707      331833
        64           2000000      2.976     3.004     3.024       20.32      20.19      332891      330731
       128           1400000      3.005     3.051     3.043       40.01      40.11      327725      328587
       256            700000      3.072     3.120     3.100       78.26      78.75      320549      322548
       512            300000      3.033     3.072     3.072      158.93     158.93      325486      325486
      1024            200000      3.219     3.261     3.261      299.50     299.50      306694      306694
      2048            100000      3.452     3.517     3.517      555.41     555.41      284372      284372
      4096            100000      3.899     3.937     3.937      992.19     992.19      254003      254003
      8192             80000      4.562     4.599     4.599     1698.62    1698.62      217426      217426
     16384             40000      6.002     6.074     6.074     2572.28    2572.28      164630      164630
     32768             20000      7.925     7.970     7.970     3920.83    3920.83      125473      125473
     65536             10000     10.638    10.755    10.755     5811.20    5811.20       92989       92989
    131072              5000     16.783    16.710    16.710     7480.36    7480.36       59855       59855
    262144              2500     28.426    28.420    28.420     8796.48    8796.48       35200       35200
    524288              1200     51.595    52.002    52.002     9614.93    9614.93       19246       19246
   1048576               600     97.607    98.573    98.573    10144.73   10144.73       10162       10162
   2097152               300    192.163   193.393   193.393    10341.62   10341.62        5188        5188
远CPU
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 0 -t put_bw -D bcopy -b ./msg_pow2 -f gpu6
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         8           2000000      0.201     0.241     0.241       31.63      31.63     4145552     4145552
        16           2000000      0.200     0.246     0.246       61.99      61.99     4062772     4062772
        32           2000000      0.202     0.241     0.241      126.57     126.57     4147608     4147608
        64           2000000      0.205     0.248     0.248      246.22     246.22     4034131     4034131
       128           1400000      0.203     0.253     0.253      482.07     482.07     3949115     3949115
       256            700000      0.205     0.245     0.245      996.69     996.69     4082451     4082451
       512            300000      0.220     0.279     0.279     1748.94    1748.94     3581846     3581846
      1024            200000      0.239     0.297     0.297     3290.69    3290.69     3369680     3369680
      2048            100000      0.285     0.383     0.383     5096.75    5096.75     2609560     2609560
      4096            100000      0.375     0.443     0.443     8808.37    8808.37     2254964     2254964
      8192             80000      0.683     0.822     0.822     9499.07    9499.07     1215896     1215896
[1701272338.822857] [gpu5:139520:0]         libperf.c:542  UCX  ERROR Message size (16384) is larger than max supported (8256)
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -d mlx5_0:1 -x rc_verbs -c 0 -t put_bw -D zcopy -b ./msg_pow2 -f gpu6
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         8           2000000      3.220     3.262     3.246        2.34       2.35      306553      308086
        16           2000000      3.215     3.233     3.241        4.72       4.71      309272      308575
        32           2000000      3.202     3.239     3.231        9.42       9.44      308757      309488
        64           2000000      3.238     3.258     3.260       18.74      18.72      306981      306723
       128           1400000      3.246     3.270     3.269       37.33      37.34      305787      305910
       256            700000      3.298     3.308     3.303       73.81      73.92      302325      302769
       512            300000      3.328     3.408     3.365      143.27     145.08      293562      297135
      1024            200000      3.492     3.496     3.496      279.36     279.36      286070      286070
      2048            100000      3.681     3.729     3.729      523.82     523.82      268198      268198
      4096            100000      4.155     4.160     4.160      939.00     939.00      240387      240387
      8192             80000      4.778     4.821     4.821     1620.62    1620.62      207442      207442
     16384             40000      6.225     6.285     6.285     2486.16    2486.16      159118      159118
     32768             20000      8.198     8.267     8.267     3780.02    3780.02      120967      120967
     65536             10000     10.849    10.905    10.905     5731.53    5731.53       91714       91714
    131072              5000     17.343    16.970    16.970     7365.77    7365.77       58938       58938
    262144              2500     29.310    29.213    29.213     8557.76    8557.76       34245       34245
    524288              1200     53.064    53.097    53.097     9416.66    9416.66       18849       18849
   1048576               600    100.253   100.615   100.615     9938.87    9938.87        9955        9955
   2097152               300    197.949   199.050   199.050    10047.72   10047.72        5041        5041

UCP测试

延迟测试
脚本
#!/bin/bash
set -e

SERVER=gpu19
AFFINITY=14
SLEEP=1
export UCX_NET_DEVICES=mlx5_0:1

if [ "$HOSTNAME" == "$SERVER" ]
then
    echo Run as server
    SERVER=""
else
    echo Run as client
    SLEEP=3
fi

echo "ucx_perftest -c $AFFINITY -t ucp_put_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_put_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t stream_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t stream_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t tag_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_am_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_am_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t tag_sync_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_sync_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_get -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_get -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_fadd -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_fadd -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_swap -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_swap -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_cswap -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_cswap -f $SERVER
sleep $SLEEP
近CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -c 14 -t ucp_put_lat -f gpu19
                     1000000      1.544     1.563     1.568        4.88       4.87      639737      637742
ucx_perftest -c 14 -t stream_lat -f gpu19
                     1000000      1.699     1.737     1.738        4.39       4.39      575720      575464
ucx_perftest -c 14 -t tag_lat -f gpu19
                     1000000      1.709     1.743     1.740        4.38       4.39      573575      574853
ucx_perftest -c 14 -t ucp_am_lat -f gpu19
                     1000000      1.725     1.752     1.753        4.36       4.35      570896      570548
ucx_perftest -c 14 -t tag_sync_lat -f gpu19
                     1000000      2.859     2.926     2.928        2.61       2.61      341716      341587
ucx_perftest -c 14 -t ucp_get -f gpu19
                     1000000      3.152     3.195     3.191        2.39       2.39      313023      313346
ucx_perftest -c 14 -t ucp_fadd -f gpu19
                     1000000      5.388     5.463     5.467        1.40       1.40      183040      182919
ucx_perftest -c 14 -t ucp_swap -f gpu19
                     1000000      5.377     5.465     5.462        1.40       1.40      182992      183092
ucx_perftest -c 14 -t ucp_cswap -f gpu19
                     1000000      5.388     5.474     5.470        1.39       1.39      182672      182812
远CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -c 0 -t ucp_put_lat -f gpu19
                     1000000      1.692     1.717     1.718        4.44       4.44      582274      582026
ucx_perftest -c 0 -t stream_lat -f gpu19
                     1000000      1.950     1.991     1.992        3.83       3.83      502375      501908
ucx_perftest -c 0 -t tag_lat -f gpu19
                     1000000      1.961     2.002     2.001        3.81       3.81      499584      499641
ucx_perftest -c 0 -t ucp_am_lat -f gpu19
                     1000000      2.028     2.061     2.071        3.70       3.68      485250      482817
ucx_perftest -c 0 -t tag_sync_lat -f gpu19
                     1000000      3.278     3.320     3.317        2.30       2.30      301190      301480
ucx_perftest -c 0 -t ucp_get -f gpu19
                     1000000      3.468     3.508     3.515        2.17       2.17      285037      284467
ucx_perftest -c 0 -t ucp_fadd -f gpu19
                     1000000      6.028     6.105     6.114        1.25       1.25      163802      163561
ucx_perftest -c 0 -t ucp_swap -f gpu19
                     1000000      6.026     6.101     6.124        1.25       1.25      163918      163287
ucx_perftest -c 0 -t ucp_cswap -f gpu19
                     1000000      6.039     6.103     6.130        1.25       1.24      163855      163123
带宽测试
脚本
#!/bin/bash
set -e

SERVER=gpu19
AFFINITY=14
SLEEP=1
export UCX_NET_DEVICES=mlx5_0:1

if [ "$HOSTNAME" == "$SERVER" ]
then
    echo Run as server
    SERVER=""
else
    echo Run as client
    SLEEP=3
fi

echo "ucx_perftest -c $AFFINITY -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP
近CPU
ucx_perftest -c 14 -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.199     0.212     0.212        4.50       4.50     4723653     4723653
         2           2000000      0.195     0.214     0.214        8.91       8.91     4670923     4670923
         4           2000000      0.189     0.203     0.203       18.75      18.75     4915248     4915248
         8           2000000      0.189     0.210     0.210       36.27      36.27     4753744     4753744
        12           2000000      0.189     0.207     0.207       64.46      64.46     4827770     4827770
        16           2000000      0.189     0.207     0.207       73.56      73.56     4820671     4820671
        24           2000000      0.188     0.205     0.205      111.43     111.43     4868490     4868490
        32           2000000      0.195     0.203     0.203      150.33     150.33     4925878     4925878
        40           2000000      0.206     0.218     0.218      175.25     175.25     4594204     4594204
        48           2000000      0.202     0.217     0.217      211.17     211.17     4613130     4613130
        64           2000000      0.210     0.226     0.226      270.20     270.20     4426914     4426914
        80           2000000      0.210     0.229     0.229      333.69     333.69     4373698     4373698
        96           2000000      0.201     0.218     0.218      419.30     419.30     4579822     4579822
       128           1400000      0.227     0.267     0.267      456.46     456.46     3739298     3739298
       256            700000      0.241     0.266     0.266      917.16     917.16     3756678     3756678
       300            700000      0.233     0.267     0.267     1071.18    1071.18     3744036     3744036
       512            300000      0.253     0.280     0.280     1746.63    1746.63     3577089     3577089
      1024            200000      0.285     0.335     0.335     2914.24    2914.24     2984187     2984187
      2048            100000      0.332     0.360     0.360     5424.88    5424.88     2777538     2777538
      3000            100000      0.320     0.404     0.404     7073.34    7073.34     2472313     2472313
      4096            100000      0.396     0.475     0.475     8229.77    8229.77     2106822     2106822
      6000            100000      0.496     0.605     0.605     9458.24    9458.24     1652947     1652947
      8192             80000      0.679     0.824     0.824     9476.89    9476.89     1213042     1213042
     10000             80000      0.856     1.059     1.059     9004.57    9004.57      944198      944198
     16384             40000      1.335     1.645     1.645     9496.77    9496.77      607793      607793
     25000             40000      2.255     2.598     2.598     9177.53    9177.53      384934      384934
     32768             20000      2.824     3.416     3.416     9148.12    9148.12      292740      292740
     45000             20000      0.335     4.132     4.132    10386.10   10386.10      242014      242014
     65536             10000      5.410     6.076     6.076    10286.53   10286.53      164584      164584
    100000             10000      0.330     9.202     9.202    10363.98   10363.98      108674      108674
    131072              5000      0.320    12.144    12.144    10293.32   10293.32       82347       82347
    262144              2500      0.317    24.180    24.180    10339.15   10339.15       41357       41357
    524288              1200      0.322    48.247    48.247    10363.39   10363.39       20727       20727
   1048576               600      0.315    97.349    97.349    10272.35   10272.35       10272       10272
   2097152               300      0.351   195.870   195.870    10210.83   10210.83        5105        5105
ucx_perftest -c 14 -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.205     0.306     0.306        3.12       3.12     3271550     3271550
         2           2000000      0.225     0.320     0.320        5.96       5.96     3126979     3126979
         4           2000000      0.218     0.314     0.314       12.15      12.15     3185601     3185601
         8           2000000      0.227     0.316     0.316       24.18      24.18     3168874     3168874
        12           2000000      0.229     0.324     0.324       41.17      41.17     3083290     3083290
        16           2000000      0.228     0.315     0.315       48.48      48.48     3177291     3177291
        24           2000000      0.225     0.317     0.317       72.12      72.12     3151006     3151006
        32           2000000      0.217     0.321     0.321       95.09      95.09     3115984     3115984
        40           2000000      0.228     0.321     0.321      118.66     118.66     3110628     3110628
        48           2000000      0.222     0.326     0.326      140.26     140.26     3064002     3064002
        64           2000000      0.222     0.322     0.322      189.79     189.79     3109598     3109598
        80           2000000      0.227     0.319     0.319      239.30     239.30     3136585     3136585
        96           2000000      0.230     0.329     0.329      277.97     277.97     3036214     3036214
       128           1400000      0.230     0.325     0.325      375.70     375.70     3077762     3077762
       256            700000      0.232     0.353     0.353      692.22     692.22     2835317     2835317
       300            700000      0.243     0.335     0.335      855.25     855.25     2989330     2989330
       512            300000      0.246     0.401     0.401     1216.45    1216.45     2491281     2491281
      1024            200000      0.298     0.432     0.432     2259.54    2259.54     2313767     2313767
      2048            100000      0.323     0.427     0.427     4577.82    4577.82     2343841     2343841
      3000            100000      0.322     0.477     0.477     6003.60    6003.60     2098411     2098411
      4096            100000      0.325     0.503     0.503     7762.35    7762.35     1987163     1987163
      6000            100000      0.488     0.632     0.632     9046.84    9046.84     1581050     1581050
      8192             80000      0.692     0.839     0.839     9311.14    9311.14     1191826     1191826
     10000             80000      0.906     1.066     1.066     8943.37    8943.37      937780      937780
     16384             40000      1.423     1.674     1.674     9334.75    9334.75      597424      597424
     25000             40000      2.301     2.632     2.632     9056.74    9056.74      379867      379867
     32768             20000      2.812     3.360     3.360     9300.87    9300.87      297628      297628
     45000             20000      0.322     4.149     4.149    10343.67   10343.67      241025      241025
     65536             10000      5.249     6.052     6.052    10327.18   10327.18      165235      165235
    100000             10000      0.325     9.207     9.207    10358.59   10358.59      108618      108618
    131072              5000      0.322    11.959    11.959    10452.35   10452.35       83619       83619
    262144              2500      0.316    23.897    23.897    10461.66   10461.66       41847       41847
    524288              1200      0.318    48.534    48.534    10302.00   10302.00       20604       20604
   1048576               600      0.328    97.277    97.277    10279.95   10279.95       10280       10280
   2097152               300      0.342   196.537   196.537    10176.19   10176.19        5088        5088
ucx_perftest -c 14 -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.065     0.268     0.268        3.56       3.56     3732429     3732429
         2           2000000      0.069     0.268     0.268        7.12       7.12     3731010     3731010
         4           2000000      0.065     0.270     0.270       14.12      14.12     3701401     3701401
         8           2000000      0.069     0.276     0.276       27.67      27.67     3626894     3626894
        12           2000000      0.072     0.272     0.272       49.12      49.12     3679101     3679101
        16           2000000      0.069     0.265     0.265       57.61      57.61     3775729     3775729
        24           2000000      0.069     0.267     0.267       85.63      85.63     3741023     3741023
        32           2000000      0.065     0.283     0.283      107.75     107.75     3530812     3530812
        40           2000000      0.065     0.275     0.275      138.58     138.58     3632737     3632737
        48           2000000      0.065     0.274     0.274      166.91     166.91     3646168     3646168
        64           2000000      0.065     0.283     0.283      215.67     215.67     3533600     3533600
        80           2000000      0.065     0.280     0.280      272.25     272.25     3568447     3568447
        96           2000000      0.069     0.277     0.277      330.07     330.07     3605260     3605260
       128           1400000      0.064     0.299     0.299      407.82     407.82     3340898     3340898
       256            700000      0.070     0.309     0.309      790.76     790.76     3238956     3238956
       300            700000      0.064     0.305     0.305      938.34     938.34     3279750     3279750
       512            300000      0.070     0.333     0.333     1466.38    1466.38     3003153     3003153
      1024            200000      0.070     0.363     0.363     2689.44    2689.44     2753984     2753984
      2048            100000      0.064     0.366     0.366     5337.85    5337.85     2732980     2732980
      3000            100000      0.068     0.424     0.424     6747.41    6747.41     2358391     2358391
      4096            100000      0.068     0.480     0.480     8141.40    8141.40     2084199     2084199
      6000            100000      0.068     0.581     0.581     9851.69    9851.69     1721708     1721708
      8192             80000      0.705     0.741     0.741    10546.72   10546.72     1349980     1349980
     10000             80000      0.065     1.013     1.013     9419.00    9419.00      987653      987653
     16384             40000      1.326     1.529     1.529    10218.25   10218.25      653968      653968
     25000             40000      2.023     2.314     2.314    10302.08   10302.08      432101      432101
     32768             20000      2.660     3.017     3.017    10357.98   10357.98      331455      331455
     45000             20000      3.638     4.139     4.139    10368.90   10368.90      241613      241613
     65536             10000      5.865     6.073     6.073    10292.10   10292.10      164674      164674
    100000             10000      9.338     9.212     9.212    10352.88   10352.88      108558      108558
    131072              5000     11.998    12.133    12.133    10302.13   10302.13       82417       82417
    262144              2500     24.182    24.306    24.306    10285.68   10285.68       41143       41143
    524288              1200     48.160    48.408    48.408    10328.81   10328.81       20658       20658
   1048576               600     97.003    97.600    97.600    10245.92   10245.92       10246       10246
   2097152               300    195.280   196.816   196.816    10161.77   10161.77        5081        5081
ucx_perftest -c 14 -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      3.115     3.160     3.170        0.30       0.30      316456      315419
         2           2000000      3.120     3.197     3.195        0.60       0.60      312774      312965
         4           2000000      3.148     3.192     3.173        1.20       1.20      313319      315144
         8           2000000      3.158     3.192     3.197        2.39       2.39      313309      312745
        12           2000000      3.161     3.207     3.212        4.16       4.16      311784      311379
        16           2000000      3.155     3.192     3.192        4.78       4.78      313258      313268
        24           2000000      3.178     3.223     3.211        7.10       7.13      310231      311456
        32           2000000      3.157     3.193     3.184        9.56       9.58      313158      314047
        40           2000000      3.188     3.223     3.226       11.84      11.83      310288      310025
        48           2000000      3.151     3.193     3.200       14.34      14.30      313177      312497
        64           2000000      3.167     3.204     3.212       19.05      19.00      312088      311357
        80           2000000      3.205     3.260     3.250       23.41      23.47      306782      307676
        96           2000000      3.179     3.233     3.220       28.32      28.43      309356      310529
       128           1400000      3.164     3.251     3.253       37.55      37.52      307623      307365
       256            700000      3.287     3.317     3.314       73.60      73.66      301458      301707
       300            700000      3.325     3.350     3.343       85.39      85.57      298467      299099
       512            300000      3.335     3.386     3.378      144.20     144.53      295326      296006
      1024            200000      3.502     3.577     3.577      273.04     273.04      279590      279590
      2048            100000      3.866     3.926     3.926      497.46     497.46      254699      254699
      3000            100000      3.972     4.047     4.047      707.00     707.00      247113      247113
      4096            100000      4.242     4.294     4.294      909.62     909.62      232863      232863
      6000            100000      4.707     4.780     4.780     1197.13    1197.13      209213      209213
      8192             80000      5.026     5.105     5.105     1530.33    1530.33      195883      195883
     10000             80000      5.391     5.447     5.447     1750.79    1750.79      183584      183584
     16384             40000      6.384     6.450     6.450     2422.45    2422.45      155037      155037
     25000             40000      7.718     7.916     7.916     3011.75    3011.75      126322      126322
     32768             20000      8.909     9.005     9.005     3470.41    3470.41      111053      111053
     45000             20000      9.808     9.868     9.868     4348.92    4348.92      101337      101337
     65536             10000     11.138    11.242    11.242     5559.71    5559.71       88955       88955
    100000             10000     14.618    15.068    15.068     6329.14    6329.14       66366       66366
    131072              5000     17.377    17.463    17.463     7158.16    7158.16       57265       57265
    262144              2500     29.586    29.548    29.548     8460.68    8460.68       33843       33843
    524288              1200     54.075    54.020    54.020     9255.82    9255.82       18512       18512
   1048576               600    102.472   102.897   102.897     9718.49    9718.49        9718        9718
   2097152               300    202.672   203.467   203.467     9829.59    9829.59        4915        4915
ucx_perftest -c 14 -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.190     0.198     0.198        4.80       4.80     5038176     5038176
         2           2000000      0.189     0.191     0.191        9.99       9.99     5239167     5239167
         4           2000000      0.183     0.191     0.191       20.01      20.01     5246313     5246313
         8           2000000      0.184     0.193     0.193       39.50      39.50     5177699     5177699
        12           2000000      0.184     0.198     0.198       67.39      67.39     5047179     5047179
        16           2000000      0.184     0.194     0.194       78.79      78.79     5163596     5163596
        24           2000000      0.183     0.194     0.194      118.20     118.20     5164407     5164407
        32           2000000      0.183     0.199     0.199      153.64     153.64     5034324     5034324
        40           2000000      0.195     0.202     0.202      188.53     188.53     4942272     4942272
        48           2000000      0.198     0.202     0.202      226.94     226.94     4957636     4957636
        64           2000000      0.203     0.205     0.205      298.46     298.46     4889928     4889928
        80           2000000      0.203     0.205     0.205      372.33     372.33     4880168     4880168
        96           2000000      0.197     0.212     0.212      432.10     432.10     4719698     4719698
       128           1400000      0.220     0.237     0.237      515.43     515.43     4222441     4222441
       256            700000      0.232     0.251     0.251      970.90     970.90     3976796     3976796
       300            700000      0.233     0.248     0.248     1153.39    1153.39     4031373     4031373
       512            300000      0.251     0.295     0.295     1653.79    1653.79     3386955     3386955
      1024            200000      0.268     0.293     0.293     3330.81    3330.81     3410752     3410752
      2048            100000      0.355     0.376     0.376     5191.18    5191.18     2657886     2657886
      3000            100000      3.915     3.977     3.977      719.39     719.39      251445      251445
      4096            100000      4.233     4.289     4.289      910.71     910.71      233142      233142
      6000            100000      4.515     4.576     4.576     1250.38    1250.38      218519      218519
      8192             80000      4.865     4.919     4.919     1588.22    1588.22      203293      203293
     10000             80000      5.687     5.765     5.765     1654.39    1654.39      173475      173475
     16384             40000      6.953     7.060     7.060     2213.18    2213.18      141644      141644
     25000             40000      8.488     8.612     8.612     2768.55    2768.55      116121      116121
     32768             20000      9.293     9.382     9.382     3330.90    3330.90      106589      106589
     45000             20000     10.422    10.561    10.561     4063.61    4063.61       94689       94689
     65536             10000     12.192    12.458    12.458     5016.98    5016.98       80272       80272
    100000             10000     15.582    15.650    15.650     6093.65    6093.65       63897       63897
    131072              5000     19.750    19.884    19.884     6286.59    6286.59       50293       50293
    262144              2500     33.013    33.308    33.308     7505.70    7505.70       30023       30023
    524288              1200     60.050    60.697    60.697     8237.59    8237.59       16475       16475
   1048576               600    114.637   115.666   115.666     8645.55    8645.55        8646        8646
   2097152               300    223.855   225.724   225.724     8860.39    8860.39        4430        4430
ucx_perftest -c 14 -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.202     0.233     0.233        4.09       4.09     4290106     4290106
         2           2000000      0.205     0.232     0.232        8.22       8.22     4308654     4308654
         4           2000000      0.202     0.227     0.227       16.79      16.79     4400304     4400304
         8           2000000      0.202     0.227     0.227       33.64      33.64     4408947     4408947
        12           2000000      0.215     0.227     0.227       58.78      58.78     4402618     4402618
        16           2000000      0.205     0.237     0.237       64.27      64.27     4212219     4212219
        24           2000000      0.212     0.226     0.226      101.36     101.36     4428393     4428393
        32           2000000      0.204     0.225     0.225      135.69     135.69     4446241     4446241
        40           2000000      0.223     0.238     0.238      160.60     160.60     4210047     4210047
        48           2000000      0.228     0.245     0.245      186.99     186.99     4084791     4084791
        64           2000000      0.217     0.244     0.244      249.95     249.95     4095240     4095240
        80           2000000      0.216     0.248     0.248      307.78     307.78     4034121     4034121
        96           2000000      0.216     0.239     0.239      382.84     382.84     4181642     4181642
       128           1400000      0.238     0.266     0.266      458.78     458.78     3758308     3758308
       256            700000      0.251     0.303     0.303      805.90     805.90     3300968     3300968
       300            700000      0.246     0.277     0.277     1031.73    1031.73     3606164     3606164
       512            300000      0.252     0.303     0.303     1610.94    1610.94     3299198     3299198
      1024            200000      0.279     0.334     0.334     2922.35    2922.35     2992490     2992490
      2048            100000      0.318     0.366     0.366     5340.50    5340.50     2734334     2734334
      3000            100000      0.340     0.440     0.440     6507.95    6507.95     2274692     2274692
      4096            100000      0.352     0.495     0.495     7883.63    7883.63     2018210     2018210
      6000            100000      0.486     0.605     0.605     9457.76    9457.76     1652863     1652863
      8192             80000      0.681     0.823     0.823     9493.43    9493.43     1215159     1215159
     10000             80000      0.877     1.149     1.149     8302.76    8302.76      870608      870608
     16384             40000      1.354     1.684     1.684     9278.65    9278.65      593833      593833
     25000             40000      2.498     2.758     2.758     8645.25    8645.25      362608      362608
     32768             20000      3.122     3.463     3.463     9023.55    9023.55      288754      288754
     45000             20000      0.326     4.140     4.140    10365.65   10365.65      241537      241537
     65536             10000      4.536     6.080     6.080    10280.28   10280.28      164484      164484
    100000             10000      0.372     9.173     9.173    10396.20   10396.20      109012      109012
    131072              5000      0.332    12.079    12.079    10348.37   10348.37       82787       82787
    262144              2500      0.327    24.113    24.113    10367.77   10367.77       41471       41471
    524288              1200      0.337    48.441    48.441    10321.90   10321.90       20644       20644
   1048576               600      0.334    97.113    97.113    10297.23   10297.23       10297       10297
   2097152               300      0.364   195.900   195.900    10209.30   10209.30        5105        5105
远CPU
ucx_perftest -c 0 -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.252     0.268     0.268        3.56       3.56     3735971     3735971
         2           2000000      0.248     0.267     0.267        7.14       7.14     3740898     3740898
         4           2000000      0.247     0.265     0.265       14.38      14.38     3769936     3769936
         8           2000000      0.248     0.272     0.272       28.00      28.00     3670041     3670041
        12           2000000      0.252     0.274     0.274       48.71      48.71     3648490     3648490
        16           2000000      0.246     0.265     0.265       57.53      57.53     3770412     3770412
        24           2000000      0.246     0.262     0.262       87.35      87.35     3816510     3816510
        32           2000000      0.248     0.272     0.272      112.04     112.04     3671187     3671187
        40           2000000      0.262     0.277     0.277      137.91     137.91     3615270     3615270
        48           2000000      0.261     0.285     0.285      160.57     160.57     3507632     3507632
        64           2000000      0.270     0.282     0.282      216.48     216.48     3546830     3546830
        80           2000000      0.258     0.281     0.281      271.36     271.36     3556820     3556820
        96           2000000      0.268     0.281     0.281      325.52     325.52     3555524     3555524
       128           1400000      0.289     0.316     0.316      386.66     386.66     3167491     3167491
       256            700000      0.300     0.333     0.333      733.10     733.10     3002783     3002783
       300            700000      0.290     0.324     0.324      882.82     882.82     3085671     3085671
       512            300000      0.309     0.364     0.364     1342.53    1342.53     2749498     2749498
      1024            200000      0.333     0.414     0.414     2356.34    2356.34     2412891     2412891
      2048            100000      0.389     0.438     0.438     4461.63    4461.63     2284355     2284355
      3000            100000      0.368     0.513     0.513     5572.58    5572.58     1947759     1947759
      4096            100000      0.382     0.600     0.600     6505.64    6505.64     1665444     1665444
      6000            100000      0.584     0.766     0.766     7470.03    7470.03     1305482     1305482
      8192             80000      0.710     0.911     0.911     8580.09    8580.09     1098251     1098251
     10000             80000      0.858     1.240     1.240     7691.70    7691.70      806533      806533
     16384             40000      1.386     1.811     1.811     8628.18    8628.18      552204      552204
     25000             40000      2.468     2.887     2.887     8259.57    8259.57      346431      346431
     32768             20000      3.256     3.670     3.670     8514.07    8514.07      272450      272450
     45000             20000      0.377     4.207     4.207    10201.19   10201.19      237705      237705
     65536             10000      5.522     6.166     6.166    10135.56   10135.56      162169      162169
    100000             10000      0.366     9.304     9.304    10249.81   10249.81      107477      107477
    131072              5000      0.385    12.349    12.349    10122.45   10122.45       80980       80980
    262144              2500      0.376    24.497    24.497    10205.44   10205.44       40822       40822
    524288              1200      0.375    49.280    49.280    10146.12   10146.12       20292       20292
   1048576               600      0.366    98.275    98.275    10175.53   10175.53       10176       10176
   2097152               300      0.411   195.463   195.463    10232.09   10232.09        5116        5116
ucx_perftest -c 0 -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.271     0.438     0.438        2.18       2.18     2284104     2284104
         2           2000000      0.278     0.438     0.438        4.35       4.35     2281912     2281912
         4           2000000      0.272     0.436     0.436        8.76       8.76     2295597     2295597
         8           2000000      0.275     0.433     0.433       17.61      17.61     2307780     2307780
        12           2000000      0.275     0.445     0.445       30.03      30.03     2248990     2248990
        16           2000000      0.275     0.437     0.437       34.95      34.95     2290790     2290790
        24           2000000      0.276     0.439     0.439       52.13      52.13     2277782     2277782
        32           2000000      0.265     0.439     0.439       69.49      69.49     2277074     2277074
        40           2000000      0.284     0.441     0.441       86.56      86.56     2269156     2269156
        48           2000000      0.286     0.447     0.447      102.43     102.43     2237527     2237527
        64           2000000      0.277     0.439     0.439      138.95     138.95     2276600     2276600
        80           2000000      0.278     0.449     0.449      169.91     169.91     2227092     2227092
        96           2000000      0.282     0.444     0.444      205.99     205.99     2249982     2249982
       128           1400000      0.287     0.450     0.450      271.05     271.05     2220418     2220418
       256            700000      0.295     0.466     0.466      523.49     523.49     2144214     2144214
       300            700000      0.303     0.466     0.466      613.39     613.39     2143951     2143951
       512            300000      0.311     0.505     0.505      967.83     967.83     1982107     1982107
      1024            200000      0.345     0.551     0.551     1773.34    1773.34     1815901     1815901
      2048            100000      0.386     0.580     0.580     3367.11    3367.11     1723958     1723958
      3000            100000      0.365     0.670     0.670     4269.79    4269.79     1492401     1492401
      4096            100000      0.369     0.760     0.760     5137.37    5137.37     1315167     1315167
      6000            100000      0.631     0.899     0.899     6367.67    6367.67     1112831     1112831
      8192             80000      0.774     1.060     1.060     7370.99    7370.99      943486      943486
     10000             80000      1.064     1.391     1.391     6854.24    6854.24      718720      718720
     16384             40000      1.721     2.011     2.011     7769.00    7769.00      497216      497216
     25000             40000      2.809     3.157     3.157     7551.94    7551.94      316751      316751
     32768             20000      3.528     3.848     3.848     8120.69    8120.69      259862      259862
     45000             20000      0.364     4.252     4.252    10091.81   10091.81      235156      235156
     65536             10000      0.392     6.171     6.171    10128.66   10128.66      162059      162059
    100000             10000      0.371     9.330     9.330    10221.47   10221.47      107180      107180
    131072              5000      0.363    12.485    12.485    10012.34   10012.34       80099       80099
    262144              2500      0.378    24.511    24.511    10199.60   10199.60       40798       40798
    524288              1200      0.375    49.322    49.322    10137.37   10137.37       20275       20275
   1048576               600      0.383    98.492    98.492    10153.16   10153.16       10153       10153
   2097152               300      0.534   200.003   200.003     9999.85    9999.85        5000        5000
ucx_perftest -c 0 -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.065     0.314     0.314        3.03       3.03     3180206     3180206
         2           2000000      0.069     0.318     0.318        6.01       6.01     3148402     3148402
         4           2000000      0.065     0.315     0.315       12.11      12.11     3175101     3175101
         8           2000000      0.065     0.317     0.317       24.08      24.08     3155574     3155574
        12           2000000      0.065     0.322     0.322       41.48      41.48     3106766     3106766
        16           2000000      0.065     0.320     0.320       47.71      47.71     3126841     3126841
        24           2000000      0.072     0.318     0.318       71.95      71.95     3143358     3143358
        32           2000000      0.065     0.332     0.332       91.90      91.90     3011472     3011472
        40           2000000      0.065     0.330     0.330      115.61     115.61     3030533     3030533
        48           2000000      0.065     0.333     0.333      137.44     137.44     3002452     3002452
        64           2000000      0.069     0.331     0.331      184.43     184.43     3021650     3021650
        80           2000000      0.065     0.328     0.328      232.81     232.81     3051474     3051474
        96           2000000      0.065     0.328     0.328      279.40     279.40     3051841     3051841
       128           1400000      0.070     0.363     0.363      336.69     336.69     2758142     2758142
       256            700000      0.068     0.374     0.374      653.38     653.38     2676251     2676251
       300            700000      0.068     0.370     0.370      773.99     773.99     2705304     2705304
       512            300000      0.064     0.380     0.380     1283.70    1283.70     2629020     2629020
      1024            200000      0.068     0.416     0.416     2344.89    2344.89     2401163     2401163
      2048            100000      0.068     0.513     0.513     3808.67    3808.67     1950041     1950041
      3000            100000      0.065     0.546     0.546     5244.00    5244.00     1832910     1832910
      4096            100000      0.065     0.630     0.630     6200.78    6200.78     1587399     1587399
      6000            100000      0.068     0.759     0.759     7537.36    7537.36     1317249     1317249
      8192             80000      0.068     0.907     0.907     8612.85    8612.85     1102444     1102444
     10000             80000      0.068     1.187     1.187     8034.83    8034.83      842513      842513
     16384             40000      1.330     1.545     1.545    10111.43   10111.43      647132      647132
     25000             40000      2.028     2.439     2.439     9776.06    9776.06      410037      410037
     32768             20000      2.655     3.176     3.176     9840.65    9840.65      314901      314901
     45000             20000      3.653     4.230     4.230    10145.45   10145.45      236406      236406
     65536             10000      6.145     6.217     6.217    10052.42   10052.42      160839      160839
    100000             10000      9.558     9.481     9.481    10058.92   10058.92      105475      105475
    131072              5000     12.242    12.430    12.430    10056.32   10056.32       80451       80451
    262144              2500     24.788    25.098    25.098     9960.94    9960.94       39844       39844
    524288              1200     48.920    49.229    49.229    10156.60   10156.60       20313       20313
   1048576               600     97.599    98.038    98.038    10200.11   10200.11       10200       10200
   2097152               300    198.453   200.047   200.047     9997.67    9997.67        4999        4999
ucx_perftest -c 0 -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      3.473     3.526     3.543        0.27       0.27      283635      282234
         2           2000000      3.506     3.557     3.546        0.54       0.54      281150      281971
         4           2000000      3.540     3.580     3.571        1.07       1.07      279300      280024
         8           2000000      3.530     3.591     3.586        2.12       2.13      278446      278859
        12           2000000      3.555     3.593     3.591        3.72       3.72      278317      278440
        16           2000000      3.530     3.591     3.573        4.25       4.27      278495      279852
        24           2000000      3.561     3.592     3.586        6.37       6.38      278384      278874
        32           2000000      3.519     3.567     3.538        8.56       8.63      280337      282678
        40           2000000      3.491     3.523     3.525       10.83      10.82      283863      283720
        48           2000000      3.480     3.548     3.510       12.90      13.04      281843      284903
        64           2000000      3.445     3.498     3.485       17.45      17.51      285855      286923
        80           2000000      3.589     3.622     3.619       21.07      21.08      276111      276298
        96           2000000      3.558     3.594     3.586       25.47      25.53      278215      278888
       128           1400000      3.522     3.570     3.567       34.20      34.22      280145      280332
       256            700000      3.588     3.640     3.645       67.07      66.98      274703      274349
       300            700000      3.647     3.688     3.688       77.58      77.59      271156      271183
       512            300000      3.660     3.719     3.715      131.31     131.42      268921      269153
      1024            200000      3.892     3.922     3.922      248.97     248.97      254943      254943
      2048            100000      4.126     4.164     4.164      469.07     469.07      240163      240163
      3000            100000      4.365     4.418     4.418      647.65     647.65      226370      226370
      4096            100000      4.788     4.830     4.830      808.67     808.67      207019      207019
      6000            100000      5.295     5.337     5.337     1072.17    1072.17      187375      187375
      8192             80000      5.699     5.754     5.754     1357.86    1357.86      173806      173806
     10000             80000      5.617     5.767     5.767     1653.72    1653.72      173405      173405
     16384             40000      6.635     6.726     6.726     2323.18    2323.18      148683      148683
     25000             40000      7.788     7.838     7.838     3041.73    3041.73      127580      127580
     32768             20000      8.810     8.851     8.851     3530.71    3530.71      112983      112983
     45000             20000      9.619     9.752     9.752     4400.76    4400.76      102545      102545
     65536             10000     11.397    11.488    11.488     5440.46    5440.46       87047       87047
    100000             10000     14.592    15.092    15.092     6319.04    6319.04       66260       66260
    131072              5000     17.308    17.506    17.506     7140.32    7140.32       57123       57123
    262144              2500     30.098    30.190    30.190     8280.89    8280.89       33124       33124
    524288              1200     55.179    55.641    55.641     8986.22    8986.22       17972       17972
   1048576               600    105.772   105.985   105.985     9435.30    9435.30        9435        9435
   2097152               300    208.972   209.947   209.947     9526.23    9526.23        4763        4763
ucx_perftest -c 0 -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.246     0.255     0.255        3.75       3.75     3928763     3928763
         2           2000000      0.237     0.242     0.242        7.88       7.88     4129884     4129884
         4           2000000      0.237     0.252     0.252       15.13      15.13     3967096     3967096
         8           2000000      0.243     0.245     0.245       31.11      31.11     4077647     4077647
        12           2000000      0.238     0.245     0.245       54.41      54.41     4074988     4074988
        16           2000000      0.243     0.254     0.254       60.03      60.03     3934003     3934003
        24           2000000      0.246     0.245     0.245       93.47      93.47     4083641     4083641
        32           2000000      0.244     0.247     0.247      123.80     123.80     4056524     4056524
        40           2000000      0.250     0.265     0.265      143.91     143.91     3772524     3772524
        48           2000000      0.255     0.258     0.258      177.11     177.11     3869100     3869100
        64           2000000      0.255     0.265     0.265      230.18     230.18     3771264     3771264
        80           2000000      0.249     0.262     0.262      291.49     291.49     3820556     3820556
        96           2000000      0.248     0.255     0.255      359.15     359.15     3922869     3922869
       128           1400000      0.274     0.294     0.294      415.13     415.13     3400765     3400765
       256            700000      0.295     0.335     0.335      728.56     728.56     2984171     2984171
       300            700000      0.278     0.312     0.312      918.07     918.07     3208904     3208904
       512            300000      0.298     0.342     0.342     1427.21    1427.21     2922921     2922921
      1024            200000      0.332     0.377     0.377     2588.53    2588.53     2650655     2650655
      2048            100000      0.362     0.540     0.540     3614.70    3614.70     1850728     1850728
      3000            100000      4.201     4.241     4.241      674.65     674.65      235808      235808
      4096            100000      4.501     4.546     4.546      859.28     859.28      219977      219977
      6000            100000      4.822     4.869     4.869     1175.15    1175.15      205372      205372
      8192             80000      5.188     5.231     5.231     1493.36    1493.36      191150      191150
     10000             80000      6.102     6.134     6.134     1554.65    1554.65      163016      163016
     16384             40000      7.405     7.493     7.493     2085.17    2085.17      133451      133451
     25000             40000      9.114     9.197     9.197     2592.46    2592.46      108736      108736
     32768             20000      9.715     9.777     9.777     3196.31    3196.31      102282      102282
     45000             20000     10.911    10.981    10.981     3908.29    3908.29       91070       91070
     65536             10000     12.558    12.670    12.670     4932.91    4932.91       78926       78926
    100000             10000     18.098    18.201    18.201     5239.59    5239.59       54941       54941
    131072              5000     20.839    20.957    20.957     5964.66    5964.66       47717       47717
    262144              2500     35.423    35.692    35.692     7004.37    7004.37       28017       28017
    524288              1200     65.245    65.493    65.493     7634.38    7634.38       15269       15269
   1048576               600    126.844   158.038   158.038     6327.59    6327.59        6328        6328
   2097152               300    244.375   271.827   271.827     7357.63    7357.63        3679        3679
ucx_perftest -c 0 -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.257     0.288     0.288        3.31       3.31     3472662     3472662
         2           2000000      0.255     0.282     0.282        6.76       6.76     3544357     3544357
         4           2000000      0.254     0.285     0.285       13.37      13.37     3505057     3505057
         8           2000000      0.262     0.280     0.280       27.22      27.22     3567459     3567459
        12           2000000      0.262     0.288     0.288       46.41      46.41     3476218     3476218
        16           2000000      0.255     0.288     0.288       52.96      52.96     3470813     3470813
        24           2000000      0.255     0.280     0.280       81.85      81.85     3576173     3576173
        32           2000000      0.255     0.281     0.281      108.55     108.55     3556940     3556940
        40           2000000      0.277     0.300     0.300      127.30     127.30     3337187     3337187
        48           2000000      0.271     0.295     0.295      154.94     154.94     3384657     3384657
        64           2000000      0.267     0.292     0.292      209.13     209.13     3426447     3426447
        80           2000000      0.278     0.299     0.299      255.34     255.34     3346832     3346832
        96           2000000      0.273     0.298     0.298      307.07     307.07     3354039     3354039
       128           1400000      0.282     0.331     0.331      369.07     369.07     3023387     3023387
       256            700000      0.288     0.338     0.338      721.82     721.82     2956579     2956579
       300            700000      0.287     0.336     0.336      850.32     850.32     2972097     2972097
       512            300000      0.309     0.413     0.413     1182.39    1182.39     2421542     2421542
      1024            200000      0.333     0.418     0.418     2336.19    2336.19     2392261     2392261
      2048            100000      0.382     0.446     0.446     4379.39    4379.39     2242248     2242248
      3000            100000      0.385     0.595     0.595     4805.94    4805.94     1679798     1679798
      4096            100000      0.385     0.544     0.544     7174.79    7174.79     1836747     1836747
      6000            100000      0.592     0.679     0.679     8424.19    8424.19     1472233     1472233
      8192             80000      0.733     0.899     0.899     8688.64    8688.64     1112146     1112146
     10000             80000      0.973     1.313     1.313     7264.01    7264.01      761687      761687
     16384             40000      1.435     1.861     1.861     8394.44    8394.44      537244      537244
     25000             40000      2.762     2.999     2.999     7949.14    7949.14      333411      333411
     32768             20000      3.542     3.825     3.825     8169.41    8169.41      261421      261421
     45000             20000      0.378     4.240     4.240    10121.66   10121.66      235852      235852
     65536             10000      0.409     6.119     6.119    10214.74   10214.74      163436      163436
    100000             10000      0.393     9.332     9.332    10219.28   10219.28      107157      107157
    131072              5000      0.375    12.156    12.156    10282.82   10282.82       82263       82263
    262144              2500      0.371    24.673    24.673    10132.62   10132.62       40530       40530
    524288              1200      0.380    49.301    49.301    10141.78   10141.78       20284       20284
   1048576               600      0.373    98.927    98.927    10108.50   10108.50       10109       10109
   2097152               300      0.387   199.137   199.137    10043.35   10043.35        5022        5022

25G RoCE 以太网卡测试

先说结论

  • RoCE的延迟比IB高0.5us左右,比较符合以太网延迟比IB略高一些的预测
  • 在perftest测试中,使用具有对网卡【不】具有亲和性的CPU,原生RoCE的延迟会升高~170ns(IB为~160ns)
  • 在perftest测试中,使用具有对网卡【不】具有亲和性的CPU,原生RoCE的小数据包消息速率(MsgRate[Mpps])会降低~22%(IB为~26%)
  • 在perftest测试中,使用具有对网卡【不】具有亲和性的CPU,原生RoCE的(大数据包)带宽几乎没有影响(IB也不受影响)
  • 在延迟测试中,无论是UCT还是UCP的(小数据包)延迟相比原生RoCE几乎没有变化
  • 在延迟测试中,UCT的(小数据包)对比原生RoCE,使用具有对网卡具有亲和性的CPU,消息速率下降了~15%(IB~20%);使用具有对网卡【不】具有亲和性的CPU,消息速率下降了~20%(IB~29%)
  • 在延迟测试中,UCP的(小数据包)对比原生RoCE,使用具有对网卡具有亲和性的CPU,消息速率下降了~30%(IB~28%);使用具有对网卡【不】具有亲和性的CPU,消息速率下降了~28%(IB~25%)
  • 在UCT延迟测试中,put_lat延迟最低,am_lat延迟稍高,add_lat延迟最高,add_lat高出~1us(与IB一致)
  • 在UCT延迟测试中,使用具有对网卡【不】具有亲和性的CPU,UCT的小数据包延迟、消息速率(MsgRate[Mpps])、带宽劣化~26%(IB~10%)
  • 在UCT带宽测试中,UCT能达到的最大带宽与原生RoCE一致(与IB一致)
  • 在UCT带宽测试中,如果使用bcopy,可以在~1KB(IB~4KB)打满带宽,但最大的包大小只能支持到8256(应该是由于相关参数设置)
  • 在UCT带宽测试中,原生RoCE在~512Bytes大小就能打满带宽,如果使用zero-copy,UCT在4KB只能到满带宽的~4%(IB~10%),UCT需要512KB的包才能打满带宽(与IB一致)
  • 在UCT带宽测试中,使用具有对网卡【不】具有亲和性的CPU,UCT的(大数据包)带宽几乎没有影响
  • 在UCP延迟测试中,使用具有对网卡【不】具有亲和性的CPU,UCP的小数据包延迟劣化~7%(IB~10%)
  • 在UCP延迟测试中,不同操作的延迟、消息速率、(小数据包)带宽差异巨大
  • 在UCP带宽测试中,IB的带宽速度并不总是随着包的大小增大而增大,比如stream_bw;但25G RoCE 以太网 UCP测试中没有出现该问题
  • 在UCP带宽测试中,使用具有对网卡【具有】亲和性的CPU,可以在512Bytes大小的数据包达到原生RoCE~70%的性能(stream_bw
  • 在UCP带宽测试中,使用具有对网卡【不】具有亲和性的CPU,UCT的带宽的512Bytes大小上再劣化~30%(stream_bw

ib perftest测试

延迟测试
近CPU
# numactl --physcpubind=14 ib_write_lat -F -d mlx5_2 --iters 100000 gpu19
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test
 Dual-port       : OFF          Device         : mlx5_2
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 220[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       100000          2.01           10.43        2.06              2.09             0.34    2.12             8.59
---------------------------------------------------------------------------------------
远CPU
numactl --physcpubind=0 ib_write_lat -F -d mlx5_2 --iters 100000 gpu19
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test
 Dual-port       : OFF          Device         : mlx5_2
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 220[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x03c3 PSN 0x97d6aa RKey 0x0034a2 VAddr 0x002b99d5c43000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:20:09:20
 remote address: LID 0000 QPN 0x0347 PSN 0x6f3d6f RKey 0x00926d VAddr 0x002b7e84e8f000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:20:09:19
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       100000          2.12           35.92        2.23              2.25             0.27    2.28             8.03
---------------------------------------------------------------------------------------
带宽测试
近CPU
# numactl --physcpubind=14 ib_write_bw -F -a -d mlx5_2 --iters=10000 --perform_warm_up gpu19
Requested SQ size might be too big. Try reducing TX depth and/or inline size.
Current TX depth is 128 and  inline size is 0 .
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_2
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          10000            7.88               7.82               4.098549
 4          10000            27.11              26.11              6.845110
 8          10000            54.07              52.65              6.900310
 16         10000            108.14             106.04             6.949527
 32         10000            215.11             212.69             6.969377
 64         10000            429.05             423.21             6.933873
 128        10000            833.27             830.40             6.802627
 256        10000            1579.26            1559.61            6.388179
 512        10000            2478.59            2474.59            5.067964
 1024       10000            2715.03            2708.94            2.773959
 2048       10000            2737.07            2735.88            1.400770
 4096       10000            2748.94            2746.59            0.703126
 8192       10000            2753.66            2752.97            0.352380
 16384      10000            2756.24            2755.89            0.176377
 32768      10000            2757.99            2757.92            0.088253
 65536      10000            2758.50            2758.47            0.044136
 131072     10000            2758.92            2758.86            0.022071
 262144     10000            2759.14            2759.13            0.011037
 524288     10000            2759.00            2758.89            0.005518
 1048576    10000            2759.25            2759.23            0.002759
 2097152    10000            2759.11            2759.10            0.001380
 4194304    10000            2759.13            2759.13            0.000690
 8388608    10000            2759.12            2759.11            0.000345
---------------------------------------------------------------------------------------
远CPU
# numactl --physcpubind=0 ib_write_bw -F -a -d mlx5_2 --iters=10000 --perf
orm_warm_up gpu19
Requested SQ size might be too big. Try reducing TX depth and/or inline size.
Current TX depth is 128 and  inline size is 0 .
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_2
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          10000            6.48               6.44               3.375411
 4          10000            20.96              20.84              5.463458
 8          10000            41.58              40.69              5.333438
 16         10000            84.03              83.58              5.477685
 32         10000            166.30             165.55             5.424836
 64         10000            335.42             329.74             5.402410
 128        10000            662.44             658.22             5.392126
 256        10000            1292.42            1286.00            5.267444
 512        10000            2376.30            2346.89            4.806440
 1024       10000            2715.07            2712.65            2.777752
 2048       10000            2738.54            2737.21            1.401452
 4096       10000            2748.19            2747.36            0.703324
 8192       10000            2753.68            2753.68            0.352470
 16384      10000            2756.41            2756.30            0.176403
 32768      10000            2758.00            2757.82            0.088250
 65536      10000            2758.52            2758.41            0.044135
 131072     10000            2758.96            2758.93            0.022071
 262144     10000            2759.10            2759.07            0.011036
 524288     10000            2759.13            2759.08            0.005518
 1048576    10000            2759.22            2759.22            0.002759
 2097152    10000            2759.20            2759.18            0.001380
 4194304    10000            2759.21            2759.21            0.000690
 8388608    10000            2759.22            2759.21            0.000345
---------------------------------------------------------------------------------------

UCT测试

延迟测试
脚本
#!/bin/bash
set -e

SERVER=gpu19
AFFINITY=14
SLEEP=1

if [ "$HOSTNAME" == "$SERVER" ]
then
    echo Run as server
    SERVER=""
else
    echo Run as client
    SLEEP=2
fi
COMMANDS=(
    "ucx_perftest -d mlx5_2:1 -x rc_verbs -c $AFFINITY -t put_lat -f"
    "ucx_perftest -d mlx5_2:1 -x rc_verbs -c $AFFINITY -t am_lat -f"
    "ucx_perftest -d mlx5_2:1 -x rc_verbs -c $AFFINITY -t add_lat -f"
)

for COMMAND in "${COMMANDS[@]}"
do
    echo $COMMAND $SERVER
    $COMMAND $SERVER
    sleep $SLEEP
done
近CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 14 -t put_lat -f gpu19
                     1000000      2.073     2.103     2.101        3.63       3.63      475456      475873
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 14 -t am_lat -f gpu19
                     1000000      2.135     2.176     2.180        3.51       3.50      459578      458735
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 14 -t add_lat -f gpu19
                     1000000      3.092     3.142     3.144        2.43       2.43      318261      318043
远CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 0 -t put_lat -f gpu19
                     1000000      2.233     2.257     2.254        3.38       3.38      443162      443646
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 0 -t am_lat -f gpu19
                     1000000      2.388     2.423     2.421        3.15       3.15      412781      413045
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 0 -t add_lat -f gpu19
                     1000000      3.323     3.347     3.348        2.28       2.28      298775      298718
带宽测试
脚本
#!/bin/bash
set -e

SERVER=gpu19
AFFINITY=14
SLEEP=1

if [ "$HOSTNAME" == "$SERVER" ]
then
    echo Run as server
    SERVER=""
else
    echo Run as client
    SLEEP=2
fi
COMMANDS=(
    "ucx_perftest -d mlx5_2:1 -x rc_verbs -c $AFFINITY -t put_bw -D bcopy -b /usr/share/ucx/perftest/msg_pow2 -f"
    "ucx_perftest -d mlx5_2:1 -x rc_verbs -c $AFFINITY -t put_bw -D zcopy -b /usr/share/ucx/perftest/msg_pow2 -f"
)

for COMMAND in "${COMMANDS[@]}"
do
    echo $COMMAND $SERVER
    $COMMAND $SERVER
    sleep $SLEEP
done
近CPU
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 14 -t put_bw -D bcopy -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.140     0.178     0.178        5.35       5.35     5609615     5609615
         2           2000000      0.143     0.178     0.178       10.74      10.74     5633060     5633060
         4           2000000      0.142     0.171     0.171       22.33      22.33     5852746     5852746
         8           2000000      0.138     0.169     0.169       45.03      45.03     5902390     5902390
        12           2000000      0.138     0.168     0.168       79.39      79.39     5946119     5946119
        16           2000000      0.138     0.181     0.181       84.38      84.38     5529905     5529905
        24           2000000      0.143     0.172     0.172      132.81     132.81     5802437     5802437
        32           2000000      0.142     0.181     0.181      169.00     169.00     5537871     5537871
        40           2000000      0.142     0.170     0.170      223.81     223.81     5867053     5867053
        48           2000000      0.139     0.169     0.169      271.64     271.64     5933967     5933967
        64           2000000      0.142     0.178     0.178      343.19     343.19     5622843     5622843
        80           2000000      0.144     0.183     0.183      418.04     418.04     5479351     5479351
        96           2000000      0.142     0.179     0.179      511.91     511.91     5591390     5591390
       128           1400000      0.140     0.185     0.185      661.50     661.50     5418991     5418991
       256            700000      0.142     0.201     0.201     1212.52    1212.52     4966485     4966485
       300            700000      0.149     0.185     0.185     1544.71    1544.71     5399164     5399164
       512            300000      0.184     0.223     0.223     2187.37    2187.37     4479753     4479753
      1024            200000      0.357     0.362     0.362     2697.66    2697.66     2762414     2762414
      2048            100000      0.713     0.714     0.714     2736.43    2736.43     1401067     1401067
      3000            100000      1.041     1.045     1.045     2737.01    2737.01      956663      956663
      4096            100000      1.416     1.424     1.424     2744.00    2744.00      702470      702470
      6000            100000      2.077     2.085     2.085     2744.62    2744.62      479662      479662
      8192             80000      2.828     2.841     2.841     2750.36    2750.36      352050      352050
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 14 -t put_bw -D zcopy -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      3.959     4.013     4.001        0.24       0.24      249183      249942
         2           2000000      3.971     4.018     4.008        0.47       0.48      248868      249485
         4           2000000      3.985     4.052     4.016        0.94       0.95      246876      249008
         8           2000000      3.981     4.032     4.014        1.89       1.90      248108      249137
        12           2000000      3.976     4.010     4.020        3.33       3.32      249406      248736
        16           2000000      3.965     4.002     4.020        3.81       3.80      249949      248783
        24           2000000      3.968     4.022     4.023        5.69       5.69      248650      248548
        32           2000000      4.005     4.047     4.028        7.54       7.58      247105      248282
        40           2000000      3.972     4.021     4.025        9.49       9.48      248717      248438
        48           2000000      4.006     4.054     4.048       11.29      11.31      246693      247028
        64           2000000      4.016     4.060     4.049       15.03      15.07      246334      246987
        80           2000000      4.022     4.065     4.074       18.77      18.73      245988      245476
        96           2000000      4.012     4.062     4.080       22.54      22.44      246196      245119
       128           1400000      4.055     4.088     4.090       29.86      29.84      244632      244484
       256            700000      4.185     4.229     4.228       57.73      57.75      236451      236531
       300            700000      4.232     4.277     4.277       66.89      66.89      233804      233799
       512            300000      4.365     4.395     4.409      111.10     110.75      227538      226812
      1024            200000      4.845     4.893     4.893      199.58     199.58      204368      204368
      2048            100000      5.264     5.297     5.297      368.69     368.69      188773      188773
      3000            100000      5.655     5.701     5.701      501.83     501.83      175405      175405
      4096            100000      6.068     6.137     6.137      636.51     636.51      162947      162947
      6000            100000      6.765     6.742     6.742      848.68     848.68      148319      148319
      8192             80000      7.793     7.810     7.810     1000.36    1000.36      128048      128048
     10000             80000      8.111     8.145     8.145     1170.85    1170.85      122775      122775
     16384             40000     10.313    10.393    10.393     1503.41    1503.41       96221       96221
     25000             40000     13.298    13.408    13.408     1778.12    1778.12       74581       74581
     32768             20000     15.976    16.076    16.076     1943.90    1943.90       62208       62208
     45000             20000     20.294    20.361    20.361     2107.71    2107.71       49116       49116
     65536             10000     27.372    27.518    27.518     2271.26    2271.26       36344       36344
    100000             10000     39.248    39.429    39.429     2418.70    2418.70       25364       25364
    131072              5000     49.905    50.174    50.174     2491.34    2491.34       19935       19935
    262144              2500     95.115    95.614    95.614     2614.68    2614.68       10463       10463
    524288              1200    185.503   186.575   186.575     2679.89    2679.89        5364        5364
   1048576               600    366.342   368.328   368.328     2714.97    2714.97        2719        2719
   2097152               300    727.829   732.223   732.223     2731.41    2731.41        1370        1370
远CPU
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 0 -t put_bw -D bcopy -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.188     0.250     0.250        3.81       3.81     3998004     3998004
         2           2000000      0.191     0.239     0.239        7.98       7.98     4186037     4186037
         4           2000000      0.188     0.256     0.256       14.88      14.88     3900644     3900644
         8           2000000      0.188     0.227     0.227       33.56      33.56     4398963     4398963
        12           2000000      0.188     0.237     0.237       56.29      56.29     4215730     4215730
        16           2000000      0.188     0.240     0.240       63.56      63.56     4165584     4165584
        24           2000000      0.192     0.242     0.242       94.77      94.77     4140617     4140617
        32           2000000      0.191     0.232     0.232      131.31     131.31     4302686     4302686
        40           2000000      0.192     0.246     0.246      155.18     155.18     4067896     4067896
        48           2000000      0.192     0.242     0.242      188.96     188.96     4127868     4127868
        64           2000000      0.188     0.237     0.237      257.31     257.31     4215747     4215747
        80           2000000      0.193     0.232     0.232      329.41     329.41     4317691     4317691
        96           2000000      0.194     0.231     0.231      396.67     396.67     4332701     4332701
       128           1400000      0.193     0.245     0.245      498.65     498.65     4084933     4084933
       256            700000      0.195     0.269     0.269      908.71     908.71     3722079     3722079
       300            700000      0.197     0.242     0.242     1183.45    1183.45     4136459     4136459
       512            300000      0.230     0.278     0.278     1756.13    1756.13     3596568     3596568
      1024            200000      0.235     0.360     0.360     2714.37    2714.37     2779530     2779530
      2048            100000      0.710     0.714     0.714     2734.86    2734.86     1400262     1400262
      3000            100000      1.042     1.044     1.044     2739.18    2739.18      957423      957423
      4096            100000      1.417     1.422     1.422     2747.42    2747.42      703346      703346
      6000            100000      2.078     2.084     2.084     2745.84    2745.84      479874      479874
      8192             80000      2.829     2.840     2.840     2751.18    2751.18      352156      352156
ucx_perftest -d mlx5_2:1 -x rc_verbs -c 0 -t put_bw -D zcopy -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      4.188     4.242     4.246        0.22       0.22      235725      235499
         2           2000000      4.231     4.253     4.248        0.45       0.45      235131      235426
         4           2000000      4.222     4.240     4.252        0.90       0.90      235860      235158
         8           2000000      4.229     4.249     4.246        1.80       1.80      235356      235505
        12           2000000      4.198     4.262     4.259        3.13       3.13      234656      234796
        16           2000000      4.232     4.255     4.267        3.59       3.58      235021      234335
        24           2000000      4.230     4.269     4.266        5.36       5.37      234271      234423
        32           2000000      4.198     4.253     4.254        7.18       7.17      235145      235049
        40           2000000      4.225     4.278     4.266        8.92       8.94      233766      234411
        48           2000000      4.227     4.265     4.260       10.73      10.74      234475      234725
        64           2000000      4.218     4.263     4.262       14.32      14.32      234567      234645
        80           2000000      4.264     4.294     4.282       17.77      17.82      232901      233561
        96           2000000      4.283     4.304     4.301       21.27      21.28      232351      232478
       128           1400000      4.282     4.321     4.302       28.25      28.37      231535      232424
       256            700000      4.405     4.451     4.449       54.85      54.88      224687      224774
       300            700000      4.463     4.475     4.497       63.94      63.62      223491      222378
       512            300000      4.642     4.657     4.668      104.84     104.59      214723      214207
      1024            200000      5.108     5.182     5.130      188.44     190.37      193004      194941
      2048            100000      5.530     5.566     5.566      350.89     350.89      179658      179658
      3000            100000      5.928     5.961     5.961      479.92     479.92      167745      167745
      4096            100000      6.297     6.313     6.313      618.73     618.73      158396      158396
      6000            100000      7.008     7.066     7.066      809.82     809.82      141527      141527
      8192             80000      7.781     7.837     7.837      996.86     996.86      127600      127600
     10000             80000      8.323     8.392     8.392     1136.40    1136.40      119162      119162
     16384             40000     10.552    10.623    10.623     1470.84    1470.84       94136       94136
     25000             40000     13.534    13.616    13.616     1751.02    1751.02       73445       73445
     32768             20000     16.235    16.347    16.347     1911.65    1911.65       61176       61176
     45000             20000     20.522    20.639    20.639     2079.33    2079.33       48454       48454
     65536             10000     27.592    27.751    27.751     2252.14    2252.14       36038       36038
    100000             10000     39.440    39.667    39.667     2404.21    2404.21       25213       25213
    131072              5000     50.248    50.474    50.474     2476.53    2476.53       19816       19816
    262144              2500     95.577    96.154    96.154     2599.98    2599.98       10404       10404
    524288              1200    185.879   186.684   186.684     2678.32    2678.32        5361        5361
   1048576               600    366.744   368.600   368.600     2712.97    2712.97        2717        2717
   2097152               300    728.118   732.340   732.340     2730.97    2730.97        1370        1370

UCP测试

延迟测试
脚本
#!/bin/bash
set -e

SERVER=gpu6
AFFINITY=14
SLEEP=1
export UCX_NET_DEVICES=mlx5_2:1

if [ "$HOSTNAME" == "$SERVER" ]
then
    echo Run as server
    SERVER=""
else
    echo Run as client
    SLEEP=3
fi

echo "ucx_perftest -c $AFFINITY -t ucp_put_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_put_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t stream_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t stream_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t tag_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_am_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_am_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t tag_sync_lat -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_sync_lat -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_get -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_get -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_fadd -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_fadd -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_swap -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_swap -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_cswap -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_cswap -f $SERVER
sleep $SLEEP
近CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -c 14 -t ucp_put_lat -f gpu19
                     1000000      2.072     2.104     2.106        3.63       3.62      475315      474790
ucx_perftest -c 14 -t stream_lat -f gpu19
                     1000000      2.253     2.285     2.291        3.34       3.33      437668      436584
ucx_perftest -c 14 -t tag_lat -f gpu19
                     1000000      2.258     2.298     2.295        3.32       3.32      435104      435764
ucx_perftest -c 14 -t ucp_am_lat -f gpu19
                     1000000      2.261     2.304     2.304        3.31       3.31      434107      434000
ucx_perftest -c 14 -t tag_sync_lat -f gpu19
                     1000000      3.412     3.464     3.467        2.20       2.20      288645      288438
ucx_perftest -c 14 -t ucp_get -f gpu19
                     1000000      4.162     4.214     4.218        1.81       1.81      237319      237064
ucx_perftest -c 14 -t ucp_fadd -f gpu19
                     1000000      6.426     6.520     6.513        1.17       1.17      153382      153550
ucx_perftest -c 14 -t ucp_swap -f gpu19
                     1000000      6.416     6.506     6.512        1.17       1.17      153699      153569
ucx_perftest -c 14 -t ucp_cswap -f gpu19
                     1000000      6.436     6.521     6.525        1.17       1.17      153350      153247
远CPU
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
ucx_perftest -c 0 -t ucp_put_lat -f gpu19
                     1000000      2.238     2.256     2.257        3.38       3.38      443223      443161
ucx_perftest -c 0 -t stream_lat -f gpu19
                     1000000      2.492     2.518     2.525        3.03       3.02      397166      396098
ucx_perftest -c 0 -t tag_lat -f gpu19
                     1000000      2.505     2.529     2.529        3.02       3.02      395395      395350
ucx_perftest -c 0 -t ucp_am_lat -f gpu19
                     1000000      2.566     2.589     2.601        2.95       2.93      386206      384435
ucx_perftest -c 0 -t tag_sync_lat -f gpu19
                     1000000      3.805     3.824     3.826        2.00       1.99      261523      261375
ucx_perftest -c 0 -t ucp_get -f gpu19
                     1000000      4.504     4.560     4.557        1.67       1.67      219281      219451
ucx_perftest -c 0 -t ucp_fadd -f gpu19
                     1000000      7.061     7.112     7.120        1.07       1.07      140599      140453
ucx_perftest -c 0 -t ucp_swap -f gpu19
                     1000000      7.057     7.119     7.134        1.07       1.07      140478      140183
ucx_perftest -c 0 -t ucp_cswap -f gpu19
                     1000000      7.082     7.174     7.138        1.06       1.07      139393      140104
带宽测试
脚本
#!/bin/bash
set -e

SERVER=gpu19
AFFINITY=14
SLEEP=1
export UCX_NET_DEVICES=mlx5_2:1

if [ "$HOSTNAME" == "$SERVER" ]
then
    echo Run as server
    SERVER=""
else
    echo Run as client
    SLEEP=3
fi

echo "ucx_perftest -c $AFFINITY -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP

echo "ucx_perftest -c $AFFINITY -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER"
ucx_perftest -c $AFFINITY -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f $SERVER
sleep $SLEEP
近CPU
ucx_perftest -c 14 -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.206     0.224     0.224        4.25       4.25     4455842     4455842
         2           2000000      0.198     0.229     0.229        8.32       8.32     4361185     4361185
         4           2000000      0.195     0.219     0.219       17.43      17.43     4568463     4568463
         8           2000000      0.199     0.219     0.219       34.80      34.80     4561565     4561565
        12           2000000      0.199     0.215     0.215       61.96      61.96     4640972     4640972
        16           2000000      0.200     0.229     0.229       66.59      66.59     4364352     4364352
        24           2000000      0.205     0.219     0.219      104.28     104.28     4555924     4555924
        32           2000000      0.205     0.224     0.224      135.94     135.94     4454432     4454432
        40           2000000      0.218     0.237     0.237      161.23     161.23     4226676     4226676
        48           2000000      0.208     0.225     0.225      203.39     203.39     4443072     4443072
        64           2000000      0.215     0.239     0.239      254.98     254.98     4177510     4177510
        80           2000000      0.210     0.228     0.228      334.93     334.93     4389951     4389951
        96           2000000      0.210     0.235     0.235      389.87     389.87     4258419     4258419
       128           1400000      0.236     0.273     0.273      446.36     446.36     3656546     3656546
       256            700000      0.260     0.277     0.277      881.29     881.29     3609756     3609756
       300            700000      0.245     0.285     0.285     1005.41    1005.41     3514180     3514180
       512            300000      0.270     0.293     0.293     1668.26    1668.26     3416586     3416586
      1024            200000      0.274     0.397     0.397     2461.35    2461.35     2520418     2520418
      2048            100000      0.324     0.739     0.739     2643.57    2643.57     1353508     1353508
      3000            100000      0.367     1.044     1.044     2740.84    2740.84      957993      957993
      4096            100000      0.388     1.450     1.450     2694.17    2694.17      689707      689707
      6000            100000      0.466     2.086     2.086     2743.52    2743.52      479465      479465
      8192             80000      0.538     2.869     2.869     2722.94    2722.94      348537      348537
     10000             80000      0.092     3.511     3.511     2716.27    2716.27      284821      284821
     16384             40000      0.091     5.711     5.711     2736.14    2736.14      175113      175113
     25000             40000      0.092     8.783     8.783     2714.58    2714.58      113858      113858
     32768             20000      0.091    11.454    11.454     2728.21    2728.21       87303       87303
     45000             20000      0.091    15.748    15.748     2725.17    2725.17       63501       63501
     65536             10000      0.090    22.938    22.938     2724.75    2724.75       43596       43596
    100000             10000      0.335    34.627    34.627     2754.12    2754.12       28879       28879
    131072              5000      0.319    45.361    45.361     2755.69    2755.69       22046       22046
    262144              2500      0.327    90.664    90.664     2757.43    2757.43       11030       11030
    524288              1200      0.331   181.282   181.282     2758.13    2758.13        5516        5516
   1048576               600      0.335   362.510   362.510     2758.55    2758.55        2759        2759
   2097152               300      0.338   725.013   725.013     2758.57    2758.57        1379        1379
ucx_perftest -c 14 -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.223     0.322     0.322        2.96       2.96     3102801     3102801
         2           2000000      0.225     0.331     0.331        5.77       5.77     3024593     3024593
         4           2000000      0.236     0.335     0.335       11.40      11.40     2989491     2989491
         8           2000000      0.238     0.328     0.328       23.23      23.23     3044613     3044613
        12           2000000      0.229     0.327     0.327       40.84      40.84     3058529     3058529
        16           2000000      0.238     0.329     0.329       46.34      46.34     3036823     3036823
        24           2000000      0.229     0.334     0.334       68.53      68.53     2994079     2994079
        32           2000000      0.238     0.333     0.333       91.78      91.78     3007415     3007415
        40           2000000      0.245     0.344     0.344      110.95     110.95     2908456     2908456
        48           2000000      0.244     0.332     0.332      138.04     138.04     3015459     3015459
        64           2000000      0.238     0.340     0.340      179.27     179.27     2937182     2937182
        80           2000000      0.238     0.344     0.344      221.58     221.58     2904322     2904322
        96           2000000      0.238     0.340     0.340      269.46     269.46     2943219     2943219
       128           1400000      0.240     0.335     0.335      364.29     364.29     2984299     2984299
       256            700000      0.252     0.371     0.371      658.75     658.75     2698244     2698244
       300            700000      0.252     0.365     0.365      783.02     783.02     2736844     2736844
       512            300000      0.265     0.410     0.410     1190.05    1190.05     2437222     2437222
      1024            200000      0.333     0.464     0.464     2103.44    2103.44     2153919     2153919
      2048            100000      0.762     0.778     0.778     2510.86    2510.86     1285559     1285559
      3000            100000      1.071     1.078     1.078     2654.40    2654.40      927781      927781
      4096            100000      1.475     1.482     1.482     2635.21    2635.21      674614      674614
      6000            100000      2.108     2.117     2.117     2702.60    2702.60      472313      472313
      8192             80000      2.885     2.901     2.901     2692.99    2692.99      344703      344703
     10000             80000      3.525     3.541     3.541     2693.14    2693.14      282396      282396
     16384             40000      5.723     5.749     5.749     2718.02    2718.02      173953      173953
     25000             40000      8.773     8.814     8.814     2705.02    2705.02      113457      113457
     32768             20000     11.436    11.487    11.487     2720.43    2720.43       87054       87054
     45000             20000     15.717    15.788    15.788     2718.25    2718.25       63340       63340
     65536             10000     22.845    22.961    22.961     2722.05    2722.05       43553       43553
    100000             10000      0.327    34.627    34.627     2754.15    2754.15       28879       28879
    131072              5000      0.318    45.356    45.356     2755.99    2755.99       22048       22048
    262144              2500      0.321    90.666    90.666     2757.37    2757.37       11029       11029
    524288              1200      0.332   181.276   181.276     2758.23    2758.23        5516        5516
   1048576               600      0.332   362.517   362.517     2758.49    2758.49        2758        2758
   2097152               300      0.332   724.974   724.974     2758.72    2758.72        1379        1379
ucx_perftest -c 14 -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.072     0.286     0.286        3.33       3.33     3496149     3496149
         2           2000000      0.068     0.279     0.279        6.84       6.84     3586724     3586724
         4           2000000      0.074     0.284     0.284       13.45      13.45     3524782     3524782
         8           2000000      0.072     0.288     0.288       26.46      26.46     3468646     3468646
        12           2000000      0.068     0.278     0.278       47.95      47.95     3591715     3591715
        16           2000000      0.068     0.286     0.286       53.39      53.39     3498719     3498719
        24           2000000      0.073     0.290     0.290       78.95      78.95     3449185     3449185
        32           2000000      0.074     0.294     0.294      103.96     103.96     3406627     3406627
        40           2000000      0.068     0.302     0.302      126.49     126.49     3315748     3315748
        48           2000000      0.068     0.288     0.288      158.97     158.97     3472728     3472728
        64           2000000      0.072     0.302     0.302      202.21     202.21     3312949     3312949
        80           2000000      0.074     0.287     0.287      265.61     265.61     3481403     3481403
        96           2000000      0.072     0.292     0.292      313.69     313.69     3426376     3426376
       128           1400000      0.072     0.325     0.325      375.61     375.61     3076964     3076964
       256            700000      0.066     0.349     0.349      699.21     699.21     2863958     2863958
       300            700000      0.067     0.320     0.320      894.58     894.58     3126801     3126801
       512            300000      0.070     0.336     0.336     1453.49    1453.49     2976753     2976753
      1024            200000      0.070     0.425     0.425     2296.04    2296.04     2351146     2351146
      2048            100000      0.711     0.714     0.714     2734.24    2734.24     1399931     1399931
      3000            100000      1.040     1.045     1.045     2736.69    2736.69      956542      956542
      4096            100000      1.418     1.424     1.424     2743.08    2743.08      702228      702228
      6000            100000      2.078     2.086     2.086     2743.35    2743.35      479435      479435
      8192             80000      2.831     2.841     2.841     2749.90    2749.90      351987      351987
     10000             80000      3.492     3.504     3.504     2721.37    2721.37      285356      285356
     16384             40000      5.655     5.677     5.677     2752.47    2752.47      176158      176158
     25000             40000      8.642     8.669     8.669     2750.24    2750.24      115353      115353
     32768             20000     11.305    11.340    11.340     2755.74    2755.74       88184       88184
     45000             20000     15.524    15.573    15.573     2755.83    2755.83       64216       64216
     65536             10000     22.602    22.671    22.671     2756.84    2756.84       44109       44109
    100000             10000     34.495    34.597    34.597     2756.51    2756.51       28904       28904
    131072              5000     45.200    45.332    45.332     2757.43    2757.43       22059       22059
    262144              2500     90.395    90.669    90.669     2757.27    2757.27       11029       11029
    524288              1200    180.785   181.370   181.370     2756.80    2756.80        5514        5514
   1048576               600    361.566   363.048   363.048     2754.45    2754.45        2754        2754
   2097152               300    723.122   727.351   727.351     2749.71    2749.71        1375        1375
ucx_perftest -c 14 -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      4.172     4.214     4.221        0.23       0.23      237284      236918
         2           2000000      4.145     4.200     4.197        0.45       0.45      238108      238269
         4           2000000      4.168     4.222     4.212        0.90       0.91      236852      237426
         8           2000000      4.148     4.220     4.210        1.81       1.81      236992      237538
        12           2000000      4.156     4.228     4.232        3.16       3.15      236530      236270
        16           2000000      4.184     4.219     4.216        3.62       3.62      237007      237192
        24           2000000      4.164     4.231     4.223        5.41       5.42      236362      236774
        32           2000000      4.202     4.264     4.244        7.16       7.19      234518      235626
        40           2000000      4.197     4.237     4.235        9.00       9.01      236015      236106
        48           2000000      4.201     4.234     4.234       10.81      10.81      236186      236165
        64           2000000      4.193     4.238     4.245       14.40      14.38      235975      235598
        80           2000000      4.203     4.234     4.245       18.02      17.97      236160      235572
        96           2000000      4.212     4.281     4.280       21.39      21.39      233582      233621
       128           1400000      4.252     4.316     4.310       28.28      28.32      231685      232028
       256            700000      4.374     4.437     4.456       55.02      54.79      225367      224416
       300            700000      4.481     4.511     4.502       63.43      63.56      221692      222144
       512            300000      4.671     4.707     4.716      103.74     103.54      212463      212055
      1024            200000      5.170     5.243     5.238      186.26     186.43      190730      190902
      2048            100000      5.714     5.754     5.754      339.41     339.41      173777      173777
      3000            100000      6.175     6.242     6.242      458.36     458.36      160207      160207
      4096            100000      6.448     6.545     6.545      596.83     596.83      152788      152788
      6000            100000      7.121     7.157     7.157      799.53     799.53      139728      139728
      8192             80000      7.837     7.968     7.968      980.43     980.43      125495      125495
     10000             80000      8.486     8.556     8.556     1114.66    1114.66      116881      116881
     16384             40000     10.913    10.982    10.982     1422.74    1422.74       91055       91055
     25000             40000     13.664    13.783    13.783     1729.80    1729.80       72553       72553
     32768             20000     16.374    16.530    16.530     1890.46    1890.46       60495       60495
     45000             20000     20.675    20.808    20.808     2062.45    2062.45       48058       48058
     65536             10000     27.805    27.955    27.955     2235.77    2235.77       35772       35772
    100000             10000     39.653    39.799    39.799     2396.24    2396.24       25126       25126
    131072              5000     50.605    50.834    50.834     2458.98    2458.98       19672       19672
    262144              2500     95.554    95.878    95.878     2607.47    2607.47       10430       10430
    524288              1200    186.013   186.543   186.543     2680.34    2680.34        5361        5361
   1048576               600    367.075   368.023   368.023     2717.22    2717.22        2717        2717
   2097152               300    728.878   730.700   730.700     2737.10    2737.10        1369        1369
ucx_perftest -c 14 -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.204     0.212     0.212        4.50       4.50     4720463     4720463
         2           2000000      0.206     0.208     0.208        9.19       9.19     4816582     4816582
         4           2000000      0.204     0.208     0.208       18.32      18.32     4802357     4802357
         8           2000000      0.198     0.211     0.211       36.11      36.11     4732866     4732866
        12           2000000      0.204     0.207     0.207       64.38      64.38     4822053     4822053
        16           2000000      0.205     0.210     0.210       72.49      72.49     4750954     4750954
        24           2000000      0.204     0.211     0.211      108.68     108.68     4748112     4748112
        32           2000000      0.205     0.216     0.216      141.16     141.16     4625571     4625571
        40           2000000      0.209     0.216     0.216      176.90     176.90     4637424     4637424
        48           2000000      0.213     0.214     0.214      213.59     213.59     4666017     4666017
        64           2000000      0.212     0.214     0.214      284.66     284.66     4663885     4663885
        80           2000000      0.212     0.214     0.214      357.14     357.14     4681078     4681078
        96           2000000      0.210     0.228     0.228      400.88     400.88     4378659     4378659
       128           1400000      0.245     0.254     0.254      480.57     480.57     3936830     3936830
       256            700000      0.249     0.265     0.265      922.04     922.04     3776679     3776679
       300            700000      0.247     0.260     0.260     1100.89    1100.89     3847890     3847890
       512            300000      0.257     0.279     0.279     1750.41    1750.41     3584834     3584834
      1024            200000      0.276     0.385     0.385     2538.64    2538.64     2599563     2599563
      2048            100000      0.335     0.739     0.739     2642.96    2642.96     1353198     1353198
      3000            100000      0.402     1.044     1.044     2741.37    2741.37      958177      958177
      4096            100000      0.402     1.448     1.448     2697.18    2697.18      690478      690478
      6000            100000      0.468     2.084     2.084     2745.15    2745.15      479750      479750
      8192             80000      0.559     2.868     2.868     2723.72    2723.72      348636      348636
     10000             80000      0.827     3.504     3.504     2721.83    2721.83      285404      285404
     16384             40000      1.115     5.709     5.709     2736.85    2736.85      175158      175158
     25000             40000      1.898     8.764     8.764     2720.31    2720.31      114098      114098
     32768             20000      2.224    11.440    11.440     2731.58    2731.58       87411       87411
     45000             20000      3.105    15.734    15.734     2727.63    2727.63       63558       63558
     65536             10000      4.497    22.885    22.885     2731.07    2731.07       43697       43697
    100000             10000      6.876    34.994    34.994     2725.27    2725.27       28576       28576
    131072              5000      8.787    45.846    45.846     2726.52    2726.52       21812       21812
    262144              2500     17.815    91.659    91.659     2727.49    2727.49       10910       10910
    524288              1200    185.278   183.223   183.223     2728.92    2728.92        5458        5458
   1048576               600    370.527   366.515   366.515     2728.40    2728.40        2728        2728
   2097152               300    744.418   733.126   733.126     2728.04    2728.04        1364        1364
ucx_perftest -c 14 -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.214     0.248     0.248        3.84       3.84     4029585     4029585
         2           2000000      0.215     0.243     0.243        7.84       7.84     4108226     4108226
         4           2000000      0.218     0.242     0.242       15.79      15.79     4138911     4138911
         8           2000000      0.218     0.246     0.246       31.06      31.06     4070857     4070857
        12           2000000      0.222     0.254     0.254       52.60      52.60     3939769     3939769
        16           2000000      0.213     0.242     0.242       62.97      62.97     4127082     4127082
        24           2000000      0.224     0.242     0.242       94.77      94.77     4140574     4140574
        32           2000000      0.220     0.254     0.254      119.93     119.93     3929959     3929959
        40           2000000      0.232     0.251     0.251      152.09     152.09     3986915     3986915
        48           2000000      0.236     0.262     0.262      174.66     174.66     3815527     3815527
        64           2000000      0.232     0.254     0.254      240.26     240.26     3936349     3936349
        80           2000000      0.231     0.257     0.257      296.69     296.69     3888742     3888742
        96           2000000      0.226     0.251     0.251      364.11     364.11     3977106     3977106
       128           1400000      0.250     0.297     0.297      410.91     410.91     3366176     3366176
       256            700000      0.251     0.299     0.299      817.06     817.06     3346673     3346673
       300            700000      0.258     0.316     0.316      905.33     905.33     3164342     3164342
       512            300000      0.273     0.314     0.314     1552.70    1552.70     3179920     3179920
      1024            200000      0.273     0.413     0.413     2364.64    2364.64     2421395     2421395
      2048            100000      0.335     0.745     0.745     2620.17    2620.17     1341525     1341525
      3000            100000      0.393     1.050     1.050     2723.87    2723.87      952063      952063
      4096            100000      0.394     1.450     1.450     2693.41    2693.41      689514      689514
      6000            100000      0.458     2.085     2.085     2743.89    2743.89      479529      479529
      8192             80000      0.537     2.869     2.869     2723.22    2723.22      348572      348572
     10000             80000      0.095     3.516     3.516     2712.61    2712.61      284437      284437
     16384             40000      0.101     5.718     5.718     2732.74    2732.74      174895      174895
     25000             40000      0.096     8.789     8.789     2712.63    2712.63      113776      113776
     32768             20000      0.097    11.463    11.463     2726.21    2726.21       87239       87239
     45000             20000      0.098    15.776    15.776     2720.37    2720.37       63389       63389
     65536             10000      0.101    22.948    22.948     2723.50    2723.50       43576       43576
    100000             10000      0.329    34.627    34.627     2754.13    2754.13       28879       28879
    131072              5000      0.340    45.358    45.358     2755.87    2755.87       22047       22047
    262144              2500      0.338    90.664    90.664     2757.45    2757.45       11030       11030
    524288              1200      0.338   181.281   181.281     2758.15    2758.15        5516        5516
   1048576               600      0.353   362.486   362.486     2758.72    2758.72        2759        2759
   2097152               300      0.352   724.937   724.937     2758.86    2758.86        1379        1379
远CPU
ucx_perftest -c 0 -t tag_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.246     0.271     0.271        3.52       3.52     3688049     3688049
         2           2000000      0.245     0.272     0.272        7.01       7.01     3676044     3676044
         4           2000000      0.246     0.268     0.268       14.25      14.25     3734458     3734458
         8           2000000      0.247     0.277     0.277       27.51      27.51     3606001     3606001
        12           2000000      0.246     0.276     0.276       48.32      48.32     3618771     3618771
        16           2000000      0.252     0.273     0.273       55.95      55.95     3666664     3666664
        24           2000000      0.252     0.272     0.272       84.11      84.11     3675031     3675031
        32           2000000      0.246     0.277     0.277      110.00     110.00     3604577     3604577
        40           2000000      0.260     0.273     0.273      139.53     139.53     3657719     3657719
        48           2000000      0.258     0.284     0.284      161.39     161.39     3525646     3525646
        64           2000000      0.260     0.275     0.275      221.98     221.98     3636999     3636999
        80           2000000      0.267     0.274     0.274      278.27     278.27     3647351     3647351
        96           2000000      0.260     0.277     0.277      331.04     331.04     3615885     3615885
       128           1400000      0.295     0.325     0.325      375.60     375.60     3076930     3076930
       256            700000      0.298     0.334     0.334      730.00     730.00     2990098     2990098
       300            700000      0.302     0.347     0.347      825.58     825.58     2885610     2885610
       512            300000      0.325     0.417     0.417     1172.02    1172.02     2400307     2400307
      1024            200000      0.340     0.414     0.414     2361.58    2361.58     2418261     2418261
      2048            100000      0.365     0.740     0.740     2639.01    2639.01     1351171     1351171
      3000            100000      0.423     1.044     1.044     2739.53    2739.53      957534      957534
      4096            100000      0.449     1.449     1.449     2696.61    2696.61      690332      690332
      6000            100000      0.515     2.083     2.083     2746.55    2746.55      479994      479994
      8192             80000      0.600     2.866     2.866     2725.82    2725.82      348905      348905
     10000             80000      0.090     3.510     3.510     2717.10    2717.10      284908      284908
     16384             40000      0.091     5.711     5.711     2735.86    2735.86      175095      175095
     25000             40000      0.091     8.774     8.774     2717.20    2717.20      113968      113968
     32768             20000      0.091    11.441    11.441     2731.51    2731.51       87408       87408
     45000             20000      0.091    15.743    15.743     2725.95    2725.95       63519       63519
     65536             10000      0.089    22.930    22.930     2725.64    2725.64       43610       43610
    100000             10000      0.093    34.993    34.993     2725.30    2725.30       28577       28577
    131072              5000      0.372    45.360    45.360     2755.74    2755.74       22046       22046
    262144              2500      0.373    90.659    90.659     2757.59    2757.59       11030       11030
    524288              1200      0.384   181.289   181.289     2758.03    2758.03        5516        5516
   1048576               600      0.382   362.504   362.504     2758.59    2758.59        2759        2759
   2097152               300      0.395   724.934   724.934     2758.87    2758.87        1379        1379
ucx_perftest -c 0 -t tag_sync_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.275     0.455     0.455        2.10       2.10     2198312     2198312
         2           2000000      0.277     0.466     0.466        4.10       4.10     2148126     2148126
         4           2000000      0.283     0.458     0.458        8.32       8.32     2182051     2182051
         8           2000000      0.276     0.452     0.452       16.89      16.89     2213572     2213572
        12           2000000      0.278     0.446     0.446       29.92      29.92     2241152     2241152
        16           2000000      0.277     0.463     0.463       32.99      32.99     2161879     2161879
        24           2000000      0.273     0.451     0.451       50.79      50.79     2218933     2218933
        32           2000000      0.271     0.449     0.449       68.03      68.03     2229184     2229184
        40           2000000      0.283     0.465     0.465       82.00      82.00     2149454     2149454
        48           2000000      0.283     0.462     0.462       99.04      99.04     2163554     2163554
        64           2000000      0.291     0.472     0.472      129.38     129.38     2119767     2119767
        80           2000000      0.289     0.468     0.468      162.90     162.90     2135130     2135130
        96           2000000      0.283     0.471     0.471      194.46     194.46     2124067     2124067
       128           1400000      0.287     0.475     0.475      257.26     257.26     2107469     2107469
       256            700000      0.298     0.485     0.485      503.43     503.43     2062050     2062050
       300            700000      0.309     0.508     0.508      562.72     562.72     1966851     1966851
       512            300000      0.319     0.532     0.532      917.64     917.64     1879323     1879323
      1024            200000      0.363     0.573     0.573     1704.25    1704.25     1745154     1745154
      2048            100000      0.715     0.792     0.792     2467.06    2467.06     1263135     1263135
      3000            100000      1.062     1.077     1.077     2656.67    2656.67      928574      928574
      4096            100000      1.474     1.483     1.483     2633.61    2633.61      674204      674204
      6000            100000      2.109     2.116     2.116     2703.72    2703.72      472509      472509
      8192             80000      2.883     2.896     2.896     2697.47    2697.47      345276      345276
     10000             80000      3.523     3.544     3.544     2690.98    2690.98      282170      282170
     16384             40000      5.719     5.746     5.746     2719.16    2719.16      174026      174026
     25000             40000      8.765     8.802     8.802     2708.77    2708.77      113614      113614
     32768             20000     11.431    11.476    11.476     2723.10    2723.10       87139       87139
     45000             20000     15.713    15.774    15.774     2720.67    2720.67       63396       63396
     65536             10000     22.845    22.946    22.946     2723.80    2723.80       43581       43581
    100000             10000     34.895    35.044    35.044     2721.36    2721.36       28535       28535
    131072              5000      0.373    45.360    45.360     2755.73    2755.73       22046       22046
    262144              2500      0.369    90.670    90.670     2757.27    2757.27       11029       11029
    524288              1200      0.367   181.279   181.279     2758.18    2758.18        5516        5516
   1048576               600      0.374   362.529   362.529     2758.40    2758.40        2758        2758
   2097152               300      0.382   724.950   724.950     2758.81    2758.81        1379        1379
ucx_perftest -c 0 -t ucp_put_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.065     0.324     0.324        2.94       2.94     3084430     3084430
         2           2000000      0.065     0.335     0.335        5.70       5.70     2985889     2985889
         4           2000000      0.069     0.333     0.333       11.46      11.46     3004971     3004971
         8           2000000      0.069     0.331     0.331       23.03      23.03     3018982     3018982
        12           2000000      0.072     0.332     0.332       40.20      40.20     3010575     3010575
        16           2000000      0.072     0.323     0.323       47.20      47.20     3093054     3093054
        24           2000000      0.065     0.333     0.333       68.75      68.75     3003612     3003612
        32           2000000      0.072     0.349     0.349       87.49      87.49     2866742     2866742
        40           2000000      0.069     0.330     0.330      115.49     115.49     3027629     3027629
        48           2000000      0.065     0.336     0.336      136.15     136.15     2974204     2974204
        64           2000000      0.069     0.332     0.332      183.97     183.97     3014145     3014145
        80           2000000      0.071     0.331     0.331      230.45     230.45     3020546     3020546
        96           2000000      0.065     0.338     0.338      270.51     270.51     2954685     2954685
       128           1400000      0.064     0.365     0.365      334.58     334.58     2740884     2740884
       256            700000      0.066     0.401     0.401      609.35     609.35     2495901     2495901
       300            700000      0.064     0.406     0.406      703.83     703.83     2460060     2460060
       512            300000      0.064     0.458     0.458     1065.11    1065.11     2181345     2181345
      1024            200000      0.064     0.425     0.425     2300.25    2300.25     2355457     2355457
      2048            100000      0.680     0.715     0.715     2732.83    2732.83     1399211     1399211
      3000            100000      1.039     1.044     1.044     2739.21    2739.21      957422      957422
      4096            100000      1.419     1.423     1.423     2744.73    2744.73      702652      702652
      6000            100000      2.076     2.084     2.084     2745.72    2745.72      479849      479849
      8192             80000      2.829     2.839     2.839     2751.99    2751.99      352255      352255
     10000             80000      3.489     3.504     3.504     2721.70    2721.70      285391      285391
     16384             40000      5.655     5.673     5.673     2754.25    2754.25      176272      176272
     25000             40000      8.640     8.667     8.667     2750.89    2750.89      115381      115381
     32768             20000     11.303    11.339    11.339     2755.93    2755.93       88190       88190
     45000             20000     15.525    15.569    15.569     2756.46    2756.46       64230       64230
     65536             10000     22.603    22.670    22.670     2756.96    2756.96       44111       44111
    100000             10000     34.496    34.592    34.592     2756.91    2756.91       28908       28908
    131072              5000     45.201    45.324    45.324     2757.92    2757.92       22063       22063
    262144              2500     90.395    90.664    90.664     2757.43    2757.43       11030       11030
    524288              1200    180.778   181.382   181.382     2756.61    2756.61        5513        5513
   1048576               600    361.560   363.040   363.040     2754.52    2754.52        2755        2755
   2097152               300    723.132   727.320   727.320     2749.82    2749.82        1375        1375
ucx_perftest -c 0 -t ucp_get -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |        latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      4.508     4.599     4.593        0.21       0.21      217415      217728
         2           2000000      4.560     4.608     4.586        0.41       0.42      217005      218073
         4           2000000      4.559     4.610     4.612        0.83       0.83      216931      216842
         8           2000000      4.549     4.577     4.582        1.67       1.66      218479      218231
        12           2000000      4.545     4.576     4.580        2.92       2.92      218534      218362
        16           2000000      4.583     4.621     4.603        3.30       3.32      216406      217265
        24           2000000      4.557     4.589     4.583        4.99       4.99      217893      218196
        32           2000000      4.594     4.631     4.615        6.59       6.61      215926      216691
        40           2000000      4.546     4.568     4.574        8.35       8.34      218910      218626
        48           2000000      4.555     4.606     4.604        9.94       9.94      217102      217191
        64           2000000      4.522     4.555     4.578       13.40      13.33      219560      218419
        80           2000000      4.597     4.629     4.629       16.48      16.48      216044      216007
        96           2000000      4.629     4.708     4.631       19.45      19.77      212398      215937
       128           1400000      4.577     4.614     4.606       26.46      26.50      216736      217118
       256            700000      4.728     4.789     4.772       50.98      51.16      208830      209556
       300            700000      4.795     4.822     4.819       59.34      59.37      207394      207507
       512            300000      5.016     5.051     5.035       96.66      96.98      197965      198616
      1024            200000      5.487     5.570     5.551      175.32     175.91      179525      180133
      2048            100000      6.015     6.071     6.071      321.69     321.69      164705      164705
      3000            100000      6.435     6.462     6.462      442.74     442.74      154748      154748
      4096            100000      6.900     6.976     6.976      559.95     559.95      143348      143348
      6000            100000      7.468     7.526     7.526      760.31     760.31      132873      132873
      8192             80000      8.355     8.412     8.412      928.69     928.69      118872      118872
     10000             80000      8.850     8.864     8.864     1075.87    1075.87      112813      112813
     16384             40000     11.101    11.142    11.142     1402.33    1402.33       89749       89749
     25000             40000     14.012    14.086    14.086     1692.55    1692.55       70991       70991
     32768             20000     16.792    16.863    16.863     1853.22    1853.22       59303       59303
     45000             20000     21.038    21.116    21.116     2032.32    2032.32       47356       47356
     65536             10000     28.077    28.290    28.290     2209.25    2209.25       35348       35348
    100000             10000     39.959    40.085    40.085     2379.14    2379.14       24947       24947
    131072              5000     50.815    50.968    50.968     2452.50    2452.50       19620       19620
    262144              2500     95.930    96.357    96.357     2594.52    2594.52       10378       10378
    524288              1200    186.328   186.824   186.824     2676.31    2676.31        5353        5353
   1048576               600    367.128   368.045   368.045     2717.06    2717.06        2717        2717
   2097152               300    729.302   731.010   731.010     2735.94    2735.94        1368        1368
ucx_perftest -c 0 -t stream_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.251     0.257     0.257        3.71       3.71     3894659     3894659
         2           2000000      0.250     0.255     0.255        7.49       7.49     3927676     3927676
         4           2000000      0.249     0.265     0.265       14.38      14.38     3768528     3768528
         8           2000000      0.254     0.263     0.263       29.06      29.06     3809300     3809300
        12           2000000      0.254     0.270     0.270       49.47      49.47     3704896     3704896
        16           2000000      0.257     0.267     0.267       57.18      57.18     3747122     3747122
        24           2000000      0.252     0.266     0.266       85.97      85.97     3756010     3756010
        32           2000000      0.248     0.259     0.259      117.90     117.90     3863345     3863345
        40           2000000      0.254     0.262     0.262      145.39     145.39     3811420     3811420
        48           2000000      0.254     0.272     0.272      168.42     168.42     3679285     3679285
        64           2000000      0.260     0.270     0.270      226.19     226.19     3705962     3705962
        80           2000000      0.255     0.262     0.262      290.95     290.95     3813577     3813577
        96           2000000      0.255     0.270     0.270      339.17     339.17     3704595     3704595
       128           1400000      0.287     0.316     0.316      386.71     386.71     3167895     3167895
       256            700000      0.307     0.321     0.321      760.59     760.59     3115361     3115361
       300            700000      0.300     0.317     0.317      902.26     902.26     3153636     3153636
       512            300000      0.326     0.397     0.397     1229.75    1229.75     2518532     2518532
      1024            200000      0.326     0.397     0.397     2461.84    2461.84     2520926     2520926
      2048            100000      0.378     0.739     0.739     2643.00    2643.00     1353215     1353215
      3000            100000      0.465     1.044     1.044     2741.25    2741.25      958137      958137
      4096            100000      0.458     1.448     1.448     2696.88    2696.88      690402      690402
      6000            100000      0.535     2.083     2.083     2746.61    2746.61      480006      480006
      8192             80000      0.606     2.864     2.864     2727.95    2727.95      349177      349177
     10000             80000      0.919     3.497     3.497     2726.89    2726.89      285935      285935
     16384             40000      1.222     5.700     5.700     2741.29    2741.29      175443      175443
     25000             40000      2.058     8.752     8.752     2724.12    2724.12      114258      114258
     32768             20000      2.431    11.430    11.430     2734.08    2734.08       87491       87491
     45000             20000      3.368    15.728    15.728     2728.60    2728.60       63581       63581
     65536             10000      4.822    22.872    22.872     2732.60    2732.60       43722       43722
    100000             10000      7.513    34.938    34.938     2729.64    2729.64       28622       28622
    131072              5000      9.558    45.773    45.773     2730.89    2730.89       21847       21847
    262144              2500     19.337    91.574    91.574     2730.04    2730.04       10920       10920
    524288              1200    185.298   183.008   183.008     2732.13    2732.13        5464        5464
   1048576               600    370.604   366.193   366.193     2730.80    2730.80        2731        2731
   2097152               300    744.353   732.059   732.059     2732.02    2732.02        1366        1366
ucx_perftest -c 0 -t ucp_am_bw -b /usr/share/ucx/perftest/msg_pow2 -f gpu19
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|     Test     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
         1           2000000      0.268     0.295     0.295        3.24       3.24     3395326     3395326
         2           2000000      0.265     0.297     0.297        6.43       6.43     3368613     3368613
         4           2000000      0.271     0.302     0.302       12.61      12.61     3306052     3306052
         8           2000000      0.269     0.300     0.300       25.45      25.45     3335530     3335530
        12           2000000      0.271     0.303     0.303       44.05      44.05     3298958     3298958
        16           2000000      0.271     0.299     0.299       51.06      51.06     3346161     3346161
        24           2000000      0.277     0.302     0.302       75.74      75.74     3308913     3308913
        32           2000000      0.270     0.300     0.300      101.74     101.74     3333905     3333905
        40           2000000      0.277     0.301     0.301      126.83     126.83     3324899     3324899
        48           2000000      0.278     0.298     0.298      153.62     153.62     3355874     3355874
        64           2000000      0.278     0.302     0.302      201.81     201.81     3306523     3306523
        80           2000000      0.288     0.311     0.311      245.18     245.18     3213616     3213616
        96           2000000      0.275     0.305     0.305      300.44     300.44     3281589     3281589
       128           1400000      0.296     0.331     0.331      369.09     369.09     3023582     3023582
       256            700000      0.314     0.342     0.342      713.13     713.13     2920964     2920964
       300            700000      0.301     0.356     0.356      803.77     803.77     2809372     2809372
       512            300000      0.325     0.424     0.424     1150.99    1150.99     2357229     2357229
      1024            200000      0.326     0.393     0.393     2481.92    2481.92     2541486     2541486
      2048            100000      0.370     0.742     0.742     2630.55    2630.55     1346841     1346841
      3000            100000      0.432     1.056     1.056     2708.38    2708.38      946648      946648
      4096            100000      0.454     1.452     1.452     2690.63    2690.63      688800      688800
      6000            100000      0.513     2.083     2.083     2747.60    2747.60      480178      480178
      8192             80000      0.589     2.864     2.864     2727.73    2727.73      349150      349150
     10000             80000      0.094     3.512     3.512     2715.44    2715.44      284734      284734
     16384             40000      0.095     5.713     5.713     2735.03    2735.03      175042      175042
     25000             40000      0.095     8.776     8.776     2716.83    2716.83      113952      113952
     32768             20000      0.095    11.449    11.449     2729.54    2729.54       87345       87345
     45000             20000      0.100    15.749    15.749     2725.00    2725.00       63497       63497
     65536             10000      0.104    22.918    22.918     2727.15    2727.15       43634       43634
    100000             10000      0.102    35.006    35.006     2724.30    2724.30       28566       28566
    131072              5000      0.386    45.358    45.358     2755.88    2755.88       22047       22047
    262144              2500      0.402    90.660    90.660     2757.54    2757.54       11030       11030
    524288              1200      0.378   181.267   181.267     2758.37    2758.37        5517        5517
   1048576               600      0.392   362.508   362.508     2758.56    2758.56        2759        2759
   2097152               300      0.385   724.970   724.970     2758.73    2758.73        1379        1379

100G IB + 25G RoCE聚合UCP带宽测试

脚本

```



#### 结果