Comments (25)
wow, code from branch sandbox-roberto-remove-memory-alloc
really unlocked the beast
also, the encrypted measurements against your staging look quite promising, thanks a lot for this!
when do you think we'll have both server and client fixes merged into the main branches?
if no one else has any further things, we can close this one. thanks again!
./ndt7-client-go/cmd/ndt7-client/ndt7-rob -scheme ws
upload: complete
Server: ndt-mlab1-ord06.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 4.3 ms
Download: 912.9 Mbit/s
Upload: 43.1 Mbit/s
Retransmission: 0.00 %
upload: complete
Server: ndt-mlab3-ord05.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 3.7 ms
Download: 872.6 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.00 %
upload: complete
Server: ndt-mlab3-ord06.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 4.4 ms
Download: 920.9 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.00 %
/ndt7-client-go/cmd/ndt7-client/ndt7-rob -locate.url=https://locate-dot-mlab-staging.appspot.com/v2/nearest -scheme=ws
upload: complete
Server: ndt-mlab4-ord05.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 4.5 ms
Download: 919.6 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.00 %
upload: complete
Server: ndt-mlab4-ord05.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 4.1 ms
Download: 898.8 Mbit/s
Upload: 43.1 Mbit/s
Retransmission: 0.00 %
upload: complete
Server: ndt-mlab4-ord05.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 4.0 ms
Download: 891.3 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.00 %
./ndt7-client-go/cmd/ndt7-client/ndt7-rob -locate.url=https://locate-dot-mlab-staging.appspot.com/v2/nearest
upload: complete
Server: ndt-mlab4-ord03.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 3.1 ms
Download: 841.9 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.00 %
upload: complete
Server: ndt-mlab4-ord05.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 4.0 ms
Download: 850.7 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.02 %
upload: complete
Server: ndt-mlab4-ord04.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 3.9 ms
Download: 855.8 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.00 %
./ndt7-client-go/cmd/ndt7-client/ndt7-rob
upload: complete
Server: ndt-mlab2-ord05.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 2.9 ms
Download: 193.9 Mbit/s
Upload: 43.0 Mbit/s
Retransmission: 0.00 %
from ndt7-client-go.
Another thing to note is that the changes represented here only affect the subset of clients that are low-resourced embedded devices with gigabit connections. The ndt-client-go change that @ggmartins mentioned above seems to have made the most significant difference to the results reported above and we are currently setting up a framework to test the impact of the server side change on other clients, devices and operating systems. We'll be posting a blog post about all the recent changes to NDT that goes into more detail.
from ndt7-client-go.
more info on this:
running Ndt7 from multiple "vantage" points inside the same network:
Jetson NX: ~450Mbps
(ndt7 dw) Linux appflow 4.9.140-tegra 1 SMP PREEMPT Fri Oct 16 12:25:00 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux (Ubuntu 20)
Jetson Nano: ~450Mbps
(ndt7 dw) Linux netrics 4.9.201-tegra 1 SMP PREEMPT Fri Feb 19 08:40:32 PST 2021 aarch64 aarch64 aarch64 GNU/Linux (Ubuntu 20)
Raspberry Pi: ~150Mbps
(ndt7 dw) Linux netrics 5.4.0-1035-raspi 38-Ubuntu SMP PREEMPT Tue Apr 20 21:37:03 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux (Ubuntu 20)
Full output here: https://chicago-cdac.github.io/nm-exp-active-netrics/debug/output.ndt7.txt
good to mention that the degraded ndt7 speed (~150Mbps) is also happening on my teammate with a Comcast connection > 300Mbps dw. He runs the same Raspberry Pi model / OS.
Thanks,
G
from ndt7-client-go.
well, certainly less than 100% for individual cores/threads,
full ps
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 169808 12044 ? Ss May18 0:37 /sbin/init fixrtc splash
root 2 0.0 0.0 0 0 ? S May18 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? I< May18 0:00 [rcu_gp]
root 4 0.0 0.0 0 0 ? I< May18 0:00 [rcu_par_gp]
root 8 0.0 0.0 0 0 ? I< May18 0:00 [mm_percpu_wq]
root 9 0.0 0.0 0 0 ? S May18 0:19 [ksoftirqd/0]
root 10 0.0 0.0 0 0 ? I May18 0:17 [rcu_preempt]
root 11 0.0 0.0 0 0 ? S May18 0:01 [migration/0]
root 12 0.0 0.0 0 0 ? S May18 0:00 [idle_inject/0]
root 14 0.0 0.0 0 0 ? S May18 0:00 [cpuhp/0]
root 15 0.0 0.0 0 0 ? S May18 0:00 [cpuhp/1]
root 16 0.0 0.0 0 0 ? S May18 0:00 [idle_inject/1]
root 17 0.0 0.0 0 0 ? S May18 0:01 [migration/1]
root 18 0.0 0.0 0 0 ? S May18 0:02 [ksoftirqd/1]
root 20 0.0 0.0 0 0 ? I< May18 0:03 [kworker/1:0H-kblockd]
root 21 0.0 0.0 0 0 ? S May18 0:00 [cpuhp/2]
root 22 0.0 0.0 0 0 ? S May18 0:00 [idle_inject/2]
root 23 0.0 0.0 0 0 ? S May18 0:01 [migration/2]
root 24 0.0 0.0 0 0 ? S May18 0:01 [ksoftirqd/2]
root 27 0.0 0.0 0 0 ? S May18 0:00 [cpuhp/3]
root 28 0.0 0.0 0 0 ? S May18 0:00 [idle_inject/3]
root 29 0.0 0.0 0 0 ? S May18 0:01 [migration/3]
root 30 0.0 0.0 0 0 ? S May18 0:01 [ksoftirqd/3]
root 33 0.0 0.0 0 0 ? S May18 0:00 [kdevtmpfs]
root 34 0.0 0.0 0 0 ? I< May18 0:00 [netns]
root 35 0.0 0.0 0 0 ? S May18 0:00 [rcu_tasks_kthre]
root 36 0.0 0.0 0 0 ? S May18 0:00 [kauditd]
root 37 0.0 0.0 0 0 ? S May18 0:00 [khungtaskd]
root 38 0.0 0.0 0 0 ? S May18 0:00 [oom_reaper]
root 39 0.0 0.0 0 0 ? I< May18 0:00 [writeback]
root 40 0.0 0.0 0 0 ? S May18 0:00 [kcompactd0]
root 41 0.0 0.0 0 0 ? SN May18 0:00 [ksmd]
root 90 0.0 0.0 0 0 ? I< May18 0:00 [kintegrityd]
root 91 0.0 0.0 0 0 ? I< May18 0:00 [kblockd]
root 92 0.0 0.0 0 0 ? I< May18 0:00 [blkcg_punt_bio]
root 93 0.0 0.0 0 0 ? I< May18 0:00 [tpm_dev_wq]
root 94 0.0 0.0 0 0 ? I< May18 0:00 [ata_sff]
root 95 0.0 0.0 0 0 ? I< May18 0:00 [md]
root 96 0.0 0.0 0 0 ? I< May18 0:00 [edac-poller]
root 97 0.0 0.0 0 0 ? I< May18 0:00 [devfreq_wq]
root 98 0.0 0.0 0 0 ? S May18 0:00 [watchdogd]
root 101 0.0 0.0 0 0 ? S May18 0:00 [kswapd0]
root 102 0.0 0.0 0 0 ? S May18 0:00 [ecryptfs-kthrea]
root 104 0.0 0.0 0 0 ? I< May18 0:00 [kthrotld]
root 105 0.0 0.0 0 0 ? S May18 0:00 [irq/41-aerdrv]
root 107 0.0 0.0 0 0 ? I< May18 0:00 [DWC Notificatio]
root 109 0.0 0.0 0 0 ? S< May18 0:00 [vchiq-slot/0]
root 110 0.0 0.0 0 0 ? S< May18 0:00 [vchiq-recy/0]
root 111 0.0 0.0 0 0 ? S< May18 0:00 [vchiq-sync/0]
root 112 0.0 0.0 0 0 ? I< May18 0:00 [ipv6_addrconf]
root 121 0.0 0.0 0 0 ? I< May18 0:00 [kstrp]
root 124 0.0 0.0 0 0 ? I< May18 0:00 [kworker/u9:0]
root 130 0.0 0.0 0 0 ? I< May18 0:00 [cryptd]
root 154 0.0 0.0 0 0 ? S May18 0:00 [spi0]
root 155 0.0 0.0 0 0 ? I< May18 0:00 [sdhci]
root 156 0.0 0.0 0 0 ? S May18 0:00 [irq/28-mmc0]
root 157 0.0 0.0 0 0 ? I< May18 0:00 [charger_manager]
root 166 0.0 0.0 0 0 ? I< May18 0:00 [mmc_complete]
root 167 0.0 0.0 0 0 ? I< May18 0:04 [kworker/0:1H-mmc_complete]
root 210 0.0 0.0 0 0 ? I< May18 0:03 [kworker/2:1H-kblockd]
root 211 0.0 0.0 0 0 ? I< May18 0:02 [kworker/3:2H-kblockd]
root 754 0.0 0.0 0 0 ? I< May18 0:00 [raid5wq]
root 813 0.0 0.0 0 0 ? S May18 0:03 [jbd2/mmcblk0p2-]
root 814 0.0 0.0 0 0 ? I< May18 0:00 [ext4-rsv-conver]
root 892 0.0 0.2 67452 19080 ? S<s May18 0:06 /lib/systemd/systemd-journald
root 919 0.0 0.0 20164 4604 ? Ss May18 0:01 /lib/systemd/systemd-udevd
root 951 0.0 0.0 0 0 ? S May18 0:00 [vchiq-keep/0]
root 952 0.0 0.0 0 0 ? S< May18 0:00 [SMIO]
root 1194 0.0 0.0 0 0 ? I< May18 0:00 [mmal-vchiq]
root 1195 0.0 0.0 0 0 ? I< May18 0:00 [cfg80211]
root 1199 0.0 0.0 0 0 ? I< May18 0:00 [mmal-vchiq]
root 1209 0.0 0.0 0 0 ? I< May18 0:00 [mmal-vchiq]
root 1213 0.0 0.0 0 0 ? I< May18 0:00 [mmal-vchiq]
root 1323 0.0 0.0 0 0 ? I< May18 0:00 [brcmf_wq/mmc1:0]
root 1325 0.0 0.0 0 0 ? S May18 0:00 [brcmf_wdog/mmc1]
root 1530 0.0 0.0 0 0 ? I< May18 0:00 [kaluad]
root 1531 0.0 0.0 0 0 ? I< May18 0:00 [kmpath_rdacd]
root 1532 0.0 0.0 0 0 ? I< May18 0:00 [kmpathd]
root 1533 0.0 0.0 0 0 ? I< May18 0:00 [kmpath_handlerd]
root 1534 0.0 0.2 280232 16604 ? SLsl May18 0:31 /sbin/multipathd -d -s
root 1545 0.0 0.0 0 0 ? S< May18 0:00 [loop0]
root 1548 0.0 0.0 0 0 ? S< May18 0:00 [loop1]
root 1549 0.0 0.0 0 0 ? S< May18 0:00 [loop2]
systemd+ 1573 0.0 0.0 89940 6240 ? Ssl May18 0:01 /lib/systemd/systemd-timesyncd
systemd+ 1613 0.0 0.0 26136 5964 ? Ss May18 0:16 /lib/systemd/systemd-networkd
systemd+ 1615 0.0 0.1 24124 11860 ? Ss May18 0:35 /lib/systemd/systemd-resolved
root 1649 0.0 0.0 237504 6736 ? Ssl May18 0:11 /usr/lib/accountsservice/accounts-daemon
message+ 1650 0.0 0.0 8304 4628 ? Ss May18 0:03 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 1653 0.0 0.0 80940 1508 ? Ssl May18 0:49 /usr/sbin/irqbalance --foreground
root 1654 0.0 0.2 29108 16644 ? Ss May18 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
syslog 1656 0.0 0.0 221180 4172 ? Ssl May18 0:01 /usr/sbin/rsyslogd -n -iNONE
root 1659 0.0 0.0 16468 6608 ? Ss May18 0:02 /lib/systemd/systemd-logind
root 1661 0.0 0.0 12376 4752 ? Ss May18 0:02 /sbin/wpa_supplicant -u -s -O /run/wpa_supplicant
root 1688 0.0 0.0 8336 2576 ? Ss May18 0:01 /usr/sbin/cron -f
root 1692 0.0 0.4 48524 32780 ? Ss May18 0:01 /usr/bin/python3 /usr/bin/salt-minion
root 1705 0.0 0.2 107832 19316 ? Ssl May18 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
daemon 1707 0.0 0.0 3592 1816 ? Ss May18 0:00 /usr/sbin/atd -f
root 1711 0.0 0.0 6836 1892 ttyS0 Ss+ May18 0:00 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220
root 1713 0.0 0.0 5312 1468 tty1 Ss+ May18 0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
root 1726 0.0 0.0 12208 6408 ? Ss May18 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root 1732 0.0 0.0 232936 6072 ? Ssl May18 0:00 /usr/lib/policykit-1/polkitd --no-debug
root 1811 0.1 0.7 986536 57444 ? Sl May18 6:06 /usr/bin/python3 /usr/bin/salt-minion
root 1813 0.0 0.3 125164 26592 ? S May18 0:00 /usr/bin/python3 /usr/bin/salt-minion
root 2125 0.0 0.0 0 0 ? S< May18 0:00 [loop3]
root 2193 0.0 0.0 0 0 ? S< May18 0:00 [loop4]
root 2253 0.0 0.3 1148884 29436 ? Ssl May18 1:10 /usr/lib/snapd/snapd
root 2544 0.0 0.0 0 0 ? S< May18 0:00 [loop5]
avahi 6392 0.0 0.0 7152 3072 ? Ss May18 0:01 avahi-daemon: running [netrics-2.local]
avahi 6393 0.0 0.0 6888 328 ? S May18 0:00 avahi-daemon: chroot helper
root 37722 0.0 0.0 0 0 ? I 17:55 0:00 [kworker/3:0-events]
root 39183 0.0 0.0 0 0 ? I 19:00 0:01 [kworker/2:1-events]
root 40291 0.0 0.0 0 0 ? I 20:03 0:00 [kworker/0:0-events]
root 40493 0.0 0.0 0 0 ? I 20:15 0:00 [kworker/u8:1-events_power_efficient]
root 40671 0.0 0.0 0 0 ? I 20:25 0:00 [kworker/0:1-events]
root 40750 0.0 0.0 0 0 ? I< 20:25 0:00 [kworker/3:1H]
root 40753 0.0 0.0 0 0 ? I< 20:25 0:00 [kworker/1:2H]
root 40756 0.0 0.0 0 0 ? I 20:25 0:00 [kworker/2:2-events]
root 40757 0.0 0.0 0 0 ? I 20:25 0:00 [kworker/1:0-events]
ubuntu 40761 0.1 0.1 19372 9920 ? Ss 20:26 0:00 /lib/systemd/systemd --user
ubuntu 40762 0.0 0.0 171608 6876 ? S 20:26 0:00 (sd-pam)
root 40845 0.0 0.0 0 0 ? I 20:26 0:00 [kworker/3:2-events]
root 40872 0.0 0.0 0 0 ? I< 20:26 0:00 [kworker/2:0H]
root 40891 0.0 0.0 15460 7808 ? Ss 20:27 0:00 sshd: ubuntu [priv]
ubuntu 40965 0.0 0.0 15596 4364 ? S 20:27 0:00 sshd: ubuntu@pts/0
ubuntu 40966 0.0 0.0 9664 4608 pts/0 Ss 20:27 0:00 -bash
root 40976 0.0 0.0 15460 7680 ? Ss 20:28 0:00 sshd: ubuntu [priv]
ubuntu 41050 0.0 0.0 15596 4464 ? S 20:28 0:00 sshd: ubuntu@pts/1
ubuntu 41051 0.0 0.0 9664 4608 pts/1 Ss+ 20:28 0:00 -bash
root 41093 0.0 0.0 0 0 ? I 20:30 0:00 [kworker/0:2-events]
root 41164 0.0 0.0 0 0 ? I< 20:31 0:00 [kworker/0:2H]
root 41165 0.0 0.0 0 0 ? I 20:31 0:00 [kworker/1:1-events]
root 41166 0.0 0.0 0 0 ? I 20:31 0:00 [kworker/u8:0-events_power_efficient]
root 41206 0.0 0.0 0 0 ? I 20:32 0:00 [kworker/2:0-events]
root 41218 0.0 0.0 0 0 ? I 20:32 0:00 [kworker/3:1-mm_percpu_wq]
root 41236 0.0 0.0 0 0 ? I< 20:32 0:00 [kworker/3:0H]
root 41255 0.0 0.0 0 0 ? I< 20:33 0:00 [kworker/1:1H]
root 41262 0.0 0.0 0 0 ? I< 20:35 0:00 [kworker/2:2H]
root 41344 0.0 0.0 0 0 ? I 20:36 0:00 [kworker/1:2-events]
root 41345 0.0 0.0 0 0 ? I 20:36 0:00 [kworker/u8:2-events_unbound]
ubuntu 41346 0.0 0.0 11140 3144 pts/0 R+ 20:36 0:00 ps -auxww
from ndt7-client-go.
same problem with a fresh raspbian install 32bits:
Linux raspberrypi 5.10.17-v7l+ #1414 SMP Fri Apr 30 13:20:47 BST 2021 armv7l GNU/Linux
go1.16.4.linux-armv6l.tar.gz
log
pi@raspberrypi:~ $ go get -v github.com/m-lab/ndt7-client-go/cmd/ndt7-client
go: downloading github.com/m-lab/ndt7-client-go v0.4.1
go: downloading github.com/m-lab/go v0.1.43
go: downloading github.com/gorilla/websocket v1.4.2
go: downloading github.com/m-lab/locate v0.4.1
go: downloading github.com/m-lab/ndt-server v0.20.2
go: downloading github.com/araddon/dateparse v0.0.0-20200409225146-d820a6159ab1
go: downloading github.com/m-lab/tcp-info v1.5.2
github.com/m-lab/ndt-server/metadata
runtime/cgo
github.com/araddon/dateparse
github.com/m-lab/go/rtx
github.com/m-lab/locate/api/v2
github.com/m-lab/ndt7-client-go/internal/params
github.com/m-lab/tcp-info/tcp
github.com/m-lab/go/flagx
net
net/textproto
github.com/m-lab/go/anonymize
crypto/x509
vendor/golang.org/x/net/http/httpproxy
github.com/m-lab/tcp-info/inetdiag
github.com/m-lab/ndt-server/ndt7/model
vendor/golang.org/x/net/http/httpguts
github.com/m-lab/ndt7-client-go/spec
github.com/m-lab/ndt7-client-go/cmd/ndt7-client/internal/emitter
crypto/tls
net/http/httptrace
net/http
github.com/m-lab/locate/api/locate
github.com/gorilla/websocket
github.com/m-lab/ndt7-client-go/internal/websocketx
github.com/m-lab/ndt7-client-go/internal/download
github.com/m-lab/ndt7-client-go/internal/upload
github.com/m-lab/ndt7-client-go
github.com/m-lab/ndt7-client-go/cmd/ndt7-client
pi@raspberrypi:~ $ cd ~/go/bin/
pi@raspberrypi:~/go/bin $ ./ndt7-client
download in progress with ndt-mlab2-ord06.mlab-oti.measurement-lab.org
Avg. speed : 121.0 Mbit/s
download: complete
upload in progress with ndt-mlab2-ord06.mlab-oti.measurement-lab.org
Avg. speed : 23.7 Mbit/s
upload: complete
Server: ndt-mlab2-ord06.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 7.2 ms
Download: 121.0 Mbit/s
Upload: 23.7 Mbit/s
Retransmission: 0.00 %
pi@raspberrypi:~/go/bin $ ./ndt7-client
download in progress with ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Avg. speed : 122.0 Mbit/s
download: complete
upload in progress with ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Avg. speed : 24.0 Mbit/s
upload: complete
Server: ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 6.2 ms
Download: 122.0 Mbit/s
Upload: 24.0 Mbit/s
Retransmission: 0.00 %
pi@raspberrypi:~/go/bin $ ./ndt7-client
download in progress with ndt-mlab3-ord06.mlab-oti.measurement-lab.org
Avg. speed : 121.3 Mbit/s
download: complete
upload in progress with ndt-mlab3-ord06.mlab-oti.measurement-lab.org
Avg. speed : 24.1 Mbit/s
upload: complete
Server: ndt-mlab3-ord06.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 5.8 ms
Download: 121.3 Mbit/s
Upload: 24.1 Mbit/s
Retransmission: 0.00 %
pi@raspberrypi:~/go/bin $
from ndt7-client-go.
sure, here you go:
./ndt7-client-go/cmd/ndt7-client/ndt7-rob -locate.url=https://locate-dot-mlab-staging.appspot.com/v2/nearest -scheme=wss
upload: complete
Server: ndt-mlab4-ord05.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 3.9 ms
Download: 814.9 Mbit/s
Upload: 42.9 Mbit/s
Retransmission: 0.01 %
upload: complete
Server: ndt-mlab4-ord06.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 3.7 ms
Download: 827.5 Mbit/s
Upload: 42.9 Mbit/s
Retransmission: 0.00 %
upload: complete
Server: ndt-mlab4-ord05.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 4.3 ms
Download: 851.3 Mbit/s
Upload: 42.9 Mbit/s
Retransmission: 0.00 %
upload: complete
Server: ndt-mlab4-ord05.mlab-staging.measurement-lab.org
Client: [REDACTED]
Latency: 4.0 ms
Download: 844.3 Mbit/s
Upload: 42.9 Mbit/s
Retransmission: 0.00 %
FYI, your PR is being tested on multiple connections ranging from 100Mbps to 1000Mbps and with RPis and Jetson, a total of ~10 devices. If you don't hear anything else from me, that means your PR is nailing it.
from ndt7-client-go.
This looks much more sane for me now, too.
from ndt7-client-go.
Same issue for me from my home, compare Ookla to NDT7.
from ndt7-client-go.
@feamster are those results also using ndt7-client-go
on a raspi?
from ndt7-client-go.
Thank you both for following up! I am going to help people in the core M-Lab team to better understand those issues! π€
from ndt7-client-go.
@ggmartins Thank you for reporting this and the kind words! May I ask which version of Go did you use to build the client running on the Raspberry Pi?
On some CPUs (namely, armv7 without a hardware AES implementation) TLS adds a lot of overhead because the crypto is implemented in Go. In those cases, I would recommend using -scheme=ws
to disable TLS, which usually makes a big difference. However, what's strange is that you're seeing bad performances on what I believe being an arm64 CPU that supports AES. Could you please make sure you're using a recent go release (ideally, 1.16) to build the client?
Also, the output of cat /proc/cpuinfo
on the Raspberry Pi would be helpful. Thanks!
from ndt7-client-go.
@ggmartins Yes, AES is (only) used by TLS encryption and the lack of hardware support is definitely the culprit. :)
Would you be willing to build and test the sandbox-roberto-remove-memory-alloc
branch? I've made some optimizations that improved the performance on an armv7 device I have here quite significantly when running the measurement without TLS.
Additionally, it's probably also worth testing with the next release (v0.20.6) of ndt-server that is available on our staging environment. You can use a server from the staging environment by running the ndt7-client like this:
./ndt7-client -locate.url=https://locate-dot-mlab-staging.appspot.com/v2/nearest -scheme=ws
(I expect the staging environment to perform a bit better even when using TLS, but you won't get reasonable speeds without hardware encryption anyway.)
A test with just -scheme=ws
and one with -locate.url=https://locate-dot-mlab-staging.appspot.com/v2/nearest -scheme=ws
would allow me to understand what's slowing the client down a bit better. Thanks!
from ndt7-client-go.
sure, I can do that in the next couple of hours, bwm.
from ndt7-client-go.
@jlivingood let me break this down for you:
- segment @ ~150Mbps, rpi4 wss (encrypted, old client code)
- segment @ ~740Mbps, rpi4 ws (unencrypted, old client code)
- segment @ ~900Mbps, rpi4 ws (unencrypted, new client code)
(all of this using production servers)
not in the graph:
~850Mbps rpi4 wss (encrypted, new client code on staging servers, to be released soon)
https://locate-dot-mlab-staging.appspot.com/v2/nearest
my takeaway here: even with ChaCha20 we'll still see encrypted measurements underperforming unencrypted, the question is, do we really need encryption? imho, I don't think so, but we do need obfuscation, so I think the holy grail here would be having WebSockets supporting an ultra-lightweight obfuscation method outperforming the lightest encryption method available, I'm not an expert in the matter, maybe this already exists.
cheers,
G
from ndt7-client-go.
What is the CPU load when running the test? Is ndt7-client-go
using 100% of the CPU?
from ndt7-client-go.
@ggmartins Thank you very much for providing detailed information! π π―
I find it interesting that in the htop
screenshot you posted there is a Go thread using 150% of the CPU. This strikes me as a goroutine doing too much work, even though I don't know very well what this goroutine may be doing.
It would probably help to shed more light on what's happening to use the diff at #60 to collect a CPU profile. Would you mind building ndt7-client
using my fork at the branch indicated in #60? This PR adds a CLI flag, -profile <file>
, which collects CPU profile information. If you don't mind sharing the collected cpuprofile file, then looking at it would certainly help me and the core time to much better understanding of what could be the bottleneck there.
I suppose the ideal is to gather a profile for both the arm64 and the arm32 devices you mentioned.
from ndt7-client-go.
@ggmartins it just occurred to me another useful data point we could collect here (long time, no looking at this ndt7 codebase but still I'm interested to understand this bug, which could also affect us at @ooni).
There is a way run ndt7-client
w/o encryption: ./ndt7-client -scheme ws
. This forces the unencrypted ws
scheme. A basic test is to execute the client in this mode and check whether there are differences in the performance. In case there are striking performance differences, this is a reasonable indicator that the problem is encryption. (I know the Go codebase does not support hardware acceleration for AES on arm32, though I am not sure this could be the root cause of your issue because the problem you reported occurs with an arm64 device, for which there should be support.)
Thanks again for investigating this problem! π€
from ndt7-client-go.
@feamster with more info on you setup we can make it faster! Is that arm32, arm64, or desktop? Thanks for reporting that!
from ndt7-client-go.
@bassosimone running w/o enc unlocked better numbers:
ubuntu@netrics:~/ndt7profile/ndt7-client-go/cmd/ndt7-client$ ./ndt7-client-prof
download in progress with ndt-mlab3-ord03.mlab-oti.measurement-lab.org
Avg. speed : 155.1 Mbit/s
download: complete
upload in progress with ndt-mlab3-ord03.mlab-oti.measurement-lab.org
Avg. speed : 24.5 Mbit/s
upload: complete
...
Retransmission: 0.00 %
ubuntu@netrics:~/ndt7profile/ndt7-client-go/cmd/ndt7-client$ ./ndt7-client-prof -scheme ws
download in progress with ndt-mlab1-ord06.mlab-oti.measurement-lab.org
Avg. speed : 414.3 Mbit/s
download: complete
upload in progress with ndt-mlab1-ord06.mlab-oti.measurement-lab.org
Avg. speed : 24.6 Mbit/s
upload: complete
...
Retransmission: 0.00 %
ubuntu@netrics:~/ndt7profile/ndt7-client-go/cmd/ndt7-client$ ./ndt7-client-prof -scheme ws
download in progress with ndt-mlab3-ord04.mlab-oti.measurement-lab.org
Avg. speed : 409.3 Mbit/s
download: complete
upload in progress with ndt-mlab3-ord04.mlab-oti.measurement-lab.org
Avg. speed : 24.6 Mbit/s
upload: complete
...
Retransmission: 0.00 %
although Jetson Nano can do better:
./ndt7-client-prof
download in progress with ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Avg. speed : 456.4 Mbit/s
download: complete
upload in progress with ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Avg. speed : 23.5 Mbit/s
upload: complete
Server: ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 27.6 ms
Download: 456.4 Mbit/s
Upload: 23.5 Mbit/s
Retransmission: 0.00 %
from ndt7-client-go.
@bassosimone yes, this: Linux raspberrypi 5.10.17-v7l+ #1414 SMP Fri Apr 30 13:20:47 BST 2021 armv7l GNU/Linux
Disabling encryption sped things up a bit, but still nowhere near iperf3
from ndt7-client-go.
Hi @robertodauria, thanks for helping on this.
yes, we see the problem, no aes on hw for the rpi, that explains the cheap price, at least :-)
is the aes only used for encryption? because we tried disabling it with no joy for speeds > 500Mbps (see Nick's graphics above). I mean, there's an unmatched performance with Ookla's speedtest that gets noticeable at higher speeds and we desperately need the retransmission rate from ndt7 :-) Let us know your thoughts
Jetson Nano
processor : 0
model name : ARMv8 Processor rev 1 (v8l)
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd07
CPU revision : 1
processor : 1
model name : ARMv8 Processor rev 1 (v8l)
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd07
CPU revision : 1
processor : 2
model name : ARMv8 Processor rev 1 (v8l)
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd07
CPU revision : 1
processor : 3
model name : ARMv8 Processor rev 1 (v8l)
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd07
CPU revision : 1
Jetson NX
cat /proc/cpuinfo
processor : 0
model name : ARMv8 Processor rev 0 (v8l)
BogoMIPS : 62.50
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x004
CPU revision : 0
MTS version : 50168445
processor : 1
model name : ARMv8 Processor rev 0 (v8l)
BogoMIPS : 62.50
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x004
CPU revision : 0
MTS version : 50168445
processor : 2
model name : ARMv8 Processor rev 0 (v8l)
BogoMIPS : 62.50
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x004
CPU revision : 0
MTS version : 50168445
processor : 3
model name : ARMv8 Processor rev 0 (v8l)
BogoMIPS : 62.50
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x004
CPU revision : 0
MTS version : 50168445
processor : 4
model name : ARMv8 Processor rev 0 (v8l)
BogoMIPS : 62.50
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x004
CPU revision : 0
MTS version : 50168445
processor : 5
model name : ARMv8 Processor rev 0 (v8l)
BogoMIPS : 62.50
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x004
CPU revision : 0
MTS version : 50168445
Raspberry Pi 4 Model B Rev 1.4
processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
Hardware : BCM2835
Revision : d03114
Serial : 100000005d172b32
Model : Raspberry Pi 4 Model B Rev 1.4
on both rpi and nano we're using ubuntu 64 mostly, and with go1.16.4.linux-arm64.tar.gz, same results on rpi:
go version
go version go1.16.4 linux/arm64
ubuntu@netrics:~/golang$ ~/go/bin/ndt7-client
download in progress with ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Avg. speed : 154.3 Mbit/s
download: complete
upload in progress with ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Avg. speed : 23.6 Mbit/s
upload: complete
Server: ndt-mlab1-ord05.mlab-oti.measurement-lab.org
Client: [REDACTED]
Latency: 26.8 ms
Download: 154.3 Mbit/s
Upload: 23.6 Mbit/s
Retransmission: 0.00 %
fwiw, I was able to replicate the problem on a raspbian 32bits:
Linux raspberrypi 5.10.17-v7l+ #1414 SMP Fri Apr 30 13:20:47 BST 2021 armv7l GNU/Linux
go1.16.4.linux-armv6l.tar.gz
Thanks,
G
from ndt7-client-go.
@ggmartins Very happy to hear that! π
The changes to the client have just been merged to the master branch with the PR above.
There is also an additional change I've made to automatically detect if the AES crypto extensions are available (on x86, ARMv7 and ARMv8) and default to using WS if not, so if you are building the latest code you won't need to specify the scheme anymore.
I'm quite surprised by the improvements with TLS and the staging ndt-server. On my test device TLS was ~50% faster (~130Mb/s -> ~190Mb/s), but you are seeing 4.4x the previous speed. Could you please confirm that result by explicitly setting -scheme=wss -locate.url=https://locate-dot-mlab-staging.appspot.com/v2/nearest
?
The promotion of ndt-server v0.20.6 from staging to production should happen before the end of the month -- likely next week. :)
from ndt7-client-go.
OK, it makes more sense now. Thanks for testing again! :)
Go1.16 changed the TLS negotiation so that if the client does not signal that it supports hardware AES it defaults to ChaCha20, for which there is an assembly implementation in go's crypto library for ARM64 but not for ARMv7. That explains why I wasn't seeing that much improvement (my ODROID runs on ARMv7).
However, without TLS my ODROID is pretty consistently giving me the same or slightly better results than Ookla's command-line client after that fix.
Closing this issue, for now. Please feel free to reopen if needed.
from ndt7-client-go.
What's the exact increase in speed above? Eyeballing it looks like a ~300% increase in measurement results or alternatively, the prior measurements under-reported "speeds" by around 80%? Anyway - it would be good to understand both of these stats.
from ndt7-client-go.
These improvements look great.
Many of our initial deployments of ndt7 was seeing 150 Mbps; then 740 Mbps before eventually a fix was deployed. (ndt1 never even came close to 1 Gbps). Some clients are still seeing 850 Mbps in comparison to other speedtests. Presumably there are many clients who have higher downstream speeds than 150 Mbps at home and some of the ndt7 data in the public data may also be invalid.
The changes here do affect the subset of clients we tested on, but you don't have any way of knowing that they only affect those clients without extensive testing. The issues were related to Go's garbage collector and the default encryption used in an old version of Goβboth of which would be exacerbated by running on an embedded device, but nonetheless omnipresent and possibly an issue for other tests. The particular garbage collection slowdown is a known issue for Websockets in Go more broadly: gorilla/websocket#134
And it is in fact pretty common for people to run these kinds of tests on embedded devices that are always on, attached to routers, etc. That's how we've been doing it since 2010, and many projects deploy router-based speed tests. Anyone who was using this test prior to June 4, 2021 may very well have injected bad data into the database.
In hindsight, if you have metadata about the nature of the device on which the test was being performed, it may be possible to go back and look for patterns in the old data to try to clean it up. There may be other ways to clean things up. For example, if you see a client that consistently measures 900 Mbps but periodically experiences drops, that's less likely this particular bug. But if the measurement is always 150 Mbps, you really have no way of knowing what's causing that absent more metadata.
Absent a cleanup effort, the only conclusion we can draw is that some non-zero and possibly significant amount of NDT test data may be subject to client-side limitations that affect the accuracy of the test, and that any data prior to June 4, 2021 should be discarded. Ethical practice suggests that you should expunge the old, inaccurate data from the M-Lab servers (or at the least, annotate it or move it to a legacy table) so that it doesn't continue to be misused by the public.
from ndt7-client-go.
Related Issues (20)
- Make the summary data hierarchical HOT 1
- Malware reports against ndt7-client.exe HOT 3
- Error compiling ndt7-client on Windows HOT 12
- document missing dependencies to compile for Windows HOT 1
- add flag to enable client metadata to be set HOT 7
- Getting "websocket: bad handshake" response when using WS scheme HOT 5
- "bad handshake" for upload after download test HOT 2
- Try next available server on doConnection failure
- Randomize the payload when using cleartext WebSocket HOT 2
- Follow up from GoToMyPC use case with ndt7-client-go HOT 1
- Make client name overridable at build time
- Upload Test Result Inconsistency HOT 3
- A pure upload test does not report latency HOT 1
- Testing both up and down simultaneously would be a nice option HOT 2
- Single number summaries can be misleading HOT 5
- Upload tests report no results when run against non-Linux NDT server
- Option / flag to use IPv4 or IPv6
- Run client (ndt7-prometheus-exporter) against multiple servers HOT 2
- HTTP Proxy support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ndt7-client-go.