Comments (7)
Update: After directly flashing my rpi4 with OpenWRT 23.05.2 with Linux v5.15.137 compiled by OpenWRT, I got 1.01 Gbit/sec!
| Raspberry Pi 4 / BCM2711* | OpenWRT 23.05.2 / 5.15.137 | 1.01 Gbits/sec |
from wg-bench.
One interesting finding: Use CONFIG_PREEMPT_NONE
instead of CONFIG_PREEMPT
in kernel config we can reach ~700Mbps on 6.1.y Kernel. CONFIG_PREEMPT_NONE
is set by default in OpenWRT Kernel.
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 47296 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 78.2 MBytes 656 Mbits/sec 0 402 KBytes
[ 5] 1.00-2.00 sec 80.2 MBytes 672 Mbits/sec 0 441 KBytes
[ 5] 2.00-3.00 sec 79.6 MBytes 668 Mbits/sec 0 441 KBytes
[ 5] 3.00-4.00 sec 80.3 MBytes 674 Mbits/sec 0 441 KBytes
[ 5] 4.00-5.00 sec 80.8 MBytes 678 Mbits/sec 0 441 KBytes
[ 5] 5.00-6.00 sec 81.0 MBytes 679 Mbits/sec 0 441 KBytes
[ 5] 6.00-7.00 sec 79.5 MBytes 667 Mbits/sec 0 441 KBytes
[ 5] 7.00-8.00 sec 80.1 MBytes 672 Mbits/sec 0 441 KBytes
[ 5] 8.00-9.00 sec 80.1 MBytes 672 Mbits/sec 0 441 KBytes
[ 5] 9.00-10.00 sec 79.7 MBytes 668 Mbits/sec 0 441 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 799 MBytes 671 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 798 MBytes 669 Mbits/sec receiver
iperf Done.
from wg-bench.
Another interesting finding: Turn off CONFIG_FTRACE
together with CONFIG_PREEMPT_NONE
we can reach ~1.1Gbps on bcm2711_defconfig with rpi-6.1.y.
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 37182 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 135 MBytes 1.13 Gbits/sec 0 818 KBytes
[ 5] 1.00-2.00 sec 130 MBytes 1.09 Gbits/sec 0 860 KBytes
[ 5] 2.00-3.00 sec 126 MBytes 1.05 Gbits/sec 0 975 KBytes
[ 5] 3.00-4.00 sec 130 MBytes 1.09 Gbits/sec 0 1022 KBytes
[ 5] 4.00-5.00 sec 130 MBytes 1.09 Gbits/sec 0 1.07 MBytes
[ 5] 5.00-6.00 sec 132 MBytes 1.11 Gbits/sec 0 1.07 MBytes
[ 5] 6.00-7.00 sec 132 MBytes 1.11 Gbits/sec 0 1.14 MBytes
[ 5] 7.00-8.00 sec 132 MBytes 1.11 Gbits/sec 0 1.26 MBytes
[ 5] 8.00-9.00 sec 129 MBytes 1.08 Gbits/sec 0 1.26 MBytes
[ 5] 9.00-10.01 sec 130 MBytes 1.08 Gbits/sec 0 1.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 1.28 GBytes 1.09 Gbits/sec 0 sender
[ 5] 0.00-10.01 sec 1.27 GBytes 1.09 Gbits/sec receiver
iperf Done.
However, if we turn off CONFIG_FTRACE
then a series of configurations that depend on it will also be turned off. Thus, we will need further investigation to see what config hinders the performance.
104d103
< # CONFIG_BPF_LSM is not set
139d137
< CONFIG_TASKS_RUDE_RCU=y
260d257
< CONFIG_TRACEPOINTS=y
1603d1599
< # CONFIG_BATMAN_ADV_TRACING is not set
1637d1632
< # CONFIG_NET_DROP_MONITOR is not set
2965d2959
< # CONFIG_ATH6KL_TRACING is not set
8041d8034
< # CONFIG_PSTORE_FTRACE is not set
8492d8484
< # CONFIG_TRACE_MMIO_ACCESS is not set
8726d8717
< # CONFIG_DEBUG_PAGE_REF is not set
8803,8804d8793
< CONFIG_TRACE_IRQFLAGS=y
< CONFIG_TRACE_IRQFLAGS_NMI=y
8837d8825
< CONFIG_NOP_TRACER=y
8845d8832
< CONFIG_TRACER_MAX_TRACE=y
8847,8853d8833
< CONFIG_RING_BUFFER=y
< CONFIG_EVENT_TRACING=y
< CONFIG_CONTEXT_SWITCH_TRACER=y
< CONFIG_RING_BUFFER_ALLOW_SWAP=y
< CONFIG_PREEMPTIRQ_TRACEPOINTS=y
< CONFIG_TRACING=y
< CONFIG_GENERIC_TRACER=y
8855,8895c8835
< CONFIG_FTRACE=y
< # CONFIG_BOOTTIME_TRACING is not set
< CONFIG_FUNCTION_TRACER=y
< CONFIG_FUNCTION_GRAPH_TRACER=y
< CONFIG_DYNAMIC_FTRACE=y
< CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
< CONFIG_FUNCTION_PROFILER=y
< CONFIG_STACK_TRACER=y
< CONFIG_IRQSOFF_TRACER=y
< CONFIG_SCHED_TRACER=y
< # CONFIG_HWLAT_TRACER is not set
< # CONFIG_OSNOISE_TRACER is not set
< # CONFIG_TIMERLAT_TRACER is not set
< # CONFIG_FTRACE_SYSCALLS is not set
< CONFIG_TRACER_SNAPSHOT=y
< CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
< CONFIG_BRANCH_PROFILE_NONE=y
< # CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
< # CONFIG_PROFILE_ALL_BRANCHES is not set
< CONFIG_BLK_DEV_IO_TRACE=y
< CONFIG_KPROBE_EVENTS=y
< # CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
< # CONFIG_UPROBE_EVENTS is not set
< CONFIG_BPF_EVENTS=y
< CONFIG_DYNAMIC_EVENTS=y
< CONFIG_PROBE_EVENTS=y
< CONFIG_FTRACE_MCOUNT_RECORD=y
< CONFIG_FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY=y
< # CONFIG_SYNTH_EVENTS is not set
< # CONFIG_HIST_TRIGGERS is not set
< # CONFIG_TRACE_EVENT_INJECT is not set
< # CONFIG_TRACEPOINT_BENCHMARK is not set
< # CONFIG_RING_BUFFER_BENCHMARK is not set
< # CONFIG_TRACE_EVAL_MAP_FILE is not set
< # CONFIG_FTRACE_RECORD_RECURSION is not set
< # CONFIG_FTRACE_STARTUP_TEST is not set
< # CONFIG_RING_BUFFER_STARTUP_TEST is not set
< # CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
< # CONFIG_PREEMPTIRQ_DELAY_TEST is not set
< # CONFIG_KPROBE_EVENT_GEN_TEST is not set
< # CONFIG_RV is not set
---
> # CONFIG_FTRACE is not set
from wg-bench.
Yet another interesting finding: turn off CONFIG_IRQSOFF_TRACER
along with CONFIG_PREEMPT_NONE
can also reach ~1.1Gbps.
Turn on CONFIG_IRQSOFF_TRACER
will also affect the following configurations:
8803a8804,8805
> CONFIG_TRACE_IRQFLAGS=y
> CONFIG_TRACE_IRQFLAGS_NMI=y
8849a8852
> CONFIG_PREEMPTIRQ_TRACEPOINTS=y
8861c8864
< # CONFIG_IRQSOFF_TRACER is not set
---
> CONFIG_IRQSOFF_TRACER=y
from wg-bench.
In my RPi 4B, using OpenWrt 23.05.2 (64bit), the tested result was 881Mbps.
from wg-bench.
BTW I believe 32bit VS 64bit should show some difference, probably we should indicate this?
from wg-bench.
BTW I believe 32bit VS 64bit should show some difference, probably we should indicate this?
For an out-of-order CPU, 32bit vs 64bit shows same performance is normal, sometimes 64bit may slower for fatter pointer size which consumes more cache capacity. Intuitively we think 64bit will be fast is based on the register width doubled so it will be faster to processing something like 64-bit arithmetic operations only take one instruction to finish. But please remind that 64-bit operations also has longer latency on the CPU physical circuit which may needs to lower the frequency or more cycles to produce. Itβs the same on SIMD.
The crypto algorithm in WireGuard is chacha20 and poly1305 also uses SIMD i.e. arm neon to calculate, if uarch implementation does not provide wide enough simd processing in a single cycle, we will get the same performance on whatever 32/64 bit.
from wg-bench.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wg-bench.