What's new

RT-AX86U Pro / AES Instruction set / Hardware-based vpn acceleration?

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Instruction sets in the CPU are hardware AES, afaik.

It's a specific set of instructions in the ARMv8-A set - has to be licensed, which the Router SoC's have, but there are ARMv8's that do not have hardware AES - Pi's for example do not, as Broadcom did not license the instuctions for those CPU's (even Pi4)...

AllWinner H5 (Cortex-A53) in 64-bit - fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

Rpi 3 (Cortex-A53) in 64 bit mode - fp asimd evtstrm crc32 cpuid
 
Wireguard does not use AES - chacha20-Poly1305 is used there...

Yeah was kinda 2 thoughts blended into one paragraph. Now if someone can come up with a ChaPoly instruction set imagine how fast it would be. On x86 it can easily exceed 1G without any instruction set or hardware acceleration.
 
The list is fairly long: various AES, SHA, MD5 variants, etc... Check the list of supported kernel cryptos, and look for all of these that says "module: bcmspu" in it.

Code:
cat /proc/crypto

My most recent test (from 2018) on the RT-AX88U allowed me to hit 387 Mbps, with lower CPU usage than another test done with pcrypto:

Code:
E:\Share>iperf -c 192.168.50.12 -N -M 1400 -t 20
------------------------------------------------------------
Client connecting to 192.168.50.12, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[316] local 10.10.10.1 port 14909 connected with 192.168.50.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[316]  0.0-20.0 sec    924 MBytes    387 Mbits/sec
 
 
Mem: 434776K used, 469692K free, 0K shrd, 3888K buff, 36492K cached
CPU:  0.2% usr 20.7% sys  0.3% nic 59.7% idle  0.0% io  0.0% irq 18.9% sirq
Load average: 3.32 3.37 2.32 4/191 13694
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  240     2 admin    RW       0  0.0   3 23.9 [pdc_rx]
  230     2 admin    RW       0  0.0   0 13.4 [bcmsw_rx]
 1170     1 admin    S N   9068  1.0   1  1.6 httpds -s -i br0 -p 8443

 2608     1 admin    R N  21208 2.3   1  0.4 aaews --sdk_log_dir=/tmp

BCMSPU usage stats can me monitored here:

Code:
[/FONT]
admin@stargate:/sys# cat kernel/debug/bcmspu/stats
Number of SPUs.........0
Current sessions.......0
Session count..........0
Cipher setkey..........0
Cipher Ops.............0
Hash Ops...............0
HMAC setkey............0
HMAC Ops...............0
AEAD setkey............0
AEAD Ops...............0
Bytes of req data......0
Bytes of resp data.....0
Channel full...........0
Channel send failures..0
Check ICV errors.......0
Packets blogged (us)...0
                (ds)...0


No, because Wireguard uses Chacha20. Faster performance than AES, but not hardware accelerated so it's more CPU intensive than a hardware-accelerated AES implementation.

Hm so in theory an IPSEC VPN is a better choice on these routers than OpenVPN? Even if the hardware caps out at 3xx (still better than OpenVPN) your CPU is still free to process other things. Of course IPSEC is a bit more complicated to get right and potentially not quite as secure (though unlikely to be an issue for a home user).
 
Hm so in theory an IPSEC VPN is a better choice on these routers than OpenVPN?
Yes. However Asuswrt only supports IPSEC servers, not clients.
 
Nice way to put in a really useful chip and then cripple it with software.

All relative - IPSec is fast, but with a lot of options, more than what is supportable for a general purpose product.

OpenVPN, in it's favor, has the conf files that one can just drop in and run... which is one of the reasons why it's so popular - I have issues with the design, but I can't argue with the ubiquity of it in the market.

WG, in many ways, is the best of both worlds - easy to configure, and fairly performant - IPSec and WG compare well in speed...
 
All relative - IPSec is fast, but with a lot of options, more than what is supportable for a general purpose product.

OpenVPN, in it's favor, has the conf files that one can just drop in and run... which is one of the reasons why it's so popular - I have issues with the design, but I can't argue with the ubiquity of it in the market.

WG, in many ways, is the best of both worlds - easy to configure, and fairly performant - IPSec and WG compare well in speed...

Unfortunately it seems WG hardware support or instruction sets are not on the horizon. I guess in reality, most of the people that would benefit are just home users with inexpensive routers. Corporations doing encryption and VPN can spin up a VM or virtual appliance in AWS or Azure for a tiny bit of money (in their view) and scale it to whatever performance they need. Maybe now that home internet is scaling above 1G regularly at the same time more and more people are using VPNs (and the paid VPN companies and router manufacturers want to keep their customer base) something will come along.

Pretty much all I deal with is IPSEC (NGE now) because our clients require standards based, thoroughly tested and proven, and constantly updated, and IPSEC is the only thing that falls into that category, and Cisco routers and the majority of firewalls have hardware offloading for it so it scales fairly well.
 
Nice way to put in a really useful chip and then cripple it with software.
They didn't cripple anything, they probably just felt it wasn't worth their development time to focus on implementing IPSEC client support. They seemed to have done some ground work toward it back then, but never finalized it, so it probably got de-prioritized at some point.
 
Maybe now that home internet is scaling above 1G regularly at the same time more and more people are using VPNs (and the paid VPN companies and router manufacturers want to keep their customer base) something will come along.
The most obvious way up there is OpenVPN DCO support, which might also in theory make bcmspu usable, since it would no longer require the costly context switches. We'd need a more up-to-date kernel to test out that theory however. It depends on whether DCO still relies on OpenSSL crypto, or if it leverages the Linux crypto modules.
 

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top