What's new

OpenVPN performance

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Tried both on and off, triple checked the settings each time, and both times I was getting the standard 40-50% load on the second CPU. Given HW NAT offloads from the CPU by design, I'm not entirely sure what's going on in your case - but if it works for you that's cool!
 
Tried both on and off, triple checked the settings each time, and both times I was getting the standard 40-50% load on the second CPU. Given HW NAT offloads from the CPU by design, I'm not entirely sure what's going on in your case - but if it works for you that's cool!

According to Merlin, HW NAT in AC56U/AC68U is not true "hardware" acceleration. It's handled by CTF kernel module...obviously in kernel space. Small tricks by broadcom that make huge marketing gimmick.

Second CPU means CPU1? If so, it's not indicative to better inference. CPU1 runs the OpenVPN server in Merlin build. CPU0 runs CTF in sirq. I'm interested in your cpu0 utilisation. Why didn't you report both as I sincerely requested in a prior post? As with "HW NAT" on/off, CPU0 should show some difference at least. Or else some other issue with your setup.

EDIT: you can check CPU0&1 by telnet/putty into Asus, type "top -d1" and then press "1"
 
Last edited:
If you guys are expecting more than ~30-40/Mbits/sec from any of the current routers, you're going to be disappointed. My rt-ac68u is OC'D to 1200mhz and I can get about 25Mbits tcp on a *local lan* through the vpn and 35udp. When testing, i'm using iperf to another local linux server over the vpn. One of my cores is at 99% during the transfer and the other is idle because openvpn on the router won't use both cores. If you want to use BOTH cores, you're going to have to use both vpn servers at once, statically assign each vpn to each core, and do load balancing on both ends to saturate both vpns.

efit:nat accell makes no difference in my tests. I've also trsted with "tinc" vpn eith similar results.

EDIT: I proved myself wrong. I can now get 60MBits using a tinc vpn and a decent cipher.
 
Last edited:
If you guys are expecting more than ~30/Mbits/sec from any of the current routers, you're going to be disappointed. My rt-ac68u is OC'D to 1200mhz and I can get about 25Mbits on a *local lan* through the vpn. When testing, i'm using iperf to another local linux server over the vpn. One of my cores is at 99% during the transfer and the other is idle because openvpn on the router won't use both cores. If you want to use BOTH cores, you're going to have to use both vpn servers at once, statically assign each vpn to each core, and do load balancing on both ends to saturate both vpns.

If you're using Merlin build, Openvpn server1 is hard coded to run on CPU1 while kernel processes packets on CPU0. So the CPUs utilisation doesn't sound right to me. Something changed to bespoke in your setup..?
 
If you're using Merlin build, Openvpn server1 is hard coded to run on CPU1 while kernel processes packets on CPU0. So the CPUs utilisation doesn't sound right to me. Something changed to bespoke in your setup..?
Nothing is different. One core is 100% while the other is doing very little work (maybe 5%) processing packets. It's working as expected.
 
I then ran some tests with tinc between the ac66 (installed from entware) and the same server and got 60mbps. So I think openvpn has limitations.

I also run tinc and tested both tinc and openvpn on a local lan. tinc is slightly faster but not by a huge amount. May 10% faster but probably because of different ciphers.
 
Nothing is different. One core is 100% while the other is doing very little work (maybe 5%) processing packets. It's working as expected.

To process at 30Mbps packet rates, one core (CPU0) doesn't need 99% of cycles to do that.

My AC56U overclocked to 1200MHz (so same as your AC68U on this aspect...) uses about 40% load AND i'm doing it by software (i.e. HW NAT disabled). It's about 38Mbps.

Isn't it iconic in your case the system uses HW NAT but yet 99% load on one core (I assume CPU0)...?
 
To process at 30Mbps packet rates, one core (CPU0) doesn't need 99% of cycles to do that.

My AC56U overclocked to 1200MHz (so same as your AC68U on this aspect...) uses about 40% load AND i'm doing it by software (i.e. HW NAT disabled). It's about 38Mbps.

Isn't it iconic in your case the system uses HW NAT but yet 99% load on one core (I assume CPU0)...?
Are you reading top correctly? Are you reading the 45% next to the openvpn line? Screenprint top output please.
 
Are you reading top correctly? Are you reading the 45% next to the openvpn line? Screenprint top output please.

I'm reading the second and third rows from top of the "top" output respectively the CPU0 & CPU1 utilization. And I only read the idle values. 100 minus that to get the CPU load. So my estimates are on the end of overestimating the actual CPU load for Openvpn in both kernel and user land.
 
I'm reading the second and third rows from top of the "top" output respectively the CPU0 & CPU1 utilization. And I only read the idle values. 100 minus that to get the CPU load. So my estimates are on the end of overestimating the actual CPU load for Openvpn in both kernel and user land.

You're saying that in your case you are seeing 55% idle time for CPU 1 and some other low number for CPU 0 during a 30 MBit Openvpn transfer?

btw, FYI, with 'taskset -p 0/1 $PID' you can move openvpn process to either core and test. If I turn NAT acceleration, I get the similar CPU cycles with openvpn.

I'm guessing top is incorrect on your build? :)

Try this. Let's see what top reports if you max your CPU out. run an openssl test in one window and top in the other. Does your top show 100% utilization (0% idle)? ( openssl speed md5 ).

Also, do you mind popping a screen print up here of your top output during a openvpn speed test?

EDIT: Also, are you running compression in openvpn? That takes cycles. What cipher/encryption?

EDIT2: I just disabled HW Accel and same results. in on latest merlin build btw.
 
Last edited:
You're saying that in your case you are seeing 55% idle time for CPU 1 and some other low number for CPU 0 during a 30 MBit Openvpn transfer?

btw, FYI, with 'taskset -p 0/1 $PID' you can move openvpn process to either core and test. If I turn NAT acceleration, I get the similar CPU cycles with openvpn.

I'm guessing top is incorrect on your build? :)

Try this. Let's see what top reports if you max your CPU out. run an openssl test in one window and top in the other. Does your top show 100% utilization (0% idle)? ( openssl speed md5 ).

Also, do you mind popping a screen print up here of your top output during a openvpn speed test?

EDIT: Also, are you running compression in openvpn? That takes cycles.

EDIT2: I just disabled HW Accel and same results. in on latest merlin build btw.

For me about 60% idle time for CPU 0. Idle time is even higher for CPU1. 38Mbps openvpn transfer (using BF-CBC...in case u didn't read all my previous posts in this thread :)

I know about taskset. I even tried with Entware's openvpn-openssl build. Btw, it's 'task -p 1/2/3 $pid' I believe. 0X11 is the full mask. 0x01 (1 in decimal) for CPU0. 0. 0x10 (2 in decimal) for CPU1. If you don't mind either and let the kernel decide, can put a 3 which corresponds to 0x11.

I tried compression on and off. Not making a huge difference. Maybe about +/- 1Mbps. My tests were done through WAN.

I'm not surprised that HW NAT on/off has no effect on throughput in your case since you are testing on LAN. What surprised me is your CPU0&1 utilisation. Earlier in your posts you said one Core 99% load. The other core almost 0% load. Is that still the case from your re-test?
 
Yes its taskset -p 1,2,3(auto)'

Yes it is the same every time. It is consistent. I don't mean to be a nah-sayer but on an ARM chipset with these mhz you're just not going to see 38Mbits while only using 40% of one core.

Even if you move away from the asus router to a rasp pi 2, you're going to see about 35-45 Mbits and 100% CPU core saturation. Here are other people:

https://www.reddit.com/r/raspberry_pi/comments/2vnxb7/

I'm using AES rather than Blowfish but I doubt that is making that large of a difference.

Here are my stats when running at about 35MB/s on a LAN.

Code:
Mem: 116220K used, 139512K free, 0K shrd, 10768K buff, 57128K cached
CPU0: 35.7% usr 10.3% sys  0.0% nic  0.0% idle  0.0% io  0.0% irq 53.8% sirq
CPU1:  0.0% usr  1.5% sys  0.0% nic 91.8% idle  0.0% io  0.0% irq  6.5% sirq
Load average: 0.67 0.28 0.22 3/80 32413
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
8376     1 calisro  R     4512  1.7   0 45.5 /etc/openvpn/vpnserver2 --cd /etc/openvpn/server2 --config config.ovpn
8758     1 calisro  S     4348  1.7   0  4.0 /etc/openvpn/vpnclient2 --cd /etc/openvpn/client2 --config config.ovpn
  278     2 calisro  RW       0  0.0   1  3.5 [kworker/1:1]
  990     1 calisro  S     6484  2.5   0  0.2 httpd -s -p 8443
  511     1 calisro  S     6112  2.3   1  0.1 watchdog02
If I move it to the other CPU, I get around 40MBits (better) but still 100% CPU on the other core
Code:
Mem: 116260K used, 139472K free, 0K shrd, 10768K buff, 57132K cached
CPU0:  3.1% usr  2.5% sys  0.0% nic 21.7% idle  0.0% io  0.0% irq 72.4% sirq
CPU1: 48.5% usr 20.9% sys  0.0% nic  3.9% idle  0.0% io  0.0% irq 26.5% sirq
Load average: 0.36 0.24 0.22 3/80 32417
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
8376     1 calisro  R     4512  1.7   1 43.6 /etc/openvpn/vpnserver2 --cd /etc/openvpn/server2 --config config.ovpn
  277     2 calisro  SW       0  0.0   0 11.8 [kworker/0:1]
8758     1 calisro  S     4348  1.7   0  5.5 /etc/openvpn/vpnclient2 --cd /etc/openvpn/client2 --config config.ovpn
 
Interesting tidbit. Changed my cipherfrom aes256cbc to aes192cbc. Now I can do 45MBits /sec on an internal lan over openvpn.

What is more interesting is what @kvic was pointing out. With the decreased cipher, my cpu usage is only 80% on the core the openvpn is using. I also tried BF-CBC for the heck of it and I got the same throughput but cpu was more around 75% utilized.

I'll stick with AES-192 now but its interesting that it won't fully max the cpu with higher bandwidth.
 
And heres another tidbit. re-tested tinc vpn using aes-192-cbc and unlike openvpn it will saturate my core and also produce 60MBits/sec. (Putting the tinc daemon on core 2).


Now why will tinc use the whole 100% of the core to give me better bandwidth but openvpn won't? Very interesting indeed.
 
I look at CPU usage using htop (entware). It has nice bar graphs. I can confirm my AC68 overclocked to 1200mhz can do 35mbit/s and only use partial CPU - about 50% each core. Certainly plenty more clock cycles available for faster openvpn. I don't know how much faster as 35mbit is my max. bulk cipher is AES-256-CBC. I haven't noticed a difference in speed comparing compression on or off. any loss in clock cycles may be made up by the slightly increased throughput due to the header compression.
 
I retested and here is the update.

For both WAN & LAN about 60Mbps. CPU load 40% for cpu0 85% for cpu1.

To recap my Openvpn server. AC56U overclocked to 1200/667. Bound to CPU1. Cipher BF-CBC. Added "fast-io" directive (I don't know if any measurable effect).

For ppl interested in ipv6 inside Openvpn tunnel, I found it's only 6% reduction in throughput. Not bad at all actually.

Indeed Openvpn cannot max out CPU though close. Could it be less optimised for ARM? Or a complication with the Asuswrt platform? I don't know but worth looking into on another day.

Until then it's enough for me. The throughput is more than what I need :cool:

EDIT:

I felt I still owe the thread a few clarifications. The above re-test figures are taken using iperf in the following way:
  • client(LAN) <> Openvpn Server (AC56U) <> server (LAN)
  • client(WAN) <> Openvpn Server (AC56U) <> server (LAN)
The figures on the other day (8Mbps/13Mbps vs 38Mbps/16Mbps) were taken using speedtest client like this:
  • client (WAN) <> Openvpn Server (AC56U) <> Speedtest.net Server (in local Internet Exchange)
The speedtest scenario indicates asuswrt/openvpn server has huge potential for improvement. Also with HW NAT disabled, throughput increase is very obvious.
 
Last edited:
Vpn provider: ipvanish


Connection method: hardwired (gateway)


Speed without vpn: 170 down, 25 up


Speed with: 35 down (avg), 10 up




I keep getting randomly disconnected in games of Madden. Is there anything here that would indicate a drop and why?


Also how do I change the cipher value to a lower bit? I'd rather have 128 being that it won't hog as much bandwidth.


Thanks!


Aug 17 03:12:57 openvpn[4146]: /usr/sbin/ip route add 0.0.0.0/1 via 172.20.16.1


Aug 17 03:12:57 openvpn[4146]: /usr/sbin/ip route add 128.0.0.0/1 via 172.20.16.1


Aug 17 03:12:57 openvpn-routing: Skipping, client 1 not in routing policy mode


Aug 17 03:12:57 openvpn[4146]: Initialization Sequence Completed


Aug 17 03:30:27 dnsmasq-dhcp[4193]: DHCPDISCOVER(br0) 192.168.1.50 00:d9:d1:d7:d0:bf


Aug 17 03:30:27 dnsmasq-dhcp[4193]: DHCPOFFER(br0) 192.168.1.50 00:d9:d1:d7:d0:bf


Aug 17 03:30:27 dnsmasq-dhcp[4193]: DHCPREQUEST(br0) 192.168.1.50 00:d9:d1:d7:d0:bf


Aug 17 03:30:27 dnsmasq-dhcp[4193]: DHCPACK(br0) 192.168.1.50 00:d9:d1:d7:d0:bf


Aug 17 04:12:53 openvpn[4146]: TLS: soft reset sec=0 bytes=409759404/0 pkts=545163/0


Aug 17 04:12:53 openvpn[4146]: VERIFY OK: depth=1, /C=US/ST=FL/L=Winter_Park/O=IPVanish/OU=IPVanish_VPN/CN=IPVanish_CA/emailAddress=support@ipvanish.com


Aug 17 04:12:53 openvpn[4146]: VERIFY X509NAME OK: /C=US/ST=FL/L=Winter_Park/O=IPVanish/OU=IPVanish_VPN/CN=sea-a05.ipvanish.com/emailAddress=support@ipvanish.com


Aug 17 04:12:53 openvpn[4146]: VERIFY OK: depth=0, /C=US/ST=FL/L=Winter_Park/O=IPVanish/OU=IPVanish_VPN/CN=sea-a05.ipvanish.com/emailAddress=support@ipvanish.com


Aug 17 04:12:53 openvpn[4146]: Data Channel Encrypt: Cipher 'AES-256-CBC' initialized with 256 bit key


Aug 17 04:12:53 openvpn[4146]: Data Channel Encrypt: Using 256 bit message hash 'SHA256' for HMAC authentication


Aug 17 04:12:53 openvpn[4146]: Data Channel Decrypt: Cipher 'AES-256-CBC' initialized with 256 bit key


Aug 17 04:12:53 openvpn[4146]: Data Channel Decrypt: Using 256 bit message hash 'SHA256' for HMAC authentication


Aug 17 04:12:53 openvpn[4146]: Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA
 
Last edited:
I can't resist the temptation to hit physical limits...so here we go. I did another round of tests.

Client (WAN) <> OpenVPN Server (AC56U) <> Server (iperf LAN/speedtest WAN)
AC56U fw 378.55 overlocked to 1200,667
OpenVPN Server 2.3.7 (bound to CPU1)

Code:
no compression / no cipher / no multihome / no HW NAT:
    iperf 88.9Mbps Openvpn 47% (user), CPU1 load 94% (user+kernel)
    speedtest 17.1Mbps/16.9Mbps

no compression / DES-CBC / no multihome / no HW NAT:
    iperf 54.3Mbps Openvpn 47% (user), CPU1 load 94% (user+kernel)
    speedtest 7.2Mbps/11.2Mbps

no compression / BF-CBC / no multihome / no HW NAT:
    iperf 67.6Mbps Openvpn 49% (user), CPU1 load 95% (user+kernel)
    speedtest 19.7Mbps/15.0Mbps

no compression / AES-128-CBC / no multihome / no HW NAT:
    iperf 72.4Mbps Openvpn 49% (user), CPU1 load 95% (user+kernel)
    speedtest 28.9Mbps/20.5Mbps

A main difference from last test is that iperf server is moved onto AC56U, and I bind it to CPU0. A few observations:
  • A big surprise to me: the OpenVPN guys did a damn good job at optimising AES cipher for ARM! Forget about the rest use it if you're on ARM (regardless client or server).
  • 95% utilization of CPU1 is about the max I could push.
  • The speedtest dot net scenario is much to be desired in comparison. Huge potential for optimisation on WAN-WAN routing performance in OpenVPN+Asuswrt. Waiting for heroes to enlighten the dark world.
I'm simply amazed at the BCM4708. The little beast can deliver lot more than people think. Once OpenVPN is multithread multicore..I couldn't stop imaging further boost on this good old (by then) AC56U.
 
nice benchmark. Speedtest dot net just isnt s good benchmark tool. You have no idea if the server is overloaded, how many network hops, etc. Its good for a "finger in the air" gauge is all.

thx for the work!
 

Similar threads

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!

Staff online

Top