What's new

Wireguard VPN Client: killswitch activation -> LAN administration lock-out

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

GoatInTheMachine

New Around Here
Hi all - title pretty much says it all, but more details are below.

I've read through several discussions (example) regarding killswitch changes/issues. FWIW, it appears to me that most have concerned OpenVPN instead of Wireguard connections. AFAIK, none cites issues with NTP sync on boot as a possible cause.

Below I've included a few things that I've tried to fix this. Since the failure mode includes admin lock-out and necessitates a router reset, I've had a difficult time gaining transparency into the failure (i.e. accessing logs) and iterating through potential solutions. (Perhaps I could log to a USB device?) Currently my solution is to leave the killswitch disabled.

It's very possible that I'm missing something obvious and/or have a fundamental misunderstanding. I'm here to learn, so thanks in advance for sharing your thoughts!



Config ---------------------------------

RT-AX86U
3004.388.8_2


VPN - WireGuard Client (WGC1 - commercial VPN):
Peer -> Allowed IPs0.0.0.0/0,::0/0

VPN Director rule:
Local IPRemote IPIface
0.0.0.0/0-WGC1


Observations ---------------------------------

On reboot (killswitch: No):
  • Nominal behavior (devices all have internet access via WGC1)
  • Log entries show very inaccurate time until NTP sync

On reboot (killswitch: Yes):
  • LAN devices have no internet access [WGC1 assumed down]
    Possible explanation:
    - internet access is blocked by killswitch until WGC1 initialization
    - WGC1 initialization requires accurate time
    - time is inaccurate until NTP sync
    - NTP sync requires internet access
    - [loop]
    Fixes tried:
    - Add static NTP server IP (tried local and public) to the WebUI NTP sync config; then, add VPN Director rule binding traffic from router IP to NTP server IP to the WAN Iface
  • LAN devices can communicate (ping, http) with peer LAN devices
  • LAN devices cannot access router admin interfaces [http(s), ssh, ping] via router LAN IP
    Possible explanation:
    - Unclear, but killswitch seems to block access to any resource beyond router LAN IP
    Fixes tried:
    -
    Add explicit VPN Director rules binding traffic between a device LAN IP and router LAN IP to the WAN Iface


Questions ---------------------------------

  • Is there a way to recover access to an admin interface without resetting the router? (It's a pain to test configs when the router needs a full reset each time.)
  • Is a failed NTP sync actually causing issues, or is it a red-herring?
    • I'm assuming that the killswitch is activated immediately upon clicking "Apply" (versus upon reboot). Is that correct?
  • A similar thread suggests adding a VPN Director rule binding all traffic from router LAN IP to the WANIface.
    • This rule seems like a more general version of the attempted fix above which binds "traffic from router LAN IP to NTP server IP to the WAN Iface", so it's unclear it would help.
    • As I understand it, this rule would bind all traffic from the router to the WAN Iface (instead of to WGC1), so it would prevent all router applications (e.g. AMTM AdGuard Home) from making requests through WGC1. This behavior seems undesirable to me, although I could be convinced otherwise.
  • What else might I try? I suspect more specific Allowed IPs or VPN Director rules?
 
Last edited:
Things are not always what they seem. VPN Director rules WAN or WGCx does not explicitly mean what you think. Routing, by means of route tables, normally uses main route table which is mostly maintained by the kernel. it contains routes to every interface and network the router is aware of, as well as some added by firmware (like WAN dns to WAN interface). entries in routing table are strictly destination based (i.e. destinations 192.168.50.x route to br0). there are always a "default route" which is of lowest priority, so if no specific routes are found, this is where data is sent, for unknown destinations.
When adding an internet VPN client there are suddenly 2 routes to the same destinations (internet destinations). one way to handle this is to create a separate routing table, copy some of the main routing table entries and override the default route with VPN. Periodical updates of policy route tables I believe is a quite recent addition to the fw. Wheither communication should be routed using main route table (WAN) or policy route table (VPN) is controlled via rules in VPN Director.

this means that these policy routing table are not as updated as the main routing table and also it does not always contain all information. until quite recently routes to WG server was not part of policy route tables, which means when lan is set to use VPN they cannot communicate with WG server. its tricky to get the right things in there, for all various configuration possibilities.

choosing Interface: WGC1 in VPN director will not force the usage of VPN, it will merely point to use the policy route table instead. if there is a specific route there pointing this destination over WAN, it will still be over WAN.

The router has several interfaces, WAN, LAN (br0), Guest Wifi, VPN, et.c.. when a local router process wishes to send data, which source address will it use? if nothing special is configured it will use a quick route lookup to figure out which interface this is likely to go out and use that interface address as the source address (usually main route table default address for unknown destinations). this means that even if you setup 192.168.50.1 to WGC1 most router communication will still be over WAN, as the router is primarily using this address when communicating with LAN, not when communicating with various internet servers. one exception to this is the router GUI, which is configured to always use LAN IP - one of the reasons for this issue.

Using 0.0.0.0/0 to WGC1 will indeed send all routed data to policy table, which is a tricky business as this table is neither complete, nor always up to date. especially since there are some very specific router functions that is critical that they have internet before things start to work properly (i.e. NTP)

I would advice you to think about what router functions you actually do need to use VPN, and which do not and don't send ALL data to policy route table.

one method I have successfully used before is to tackle this differently. as we know router would normally only use LAN ip 192.168.50.1 when it is communicating with LAN, so we could formulate two rules:

#Rule 1
Local IP: 192.168.50.0/24
Remote IP: leave blank
IFace: WGC1
Comment: Lan to VPN

#Rule 2
Local IP: leave blank
Remote IP: 192.168.50.0/24
IFace: WAN
Comment: Local to Main

this would still send source ip 192.168.50.1 to policy table (WGC1) unless the destination is 192.168.50.x, then it is sent to main route table (WAN). Now, typical router communication will still be over WAN but this means that you could create configuration rules for certain applications (Like AGH, Unbound, Transmission et.c.) and force these to bind to 192.168.50.1 and thus use VPN. This should also make sure you will not loose router GUI connection when the killswitch disables access to policy routing tables.
Here are some inspiration for how to do this with Unbound/Transmission when using WGM addon https://github.com/ZebMcKayhan/Wire...p-transmission-andor-unbound-to-use-wg-client Ill guess there are similar setting for AGH...

/Zeb
 
Last edited:
I think at least part of the problem here is the 0.0.0.0/0 rule w/ the VPN Director.

With this rule in place, it is effectively the same as NOT using the VPN Director at all, except now the router itself is denied access to the WAN due to the kill switch, which it needs to reestablish the VPN!!!

You can sort of get away w/ it on the initial connection, before any kill switch is active, since the router will typically create a static route to the remote server over the WAN once the VPN is established. But the problems come later should the VPN fail requiring a restart. Domain name resolution might fail (should you be using a domain name for the server) if the DNS server(s) remain bound to the VPN. Or now w/ the kill switch being persistent w/ the VPNs, you can't get connected on a reboot.

As a general rule, it's best to NOT have the router (or more specifically, the VPN client) itself participate in the VPN if the intent is to support the VPN for the benefit of other devices. You need to make sure the router (VPN client) can *manage* the VPN at all times. But if you make it an active participant, you create these kinds of problems.
 

Part 1/3​

Thank you @ZebMcKayhan and @eibgrad for your thoughtful replies! I think they sent me down a productive path.

I started by implementing two slightly modified versions of the 2 rules suggested by @ZebMcKayhan:

VPN Director rules 1,2
Desc: LAN to LAN
Local IP: 192.168.a.0/24 <-- primary subnet (vs 192.168.b.0/24 for guest)
Remote IP: 192.168.a.0/24
IFace: WAN

ensures killswitch activation does not interfere with LAN to LAN communication

Desc: LAN to VPN
Local IP: 192.168.0.0/16
Remote IP: -
IFace: WGC1

routes all router LAN clients (on primary or guest subnets) to the VPN
- as mentioned, the router's own traffic is exempted from this rule
- it's unclear to me how this rule fails to capture router's LAN IP of
192.168.a.1 (presumably, there is another routing rule/table with precedence not shown in the WebUI)

Empirically, these rules correctly route all client traffic to the VPN (with the exception of the router's own traffic - more on that later); with the killswitch active, they also maintain uninterrupted access to the router's admin interfaces when a) WGC1 deactivates and when b) the router restarts. However, as Zeb stated, the WebUI abstraction provides an incomplete picture of the underlying routing logic, which makes it difficult to predict the consequences of changes to the VPN Director and/or WG Client. To gain insight into how these WebUI configuration changes effect their routing behavior, I spent some time in the router's ssh interface. Specifically, Zeb's discussion of policy/kernel routing tables led me to probe routing rules + tables with the ip command suite.

At the risk of providing excessive detail for something that others consider trivial, I'll include editorialized ip commands and responses below - to the extent my comments are correct, I know I would have found a resource like this helpful.
 

Part 2/3​

In general, my experimental loop was to:
- perturb settings within the WebUI (VPN Director rules and/or WG Client configuration)
- inspect routing rules + tables with the following ip commands; note any changes

To see the router's policy routing rules, I used the following command:

ip -c rule
-c: format response with color
Rich (BB code):
0: from all lookup local
10010: from 192.168.a.a1 lookup main
10011: from 192.168.a.a2 lookup main
10012: from 192.168.a.a3 lookup main
10013: from 192.168.a.a4 lookup main
10014: from 192.168.b.b1 lookup main
10015: from 192.168.a.0/24 to 192.168.a.0/24 lookup main
11210: from 192.168.0.0/16 lookup wgc1
12215: from 192.168.0.0/16 prohibit
32766: from all lookup main
32767: from all lookup default

This list of policy routing rules almost exactly mirrors the one in VPN Director (note: I also implemented 5 VPN Director rules to bypass specific clients from WGC1). Given a routing task with a source + destination IP, my understanding is that the router considers these rules top-to-bottom (i.e. high-to-low priority rank) for IP matches before dispatching to a routing table (i.e.local, main, wgc1, or default).

Notably, the rule with priority rank 12215, which prohibits traffic originating from the LAN, is only present in this list when the killswitch is enabled. Effectively, it appears this prohibit rule is the killswitch - since the preceding rule routes the same source IP range to wgc1, the router will only consider the killswitch rule if the preceding rule is removed (i.e. if WGC1 is down). To me it appears that the killswitch rule's IP range is the most general (possibly the union) of Local IPs bound to theWGC1 Iface in the VPN Director. Notably, the killswitch rule's IP range generally does not consider Remote IPs (more on that later).

(Aside - the killswitch's behavior and high-level mechanism are, of course, detailed in the 3004.388.8_2 changelog and other wizard-grams [1, 2]).

The main and wgc1 routing tables themselves can be printed with the following commands:

ip -d -c route (list table main)
-d: more detail
-c: format response with color
similar to netstat -r and route
Rich (BB code):
     default via WAN-gateway  dev eth0  proto boot    scope global
    WAN-dns1 via WAN-gateway  dev eth0  proto boot    scope global  metric 1
    WAN-dns2 via WAN-gateway  dev eth0  proto boot    scope global  metric 1
                  WAN-subnet  dev eth0  proto kernel  scope link    src WAN-router
                 WAN-gateway  dev eth0  proto kernel  scope link
                    WGC1-dns  dev wgc1  proto boot    scope link
                 127.0.0.0/8  dev lo    proto boot    scope link
              192.168.a.0/24  dev br0   proto kernel  scope link    src 192.168.a.1
              192.168.b.0/24  dev br1   proto kernel  scope link    src 192.168.b.1

ip -d -c route list table wgc1:
-d: more detail
-c: format response with color
Rich (BB code):
                   0.0.0.0/1  dev wgc1  proto boot    scope link
     default via WAN-gateway  dev eth0  proto boot    scope global
    WAN-dns1 via WAN-gateway  dev eth0  proto boot    scope global  metric 1
    WAN-dns2 via WAN-gateway  dev eth0  proto boot    scope global  metric 1
                  WAN-subnet  dev eth0  proto kernel  scope link    src WAN-router
                 WAN-gateway  dev eth0  proto kernel  scope link
                 127.0.0.0/8  dev lo    proto boot    scope link
                 128.0.0.0/1  dev wgc1  proto boot    scope link
VPN-endpoint via WAN-gateway  dev eth0  proto boot    scope global
              192.168.a.0/24  dev br0   proto kernel  scope link    src 192.168.a.1
              192.168.b.0/24  dev br1   proto kernel  scope link    src 192.168.b.1

A seemingly complete routing table (not shown) can be retrieved via:
ip -d -c route list table all

In general, changes I made to the VPN Director + WG Client seemed to be directly reflected in the routing rules but introduced minimal changes to the the routing tables themselves.

Along with traceroute, I found the following command useful for testing how the router will direct traffic from the router's source IP to a given destination IP:
ip -c route get <dest-ip>

I'd love to know if there's a similar way to query the router rules/tables with arbitrary combinations of source + destination IPs.
Update - @ZebMcKayhan suggested appending the option [ from ADDRESS iif STRING ]:
ip route get <dest-ip>from <src-ip> iif br0
iif br0: incoming interface bridge0
 
Last edited:

Part 3/3​

As I've alluded, I'm interested in directing (at least some) router-generated traffic over the VPN. However, as I've experienced and as @eibgrad highlighted, care needs to be taken so that if WGC1 is deactivated (for whatever reason, be it manually or on reboot):
- the killswitch does not block LAN devices from router admin interfaces
- the killswitch does not create a deadlock between critical resources (e.g. NTP, DNS) and the VPN connection

Short of self-hosting my own NTP and DNS (and likely other infrastructure I am overlooking), I think a reasonable compromise is to direct router traffic through WGC1 while exclusively excluding it from the killswitch rule.

By this logic, I can add a third VPN Director rule:

VPN Director rule 3
Desc: ROUTER to LAN
Local IP: -
Remote IP: 0.0.0.0/0
IFace: WGC1

directs router traffic through WGC1 (in combination with above 2 VPN Director rules)
- can be modified to be more specific than 0.0.0.0/0 (e.g. by specifying a particular DNS server)

Since this rule does not specify a Local IP, we can (correctly) predict that this IP range will not be included in the killswitch rule:

ip -c rule
Rich (BB code):
0: from all lookup local
10010: from 192.168.a.a1 lookup main
10011: from 192.168.a.a2 lookup main
10012: from 192.168.a.a3 lookup main
10013: from 192.168.a.a4 lookup main
10014: from 192.168.b.b1 lookup main
10015: from 192.168.a.0/24 to 192.168.a.0/24 lookup main
11210: from 192.168.0.0/16 lookup wgc1
11211: from all lookup wgc1
12215: from 192.168.0.0/16 prohibit
32766: from all lookup main
32767: from all lookup default

The resulting rule list is identical, save for the addition of the rule with priority rank 11211. In fact, were I to directly edit the routing policy rules (or if the WebUI provided more fine-grained access), I believe I could achieve the same functional result by modifying the rule with priority 11210 to replace 192.168.0.0/16 with all.

Regardless, with this rule in place, I can empirically verify with traceroute that the router directs its own traffic to WGC1 when the interface is up and to the WAN when the interface is down; nonetheless, upon manually deactivating WGC1 or rebooting the router, I can still access the router's admin interfaces with a LAN device.

I've implemented the above 3 VPN director rules and am yet to encounter any issues. As always though, I'd be happy to hear further thoughts, comments, and concerns - perhaps my newfound sense of clarity is ill-gotten =) Thanks again!



Remaining Questions:
  • How exactly does the router evade capture by a routing rule with an IP range of 192.168.0.0/16?
  • I'd love to know if there's a way to query router rules/tables with arbitrary combinations of source + destination IPs
    • ip route get <dest-ip> from <src-ip> iif br0 (courtesy @ZebMcKayhan)
  • How does Peer - Allowed IPs interact with the routing tables? (I assume intersection?)

TL;DR - VPN Director rules:
DescriptionLocal IPRemote IPIface
LAN to LAN192.168.a.0/24192.168.a.0/24WAN
LAN to VPN192.168.0.0/16WGC1
ROUTER to VPN0.0.0.0/0WGC1
where 192.168.a.0/24 is your primary LAN subnet
 
Last edited:
Sounds like you had a productive couple of days! Thanks for the write-up.

How exactly does the router evade capture by a routing rule with an IP range of 192.168.0.0/16? (I've seen threads suggesting this is an intentional choice, but the mechanism is still unclear to me).
Try searching the web about which source address a new packet get on a device with multiple interfaces. Router is not bound to lan in that sense (except gui). Router will not use this address as source address so your rule will not apply to it.
Edit: I.e http://linux-ip.net/html/routing-saddr-selection.html


I'd love to know if there's a way to query router rules/tables with arbitrary combinations of source + destination IPs
Try:
Code:
ip route get 1.1.1.1 from 192.168.128.10 iif br0
1.1.1.1 from 192.168.128.10 dev wgc1 table wgc1
    cache iif br0


How does Peer - Allowed IPs interact with the routing tables? (I assume intersection?)
AllowedIP will become the route in table wgcX. 0.0.0.0/0 as allowedip will become a default route for this table.
It is also used inside Wireguard for routing.
 
Last edited:

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top