This is a lengthy post so I can tell you all of the steps I've taken so far to try and pin this down. I own one RBK50 in AP mode and two RBS50 satellites, one with ethernet backhaul and the other 5G backhaul. I have been experiencing usually at least one outage every 24h, time of day is random. It will start with devices still being connected but without internet access and if I restart the wifi/ethernet adapter on any device it will reconnect but not receive IP configuration from DHCP. Hostnames outside or on the local network can't be resolved anymore, but I can ping and access local devices using their IP (if they still have one). I can grab my laptop, assign a static IP and DNS server(either wifi or ethernet) and it will actually work. Meanwhile the satellites usually show a purple ring together with the rbk50.
My DHCP and DNS server is an instance of Pihole running in a Debian jail on my Freenas server. I thought it was this at first but I've had Pihole running on another server and on a raspberrypi for years without issue until the Orbis were introduced.
What I did notice once an outage starts is that my freenas server which is connected to the RBK50 via ethernet starts spamming ethernet up/down messages in the logs, similar messages to what you get from constantly disconnecting/reconnecting the cable, or disabling/enabling the adapter and to show this isn't limited to WiFi.
The only way to fix it is to press the on/off button on the RBK50, sometimes 2-3 times in a row as it will come back up, lights go blue, internet works and then 30s later it dies again.
I've only recently discovered that the RBK50 has a debug menu. I've spent an hour last night going through the logs and found a couple of things that don't seem right, at least one of which happened during an outage. I had a browser tab with the rbk50 debug page and managed to grab the log files before the connection fully went.
I've had this issue since I think the official 2.6.1.40 firmware all the way to 2.7.2.102. For the past 2 or 3 months I've been using Voxels firmware, which unfortunately did not solve the problem, although it is better when it works.
wireless-log1.txt shows these messages, almost every minute. Now if I recall correctly the higher channels are for the hidden backhaul network right?
@ @ 23.12.11.351292 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.12.352566 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.14.158770 HYDR steermsg info : steermsgRxLoadBalancingComplete: Received load balancing complete from 9C:3D:CF:F8:52:CD, transaction ID [59] steering attempted [0] (mid [48469])
23.12.14.159385 HYDR bandmon info : bandmonMBSAHandleLoadBalancingCompleteEvent: 9C:3D:CF:F8:52:CD did not perform any active steering
23.12.14.493842 HYDR csh ERR : New shell session (3/5) using sd 40
23.12.14.653669 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: 8E:85:80:01:F7:56 activity status changes to INACTIVE APId 255 ChanId 8 ESSId 0
23.12.14.806459 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: 8E:85:80:01:F7:56 activity status changes to ACTIVE APId 255 ChanId 8 ESSId 0
23.12.15.354934 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.15.827807 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: B8:8A:EC:22:93:E5 activity status changes to ACTIVE APId 255 ChanId 40 ESSId 0
A more interesting error is one that happened at the time of the outage, again in wireless-log1.txt
23.17.15.999483 HYDR wlanManager ERR : wlanManager_isAP: ioctl() failed, ifName: eth1.
This gets repeated a LOT in that particular log file. Now I'm not sure how the router counts its ethernet ports. I can't assume eth1 is port1 as it might internally start with eth0. The only time I've seen ioctl messages is when I've had harddrives/usb sticks go bad and they stop responding. I really hope this isn't a hardware problem.
Googling bandmonMBSAHandleRawMediumUtilizationUpdateEvent gets me two results, both on Netgear forums but relating to the RBR40. However both of these posts go unaddressed.
Hope someone can be bothered to read it. It's becoming more of a hobby/obsession of mine to figure out what is going on. Over the months I've tried every combination of settings for channels, daisychaning, beamforming etc, the problem seems to be deeper than just a surface level configuration issue.
My DHCP and DNS server is an instance of Pihole running in a Debian jail on my Freenas server. I thought it was this at first but I've had Pihole running on another server and on a raspberrypi for years without issue until the Orbis were introduced.
What I did notice once an outage starts is that my freenas server which is connected to the RBK50 via ethernet starts spamming ethernet up/down messages in the logs, similar messages to what you get from constantly disconnecting/reconnecting the cable, or disabling/enabling the adapter and to show this isn't limited to WiFi.
The only way to fix it is to press the on/off button on the RBK50, sometimes 2-3 times in a row as it will come back up, lights go blue, internet works and then 30s later it dies again.
I've only recently discovered that the RBK50 has a debug menu. I've spent an hour last night going through the logs and found a couple of things that don't seem right, at least one of which happened during an outage. I had a browser tab with the rbk50 debug page and managed to grab the log files before the connection fully went.
I've had this issue since I think the official 2.6.1.40 firmware all the way to 2.7.2.102. For the past 2 or 3 months I've been using Voxels firmware, which unfortunately did not solve the problem, although it is better when it works.
wireless-log1.txt shows these messages, almost every minute. Now if I recall correctly the higher channels are for the hidden backhaul network right?
@ @ 23.12.11.351292 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.12.352566 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.14.158770 HYDR steermsg info : steermsgRxLoadBalancingComplete: Received load balancing complete from 9C:3D:CF:F8:52:CD, transaction ID [59] steering attempted [0] (mid [48469])
23.12.14.159385 HYDR bandmon info : bandmonMBSAHandleLoadBalancingCompleteEvent: 9C:3D:CF:F8:52:CD did not perform any active steering
23.12.14.493842 HYDR csh ERR : New shell session (3/5) using sd 40
23.12.14.653669 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: 8E:85:80:01:F7:56 activity status changes to INACTIVE APId 255 ChanId 8 ESSId 0
23.12.14.806459 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: 8E:85:80:01:F7:56 activity status changes to ACTIVE APId 255 ChanId 8 ESSId 0
23.12.15.354934 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.15.827807 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: B8:8A:EC:22:93:E5 activity status changes to ACTIVE APId 255 ChanId 40 ESSId 0
A more interesting error is one that happened at the time of the outage, again in wireless-log1.txt
23.17.15.999483 HYDR wlanManager ERR : wlanManager_isAP: ioctl() failed, ifName: eth1.
This gets repeated a LOT in that particular log file. Now I'm not sure how the router counts its ethernet ports. I can't assume eth1 is port1 as it might internally start with eth0. The only time I've seen ioctl messages is when I've had harddrives/usb sticks go bad and they stop responding. I really hope this isn't a hardware problem.
Googling bandmonMBSAHandleRawMediumUtilizationUpdateEvent gets me two results, both on Netgear forums but relating to the RBR40. However both of these posts go unaddressed.
Hope someone can be bothered to read it. It's becoming more of a hobby/obsession of mine to figure out what is going on. Over the months I've tried every combination of settings for channels, daisychaning, beamforming etc, the problem seems to be deeper than just a surface level configuration issue.