aex.perez
Senior Member
For the last few days my AX88u on 388.4 was acting weird and wierder every day. Changing the channels on 5Ghz WiFi often, way too often to be normal, 802.3d setting was disabling my NAS, 2.4Ghz devices were dropping of and no longer able to establish a connection. when it got worse I finally had to take action to "fix" things, thinking it would've been easy.
Probably would’ve been easier and faster to start from scratch, do a full reset, recover, reinstall of scripts, reboot and reestablish the mesh (ax86 node's) resetting the mesh nodes along the way. But I was curious mainly because I thought I had been hacked and was curious how with no services being exposed to the Internet it happened.
Then on further inspection scribe, Skynet, spdMerlin all stopped working, or would randomly stop performing their function. For example, Skynet stopped logging, even via the script's debug options. Scribe would report the configuration file was formatted wrong and failed to run. spdMerlin wouldn’t produce results when running a speedtest.
Then upon noticing the 5Ghz channel changing from the default I had configured and going into the configuration via the GUI. I could no longer pick the channel, just auto or a limited number of channels were available in the GUI, none that enabled 160Mhz bandwidth. Also certain channels numbers ending with an "I" did show up and allowed me to select them. Unless I picked 80Mhz then more channels would appear in the GUI but not all for the US region. Even 20/40/80/160Mhz didn't help.
Immediately going to the command line via ssh, I ran HTOP on the AX88 and AX86s nodes looking for any services running that shouldn’t be or abnormal CPU/RAM utilization. None were, all running were within my norms for my setup. Then I listed out NVRAM (nvram getall) on all devices making sure they were all set for the US. They were and even matched them against a copies I had stored previously, all good.
At which point I used BACKUPMON to run a restore for the router from a few weeks back. Uninstalled the scripts I referenced, then installed them fresh. Upon the scribe install none of the default filters came back/got installed, luckily the backup had a copy of them. Got those recovered eventually.
Did the restore, then rebooted a few times to get the scripts installed and the router and nodes up on the restored config. Changed the password on the router to be safe. Verified no open services exposed to the internet. Ran a GRCShieldsUp scan (amongst others) to verify nothing was open that should not have been. Also run a couple of malware tools on the PC, which also came up clean. Comparing results along the way to how things were before things started acting strange.
Once the router was booted and I was no longer getting errors about the NAS losing connectivity, devices on 2.4Ghz not dropping off (none showing as still connected but couldn’t be accessed, as before), 5ghz stable on the channel and 160MHZ bandwidth (32Hrs so far, where I was getting a channel change every 5min during this, and stable for months on end before that). The scripts all running, and the scribe filters restored. So far, the last 12 hours have been uneventful, so back to stability .
Some of this reminded me of the AiCloud issues recently reported but again, no open services on the router exposed to the internet, no changes in the amount of data uploaded/downloaded, nothing running that shouldn’t be on either the router or nodes, CPU utilization within my expected norms of Router and Nodes. Never had AiCloud (or anything else) running either. Region was still set to US. Also 388.4 installled on the router and nodes, and the Trendmicro signatures were current on the router and nodes.
Normally, at this point I’d be pointing the finger at hardware. But after restoring the configs, reinstalling the scripts from scratch, recovering the scribe filters, with the password change and reboots.
I’m left wondering if corrupted device RAM or the USB/SSDs could’ve caused these anomalies? But I would expect to still have problems if it was hardware related. As everything is back to the way it was and no longer seeing any anomalies I'm left wondering. Also note that the router is on a UPS and on filtered power in case the thought of a power spike might've had an influence.
Very happy that its all working again, nothing suspicious was found/identified running taking up CPU or abnormal uploads or bandwidth consumption during all of this or after. But still my doubts remain.
Nothing like looking a gift horse in the mouth as they say, but curiosity is eating away at me as to a possible cause.
Tried to be thorough,
So what else should I be/have been looking for a root cause?
Probably would’ve been easier and faster to start from scratch, do a full reset, recover, reinstall of scripts, reboot and reestablish the mesh (ax86 node's) resetting the mesh nodes along the way. But I was curious mainly because I thought I had been hacked and was curious how with no services being exposed to the Internet it happened.
Then on further inspection scribe, Skynet, spdMerlin all stopped working, or would randomly stop performing their function. For example, Skynet stopped logging, even via the script's debug options. Scribe would report the configuration file was formatted wrong and failed to run. spdMerlin wouldn’t produce results when running a speedtest.
Then upon noticing the 5Ghz channel changing from the default I had configured and going into the configuration via the GUI. I could no longer pick the channel, just auto or a limited number of channels were available in the GUI, none that enabled 160Mhz bandwidth. Also certain channels numbers ending with an "I" did show up and allowed me to select them. Unless I picked 80Mhz then more channels would appear in the GUI but not all for the US region. Even 20/40/80/160Mhz didn't help.
Immediately going to the command line via ssh, I ran HTOP on the AX88 and AX86s nodes looking for any services running that shouldn’t be or abnormal CPU/RAM utilization. None were, all running were within my norms for my setup. Then I listed out NVRAM (nvram getall) on all devices making sure they were all set for the US. They were and even matched them against a copies I had stored previously, all good.
At which point I used BACKUPMON to run a restore for the router from a few weeks back. Uninstalled the scripts I referenced, then installed them fresh. Upon the scribe install none of the default filters came back/got installed, luckily the backup had a copy of them. Got those recovered eventually.
Did the restore, then rebooted a few times to get the scripts installed and the router and nodes up on the restored config. Changed the password on the router to be safe. Verified no open services exposed to the internet. Ran a GRCShieldsUp scan (amongst others) to verify nothing was open that should not have been. Also run a couple of malware tools on the PC, which also came up clean. Comparing results along the way to how things were before things started acting strange.
Once the router was booted and I was no longer getting errors about the NAS losing connectivity, devices on 2.4Ghz not dropping off (none showing as still connected but couldn’t be accessed, as before), 5ghz stable on the channel and 160MHZ bandwidth (32Hrs so far, where I was getting a channel change every 5min during this, and stable for months on end before that). The scripts all running, and the scribe filters restored. So far, the last 12 hours have been uneventful, so back to stability .
Some of this reminded me of the AiCloud issues recently reported but again, no open services on the router exposed to the internet, no changes in the amount of data uploaded/downloaded, nothing running that shouldn’t be on either the router or nodes, CPU utilization within my expected norms of Router and Nodes. Never had AiCloud (or anything else) running either. Region was still set to US. Also 388.4 installled on the router and nodes, and the Trendmicro signatures were current on the router and nodes.
Normally, at this point I’d be pointing the finger at hardware. But after restoring the configs, reinstalling the scripts from scratch, recovering the scribe filters, with the password change and reboots.
I’m left wondering if corrupted device RAM or the USB/SSDs could’ve caused these anomalies? But I would expect to still have problems if it was hardware related. As everything is back to the way it was and no longer seeing any anomalies I'm left wondering. Also note that the router is on a UPS and on filtered power in case the thought of a power spike might've had an influence.
Very happy that its all working again, nothing suspicious was found/identified running taking up CPU or abnormal uploads or bandwidth consumption during all of this or after. But still my doubts remain.
Nothing like looking a gift horse in the mouth as they say, but curiosity is eating away at me as to a possible cause.
Tried to be thorough,
So what else should I be/have been looking for a root cause?