Watchdog crash problem

DocUmibozu · Feb 7, 2020

Hello,
I'm experiencing a strange problem in the last few days.
Every 2-3 days the router web interface and ssh shell hang up.
I mean that logging in in the web interface is impossibile. Logging in via ssh is possibile, but if I try to issue any command it hangs-up.
Only solution is to turn off the router manually, using the switch.
I've inspected the log and I can see that it's everything normal until the log starts to populate with this error:

Feb 7 15:17:38 RT-AC66U_B1 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
Feb 7 15:17:38 RT-AC66U_B1 rc_service: check_watchdog 234:notify_rc restart_watchdog

this is the only log message before I switch the router off, and it's repeated every minute.
Any idea?
Thanks to all

dave14305 · Feb 7, 2020

Which firmware version?

DocUmibozu · Feb 7, 2020

dave14305 said:
Which firmware version?

384.14_2, the last stable one

L&LD · Feb 7, 2020

Was this router ever fully reset to factory defaults after flashing the RMerlin firmware? Without using a saved backup config file to configure it?

If not, please see the link in my signature below to get your router back to a good/known configuration.

I have installed a few RT-AC66U_B1 routers for customers and I haven't seen this issue with any of them.

DocUmibozu · Feb 8, 2020

Yes, flash erase after 384.14 upgrade

DocUmibozu · Feb 8, 2020

Update:

the complete log of the error is this:

Feb 7 16:13:38 RT-AC66U_B1 custom_script: Running /jffs/scripts/service-event (args: restart watchdog)
Feb 7 16:21:38 RT-AC66U_B1 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
Feb 7 16:21:38 RT-AC66U_B1 rc_service: check_watchdog 234:notify_rc restart_watchdog

So it seems that the crash is started from /jffs/script/service-event.
In this script there's one call to uiscribe and one call to Skynet.

dave14305 · Feb 8, 2020

DocUmibozu said:
Update:

the complete log of the error is this:

Feb 7 16:13:38 RT-AC66U_B1 custom_script: Running /jffs/scripts/service-event (args: restart watchdog)
Feb 7 16:21:38 RT-AC66U_B1 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
Feb 7 16:21:38 RT-AC66U_B1 rc_service: check_watchdog 234:notify_rc restart_watchdog

So it seems that the crash is started from /jffs/script/service-event.
In this script there's one call to uiscribe and one call to Skynet.

No, the service-event is the last step from the previous restart. It runs when any service is restarted/started/stopped.

DocUmibozu · Feb 8, 2020

Yes, you are right, the sequence seems to be:

something goes wrong;
watchdog comes into play and starts service-event;
the loop continues because the problem in point n.1 isn't solved.

I'll do some more investigation.
Thanks to you all, I'll be in touch

saccleo · Feb 26, 2020

DocUmibozu said:
Yes, you are right, the sequence seems to be:

something goes wrong;

watchdog comes into play and starts service-event;

the loop continues because the problem in point n.1 isn't solved.

I'll do some more investigation.
Thanks to you all, I'll be in touch

Is there any method to solve the problem? i have met the same problem.
same router model and same error log, for every about 3 days webui and ssl can not be accessed.
i have upgrade to the newest firmware and start from blank flash.

DocUmibozu · Feb 26, 2020

Hi,
go to administration/system. Then under network monitoring select ping instead of dns query.
For me this solved the problem. Before the uptime was 4-6 days, now my router is running for 14 days and the glitch hasn't showed again.

ech · Feb 26, 2020

I had this happen yesterday as well - loads of the watchdog restarts getting logged. I was able to run some commands though, and see that any attempt to access /jffs was hanging... so there were a lot of "cp .../tmp/syslog... /jffs" processes running and hung.

reboot didn't work - or not cleanly, and I had to power-cycle to get the device (RT-AC68U running 384.15) running again. It had been up about 10 days.

Also, I don't have either ping or dns network monitoring enabled.

DocUmibozu · Feb 26, 2020

That seems a jffs corruption problem.
Have you tried to format again jffs and start configuration again?

saccleo · Feb 26, 2020

DocUmibozu said:
Hi,
go to administration/system. Then under network monitoring select ping instead of dns query.
For me this solved the problem. Before the uptime was 4-6 days, now my router is running for 14 days and the glitch hasn't showed again.

i got some help from offcial asus support which need to some change for wireless setting for 2.4g and 5g, use n mode instead of auto or legacy.

DocUmibozu · Feb 26, 2020

saccleo said:
i got some help from offcial asus support which need to some change for wireless setting for 2.4g and 5g, use n mode instead of auto or legacy.

Consider that my router has wifi disabled.... I don't think Asus support gave you a correct answer....

saccleo · Feb 26, 2020

DocUmibozu said:
Consider that my router has wifi disabled.... I don't think Asus support gave you a correct answer....

maybe, i have check my config, ping and dns query under adminstration/system are both uncheck.
Because when it happened i could not login via ssh, no more informations could be found.

DocUmibozu · Feb 26, 2020

saccleo said:
Because when it happened i could not login via ssh, no more informations could be found.

Yes, neither could I...

saccleo · Mar 1, 2020

ech said:
I had this happen yesterday as well - loads of the watchdog restarts getting logged. I was able to run some commands though, and see that any attempt to access /jffs was hanging... so there were a lot of "cp .../tmp/syslog... /jffs" processes running and hung.

reboot didn't work - or not cleanly, and I had to power-cycle to get the device (RT-AC68U running 384.15) running again. It had been up about 10 days.

Also, I don't have either ping or dns network monitoring enabled.

which command that you used to find the crash reason?
ps, or top?

ech · Mar 1, 2020

saccleo said:
which command that you used to find the crash reason?
ps, or top?

Didn't see a crash - just a hang. And "ps" showed a huge number of those "cp .../tmp/syslog... /jffs" commands running.

I've reformatted /jffs (backed up jffs, selected the reformat on next reboot option, rebooted, then restored /jffs, then rebooted again) and haven't had the problem since... though I had only seen this on this one occasion as well, so will have to see if it happens again or not.

bengalih · Mar 5, 2020

I believe I'm seeing the same problem here. On a RT-AC68U running 384.15.
This morning I was able to log into SSH, but after getting the MOTD/banner I did not get a command prompt.
I was also unable to get the WebUI login.

I downloaded my syslog via plink (that worked) and apart from a bunch of kernel block messages from skynet and what look like normal dropbear logins I have only the following:

Code:

    Line 332: Mar  5 07:53:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 332: Mar  5 07:53:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 333: Mar  5 07:53:07 rc_service: check_watchdog 308:notify_rc restart_watchdog
    Line 848: Mar  5 08:00:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 848: Mar  5 08:00:07 check_watchdog: [check_watchdog] restart watchdog for no heartbeat
    Line 849: Mar  5 08:00:07 rc_service: check_watchdog 308:notify_rc restart_watchdog

My first login attempt this morning was 7:58, so one of those watchdog checks is prior to my login and one is after. It's worth noting that the syslog only contains data from 7:49 am, I'm not sure if this is due to the number of logged entries from skynet/kernel (total log is about 1100 lines).

I issued a service restart_httpd via plink and immediately the web interface login became available. I logged on to the webUI, although I can't say that I actually got the interface because as I waited for it to load I toggled away and also issued a service restart_sshd. When I did that I lost my putty connection (the one where I didn't get the command prompt), and it also appeared that I lost my webui. After that I can no longer access the webUI nor SSH at all, so the restart seemed to not restart but totally kill both services.

My device was fully reset (100% jffs reformated, nvram cleared, new firmware flashed, etc) last week when I upgraded to .15 so I can't see how that is related. I have been running for years on prior versions and never experienced this before.

That being said, on my latest build I did allow AMTM to do all my commands regarding usb drive formatting and swapfiles, etc where in the past I had done those myself. I also haven't been running Skynet very long, I ran it for maybe a week prior to my latest upgrade and fresh format.
Apart from that and some other minor customization I can't think of anything intrusive enough that should be causing this.

bengalih · Mar 6, 2020

As an additional follow up I rebooted my router today after several days of it being in the above state.
While my WiFi and Internet had continued to work in the state it was in, after reboot I see that most other functions failed to operate.
For instance, no traffic was captured for use Traffic Analyzer.
My schedule cru/cron file backup job had not run.

So far this was a one time deal, if it continues to happen I will begin rollback to some of my configuration settings that I had prior to my last firmware update and full router reset. Specifically I will reformat using EXT3 instead of EXT4, set my swap to 4GB instead of 2GB and eliminate Skynet. While most of these should not have bearing on this issue, those are really the only new variables since my rebuild apart from the actual firmware itself.

Thread starter	Title	Forum	Replies	Date
	Dnsmasq crashes, watchdog fails to restart it	Asuswrt-Merlin	47	Feb 25, 2024
	ddns watchdog restarting every 30 seconds	Asuswrt-Merlin	9	Dec 6, 2023
	GT-BE98 Pro wifi crash and ethernet too	Asuswrt-Merlin	1	Sep 28, 2024
W	RT-AX86U with latest version 3004.388.8_2 always crash and reboot	Asuswrt-Merlin	35	Aug 25, 2024
H	Astrill router applet problem.	Asuswrt-Merlin	0	Oct 30, 2024
C	OVPN server setup problem	Asuswrt-Merlin	7	Oct 12, 2024
	RT-AX88U Pro - Merlin Firmware:3004.388.8_2 - WiFi connection problem	Asuswrt-Merlin	11	Oct 3, 2024
G	RT-AC87U wifi flaky connections and network problem on reboot.	Asuswrt-Merlin	3	Sep 27, 2024
C	Weird client VPN problem that just started !	Asuswrt-Merlin	53	Sep 4, 2024
E	RT AX88U problem with update 3004 388-8-1 and 2: loss of internet access	Asuswrt-Merlin	8	Aug 28, 2024

Watchdog crash problem

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Regular Contributor

Part of the Furniture

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Senior Member

Senior Member

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest