What's new

Scribe syslog-ng intermittent failure to shutdown which messes all logging

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Tarek Yag

Occasional Visitor
Hi!

Many thanks to @cmkelley for this great addon that I heavily rely on!

I've been diagnosing this issue for a while now trying to troubleshoot as much as I can to provide as much details as possible. I've seen similar issues in other threads that I could relate to, but I need some more assistance to decide what is the best fix for my issue. It might get useful again for the whole community if it turns out that we need a global config check built into the original script.

I had my DSL-AC68U previously running on firmware 386.12_0-gnuton0_beta1 (latest beta back then) setup with the latest scribe 3.2.0 and syslog-ng 4.7.1, and it was running smoothly without any issues at all.
A couple of months ago, I setup my DSL-AC68U again from scratch but this time with firmware 386.12_0-gnuton1 (latest release right now) with a slightly different setup. Additionally, I added my RT-AC68U again to my home network, setup with the latest official Merlin 386.14. Both my routers right now have the latest version of all addons, scripts, and packages (amtm 4.9.2, scribe 3.2.0, syslog-ng 4.7.1), and they both have the exact same issue!

Now, straight to the point. At the beginning, I noticed some intermittent loss of logging, and as I have my own reboot and shutdown scripts that does my own stuff, I noticed that a lot of times syslog-ng fails to shutdown properly as a service using command "/opt/etc/init.d/S01syslog-ng stop", which might very likely mess logging on the next boot up, so I have to make another reboot to sort things out again!

So, I started by trying to restart the service manually outside my own scripts using the command "/opt/etc/init.d/S01syslog-ng restart", and the same issue happens, syslog-ng frequently fails to shutdown, then it keeps consuming 49.9% of the router's CPU until I kill it using the service kill command "/opt/etc/init.d/S01syslog-ng kill". Again, it messes logging most of the times and I have to reboot the router.

Next, I uninstalled scribe completely, then installed it again as normal, knowing that all scribe status checks are ok before and after the re-installation process, but the problem persisted with syslog-ng frequently not shutting down gracefully. And to make sure everything is done right, I did all of this more than four times!

Next, I edited syslog-ng.conf (at /opt/etc) to exclude (comment out) all logging sources at "source src", and started re-adding them one by one to try to figure out which source is possibly causing the problem, and to my surprise, two sources were causing the issue, enabling any one of them triggers the issue, these two are:
unix-dgram("/dev/log" so_rcvbuf(65536) flags(syslog-protocol));
internal();

Without these two sources, I can restart syslog-ng as many times as I want without it freezing and me having to kill the service to free the CPU and restart syslog-ng, and of course, logging works normally but missing logs from these two sources for sure!

Please also note that I debugged the service script itself by uncommenting "ARGS="-v" in "rc.func.syslog-ng" but got no useful information. I also ran the service manually using command "syslog-ng -Frevd" but again, no useful information when the script freezes in full verbose mode! And most importantly, I never found klogd or logger running when syslog-ng fails to shutdown. I find logger running after a reboot only when syslog-ng fails to start after a failed shutdown.

I've read in other related posts that /jffs/syslog.log must be a symlink in a normal setup, but mine is not! I guess this should be related to my issue. I've attached all useful screenshots for anyone willing to help, plus all key folder and file listings.

Note: I tried hard reboot (unplugging the router's AC adapter) many times too and in all test scenarios, but syslog-ng still frequently didn't start at boot up, and had logger run instead.

Note: I tried running syslog-ng with no log filters, but had the same issue too!


Thanks!


Code:
# l /tmp

lrwxrwxrwx    1 atr      root            21 Oct 22 04:26 syslog.log -> /opt/var/log/messages
-rw-rw-rw-    1 atr      root            24 Oct 22 04:26 syslog.log-1

Code:
# l /jffs

drwxrwxrwx    2 atr      root             0 Oct 22 04:26 syslog.log/
drwxrwxrwx    2 atr      root             0 Oct 22 04:26 syslog.log-1/

Code:
# l /opt/var/log

-rw-------    1 atr      root          1256 Oct 22 04:26 messages
 

Attachments

  • 2024-10-22_04-24-01.png
    2024-10-22_04-24-01.png
    65.7 KB · Views: 9
  • 2024-10-22_04-24-44.png
    2024-10-22_04-24-44.png
    140.7 KB · Views: 9
  • 2024-10-22_04-27-12.png
    2024-10-22_04-27-12.png
    143 KB · Views: 9
Last edited:
I'm away from my routers so it will take me a bit to unpack all this, but I suspect at root the problem is that scribe isn't set up for syslog-ng to be started/stopped through rc, and your custom start/stop scripts are adding to that. When scribe starts or restarts it kills the daemons and starts syslog-ng; when it stops it needs to stop syslog-ng and restart the daemons, and I don't think the .func helper does that. Those two sources are the ports the daemons serve to syslog, which I think is why you see them in the mix. But I'm not sure.
 
I'm away from my routers so it will take me a bit to unpack all this, but I suspect at root the problem is that scribe isn't set up for syslog-ng to be started/stopped through rc, and your custom start/stop scripts are adding to that. When scribe starts or restarts it kills the daemons and starts syslog-ng; when it stops it needs to stop syslog-ng and restart the daemons, and I don't think the .func helper does that. Those two sources are the ports the daemons serve to syslog, which I think is why you see them in the mix. But I'm not sure.
Yes, you are probably on the right track — there is an option (rs.) within the scribe interface to restart syslog-ng:
Code:
                            _
                         _ ( )          
       ___    ___  _ __ (_)| |_      __  
     /',__) /'___)( '__)| || '_`\  /'__`\
     \__, \( (___ | |   | || |_) )(  ___/
     (____/`\____)(_)   (_)(_,__/'`\____)
     syslog-ng and logrotate installation
     v3.2_1 (master)    Coded by cmkelley

 =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

     s.    Show scribe status
     rl.   Reload syslog-ng.conf
     lr.   Run logrotate now
     rs.   Restart syslog-ng
     st.   Stop syslog-ng & logrotate cron

     u.    Update scribe
     uf.   Update filters
     su.   scribe utilities
     e.    Exit scribe

     is.   Reinstall scribe
     zs.   Remove scribe

 =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

 Please select an option:
 
First of all, thank you very much for your interest in helping me to troubleshoot and solve this out.
I'm Sorry to interrupt you both guys, but I can confirm that all rc functions get called both on normal boot, reboot, and shutdown in Merlin firmware by design as it's normally supposed to be.

And to make it more clear, it is documented that all rc functions get called in numerical order from lowest to highest at boot time and vice versa when shutting down for reboot. Moreover, please note again that the failure to shutdown and the whole issue happens with or without my custom scripts, and the whole thing gets messed up frequently as described earlier.

All the functions you talked about (start, stop, restart, ..) for syslog-ng specifically are done through "rc.func.syslog-ng" which is written specifically for scribe by the script author @cmkelley

More evidence about the usage of rc functions, in the jffs scripts folder, script "services-stop" calls the following command to stop services explicitly:
Code:
/opt/etc/init.d/rc.unslung stop # Added by amtm
And vice versa, the start functions are called while mounting Entware from "post-mount"

As a side note now, as they should become out of our discussion, my custom scripts just call the jffs "services-stop" script explicitly, exactly as it's done in normal reboot using "service reboot" command, which is totally official. I'm not executing any explicit commands with rc functions.

Again, everything was working totally fine on a previous router setup with an older firmware. So, I'm totally positive it's an issue with opening or closing the log sockets, as I can see after executing a "fuser" or "lsof" command when syslog-ng freezes on shutdown

Are you interested in the output of fuser and lsof commands?
But their results are unreliable because they give slightly different output each time when syslog-ng is freezing
 
Last edited:

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top