What's new

Scribe [SOLVED] syslog-ng intermittent failure to shutdown which messes all logging

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Tarek Yag

Regular Contributor
Hi!

Many thanks to @cmkelley for this great addon that I heavily rely on!

I've been diagnosing this issue for a while now trying to troubleshoot as much as I can to provide as much details as possible. I've seen similar issues in other threads that I could relate to, but I need some more assistance to decide what is the best fix for my issue. It might get useful again for the whole community if it turns out that we need a global config check built into the original script.

I had my DSL-AC68U previously running on firmware 386.12_0-gnuton0_beta1 (latest beta back then) setup with the latest scribe 3.2.0 and syslog-ng 4.7.1, and it was running smoothly without any issues at all.
A couple of months ago, I setup my DSL-AC68U again from scratch but this time with firmware 386.12_0-gnuton1 (latest release right now) with a slightly different setup. Additionally, I added my RT-AC68U again to my home network, setup with the latest official Merlin 386.14. Both my routers right now have the latest version of all addons, scripts, and packages (amtm 4.9.2, scribe 3.2.0, syslog-ng 4.7.1), and they both have the exact same issue!

Now, straight to the point. At the beginning, I noticed some intermittent loss of logging, and as I have my own reboot and shutdown scripts that does my own stuff, I noticed that a lot of times syslog-ng fails to shutdown properly as a service using command "/opt/etc/init.d/S01syslog-ng stop", which might very likely mess logging on the next boot up, so I have to make another reboot to sort things out again!

So, I started by trying to restart the service manually outside my own scripts using the command "/opt/etc/init.d/S01syslog-ng restart", and the same issue happens, syslog-ng frequently fails to shutdown, then it keeps consuming 49.9% of the router's CPU until I kill it using the service kill command "/opt/etc/init.d/S01syslog-ng kill". Again, it messes logging most of the times and I have to reboot the router.

Next, I uninstalled scribe completely, then installed it again as normal, knowing that all scribe status checks are ok before and after the re-installation process, but the problem persisted with syslog-ng frequently not shutting down gracefully. And to make sure everything is done right, I did all of this more than four times!

Next, I edited syslog-ng.conf (at /opt/etc) to exclude (comment out) all logging sources at "source src", and started re-adding them one by one to try to figure out which source is possibly causing the problem, and to my surprise, two sources were causing the issue, enabling any one of them triggers the issue, these two are:
unix-dgram("/dev/log" so_rcvbuf(65536) flags(syslog-protocol));
internal();

Without these two sources, I can restart syslog-ng as many times as I want without it freezing and me having to kill the service to free the CPU and restart syslog-ng, and of course, logging works normally but missing logs from these two sources for sure!

Please also note that I debugged the service script itself by uncommenting "ARGS="-v" in "rc.func.syslog-ng" but got no useful information. I also ran the service manually using command "syslog-ng -Frevd" but again, no useful information when the script freezes in full verbose mode! And most importantly, I never found klogd or logger running when syslog-ng fails to shutdown. I find logger running after a reboot only when syslog-ng fails to start after a failed shutdown.

I've read in other related posts that /jffs/syslog.log must be a symlink in a normal setup, but mine is not! I guess this should be related to my issue. I've attached all useful screenshots for anyone willing to help, plus all key folder and file listings.

Note: I tried hard reboot (unplugging the router's AC adapter) many times too and in all test scenarios, but syslog-ng still frequently didn't start at boot up, and had logger run instead.

Note: I tried running syslog-ng with no log filters, but had the same issue too!


Thanks!


Code:
# l /tmp

lrwxrwxrwx    1 atr      root            21 Oct 22 04:26 syslog.log -> /opt/var/log/messages
-rw-rw-rw-    1 atr      root            24 Oct 22 04:26 syslog.log-1

Code:
# l /jffs

drwxrwxrwx    2 atr      root             0 Oct 22 04:26 syslog.log/
drwxrwxrwx    2 atr      root             0 Oct 22 04:26 syslog.log-1/

Code:
# l /opt/var/log

-rw-------    1 atr      root          1256 Oct 22 04:26 messages
 

Attachments

  • 2024-10-22_04-24-01.png
    2024-10-22_04-24-01.png
    65.7 KB · Views: 13
  • 2024-10-22_04-24-44.png
    2024-10-22_04-24-44.png
    140.7 KB · Views: 13
  • 2024-10-22_04-27-12.png
    2024-10-22_04-27-12.png
    143 KB · Views: 13
Last edited:
I'm away from my routers so it will take me a bit to unpack all this, but I suspect at root the problem is that scribe isn't set up for syslog-ng to be started/stopped through rc, and your custom start/stop scripts are adding to that. When scribe starts or restarts it kills the daemons and starts syslog-ng; when it stops it needs to stop syslog-ng and restart the daemons, and I don't think the .func helper does that. Those two sources are the ports the daemons serve to syslog, which I think is why you see them in the mix. But I'm not sure.
 
I'm away from my routers so it will take me a bit to unpack all this, but I suspect at root the problem is that scribe isn't set up for syslog-ng to be started/stopped through rc, and your custom start/stop scripts are adding to that. When scribe starts or restarts it kills the daemons and starts syslog-ng; when it stops it needs to stop syslog-ng and restart the daemons, and I don't think the .func helper does that. Those two sources are the ports the daemons serve to syslog, which I think is why you see them in the mix. But I'm not sure.
Yes, you are probably on the right track — there is an option (rs.) within the scribe interface to restart syslog-ng:
Code:
                            _
                         _ ( )          
       ___    ___  _ __ (_)| |_      __  
     /',__) /'___)( '__)| || '_`\  /'__`\
     \__, \( (___ | |   | || |_) )(  ___/
     (____/`\____)(_)   (_)(_,__/'`\____)
     syslog-ng and logrotate installation
     v3.2_1 (master)    Coded by cmkelley

 =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

     s.    Show scribe status
     rl.   Reload syslog-ng.conf
     lr.   Run logrotate now
     rs.   Restart syslog-ng
     st.   Stop syslog-ng & logrotate cron

     u.    Update scribe
     uf.   Update filters
     su.   scribe utilities
     e.    Exit scribe

     is.   Reinstall scribe
     zs.   Remove scribe

 =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

 Please select an option:
 
First of all, thank you very much for your interest in helping me to troubleshoot and solve this out.
I'm Sorry to interrupt you both guys, but I can confirm that all rc functions get called both on normal boot, reboot, and shutdown in Merlin firmware by design as it's normally supposed to be.

And to make it more clear, it is documented that all rc functions get called in numerical order from lowest to highest at boot time and vice versa when shutting down for reboot. Moreover, please note again that the failure to shutdown and the whole issue happens with or without my custom scripts, and the whole thing gets messed up frequently as described earlier.

All the functions you talked about (start, stop, restart, ..) for syslog-ng specifically are done through "rc.func.syslog-ng" which is written specifically for scribe by the script author @cmkelley

More evidence about the usage of rc functions, in the jffs scripts folder, script "services-stop" calls the following command to stop services explicitly:
Code:
/opt/etc/init.d/rc.unslung stop # Added by amtm
And vice versa, the start functions are called while mounting Entware from "post-mount"

As a side note now, as they should become out of our discussion, my custom scripts just call the jffs "services-stop" script explicitly, exactly as it's done in normal reboot using "service reboot" command, which is totally official. I'm not executing any explicit commands with rc functions.

Again, everything was working totally fine on a previous router setup with an older firmware. So, I'm totally positive it's an issue with opening or closing the log sockets, as I can see after executing a "fuser" or "lsof" command when syslog-ng freezes on shutdown

Are you interested in the output of fuser and lsof commands?
But their results are unreliable because they give slightly different output each time when syslog-ng is freezing
 
Last edited:
I haven't tried it your way because I don't want to put my routers in an unstable state, so the following is only a suspicion.

I think the problem might be your use of services-stop instead of service-event stop, so you are missing all this in the scribe script:
Code:
stop_sng(){
    printf "$white stopping %s ..." "$sng"
    $S01sng_init stop
    # remove any syslog links
    clear_loglocs
    mv "$optmsg" "$syslog_loc"
    ln -s "$syslog_loc" "$optmsg"
    printf "$white starting system klogd and syslogd ..."
    start_syslogd
    if ! $banner; then return; fi
    printf "\n$white %s will be started at next reboot; you\n" "$sng"
    printf " may type '%s restart' at shell prompt, or\n" "$script_name"
    printf " select rs from %s menu to restart %s $std\n\n" "$script_name" "$sng"
}

stop_lr(){ if cru l | grep -q $lr; then cru d $lr; fi; }

menu_stop(){
    stop_sng
    stop_lr
}

...

start_syslogd(){
    service start_logger
    count=30
    while ! sld_rng && [ $count -gt 0 ]
    do
        sleep 1 # give syslogd time to start up
        count=$(( count - 1 ))
    done
    if [ $count -eq 0 ]; then printf "\n$red UNABLE TO START SYSLOGD!  ABORTING!\n$std"; exit 1; fi
}

More specifically, service-event stop calls /jffs/scripts/scribe service_event "$@" & # added by scribe
and services-stop doesn't. All of the above is in scribe, not rc. So you are just killing syslog-ng without starting syslogd. That means the sockets aren't being serviced by anything and all your logging stops. If you call scribe first, it stops syslog-ng, starts syslogd and then exits, so then the normal rc function goes to work. Neither S01syslog-ng nor the helper script does that in the normal rc function. And that is all you get with services-stop. I'm not sure any of the other scripts fool that deeply with asuswrt, but note that your way skips whatever the other scripts lodged in service-event do to handle a shutdown/stop/start/restart.

I think the reason it worked before and not now might have something to do with changes in amtm, but I'm not sure. Also, I'm on 388.8_2 so I may be way off base looking at my routers.
 
Last edited:
I haven't tried it your way because I don't want to put my routers in an unstable state
I highly appreciate your efforts to help in debugging the issue with me, and I do understand the limits of this.

your use of services-stop instead of service-event stop, so you are missing all this in the scribe script
I totally agree with you, but basically there's no aim to restart the system's logger, as we're not trying to use it after stopping or restarting syslog-ng. I can confirm again, as I'm still experiencing the issue, the whole issue happens with normal reboot too (even without executing my scripts during operation at all). And yet, in a separate scenario, syslog-ng sometimes doesn't start on boot up, instead, logger starts when the former fails.

I think the reason it worked before and not now might have something to do with changes in amtm, but I'm not sure.
I'm sorry, both were running on amtm 4.9.1, there was no version difference, only the firmware version changed between the two setups.

Could you please assist me with the best way to decide on which single resource failed when the error happens? I need to pin-point exactly which socket or file froze first which caused syslog-ng to freeze. And would you point me to the default /jffs/syslog.log file attributes and how to restore them?

I've read in other related posts that /jffs/syslog.log must be a symlink in a normal setup, but mine is not! I guess this should be related to my issue.
Next, I edited syslog-ng.conf (at /opt/etc) to exclude (comment out) all logging sources at "source src", and started re-adding them one by one to try to figure out which source is possibly causing the problem, and to my surprise, two sources were causing the issue, enabling any one of them triggers the issue, these two are:
 
I'd try a couple of things.

1. Uninstall scribe entirely and see if the same problem is happening. Go back to the beta firmware and see if the same problem happens on the DSL version.
2. Go into scribe and edit all the update config calls (I think there are four) and comment them out. Then edit syslog-ng.conf to go back to version 3.6, and see if the same thing happens. There is a lot of development activity in syslog-ng, none of it interesting for us, but maybe running it in an old compatibility setting would be illuminnating.
3. Somewhat tiresome, but pepper the scripts with logger statements to see if anything is hanging. service-event is a blocking script so scribe has some wait states to exit, and maybe there is a timing or race condition going on. Unfortunately rc is binary so you can't see what happens when scribe calls service start_logger. (logger here meaning starting the syslogd daemon and not the logger function.)

I think for /jffs/syslog.log you just delete the link; the attributes would be 666 like syslog.log-1.
 
Thank you for the recommendations.

Uninstall scribe entirely and see if the same problem is happening
I already did while troubleshooting before starting this thread.

Go back to the beta firmware and see if the same problem happens on the DSL version.
This exactly will be my last resort for troubleshooting, as I will have to interrupt internet availability at home.

Go into scribe and edit all the update config calls (I think there are four) and comment them out
What are the update config calls?

I'll continue troubleshooting at my earliest free time. Thank you!
 
What are the update config calls?
Scribe has a function called sync_conf()that it uses to update syslog-ng.conf to whatever the installed version of syslog-ng is. You need to comment out each time the function is called in order for syslog-ng to start in a compatibility mode for an earlier version, like 3.6 instead of 4.7, otherwise scribe will on the fly update the 3.6 to 4.7 and you can't test whether the treatement by syslog-ng of one of your sources has changed.
 
Finally! I found the ultimate solution without messing my two routers. I will write it down here if someone faces the same issue any time in the future, especially for owners of old models.

Last time I tried this I had to re-setup my RT-AC68U router from scratch as it destroyed Entware setup, hence the router's whole setup. This time I made a full backup, then experimented a few things to find my way through the best solution without messing anything.

So, since I was skeptical since the very beginning that the whole issue is caused by some update that got released. I focused my troubleshooting for any version differences between the time of my first setup with the beta firmware, and the time of the second setup with latest versions.

So today, I tried reinstalling all Entware packages again. The first time I tried this a couple of weeks ago after disabling backport package updates, it wasn't able to reinstall most of the packages and the whole setup got destroyed immediately. So, this time I tried again and got the exact same result!

I believe there's some issue that makes Entware always try to reinstall the same versions that were installed from the backport repo even while backport repo is disabled, and packages list updated many times. It never tries to use the main repo package versions, which makes the reinstallation fail, stating that packages were not found in any repo.

Next, I faced the problem that the backport updates option is enabled by default when AMTM installs Entware. To workaround this, I installed Entware then disabled backport updates. Then I made a list of installed package names (excluding opkg and entware-*), removed them manually, then installed them again. And YES! Entware is still working perfectly!

Now, I restored the full backup, made a list of all installed package names again (excluding opkg and entware-*), removed them, then installed them again from the main repo only, totally avoiding any packages from the backport repo. And YES! Everything is working super fine, and the syslog-ng issue vanished away!

Eventually, applied the same fix to my other router and everything went smooth again!

You can find all the steps of the fix in the following posts.
 
Last edited:
So, long story short, I suspected that it was a binary issue since the beginning, because it was the second major change after the firmware version between my two setup times. It seems that the issue got introduced with the latest (and maybe last) Entware backport packages update for routers with ARM kernel v2.6

@thelonelycoder, as these models are becoming more obsolete, and some updates are already breaking some of their functionalities (there are other examples than this issue), would you please consider making the backport repo option selectable with a prompt right before installing Entware through your AMTM project? At least for AC routers, as I had to do a hard workaround to bypass the automatic enablement of backport updates.

After having a few issues in my last setup alone, I very much now prefer keeping my router setup with the last original package versions available in the main repo and never download updates.
 
Perhaps you should consider changing the title of this thread and marking it [SOLVED] since it doesn't appear to be a scribe issue.
 
Fix Steps:

If you're experiencing issues downgrading your packages from versions available in the Entware backports repo to lower versions available in the main repo. The following commands are used to force downgrade all of your packages to latest versions available in the main Entware repo.

Make sure you make a full backup of your router, and keep it in a safe place, before trying this fix!
Use it at your own risk!

1. Through AMTM menu, open "Entware packages" menu and make sure to turn off option number 6:
Parallel use Entware-backports Repo
2. Use the following command to produce a full list of your installed packages (excluding opkg and entware-*) in a format (space separated names) ready to be used with next commands
Code:
opkg status | grep 'Package: ' | sed 's/Package: //' | grep -vw 'opkg' | grep -v 'entware-' | echo $(tr '\n' ' ')
3. Use the following command, replacing "[PACKAGES]" with the output of the previous command, to remove all installed packages
Code:
opkg --force-depends remove [PACKAGES]
4. Finally, use the following command, replacing "[PACKAGES]" too, to install all removed packages again
Code:
opkg install [PACKAGES]
 
Last edited:

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top