What's new

Scribe scribe - syslog-ng and logrotate installer

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

I've only been on A4 for 10 hours, running the local NTP server. I haven't had syslog-ng restarting, and no blank kernel lines.
Also, try
Code:
filter f_empty { match("^ *$"); };
 
I've only been on A4 for 10 hours, running the local NTP server. I haven't had syslog-ng restarting, and no blank kernel lines.
Also, try
Code:
filter f_empty { match("^ *$"); };
If 384.11 solves the blank kernel lines then I'm probably going to not bother trying to strip the blank lines. It seems too risky. I'll give this solution a try though on 384.10 so that if it reappears we have a known solution.
 
@cmkelley, you might think about this. With 11A4, it seems to take a long time before syslog-ng starts. Stubby now loads before, skynet is running, etc. The openservers are up, etc. So there are a lot of messages before syslog-ng starts that would be filtered if syslog-ng were running, particularly the BLOCKED, DROP IN, and so forth. I was noodling on how to read the startup syslog.log through syslog-ng.

So I did this:

1. I added into syslog-ng conf a new source: source s_startup {file("/opt/var/log/startupmessages" ) ; };
2. I added source(s_startup) to the message log block, so that file would end up in messages, if not dealt with before.
3. I added the same source to my openvpn config file, skynet config file, and pixelserv config file(not really necessary, since pixelserv starts after syslog-ng).
4. Then I changed rc.func.syslog-ng like so:
Code:
 # link to messages if not already a symlink
    if [ ! -L "/tmp/syslog.log" ]
    then
        #/tmp/syslog.log is running and we don't want to read the startup messages from the prior reboot
    rm -f /opt/var/log/startupmessages
        #cat the syslog to startupmessages so they will be read in by syslog-ng
    cat /tmp/syslog.log > /opt/var/log/startupmessages
        rm -f /tmp/syslog.log /tmp/syslog.log-1
        ln -s /opt/var/log/messages /tmp/syslog.log
    fi

syslog-ng kicks off an error message every millisecond or so if a source file doesn't exist, so I don't delete the startupmessages until it runs through the cycle again.

I'm testing it now. Seems to be working.
 
Last edited:
I've only been on A4 for 10 hours, running the local NTP server. I haven't had syslog-ng restarting, and no blank kernel lines.
Also, try
Code:
filter f_empty { match("^ *$"); };
I'm still on 10_2 where would I insert that line, syslog-ng.conf or in a new syslog-ng "helper" file or filter file in syslog-ng.d?
 
I'm still on 10_2 where would I insert that line, syslog-ng.conf or in a new syslog-ng "helper" file or filter file in syslog-ng.d?
You'd have to create a filter file in syslog-ng.d. But it would need a destination and log statement also, not just simply the one line.
 
I'm still on 10_2 where would I insert that line, syslog-ng.conf or in a new syslog-ng "helper" file or filter file in syslog-ng.d?
I was just making a suggestion for how to trap a blank line. Syslog-ng is supposed to drop them anyway, so this is a filter that traps for two or more spaces. I think @cmkelley is right to let 11 settle first before we see what might need to be done.
 
I this a clue to the blank kernel lines in the webGUI syslog? The blank lines stopped for just over 30 minutes, with these showing instead with 4 - 8 per second. I trimmed this back to just three messages or cloudflare blocks my post. :rolleyes:
Code:
Apr 25 18:25:43 RT-AC86U-4608 kernel: net_ratelimit: 1547 callbacks suppressed
Apr 25 18:25:43 RT-AC86U-4608 kernel: protocol 0800 is buggy, dev eth0
Apr 25 18:25:53 RT-AC86U-4608 kernel: device eth0 left promiscuous mode
These stopped and blank kernel lines are back.
 
I this a clue to the blank kernel lines in the webGUI syslog? The blank lines stopped for just over 30 minutes, with these showing instead with 4 - 8 per second. I trimmed this back to just three messages or cloudflare blocks my post. :rolleyes:
Code:
Apr 25 18:25:43 RT-AC86U-4608 kernel: net_ratelimit: 1547 callbacks suppressed
Apr 25 18:25:43 RT-AC86U-4608 kernel: protocol 0800 is buggy, dev eth0
Apr 25 18:25:53 RT-AC86U-4608 kernel: device eth0 left promiscuous mode
These stopped and blank kernel lines are back.
Seems unlikely to be coincidence, but it could be a third process affecting both of them. I wouldn't know how to track that down.
 
If 384.11 solves the blank kernel lines then I'm probably going to not bother trying to strip the blank lines. It seems too risky. I'll give this solution a try though on 384.10 so that if it reappears we have a known solution.
Well 11_A4 does not remove them, but I see no issues with scribe after 2 1/2 hours. I did not dig deeply into the delayed start of syslog-ng, but I see no effects of it looking through all the logs, including the original syslog until syslog-ng loads. (shrug)
 
On my 87U with 11A4, syslog-ng starts 1 min 14 seconds after syslogd had started and 2 seconds after it exits. On 10, syslog-ng started at the 30 second mark. In that time the router generates about 500 messages. So I kind of like feeding the starting syslog into syslog-ng to scrub those. It takes a few seconds for that to happen, so in that time 5 or 6 new log messages are mixed in to the earlier messages.

The net_ratelimit indicates the kernel suppressed 1547 log messages because there were too many coming in too short a time. The second message could be something trying to pull an IP address and failing, but something about network traffic is spamming the kernel log.
 
Last edited:
11b1 seems to work fine with scribe.
 
v0.9_3 is out.

If the webui causes a restart_time event, then klogd and syslogd are restarted individually rather than through a restart_logger call. Also, something isn't letting go of the messages file, so I've added an ugly hack to force the messages inode to change. I think this isn't a 100% fix yet, but the messages file at least seems to work oaky.

Also of note, as of 384.11B1, when the webui calls restart_time, service-event-end is never started. However if restart_time is called on the command line, service-event-end is called. I have no idea why that is, and yes, I reported it RMerlin, so you don't need to.
 
Not sure about line 679. You have a two minute loop that waits until klogd and syslogd are both running; if they are not the loop keeps going. If they are, you restart syslog-ng, which will kill them, and returns to the loop to set the clock at 6. But kill_logger will take some non-trivial time to move and cat the (possibly long) messages file before it returns. If in the six seconds they are restarted, you will go around again and possibly never exit.

Syslog-ng keeps a destination file open for 60 seconds in case new log messages need to be written; this is the default time-reap() parameter. Could that be what is hanging on to messages?
 
Not sure about line 679. You have a two minute loop that waits until klogd and syslogd are both running; if they are not the loop keeps going. If they are, you restart syslog-ng, which will kill them, and returns to the loop to set the clock at 6. But kill_logger will take some non-trivial time to move and cat the (possibly long) messages file before it returns. If in the six seconds they are restarted, you will go around again and possibly never exit.

Syslog-ng keeps a destination file open for 60 seconds in case new log messages need to be written; this is the default time-reap() parameter. Could that be what is hanging on to messages?
I'm okay with that possibility. If something keeps restarting syslogd and klogd, I want to kill them again and have another go at it. Hopefully nothing should keep restarting them indefinitely.

The real reason for that is the opposite. It's got a two minute timer on it. If it kills them in 2 seconds and they don't reappear in 5 more seconds, then I'm arbitrarily going to assume they are dead and won't re-animate on their own. :) No need to go another 118 seconds, but I'm a bit uncomfortable on just calling it good like I had been.

If it's just the time-reap parameter then I'd have expected it to clear with a just a killall -HUP or even a restart. But as of now, I'm changing the inode (i.e. mv and copy it back to itself), sending a -HUP and restarting it. Of the three, I think only the -HUP may be unnecessary but it also can't hurt.

The current solution also doesn't depend on service-event-end, as it seems it can't since one doesn't appear after a webui event causes a restart_time. But that's also kind of a bonus, it doesn't rely on 384.11 to work. :)
 
I follow not waiting for another 118 seconds. What I meant was you reset the counter to 6, go around again and move and cat the file, and then perhaps reset the counter again to 6, so it might never time out of the 120 second loop. As opposed to killing the processes, waiting 6, and if they aren't running, true out of the function.

I agree restarting should close out an open file. If you do scribe restart when syslog-ng is running, you get different process ids, unlike when you issue syslog-ng -Fevd when syslog-ng is already running, which keeps the same process numbers. In any case, unless I'm actively doing something, I only get a daily diversion logrotate message and a daily QOS persistence check into my messages file, and no process restarts that I can tell. I get about 15,000 pixelserv messages in that 24 hours. So while everything is running, I really can't tell if something is holding on to the messages file.
 
Last edited:
I follow not waiting for another 118 seconds. What I meant was you reset the counter to 6, go around again and move and cat the file, and then perhaps reset the counter again to 6, so it might never time out of the 120 second loop. As opposed to killing the processes, waiting 6, and if they aren't running, true out of the function.

I agree restarting should close out an open file. If you do scribe restart when syslog-ng is running, you get different process ids, unlike when you issue syslog-ng -Fevd when syslog-ng is already running, which keeps the same process numbers. In any case, unless I'm actively doing something, I only get a daily diversion logrotate message and a daily QOS persistence check into my messages file, and no process restarts that I can tell. I get about 15,000 pixelserv messages in that 24 hours. So while everything is running, I really can't tell if something is holding on to the messages file.
I think you are mis-reading the lines in scribe. The move and cat the message file only happens if both syslogd and klogd restart themselves in that 6 second period. Just instead of checking once at the end of 6 seconds to see if they've restarted, it checks once a second for 6 seconds.

if "$( pidof klogd )" is null, that is, if klogd isn't running, then it doesn't even check if syslogd is running, it waits a seconds and then checks again. In fact, the way it's written, if only klogd or only syslogd gets restarted, they won't get killed. I should probably fix that case. :)

EDIT: Fixed the logic and also did a test to see if both syslogd and klogd were restarted in the same cycle, and they were (several times), so the chances of running the restart twice appear low.
 
Last edited:
I follow not waiting for another 118 seconds. What I meant was you reset the counter to 6, go around again and move and cat the file, and then perhaps reset the counter again to 6, so it might never time out of the 120 second loop. As opposed to killing the processes, waiting 6, and if they aren't running, true out of the function.

I agree restarting should close out an open file. If you do scribe restart when syslog-ng is running, you get different process ids, unlike when you issue syslog-ng -Fevd when syslog-ng is already running, which keeps the same process numbers. In any case, unless I'm actively doing something, I only get a daily diversion logrotate message and a daily QOS persistence check into my messages file, and no process restarts that I can tell. I get about 15,000 pixelserv messages in that 24 hours. So while everything is running, I really can't tell if something is holding on to the messages file.
So, pretty sure I've found the issue ... it's reading from /dev/kmsg (i.e. the system() source) instead of /proc/kmsg. Reverting that fixed the holding on to the messages file issue I was hacking around by mv and cat (i.e. I was able to remove that bit of ugliness), and also a message dumping problem I saw when klogd and syslogd were restarted. The downside is the old partial message issue is back, although by copying some directives from the expansion of the system() source, so far they've all been proceeded by "Error processing log message:" so at least it seems to be flagging them. EDIT: BONUS! I can reliably filter the partial messages. EDIT2: And currently marking all Skynet (IPTables) messages as errors. Fun.

All that said, I'm dog tired and reluctant to push the changes until I've had a fresher think about them. EDIT: Proven to be a good idea, I found a major logic error that would have filled up small thumb drives very quickly. Ooops.
 
Last edited:
Makes sense why it isn't happening to me as my devices are all on the 2,6 kernel reading from proc/kmsg.
 
So, pretty sure I've found the issue ... it's reading from /dev/kmsg (i.e. the system() source) instead of /proc/kmsg. Reverting that fixed the holding on to the messages file issue I was hacking around by mv and cat (i.e. I was able to remove that bit of ugliness), and also a message dumping problem I saw when klogd and syslogd were restarted. The downside is the old partial message issue is back, although by copying some directives from the expansion of the system() source, so far they've all been proceeded by "Error processing log message:" so at least it seems to be flagging them. EDIT: BONUS! I can reliably filter the partial messages. EDIT2: And currently marking all Skynet (IPTables) messages as errors. Fun.

All that said, I'm dog tired and reluctant to push the changes until I've had a fresher think about them. EDIT: Proven to be a good idea, I found a major logic error that would have filled up small thumb drives very quickly. Ooops.
Just thought I'd drop in again to mention my setup with klogd still running (I also wrote my own similar service-start script loop to kill syslogd - actually it calls scribe's own rc.func-syslog-ng, albeit an older versoin without the killall klogd) and it hasn't missed a beat. Was thinking that if you were inclined to do so with scribe, it should allow both 2.6 kernal and the HND kernel to read the kernel (klogd) messages without issue as they are then just collecting from the same place.
Plus, at least for me, I feel this has the added advantage of not messing too much with an embedded system where unlike a typical linux distro, klogd is linked into many other parts of the firmware (eg the time service).

Either way, I appreciate all the work you've put in @cmkelley, thanks mate.
 
v0.10_0 is up (sorry it isn't 1.0)
  • using system() as source turns out to cause problems on HND routers, so I've reverted syslog-ng.conf to use /proc/kmsg; system() uses /dev/kmsg on HND routers
  • added some flags to sourcing /dev/log and /proc/kmsg which appear after short testing to have eliminated fractured IPTables (Skynet) messages
  • reverted the "move and cat" hack that sourcing from /dev/kmsg necessitated
  • added a syslog-ng filter to filter out blank messages from the kernel; these are filtered to /dev/null, however the filter file has a commented out line to filter them to a file; if you wish to verify for yourself that all the filtered out messages are indeed blank, simply remove the '#' at the front of the line and run 'scribe restart' to log to /opt/var/log/blank.log
  • found a bug in (presumably) ASUS' logic where a webui event that causes a restart_time event will not subsequently trigger the 384.11 service-event-end script; as it turns out this script works fine when called by service-event so no dependency on 384.11 is created
Unfortunately, once again /opt/etc/syslog-ng.conf and /opt/etc/init.d/rc.func.syslog-ng have changed. The only way to force these to be automatically updated is to force reinstall of scribe. I know I need a better way once 1.x is out, but part of the point of this extended beta is to get these two ironed out so hopefully they won't change in the near future. For now, if you don't want to force reinstall, the updated syslog-ng.conf will at /opt/share/syslog-ng/examples/syslog-ng.conf-scribe and the updated rc.func.syslog-ng will be at /opt/tmp/scribe-master/init.d/rc.func.syslog-ng
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top