Scribe scribe - syslog-ng and logrotate installer

kernol · Nov 10, 2019

cmkelley said:
scribe v2.2_0 is out.

Please update everything, including filter files, some defaults have had to change. It's likely (but not confirmed) you need to be using syslog-ng 3.23 if you use skynet for the new filter to work.

Excellent - well done - I confirm Skynet log now working properly - and I do have syslog-ng v3.23. Easily updated with the new amtm feature ep in Diversion - option 6 "Manage Entware packages"- then option 5 "update and upgrade installed packages".

cmkelley · Nov 10, 2019

I should mention, version 2.2_0 would have been even more delayed without @Butterfly Bones basically doing all my testing for me. Life has been very crazy and his help was indispensable.

elorimer · Nov 10, 2019

Thanks for the back up function. I've updated; I've moved the time reap back down to 2 for skynet, and I've changed the startup to pour the start up messages back into syslog-ng instead of directly into messages. So far so good on the 56U. UiScribe hadn't quite caught up until a reboot.

Also, null now needs to be excluded from Uiscribe, either with a comment line or explicitly in Uiscribe.

cmkelley · Nov 10, 2019

elorimer said:
Thanks for the back up function. I've updated; I've moved the time reap back down to 2 for skynet, and I've changed the startup to pour the start up messages back into syslog-ng instead of directly into messages. So far so good on the 56U. UiScribe hadn't quite caught up until a reboot.

Also, null now needs to be excluded from Uiscribe, either with a comment line or explicitly in Uiscribe.

My rationale for using time_reap(60) is because that's what it was before they changed the functionality.

I don't understand what you're doing with the startup messages, or what the advantage to your method might be.

UiScribe should have got a restart command after upgrading, I'll see if I missed that somewhere.

elorimer · Nov 10, 2019

At the beginning when the symlink doesn't exist, you cat syslog.log to messages. They are captured without the host. I cat the syslog.log to a startup.log file, and then I have a syslog-ng configuration that reads the startup.log as a source. They go into messages that way, but with the host. Thereafter startup.log isn't written to, so syslog-ng never processes anything further. But that way, I can also send the messages to loggly with the host information; otherwise they are lost to loggly. I don't change the time stamp (I think I could) just because that is a good way to tell what happened from the reboot.

elorimer · Nov 10, 2019

cmkelley said:
My rationale for using time_reap(60)

I think maybe we were just lucky. I changed it to 2 so as to minimize the chance that skynet would come along and stomp.

cmkelley · Nov 10, 2019

elorimer said:
I think maybe we were just lucky. I changed it to 2 so as to minimize the chance that skynet would come along and stomp.

Looking at the syslog-0 file, I don't think time_reap works the way it seems to be described anyways. I never have 60 seconds between entries but they are all written just the same.

In hindsight, I would have tried to find a better alternative for syslogd than syslog-ng. The documentation is borderline unintelligible.

kernol · Nov 11, 2019

elorimer said:
Thanks for the back up function. ....
Also, null now needs to be excluded from Uiscribe, either with a comment line or explicitly in Uiscribe.

Option to exclude already available in uiScribe - go to option 1 "Customise list of logs ..." and then toggle /dev/null off [in my case it was item 8 of the list].

elorimer · Nov 11, 2019

cmkelley said:
Looking at the syslog-0 file, I don't think time_reap works the way it seems to be described anyways. I never have 60 seconds between entries but they are all written just the same.

TL;DR: I may not be following, but I think it is the opposite. Holding the file open doesn't prevent syslog-ng from writing the messages; it speeds it up.

When syslog-ng writes to a file, it held the file open by default for 60 seconds after the write. If another message comes along in that time, it writes it to the file a little faster, then restarts the clock. After 60 seconds it closed the file. Every hour, along comes skynet which wants to do its in-place sed. Sed reads the log file, does its processing of it to a temporary file, and then renames the temporary file as the original name. If syslog-ng has the file open at that moment, it can't, and whatever changes skynet wanted to make are lost until the next hour, which then will also include the messages syslog-ng was writing at the moment skynet wanted to rename the file. If skynet was in the process of renaming the file, and syslog-ng wanted to write, then the message would be held in the syslog-ng buffer until skynet closed the file. This would be the same if syslogd was running and we were dealing with /tmp/syslog.log. A downside of skynet's doing its sed on the primary log file, but reduced likelihood of a collision since syslog-ng is only writing skynet messages to that file.

When syslog-ng came along with its changes, it allowed that 60 seconds (or whatever the global time-reap was set to) to be changed on a per log file basis. But it also overrode the global 60 seconds in our case, because it was thought nobody would be writing to a file without a template, so syslog-ng left the file open forever. That meant skynet could never complete its sed. For reasons I don't understand, the result was new syslog-0.log file, with nothing written, while syslog-ng continued on its way writing to the original file, lost in space. Explicitly invoking time-reap(60) restores the original behavior, but in our application I don't think we are writing that much, so closing the file after 2 seconds makes the possibility that much less.

From the comments for this commit:

Code:

The first problem: time-reap() can only be set globally, for all destinations.
The second problem: It can not be disabled. We should be able to disable it, if we are sure, that no problematic templating is set for that destination.
...
Hmm I am not against making this configurable, however if its about disabling time-reap() for cases where the destination is not using macros, I would simply disable reaping automatically in that case. There's code to decide if a destination is macro based or not. 
...
I added a commit, which disables time-reap() for non-templated filename destinations by default.
The user still has the option, to turn it on, and still has the option to turn off for templated filename destinations. This applies for both pipe() and file().

elorimer · Nov 11, 2019

I've submitted a comment to the syslog-ng folks. If you want to follow: https://github.com/syslog-ng/syslog-ng/pull/2798

elorimer · Nov 11, 2019

elorimer said:
I think maybe we were just lucky. I changed it to 2 so as to minimize the chance that skynet would come along and stomp.

Also, I've always wondered why the skynet-0.log seemed to skip some hourly summaries. It could be related to the 60 seconds; since changing to 2 seconds I haven't skipped any.

cmkelley · Nov 11, 2019

elorimer said:
TL;DR: I may not be following, but I think it is the opposite. Holding the file open doesn't prevent syslog-ng from writing the messages; it speeds it up.

When syslog-ng writes to a file, it held the file open by default for 60 seconds after the write. If another message comes along in that time, it writes it to the file a little faster, then restarts the clock. After 60 seconds it closed the file. Every hour, along comes skynet which wants to do its in-place sed. Sed reads the log file, does its processing of it to a temporary file, and then renames the temporary file as the original name. If syslog-ng has the file open at that moment, it can't, and whatever changes skynet wanted to make are lost until the next hour, which then will also include the messages syslog-ng was writing at the moment skynet wanted to rename the file. If skynet was in the process of renaming the file, and syslog-ng wanted to write, then the message would be held in the syslog-ng buffer until skynet closed the file. This would be the same if syslogd was running and we were dealing with /tmp/syslog.log. A downside of skynet's doing its sed on the primary log file, but reduced likelihood of a collision since syslog-ng is only writing skynet messages to that file.

When syslog-ng came along with its changes, it allowed that 60 seconds (or whatever the global time-reap was set to) to be changed on a per log file basis. But it also overrode the global 60 seconds in our case, because it was thought nobody would be writing to a file without a template, so syslog-ng left the file open forever. That meant skynet could never complete its sed. For reasons I don't understand, the result was new syslog-0.log file, with nothing written, while syslog-ng continued on its way writing to the original file, lost in space. Explicitly invoking time-reap(60) restores the original behavior, but in our application I don't think we are writing that much, so closing the file after 2 seconds makes the possibility that much less.

Okay, I think I figured out why any time_reap except 0 will work. With time_reap(0) the syslog-0 file never closes but Skynet can scrape it just the same. The problem is, it can't be closed and a new one created, so it sits empty. Not sure why it goes completely empty, but that may be down to the way Skynet does the close and re-open or maybe the sed operation. With time_reap(60), after that happens and a minute passes in that condition, syslog-ng thinks skynet-0 is idle, and so it closes skynet-0 and Skynet can go on its merry way. I think the biggest gap I saw between entries in syslog-0 was on the order of 20 seconds, so with time_reap(60) it's never closing until the hour comes up and it "hangs" for a minute after Skynet scrapes it.

So I do agree time_reap needs to be lowered (considerably). Maybe time_reap(2) doesn't have any adverse consequences, but I'm not sure it doesn't either. I'd like to be a bit more scientific about it, although a setting based on what I'm seeing might not reflect what other people are seeing, so maybe there isn't a one-size-fits-all method to set it.

elorimer · Nov 11, 2019

cmkelley said:
With time_reap(0) the syslog-0 file never closes but Skynet can scrape it just the same. The problem is, it can't be closed and a new one created, so it sits empty. Not sure why it goes completely empty, but that may be down to the way Skynet does the close and re-open or maybe the sed operation.

Two blind guys groping for an explanation, aren't we. I'm following most of your thoughts on this, except the above.

Skynet writes the BLOCKED messages, etc. to the system logger but maintains its stats separately. On the hour, a skynet chron job (firewall save) runs the stats, Purge_Log and Print_Log functions: Skynet collects its stats, deletes via sed its regular writes from syslog-0.log, and then writes the summary stat line to the system logger--which goes through syslog-ng. When skynet does the sed, it reads all of syslog-o.log even if syslog-ng has it locked, because locking doesn't prevent reading. It deletes from its buffer all of the regular writes, and writes the results to a temporary file; that file should only include (mostly), the old hourly stats. Then, sed wants to rename the temporary file syslog-o.log, and delete the old syslog-0.log; it can't do that if syslog-ng has the file locked. That all takes around 10 seconds on my 56U, during which skynet is doing no other logging, so if time-reap is less than 10 seconds, syslog-ng closes the file, releases the lock, and skynet is back to normal. I imagine your 86U is a lot faster, and an AX88 faster still, so something less than 10 seconds might be necessary. Also this behavior derives from the kernel so later routers might be different.

Hoping the syslog-ng folks will explain a bit more later.

Butterfly Bones · Nov 11, 2019

Is this relevant to the time-reap() discussion with regard to Skynet? I saw this one before after the syslog-ng 3.23 but dismissed it until you two have this dialog about writing to files.

Code:

Nov 11 14:00:03 RT-AC86U-4608 Skynet: [#] 129047 IPs (+0) -- 27203 Ranges Banned (+0) || 15684 Inbound -- 302 Outbound Connections Blocked! [save] [3s]
Nov 11 16:00:00 RT-AC86U-4608 Skynet: [*] Killing Locked Processes (save) (pid=894)
Nov 11 16:00:00 RT-AC86U-4608 Skynet: [*]   894 usr_name  3904 S    sh /jffs/scripts/firewall save
Nov 11 16:00:03 RT-AC86U-4608 Skynet: [#] 129047 IPs (+0) -- 27203 Ranges Banned (+0) || 16090 Inbound -- 311 Outbound Connections Blocked! [save] [3s]

Note the 1500 hour line is missing. I actually have every hour summary for the last couple days, as elorimer states, except for the 1500 hour missing above.

kernol · Nov 11, 2019

With latest scribe installed - default settings and filters - I am missing a LOT more hourly summaries ...

Code:

Nov 11 15:00:03 RT-AC86U-8178 Skynet: [#] 128128 IPs (+0) -- 1645 Ranges Banned (+0) || 642 Inbound -- 0 Outbound Connections Blocked! [save] [3s]
Nov 11 23:00:03 RT-AC86U-8178 Skynet: [#] 128128 IPs (+0) -- 1645 Ranges Banned (+0) || 1889 Inbound -- 0 Outbound Connections Blocked! [save] [3s]
Nov 12 00:00:02 RT-AC86U-8178 Skynet: [#] 128128 IPs (+0) -- 1645 Ranges Banned (+0) || 2071 Inbound -- 0 Outbound Connections Blocked! [save] [2s]
Nov 12 03:00:03 RT-AC86U-8178 Skynet: [#] 128128 IPs (+0) -- 1645 Ranges Banned (+0) || 2525 Inbound -- 0 Outbound Connections Blocked! [save] [3s]

elorimer · Nov 12, 2019

@kernol: Your 86u is a lot faster than my 56U, and looks to take only 2-3 seconds to do the hourly save. Try changing the time-reap parameter in the file destination definition in the skynet configuration file from 60 to 1 or 2 and tell us what happens.

@Butterfly Bones : This suggests to me that when the hourly chron job kicked off, it found the skynet lock file (created probably by the 1500 chron job) in place with a time stamp more than 30 minutes, assumed something hung, and cleared the lock file before starting. The question then is why the 1500 chron job didn't finish cleanly. Maybe this time reap business, maybe something else.

kernol · Nov 12, 2019

elorimer said:
@kernol: Your 86u is a lot faster than my 56U, and looks to take only 2-3 seconds to do the hourly save. Try changing the time-reap parameter in the file destination definition in the skynet configuration file from 60 to 1 or 2 and tell us what happens.

Will do as soon as I get back home (7 hours away). Thanks.

kernol · Nov 12, 2019

kernol said:
Will do as soon as I get back home (7 hours away). Thanks.

OK - so last night I fully uninstalled skynet and scribe - cleaned out all the old logs -kept the messages symlink in place. Rebooted and reinstalled skynet then scribe. Checked all configs and left the system running. Since then I have only had 2 hourly summaries from skynet - 1st at 13:00hrs today and the next at 20:00hrs this evening.

I have now changed time-reap to 2 seconds as requested - reloaded scribe config and restarted. Will report back in the morning on how the hourly summaries progressed. Not holding my breath though - think there may be more to the problem and surprised others aren't also experiencing this.

kernol · Nov 12, 2019

kernol said:
I have now changed time-reap to 2 seconds as requested - reloaded scribe config and restarted. Will report back in the morning on how the hourly summaries progressed. Not holding my breath though - think there may be more to the problem and surprised others aren't also experiencing this.

Got the first skynet summary in the log since time-reap change - [23:00hrs] - will let you know in the morning whether it kept going overnight.

elorimer · Nov 12, 2019

kernol said:
Since then I have only had 2 hourly summaries from skynet - 1st at 13:00hrs today and the next at 20:00hrs this evening.

Taking this in a slightly different direction. I am not sure I follow why for skynet we need hourly summaries in the first place, and why we need to purge the logs in skynet. If we ran the cron daily instead of hourly, what do we lose?

And then, if we wanted to purge the logs, why not use logrotate? I can see the skynet log getting very large and bogging down uiScribe's display, but we could run logrotate hourly, or for that matter exclude the messages that will get purged hourly and rotate them off.

Thinking about this only because I have a guess that the ax88 might run the hourly save in less than a second, so even time-reap(1) will be too long.

Thread starter	Title	Forum	Replies	Date
	Scribe [SOLVED] syslog-ng intermittent failure to shutdown which messes all logging	Asuswrt-Merlin AddOns	15	Oct 21, 2024
J	AdGuardHome There are tons of dnsproxy error messages occuring on syslog.	Asuswrt-Merlin AddOns	2	Jul 26, 2024
T	Updated scMerlin to 2.5.5 - syslog stops at 'syslogd exiting'	Asuswrt-Merlin AddOns	9	Jun 25, 2024

Scribe scribe - syslog-ng and logrotate installer

Very Senior Member

Very Senior Member

Part of the Furniture

Very Senior Member

Part of the Furniture

Part of the Furniture

Very Senior Member

Very Senior Member

Part of the Furniture

Part of the Furniture

Part of the Furniture

Very Senior Member

Part of the Furniture

Very Senior Member

Very Senior Member

Part of the Furniture

Very Senior Member

Very Senior Member

Very Senior Member

Part of the Furniture

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest