A few random thoughts reading this, never having had the problem but admiring the tenacity with which it is being attacked.
1. There really isn't any reason why syslog-ng has to start first or early. I think it just seems to make sense that it should and that's why it is S01. But copying the messages up to the point that syslogd is killed into messages (and this goes back to the first method of running syslog-ng) and then starting syslog-ng keeps a continuous log.
2. There is a hiccup that can occur where messages loses some beginning logging. I haven't quite figured it out but I think it is a combo of the buffers and the lack of flow control. In my cases, there are about 1100 messages in the starting sequence before S01 kicks in. They aren't long messages though. But between killing syslogd/klogd, copying messages, starting syslog-ng, there is a gap. That's partly why I use a larger buffer, copy the startup sequence to a file (sudden thought--maybe i would be better just renaming it), start syslog-ng and then use flow control, in an effort to shrink that gap as much as possible. Too large a buffer also results in a loss of messages, and I haven't figured that out. But log_fifo_size of 256 is too small and log_msg_size of 16k is too big (also, you have to multiply those with the number of destinations to get total memory use). I use 2048 and 4096.
3. I absolutely hate this idea of putting the kill instructions in a 30 second loop. I can imagine how painful this is for @cmkelley. I don't know what might be happening at the sockets if syslogd and klogd keep restarting, but it seems likely that the messages at this part of the start cycle are being lost if the daemons are hitting the sockets before syslog-ng steps in to do that. I'd be less troubled if syslog-ng were S99. Are messages being lost with this new method? Perhaps you need to log how many kill cycles you are running to tell.
4. All in all, my suggestion would be to go back to what was working for most people, but perhaps put in a watchdog at the 30 second mark to see if syslog-ng is running, and if not, restart scribe.
Separately, syslog-ng will use the original time stamp of a message (except for network sources, where it seems to add its own as well), but you can also have it use the time stamp of when it is processed.
1. There really isn't any reason why syslog-ng has to start first or early. I think it just seems to make sense that it should and that's why it is S01. But copying the messages up to the point that syslogd is killed into messages (and this goes back to the first method of running syslog-ng) and then starting syslog-ng keeps a continuous log.
2. There is a hiccup that can occur where messages loses some beginning logging. I haven't quite figured it out but I think it is a combo of the buffers and the lack of flow control. In my cases, there are about 1100 messages in the starting sequence before S01 kicks in. They aren't long messages though. But between killing syslogd/klogd, copying messages, starting syslog-ng, there is a gap. That's partly why I use a larger buffer, copy the startup sequence to a file (sudden thought--maybe i would be better just renaming it), start syslog-ng and then use flow control, in an effort to shrink that gap as much as possible. Too large a buffer also results in a loss of messages, and I haven't figured that out. But log_fifo_size of 256 is too small and log_msg_size of 16k is too big (also, you have to multiply those with the number of destinations to get total memory use). I use 2048 and 4096.
3. I absolutely hate this idea of putting the kill instructions in a 30 second loop. I can imagine how painful this is for @cmkelley. I don't know what might be happening at the sockets if syslogd and klogd keep restarting, but it seems likely that the messages at this part of the start cycle are being lost if the daemons are hitting the sockets before syslog-ng steps in to do that. I'd be less troubled if syslog-ng were S99. Are messages being lost with this new method? Perhaps you need to log how many kill cycles you are running to tell.
4. All in all, my suggestion would be to go back to what was working for most people, but perhaps put in a watchdog at the 30 second mark to see if syslog-ng is running, and if not, restart scribe.
Separately, syslog-ng will use the original time stamp of a message (except for network sources, where it seems to add its own as well), but you can also have it use the time stamp of when it is processed.
Last edited: