What's new

/sbin/preinit Memory Leak in 380 series FW

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

To recap the essential but subtle difference..

378.x
do {
ret = sigwait(&sigset, &state);
} while (ret);

380.x

do {
ret = sigwaitinfo(&sigset, &info);
} while (ret == -1);
state = info.si_signo;

Doing the equivalent in 380.x but get more info about the signal (for their debugging purpose?) but causes the leak in the process somehow..
 
To recap the essential but subtle difference..

378.x
do {
ret = sigwait(&sigset, &state);
} while (ret);

380.x

do {
ret = sigwaitinfo(&sigset, &info);
} while (ret == -1);
state = info.si_signo;
Doing the equivalent in 380.x but get more info about the signal (for their debugging purpose?) but causes the leak in the process somehow..

I don't see why this would cause a leak, unless there's a bug in sigwaitinfo() itself. The info struct is pre-allocated, so it should be reused on every sigwaitinfo() call.

In essence, I don't see anything wrong from Asus in using this function call.
 
I don't see why this would cause a leak, unless there's a bug in sigwaitinfo() itself. The info struct is pre-allocated, so it should be reused on every sigwaitinfo() call.

In essence, I don't see anything wrong from Asus in using this function call.

I also don't see the "how it's leaked"... but it's "leaking"...that's the fun part.

I agree with sfx's sentiment about the code quality. Just want to point out the specific below skin deep..

This process also receives at least two alarm timeout per second. For simple purpose. It's insane. So I won't rule out any complications from there..
 
I also don't see the "how it's leaked"... but it's "leaking"...that's the fun part.

I agree with sfx's sentiment about the code quality. Just want to point out the specific below skin deep..

This process also receives at least two alarm timeout per second. For simple purpose. It's insane. So I won't rule out any complications from there..

My personal opinion is that Asuswrt's whole event system need to be overhauled. The original Tomato design was never intended to handle so many types of events, and it's become unmanageable at the code level.
 
I also don't see the "how it's leaked"... but it's "leaking"...that's the fun part.

Next place to look would be in uclibc, which provides that function.
 
Hi all,
I have tested with the method as @kvic provided, it can't duplicate in ASUSWRT 380 code base. We won't maintain 378 code base anymore.
my debug message in my router RT-AC87U with firmware 380-2868:
before is 2320kB
After is 2320kB
2016-04-15_105945.png

my shell script:
2016-04-15_105955.png



Thanks,
Vanic
 
I also just tested your script on my RT-AC88U (based on GPL 380_2697), and for me there is zero change in vmRSS size. It stayed at 2040 KB before and after the test.

@kvic, aren't you using your homebrew build of uclibc with pthread support? Based on a quick look at the uclibc source code, this can have an impact on signal handling.
 
Let's put it this way.

I have a heavily patched custom kernel as I mentioned earlier in this thread. I also run uClibc 0.98.33.2 with native pthread library support as I mentioned elsewhere on the forum. I have a RT-AC56U.

The "leak" is reproducible on RT-AC56U with stock 380.58. As John mentioned, and I think worth noting, I always have "drop_caches=0".

Seems RT-AC3200 can reproduce with stock 380.58 too (from FTC's feedback).

I'm happy with reverting the change and solved my problem. Everything else is a bonus or not to Asus users.
 
Let's put it this way.

I have a heavily patched custom kernel as I mentioned earlier in this thread. I also run uClibc 0.98.33.2 with native pthread library support as I mentioned elsewhere on the forum. I have a RT-AC56U.

The "leak" is reproducible on RT-AC56U with stock 380.58. As John mentioned, and I think worth noting, I always have "drop_caches=0".

Seems RT-AC3200 can reproduce with stock 380.58 too (from FTC's feedback).

I'm happy with reverting the change and solved my problem. Everything else is a bonus or not to Asus users.

One possible explanation is that the issue lies in the kernel, and it has already been fixed upstream since both me (2697) and Vanic (2868) run newer kernels (and there's been a few low-level changes in the kernel in those 2xxx builds).

I doubt the issue lies in uclibc itself, after reviewing that code - it's deceptively simple.

https://git.busybox.net/uClibc/tree/libpthread/nptl/sysdeps/unix/sysv/linux/sigwaitinfo.c?h=0.9.33

Might be worth either checking these kernel changes (they're on my Git), or retrying the same test using one of the alpha builds I provide for 380.59 (based on 2697).

(Interesting bit: sigwait() actually calls sigwaitinfo() under certain build parameters, depending on whether pthread support is enabled or not).
 
I also just tested your script on my RT-AC88U (based on GPL 380_2697), and for me there is zero change in vmRSS size. It stayed at 2040 KB before and after the test.

@kvic, aren't you using your homebrew build of uclibc with pthread support? Based on a quick look at the uclibc source code, this can have an impact on signal handling.

That's right - he was tweaking/tuning uclibc - forgot about that, and yes, that could definitely cause some issues with pthreads enabled
 
One possible explanation is that the issue lies in the kernel, and it has already been fixed upstream since both me (2697) and Vanic (2868) run newer kernels (and there's been a few low-level changes in the kernel in those 2xxx builds).

I doubt the issue lies in uclibc itself, after reviewing that code - it's deceptively simple.

https://git.busybox.net/uClibc/tree/libpthread/nptl/sysdeps/unix/sysv/linux/sigwaitinfo.c?h=0.9.33

Might be worth either checking these kernel changes (they're on my Git), or retrying the same test using one of the alpha builds I provide for 380.59 (based on 2697).

(Interesting bit: sigwait() actually calls sigwaitinfo() under certain build parameters, depending on whether pthread support is enabled or not).

Perhaps they made the explicit function call to make something thread safe rather than considering the implicit function call.. normally though a commit like this would have a comment as to why the change was made...
 
One possible explanation is that the issue lies in the kernel, and it has already been fixed upstream since both me (2697) and Vanic (2868) run newer kernels (and there's been a few low-level changes in the kernel in those 2xxx builds).

I doubt the issue lies in uclibc itself, after reviewing that code - it's deceptively simple.

https://git.busybox.net/uClibc/tree/libpthread/nptl/sysdeps/unix/sysv/linux/sigwaitinfo.c?h=0.9.33

Might be worth either checking these kernel changes (they're on my Git), or retrying the same test using one of the alpha builds I provide for 380.59 (based on 2697).

(Interesting bit: sigwait() actually calls sigwaitinfo() under certain build parameters, depending on whether pthread support is enabled or not).

Thanks for the info but I'm not gonna try and test ;)

I tied up my hands not to touch firmware codes before seeing this leak. But it annoyed me... With it done, hopefully I won't have to do another build for a very long time.

At the moment, I have user space from 380.58 alpha 3 with a few minor patches. A very robust and capable kernel patched by myself and upstream. The essence of SDK7.14 kernel changes I already have them. Overall the firmware is tinkered to a point I like it very much or maybe on the projectile of diminishing return..

I recall there was a guy on this forum who tinkered the firmware for a year and stopped. Then run the firmware for a few years until now (?). I feel like I reached that moment. Hope no oops need to get my hands wet again anytime soon..
 
FWIW: I can confirm this on my AC88 with 380.58 with your script @kvic

I seem to have stumble on the memory leak thru heavy use of OpenVPN server 1. I'll try to test that somewhat more tomorrow, and see if what is suggested is this memory leak... I'll be back :)
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top