What's new

Still trying to fix my R7800 reboot issue

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

NetBytes

Senior Member
Hi,
For this year I have been trying to solve a problem introduced into Voxel's software starting with .63SF.
When I reboot my R7800 it comes up in a 'bridge' like mode and starts blasting data out to the cable modem
BEFORE it tries to get a dhcp lease from the cable modem. This leaves the R7800 disconnected from the internet
until I pull the WAN wire, count to 10, and re-plug the WAN wire - at which point everything recovers and works fine.
I am not alone with this problem.

I have tried a lot of suggestions all of which have failed, I would like to solve this.
This is definitely a firmware issue: use .62SF and everything works, anything after and it's broken.
What process turns on the WAN/LAN ports?
What is the process which gets the R7800s DHCP lease from the cable modem?
Perhaps I can programmatically mimic dropping the WAN connection, though I have tried a few variants of this which all failed.
Any thoughts you might have would be appreciated.

This thread contains the previous discussions:
 
What process turns on the WAN/LAN ports?
What is the process which gets the R7800s DHCP lease from the cable modem?
The process getting the dhcp lease is udhcpc.
You probably want to look into /etc/init.d/net-wan, /etc/init.d/net-br, /etc/init.d/net-br-dhcpc-helper, but reading the other thread, it seems you already did that.

Why not compare the init.d scripts between .62SF and .63SF and see what is different? (Install .62SF, copy /etc/init.d/ into usb cp -r /etc/init.d /mnt/sda1/init.d.62sf, then install .63SF and do the same cp -r /etc/init.d /mnt/sda1/init.d.63sf, then you can compare with diff).
 
The process getting the dhcp lease is udhcpc.
You probably want to look into /etc/init.d/net-wan, /etc/init.d/net-br, /etc/init.d/net-br-dhcpc-helper, but reading the other thread, it seems you already did that.

Why not compare the init.d scripts between .62SF and .63SF and see what is different? (Install .62SF, copy /etc/init.d/ into usb cp -r /etc/init.d /mnt/sda1/init.d.62sf, then install .63SF and do the same cp -r /etc/init.d /mnt/sda1/init.d.63sf, then you can compare with diff).
This is a great idea, and I did this very early on. I have a USB stick with the full file systems on it.
Trying various things so much I've stopped (re)installing the kamoj addon and created the autorun scripts discussed in the R7800 settings documentation thread,
keeping an install/fw change/reinstall relatively easy.
A full diff check against everything shows nearly nothing significant.

Thanks for the info about udhcpc, a new angle I will check.
 
/etc/init.d/opmode is the script that initializes the eth ports and bridges.
It does that via functions in /lib/cfgmgr/opmode.sh

So perhaps you can do a diff on those files, to see what has changed??
Really appreciate the suggestions from everyone.
I've been through /etc/init.d/opmode many times.
Pretty sure I've also been through /lib/cfgmgr/opmode.sh but will check again.
 
Question 1: If i copy the udhcpc from .62 into my current router firmware - will it be overwritten on (re)boot?
Question 2: What is the 'thing' that detects the removal and subsequent reinsertion of the WAN port cable?
 
Question 1: If i copy the udhcpc from .62 into my current router firmware - will it be overwritten on (re)boot?
Question 2: What is the 'thing' that detects the removal and subsequent reinsertion of the WAN port cable?
1) No, it won’t be overwritten, but it might not work if it was compiled differently.
Anyway, udhcpc is just the dhcp client binary, and I doubt it is what the problem is (it is more when and how it is used), and we know the binary works since you get assigned an ip when you replug the wire.
If you want use another dhcp client, try instead to install another version or client using Entware and try manually.

Did you find any difference between the 2 versions in the scripts? The solution must reside in /etc somewhere as I believe it is not a problem with a binary but with a setting or a script. Could be a time sleep difference, or the order of execution of some commands could be sysctl related....
I had to rewrite a bit the IPv6 scripts for IPv6 to work for me and my provider.
 
I've really been over /etc and am pretty done with comparison checking.
I found very little changes and those were virtually nothing.
I originally hoped I would find an obvious change.

I'm now months into trying testing/making changes to get the system to work,
I have soft-bricked the thing so many times testing changes that is why I started this thread to see if another idea might be what I need.

If a start script is (for example) S65dosomething how would I make it S60dosomething.
Is the rename enough? Must I edit the script to change the "start=65" into "start=60"? anything else?
 
Last edited:
If a start script is (for example) S65dosomething how would I make it S60dosomething.
Is the rename enough? Must I edit the script to change the "start=65" into "start=60"? anything else?
You got it: edit the script to match the start level (START=65) and the name as well /etc/rc.d/S65dosomething
 
Any success?

Don’t know if this can help or was already posted here or on the other thread, but I found how NG is forcing LAN physical signal down for X seconds:
Code:
echo -n X > /proc/switch_phy
Replace X by an integer for how many seconds.
 
Any success?

Don’t know if this can help or was already posted here or on the other thread, but I found how NG is forcing LAN physical signal down for X seconds:
Code:
echo -n X > /proc/switch_phy
Replace X by an integer for how many seconds.
Wow, thank you so much!. This has not been posted before.
I have had zero success so far.

My last test was to put 'net-wan restart' in rc.local as an end-of-booting attempt and this did nothing.
I had modified the script heavily to log everything it was doing and a large number of the internal variables to a file /bootlog.txt.
This showed that the script is doing what it says it will with the proper variables set to accurate values.

I need to drop the WAN physical signal down for 10 seconds.
When you say LAN do you mean it drops all the ports including the WAN one?
 
Wow, thank you so much!. This has not been posted before.
I have had zero success so far.

My last test was to put 'net-wan restart' in rc.local as an end-of-booting attempt and this did nothing.
I had modified the script heavily to log everything it was doing and a large number of the internal variables to a file /bootlog.txt.
This showed that the script is doing what it says it will with the proper variables set to accurate values.

I need to drop the WAN physical signal down for 10 seconds.
When you say LAN do you mean it drops all the ports including the WAN one?
It drops the switch, so I suppose it drops all the ports as there is no differentiation between LAN and WAN. NG comments are always talking about LAN, because it is what they need to drop.
Best is to try ;)
 
It drops the switch, so I suppose it drops all the ports as there is no differentiation between LAN and WAN. NG comments are always talking about LAN, because it is what they need to drop.
Best is to try ;)
I will try - I'm sure we all to well understand the balance between testing and our 'users' putting up with it!
Am definitely trying this first chance I can. Thanks Again!
 
I will try - I'm sure we all to well understand the balance between testing and our 'users' putting up with it!
Am definitely trying this first chance I can. Thanks Again!

I understand this balancing very well;)

I hope this will work for you; your problem is strange and hard to troubleshoot.
 
Out of curiosity...
Have you tried the @HELLO_wORLD tip yet?
By using the command with a value of 10 in my rc.local I have tried his suggestion just yesterday finally
(RealLife has been in the way) and unfortunately it does not help.
The command disconnects all the LAN ports *AND* the wifi radios, but does not affect the WAN port.
Was *really* hoping this would work and was not in a rush to report it did not.

Software is my issue but I can't find what. Beginning to think that the chinese software Voxel removed in .63 has a side affect that
helps in my situation. My choices are stay on .62 and reboot auto-reconnects, or stay current to have security fixes and performing manual reconnects.

P.S.: Starting with both the cable modem and the router powered off.
If I let the modem boot first then R7800, the full 'system' startup fails and no internet.
If I let the r7800 boot first then power on the cable modem then the full 'system' startup works.

P.P.S.: Leaving the R7800 in the 'un-connected' state for hours does not help either, it never recovers.
Wan wire removal is the only way to solve connection.
 
The /sbin/11k_scan script was introduced in .63SF.
It is called at boot from /etc/init.d/wlan-common (START=80) and /lib/wifi/qcawifi.sh.
It starts some type of network scan and you can
try to remove (or chmod -x ...) it and disable 11k in debug page.

You can also try booting with all WIFI turned off to see if it differs.

I remember that the start order of some program(s) changed in .63SF as well:
See /etc/rc.d:
S60dnsmasq -> S61dnsmasq.

Have you tried to create a new entry in /etc/rc.d, e.g. /etc/rc.d/S00aaaaaawait30 that calls /etc/init.d/aaaaaawait30 that just delays e.g. 30 sec?

Also; a lot of Chinese software (funjsq) were ADDED in .63SF, not removed.

Sorry I can be of not much more help now.
I hope we can sort this out one day, we can not gice up!

...Beginning to think that the chinese software Voxel removed in .63 has a side affect that
helps in my situation.
...
Wan wire removal is the only way to solve connection.
...
 
I appreciate the suggestions/interest in my issue.
I have altered the boot sequence on different Versions to match the .62 boot order, but this did not help.
The differences between .62 and .63 in boot sequence were really small and I have tried using .62 versions
on later firmware and this did not help.
I actually stayed on .63 for a long time trying to find the 'magic' change that was broken from .62.
FWIW: The Chinese software (funjsq) was -removed- in .63, i just rechecked changes.log to make sure.

I don't understand the suggestion to delay the router's booting, not sure what that accomplishes.
I will look at the 11k _scan script when I can, I already have 11K disabled.

I believe the main issue since .63 is that while the R7800 boots, what should be LAN only traffic is leaked onto the WAN port, causing the cable modem to lock out the connection. The first traffic out of the WAN port should be the request of it's IP address from the cable modem. It is as if the .63 and later firmware versions start with the router in Bridge mode, where previously it always started in Router mode.

Thanks for the injection of hope!
 
You are welcome! And do check the usage of 11k_scan !!!
funjsq-png.29621

...
FWIW: The Chinese software (funjsq) was -removed- in .63, i just rechecked changes.log to make sure.
...
 

Attachments

  • funjsq.png
    funjsq.png
    19.8 KB · Views: 414
This is really a hard to get problem...

2 ideas:

1) maybe part of the problem is coming from the modem... Any chance you can change any setting on this side?

2) maybe using tcpdump as early as possible to sniff what is being sent to the modem?
 

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!

Staff online

Top