• SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Possible issue with new client discovery methodology?

rlcronin

Regular Contributor
This is going to sound far-fetched, and I'm not sure I believe it myself, but since I have installed firmware levels that contain the new client discovery methodology, periodically I will find all the devices on the LAN have lost connectivity (as if the router simply stopped talking to them). I have also observed that if I then unplug the ethernet cable from a certain Linux machine (a RHEL 6.x pc that I use for work) that within 30 seconds all the other devices regain their connection. So I am presuming that something happened on the Linux machine that was flooding the router causing it to not be able to do anything else.

I suppose it may not have anything to do with the client discovery functionality, but prior to migrating to recent firmwares (later than 35_4) this never happened, the implication being that something in later firmwares may be causing it.

I need help figuring out how to get to the bottom of this. From time to time I have to travel and I count on being able to VPN back into my network from afar. I can't have this happening while I am halfway across the country. Suggestions as to how to proceed? I am tempted to just revert back to an older firmware, but I'd rather try to help diagnose the issue and get it fixed.
--
bc
 
The device discovery method has not changed - it still relies on sending an ARP packet and waiting for a reply. Only some timings were adjusted, and one bug where the router would try to access the wrong port when trying to determine if a client had a DLNA server running.

Make sure you don't have any IP conflict on the network, or duplicate MACs (if somehow you were duplicating/modifying them).
 
Make sure you don't have any IP conflict on the network, or duplicate MACs (if somehow you were duplicating/modifying them).
Would there be log entries that might help reveal that? Otherwise I am not sure how to go about doing that.
--
bc
 
Would there be log entries that might help reveal that? Otherwise I am not sure how to go about doing that.
--
bc

I would start with Syslog on the Linux box. If you have Windows boxes being "kicked" from LAN, also check their Event Viewer.

Make sure you don't have more than one DHCP server on your LAN as well.
 
Syslog doesn't really show anything conclusive:

Jan 15 06:00:15 bctp avahi-daemon[2511]: Invalid query packet.
Jan 15 06:00:15 bctp avahi-daemon[2511]: Invalid query packet.
Jan 15 06:00:18 bctp avahi-daemon[2511]: Invalid query packet.
Jan 15 07:49:50 bctp symcfgd: subscriber 14 has left -- closed 0 remaining handles
Jan 15 07:49:50 bctp rtvscand: Download of virus definition file from LiveUpdate server succeeded.
--> Sometime in this interval, the system went unresponsive
Jan 15 09:17:23 bctp kernel: imklog 5.8.10, log source = /proc/kmsg started.

I found the system still powered on, but with a blank screen and unresponsive to any keyboard or trackpad input.

The Windows boxes on the network all had exclamation points on the Network tray icon indicating no network access. An attempt to start a ssh tunnel to the Linux machine timed out. An attempt to open Chrome and go to www.google.com yielded an error.

There were no network-related entries in the event viewer.

Pulling the ethernet cable from the Linux machine (while still powered on) caused the exclamation points to disappear from the Windows network tray icon. Thereafter, I could get to www.google.com again, but the Linux machine was still unresponsive. I had to force power it off to get it to recover.

Would a simple duplicate IP address somewhere on the network cause something like this? It really seemed as though the router was getting hammered somehow and unable to do much of anything. The fact that simply pulling the ethernet cable from the Linux machine made the problem go away is suspicious. What sorts of things might cause these symptoms?

A bit more background ...

This is the 4th or 5th time this has happened in the past month or so. Today's occurrence was the only time I've seen syslog entries related to Symantec AntiVirus just before the problem (so I don't think I can necessarily conclude that this is some kind of Symantec issue).

The Linux machine has a DHCP reservation in the router. Might that be somehow related? It is also constantly connected to the corporate network of my employer using a vpn service provided by AT&T.
--
bc
 
Syslog doesn't really show anything conclusive:

Jan 15 06:00:15 bctp avahi-daemon[2511]: Invalid query packet.
Jan 15 06:00:15 bctp avahi-daemon[2511]: Invalid query packet.
Jan 15 06:00:18 bctp avahi-daemon[2511]: Invalid query packet.
Jan 15 07:49:50 bctp symcfgd: subscriber 14 has left -- closed 0 remaining handles
Jan 15 07:49:50 bctp rtvscand: Download of virus definition file from LiveUpdate server succeeded.
--> Sometime in this interval, the system went unresponsive
Jan 15 09:17:23 bctp kernel: imklog 5.8.10, log source = /proc/kmsg started.

I found the system still powered on, but with a blank screen and unresponsive to any keyboard or trackpad input.

The Windows boxes on the network all had exclamation points on the Network tray icon indicating no network access. An attempt to start a ssh tunnel to the Linux machine timed out. An attempt to open Chrome and go to www.google.com yielded an error.

There were no network-related entries in the event viewer.

Pulling the ethernet cable from the Linux machine (while still powered on) caused the exclamation points to disappear from the Windows network tray icon. Thereafter, I could get to www.google.com again, but the Linux machine was still unresponsive. I had to force power it off to get it to recover.

Would a simple duplicate IP address somewhere on the network cause something like this? It really seemed as though the router was getting hammered somehow and unable to do much of anything. The fact that simply pulling the ethernet cable from the Linux machine made the problem go away is suspicious. What sorts of things might cause these symptoms?

A bit more background ...

This is the 4th or 5th time this has happened in the past month or so. Today's occurrence was the only time I've seen syslog entries related to Symantec AntiVirus just before the problem (so I don't think I can necessarily conclude that this is some kind of Symantec issue).

The Linux machine has a DHCP reservation in the router. Might that be somehow related? It is also constantly connected to the corporate network of my employer using a vpn service provided by AT&T.
--
bc

An IP conflict would also show in Windows' Event Viewer.

If the machine is flat out unresponsive then it looks like a problem with that machine itself. It might be crashing, and sending garbage down the network while in its crashed state, causing a problem with the network switch.
 
That sounds plausible. Yet there is nothing in /var/log/messages that shows evidence of a crash. How does one debug a Linux crash like this? I'm more of a Windows guy. We've been given these Linux machines for connecting to the company network to improve overall security, but it does complicate things because a lot of us have limited Linux skills (at least as compared to Windows skills).
--
bc
 
That sounds plausible. Yet there is nothing in /var/log/messages that shows evidence of a crash. How does one debug a Linux crash like this? I'm more of a Windows guy. We've been given these Linux machines for connecting to the company network to improve overall security, but it does complicate things because a lot of us have limited Linux skills (at least as compared to Windows skills).
--
bc

Most of the time, a Linux crash will dump a kernel panic error on the console. Might be a good idea to configure your video card to not go into sleep mode, and have the screen left open on a TTY console, so in case of a kernel panic you will get the text shown on screen next time you check.

You can also start doing some of the traditional HW tests, such as running memtest86+ during an entire night, to check for memory errors.
 
Most of the time, a Linux crash will dump a kernel panic error on the console. Might be a good idea to configure your video card to not go into sleep mode, and have the screen left open on a TTY console, so in case of a kernel panic you will get the text shown on screen next time you check.

You can also start doing some of the traditional HW tests, such as running memtest86+ during an entire night, to check for memory errors.

Is a "TTY console" just a standard terminal window, or something else?
--
bc
 
Just for closure, I'm happy to report that I think I may have found the problem (using powertop to reveal the culprit) and it has nothing whatsoever to do with the router or firmware. Thanks again Merlin for your advice (and patience). Much appreciated.
--
bc
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top