What's new

having the most peculiar issue

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

lordv

Occasional Visitor
I think I have narrowed this down to the router, but I would like to submit my issue to you guys for your perusal.

This issue just cropped up recently (I can't recall what day exactly. I would have been at least on build 42. I am on build 43 now.) and it took me a while to figure out exactly what was happening. I thought it was spanning tree for a while, but it turns out that it wasn't. Time to explain.

So I have two switches. I replaced my 8 port netgear switch with a 16 port Cisco SG100-16. Both switches are dumb switches. I have a server that is doing DHCP and DNS duty for the network and both of these things are disabled in the router. I am going to try my best to draw out my network with text.

modem -> r o u t e r
/ |
DHCP/DNS server switch

Some of the clients are plugged into the router switch ports and some are connected to the switch. I have a total of 9 (including the switch) items that would be plugged in. those items would be spread between the router and the switch. I typically keep one router port open due to the placement of the router just in case I need to troubleshoot something.

That said, that means there are 3 things (including the switch) plugged in to the router. The rest is plugged in to the switch. Now, on the switch if I have 6 things plugged in all internet traffic stops (see my pings in the paste bin below). And by all traffic I mean I can't get to any of the LAN resources or any WAN resources.

If I unplug just one item from the switch (it doesn't matter what it is), all traffic resumes like normal. And I mean immediately. The moment item X is unplugged, everything resumes.

I have tried plugging things in to different switch ports on the switch and tried unplugging different items. I also switched back to my 8 port netgear switch, thinking the switch was the problem, but the behavior was the same.

I had initially thought it may have to do with spanning tree (which was enabled), but it turns out it wasn't because I disabled spanning tree and the behavior was the same.

I also tested by plugging in another device to the empty port on the router with everything else plugged (all 9 devices plugged in) in and the same thing happened. Further leading me to believe the issue is with the router, but if I am wrong, please correct me. My router is showing 11 clients connected now, most of which are wired (8 to be precise) and the rest are wireless.

So here are the pings, I have truncated them quite a bit because they were running for a long time, but this will help give you an idea. To be clear, we have pings from server (plugged into router directly) to the internet (google.com) and to a client (which is plugged in to the switch). We have pings from another server (which is plugged into the switch) to the internet (google.com) and to another client on the network (plugged into the router) and one client pinging the internet while on wifi.

Client pinging google while on wifi I plugged in the device at line 3. At line 4 you see how I start getting timeouts. I unplugged it at line 30. Then you see REALLY high times for the pings then it drops back to normal by line 38:

http://fpaste.org/109897/

Server connected to switch pinging google. The device was plugged in at line 3 then the ping didn't appear to do anything. It just stopped. Once I unplugged the device everything continued like normal. You can see how it goes from a 19ms time to 32253ms and other REALLY high times (I wasn't seeing this on my screen) until line 15 when the device was unplugged.

http://fpaste.org/109898/

You can see the same behavior here. This time it was the same server pinging another device on the LAN (this device is attached to the switch). You see really low ping times then they shoot WAY up (the super high times were not displayed on my screen until I unplugged the device) then they drop back down:

http://fpaste.org/109899/

In this paste you see the DHCP server (connected to the router) pinging the other server (connected to the switch) and we see the same behavior. LAN type ping times, they then shoot WAY up when another device is plugged in, then you see them drop back down once the device is unplugged.

http://fpaste.org/109900/

And finally the DHCP server (connected to the router) pinging google. You can see it shoot way up then drop back down when the device is unplugged.

http://fpaste.org/109901/

I have tried everything I can think of, but I just don't know what to make of my findings. I tried unplugging different devices to see if it was one specific device causing the issue (it was not device specific). I tried plugging in another device to the router to see if that would cause the issue (it did). The cables are all in good shape. The only thing I could think it could possibly be is the firmware or the router hardware itself. Is there any sort of debugging I can do to help figure this out? Would switching to the stock ASUS firmware potentially resolve things? Maybe DD-WRT? Tomato? I would rather not have to switch to dd-wrt or tomato if I can avoid it. At any rate, please comment with your thoughts.
 
Bonus question: Can I roll back to earlier firmware versions without issues? I am thinking I would like to try that and see if the firmware is the problem.
 
I had initially thought it may have to do with spanning tree (which was enabled), but it turns out it wasn't because I disabled spanning tree and the behavior was the same.

FWiW...
When I fist employed the ASUS into my network (which looks similar to yours, besides my switches are smart-managed) I experienced about the same issues.

My suspect was spanning tree as well..I had it enabled in the switches and the router (which is the default setting for my switches and the ASUS internal switch, I believe).

Disabling this did not make the effect go away.
I ended up resetting and physically disconnect power of all network components/switches.
After that everything picked up and is going smoothly ever since (with spanning tree enabled/default).
 
FWiW...
When I fist employed the ASUS into my network (which looks similar to yours, besides my switches are smart-managed) I experienced about the same issues.

My suspect was spanning tree as well..I had it enabled in the switches and the router (which is the default setting for my switches and the ASUS internal switch, I believe).

Disabling this did not make the effect go away.
I ended up resetting and physically disconnect power of all network components/switches.
After that everything picked up and is going smoothly ever since (with spanning tree enabled/default).

Thank you for the reply Ford. You are truly one froody dude. Interesting. I haven't tried physically disconnecting the power from the devices, but I have powered everything down and brought it back up.

Does anyone else have any comments?
 
No idea, beside restarting every devices on your network (both router/switches and clients) in case there's something odd going on at the ARP caching level.

Check system logs for reports of any IP or hostname conflict (in case you had a mixture of fixed and DHCP IPs, with a DHCP scope overlapping those static IPs).
 
No idea, beside restarting every devices on your network (both router/switches and clients) in case there's something odd going on at the ARP caching level.

Check system logs for reports of any IP or hostname conflict (in case you had a mixture of fixed and DHCP IPs, with a DHCP scope overlapping those static IPs).

I will take a look, I presume I can view a syslog on the router itself as well? I haven't ssh'd in and snooped around the router yet. I have been keeping an eye on the syslog for the DHCP/DNS server, though. I am probably going to roll back a couple of builds and see what happens. There is no harm in rolling back to a previous build right? To clarify that question, bricking should not occur, right?
 
So you see this when a device connected to the switch pings another device connected to the same switch. Broadcast storm?

Try running wireshark or tcpdump on one of those devices and see if/what you're flooding your network with.
 
I will take a look, I presume I can view a syslog on the router itself as well? I haven't ssh'd in and snooped around the router yet. I have been keeping an eye on the syslog for the DHCP/DNS server, though. I am probably going to roll back a couple of builds and see what happens. There is no harm in rolling back to a previous build right? To clarify that question, bricking should not occur, right?

Reverting back to an older firmware should be fine for testing purposes. If you decide to stick with an older firmware it might be better doing a factory default reset and reconfiguring it however, to ensure you are using the optimum low-level parameters for the wifi radios.
 
So you see this when a device connected to the switch pings another device connected to the same switch. Broadcast storm?

Try running wireshark or tcpdump on one of those devices and see if/what you're flooding your network with.

Could be connected to the same switch, could be the router, could be an external site. It also occurs if I plug in the final device into the router, not just the switch. If it were a broadcast storm, why did this never occur in the past (even without equipment changes)? That is the first question that comes to mind. Not saying that it isn't a broadcast storm, just odd that it would happen all of a sudden and with two different switches from two different vendors (which is what leads me to think it is something with the router/firmware). I should grab a pcap with Wireshark to try and figure out what the heck is going on, but the first thing I am going to do is try rolling back the firmware. I just have this sneaking suspicion that it may be firmware related. If I roll back and it still occurs then I will break out wireshark and do a pcap.
 
I agree with all your points and that it's probably the router/firmware. But as none of it makes any sense, looking at a wireshark pcap might be a quick way of identifying the source of the problem.
 
Last edited:
I agree with all your points and that it's probably the router/firmware. But as none of it makes any sense, looking at a wireshark pcap might be a quick way of identifying the source of the problem.

Good point :). If I get some time this weekend I will have a go. I don't want to break things on a weeknight if I can avoid it. I work on computers/networks all day, don't want to come home and do the same thing. LEL. I will report back with the results.
 
Well...WTH. This weekend I plugged in the final device again and everything still worked just fine. I did take a pcap, but nothing looked out of the ordinary (which I suppose was to be expected). Seems like it was probably a broadcast storm that was tied to dhcp. That is my guess. I have been long contemplating ditching the whole DHCP/DNS server that I have in place and just letting the router handle it. That is still on the table.

On a related note, Merlin, will you be able to add in any other dynamic DNS services? Or provide users the option to just add their own in a future firmware upgrade?
 
On a related note, Merlin, will you be able to add in any other dynamic DNS services? Or provide users the option to just add their own in a future firmware upgrade?

No plan to. Everyone has his own favorite service, and trying to keep up with all of these + any change from their providers would be a full blown project on its own. Those requests should rather be sent to the ez-ipupdate developer.
 

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!

Staff online

Top