What's new

Today's delightful new problem (10gbit/s connection incoming)

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

So far there's not really that much difference, but then again, I really don't have anything to get any throughput above my previous 1gbit/s either, so that's kind of expected.

Only thing I've seen improved so far is the routing internally in the ISPs network which I believe is related to the change in my service. Just a few days before the change I was as an example doing 13 hops between me and amazon.com, and now after switching up to 10gbit/s it's down to 11 hops. It could also be related to the changes in equipment in the fiber station, which I understood was needed before the could switch me up. As I'm running a fingbox, doing periodic speedtests against M-Lab's servers (measurementlab.net), I've gone down from 16 (stable ping on every test for the last two weeks) to 11 ms against the test server used.

At the moment I'm kind of seeing the next hop out from my local router looking kind of overloaded, with ping times jumping between 1.5ms up to 75ms worst case, so that probably pulls down the performance at the moment quite a bit. Seeing as my connection outwards should not be "capped" in relation to what the hardware can do at the moment, I really should be able to push up towards a gigabit/s, but I'm not really seeing those numbers and will probably try to investigate it a bit if the ISP is helpful.

As for the models you're linking; I'm currently leaning towards a 8 core model. It may or may not be the 2146NT configuration, or simply the 2141I. At the moment I'm not really anticipating using the QAT features, so the 20% price increase at the moment seems like it would be wasted money. My local vendor also has a roughly 50% price increase going from the 8 core 2146NT to the 12 core 2166NT, which puts it a bit outside of my current target budget.

Then again, the price of a 2141I is about what a Mikrotik CCR1036 would end up, and the price point of the 2166NT is a bit above the Ubiquity EdgeRouter Infinity. I guess it all falls back a bit on the keyword "repurposing", as I know my ISP is already testing out equipment they will try to hand their customers, and once the Qualcomm IPQ807x products start hitting the market four router use, I kind of anticipate the existing hardware solutions will be kind of obsolete. Going with a Xeon D-2100 or similar then at least it can be used for any use case, which may be harder to switch up a Mikrotik or a Ubiquity device to do.
 
Hi @Magebarf - thanks for following up. When I looked today I saw a vendor that had the 2166NT for ~33% more than the 2146NT, which seemed reasonable to me (there was a much larger jump to the 16 Core version though).

The Qualcomm IPQ807x series of chips does look interesting and boasts some impressive specs:

https://wikidevi.com/wiki/Qualcomm#ax_2

However, I'm not sure if they will necessarily make other options obsolete. Once products with these chips come out, I'll be very curious to see some benchmarks - and I'd be looking for packets per second numbers running a realistic workload (something like an IMIX test). I suppose with enough optimization and using the right tools, maybe these devices will be able to push 3 - 4 Million+ packets per second across the firewall to get close to 10Gbit from IMIX traffic. Can't wait for some tests and benchmarks :)

I do see your point about repurposing though - if you are just looking to setup a simple network with multi-gigabit support, it may not make sense to spend a ton of $$$ on Supermicro system now if you plan to switch to a Qualcomm based solution down the road (assuming these perform up to par at 10Gbit WAN speeds). The Supermicro 5018D-F8NT I have now running pfSense is good bit cheaper than the 21x6NT offerings and will probably get you almost halfway there, i.e. the best I have seen is ~1.4 - 1.5 Million packets per second across the firewall, which with an IMIX load implies a throughput close to 4 Gbit/s. Just some additional things to think about as you make your decision - I'll be curious to hear what you ultimately decide on.
 
Yep, there's many variables intertwined...

With regards to the IPQ807x series, on that page you linked there was a link to the datasheet for the 8078, which is where I'm assuming you need to go.
The document was watermarked and not supposed to be out in the open, hence why that link is dead by now.

From memory, I do recall the numbers were routing capacity of about 25Gbps at 64 bytes, which was in the ballpark of 35-40Mpps listed.
How those numbers are affected by NAT etc. is a lot harder to know, but the device is supposed to have serious oomph, even overshooting what today's dream use case would be.
In addition to that, there was also a chip doing crypto acceleration, specced for multiple gigabit/s of VPN traffic.
So, the IPQ-series is definitely not to disregard, but seeing as I haven't heard anyone announce a routing-product, besides ISPs themselves going for subscription models, I'm unsure when to expect them.
 
Those are some very impressive specs indeed - I assume they have a custom ASIC then to offload common packet sizes to (or maybe just 64 byte?) that will provide the acceleration? I'm still a little skeptical how all this will work in a real world mixed packet size use case, but I'm ready to be impressed :). I don't really know what kind of pps we could expect from a XeonD 2146NT or 2166NT based setup running pfSense, but I hope that it would approach 3 Million pps. Another interesting site to check out with some PPS benchmarks on various FreeBSD based system configs would be the BSD Router Project page.
 
Lot's to digest here...

If this were the case, my CCR would be slower than the current line of asus routers.

the CCR is a good device - and a good example of systems engineering - put resources where they can benefit.

That being said - the broadcom SOHO platform is remarkably good - so is QCA, Marvell, and to some degree MediaTek (ex-Ralink) and Realtek

For a given purpose - and those SOHO platforms get bound up when considering the loads that the CCR can carry.

Also PCIe latency isnt as high as you think, especially now that the controller is located on the CPU rather than chipset. IPCs are king and the performance is based on the total MIPS (not the architecture), this is where the MIPS CPU has the advantage, but more development done on ARM and cheaper ARM chips that provide more total clocks than MIPS is what makes them get chosen, along with other SoC stuff like wifi, PCIe, usb3 and many more.

at 10Gbe - latency obviously matters - and while IPC is important, clocks rule...

AMD has done a great job with the Zen platform - and I'm looking forward to seeing higher clock speeds - Epyc is still limited to the sub-3GHz range at the moment...

Going into the MIPS/ARM thing - I always thought that MIPS was a better solution with threads and processes - that being said, ARM did a better job getting SoC vendors on board.

Even one of the better MIPS guys - Cavium, moved over to ARM with ARMv8....

You also have marketing, phones use ARM and people are hooked onto the CPU so using it in your router sells. I mean it runs software quite decently, only slower than MIPS in IPCs where routing, NAT, QoS is concerned but on the other spectrum, math stuff and floats ARM rules here.

Going to the Broadcom AC1900 platforms - there's no FP or SIMD support there.... here's features from BCM4708... Cortex-A9 dual... many on this forum are familiar, as it's the Core SoC for the RT-AC68U series, along with the well known Netgear R7000

Features : swp half thumb fastmult edsp

It can't even do division - Cortex-A on ARMv7 needs VPF3/4 or Neon to do that, and there it needs to be a float via VFP, which the BCM4708 lacks...

So that puts the math argument to rest, as we know that the Broadcom AC1900 platforms can switch/route well enough, and even do complex things like OpenVPN with OpenSSL to a certain degree..

But yet, the RT-AC68U and R7000, years after introduction, are still considered as good devices for a SOHO network.

ARM is actually more efficient or faster than x86 and this is actually where the x86 does lose out if comparing the server ARM variants but router manufacturers want to provide all in ones and MIPS will perform the other functions poorly.

I wouldn't say it's more efficient that MIPS or x86 - at the SOHO scale, it really was about clocks - ARM was faster than MIPS, and while Intel says much about LPIA (aka Bonnell, aka Atom 1st gen), ARM Cortex-A9 was smaller than x86-Bonnell, and the fabs spoke...

They kept cranking out MIPS based SoC's as needed until the OEM's said enough...

This is why ARM was chosen, because it had more development than MIPS and that it ran the other stuff consumer manufacturers wanted to implement much better. If you run a file server on openWRT with a 400Mhz single core MIPS, you'd get 2MB/s. If you ran it on the AC68U (stock 800Mhz dual core ARM), you'd get around 40MB/s, both with the FAT32 file system or even the more efficient ext2 linux file system which is more than 4x faster, so this is why ARM was picked, not because it was faster in doing networking than MIPS, as its only about 50% slower in that regard (IPCs) compensated by the fact that you can get much higher clocks and core counts than MIPS on the low end(4x more total clocks combined).

Not sure where you're going with this statement - except that in the NAS space - MIPS wasn't really a factor there in the long term - ARM did out-execute them from both a business and engineering perspective...

Some insight into a project from more than 10 years back... we were looking at doing something really interesting... notice that MIPS wasn't in the picture - and we're talking 2005-2006 timeframe - ARM was already interesting, PPC was strong, and Intel had their nose in the tent so to speak.
  • ARM - imx from freescale, omap from texas instrustuments, ixp (intel), and Marvell's Kirkwood - Qualcomm's Scorpion was slideware back in the day...
  • PPC - the quoric chip was very interesting (also from Freescale/Moto)
  • x86 - only candidate was LPIA at the time - which meant bonnell, aka ATOM the first gen
ended up going with Marvell Kirkwood - which probably explains to this day why I still am a bit biased towards Marvell ;)

anyways - bringing this discussion full circle...

on x86-64 - AMD has made some fantastic moves with the Zen arch - huge step forward, and on certain tasks, they more than outperform Intel...

Intel has a serious challenge with AMD, and I grant that - that being said, there are certain use cases, and these are relevant to the discussion here, where Intel, and the Skylake based Xeon's still prevail - and a lot of this is due to just pure clock speed - it's up to AMD to push the clocks - so far they're more focused on how many threads they can do, rather than less cores, more clocks, and the Zen fabric does have some challenges with core vs. memory vs. which bus to move traffic across. The Epyc's do have a surplus of bandwidth, and deserve a fair amount of attention as AMD scales this platform.

Going back to PacketShader - was an interesting concept - these days, both AMD and nVidia are focused more on DL/ML, and there, the effort is paying off - AMD's x86 is focused in other places (we have GPU's), and Intel does have their stuff with the AVX-512 instructions (Xeon PHI seems to be a bust, they've deprecated that platform it seems). Certain Xeon sku's with low core count and very high clock rates are very interesting...

What intel does still have an advantage - obviously AVX512 where it matters - AVX isn't just about math, it's about moving data across the core and platform - they have OmniPath on chip with certain SKU's, and of course DPDK and QAT - and AMD doesn't have an answer there, it's not a priority for them at the moment.
 
Last edited:
Those are some very impressive specs indeed - I assume they have a custom ASIC then to offload common packet sizes to (or maybe just 64 byte?) that will provide the acceleration? I'm still a little skeptical how all this will work in a real world mixed packet size use case, but I'm ready to be impressed :). I don't really know what kind of pps we could expect from a XeonD 2146NT or 2166NT based setup running pfSense, but I hope that it would approach 3 Million pps. Another interesting site to check out with some PPS benchmarks on various FreeBSD based system configs would be the BSD Router Project page.

In addition I'm also thinking I could try to experiment and build my own setup using CentOS and VPP with the NAT plugin.
If nothing else just the understanding of how VPP works would be useful, especially if TNSR is any good indication of the current trends.
 
at 10Gbe - latency obviously matters - and while IPC is important, clocks rule...

I know this has been part of the discussion for quite a while here, so it probably would have been a good idea to ask this earlier, but better late than never I guess. :)

Exactly what do you mean when saying "clocks"?

My assumption is that we're purely talking CPU frequency, in raw Herz. Ticks per second. Is this correct?

The main "power"/advantage with this is bringing is that it's the most direct way to increase performance, as IPC may or may not benefit a specific use case, wheras an increase of the frequency for the whole package is gonna benefit all scenarios? In addition, higher frequency also directly lowers the time between context switches, and servicing of interrupt routines, thus operating at a lower latency?

Or have I misinterpreted exactly what concepts are being put into the term of "clocks"?
 
Lot's to digest here...

...
PPC is one of the best choices, but more development on ARM just meant they didnt get picked. Apple did use them in the past and thats saying something. I do like sun sparc though :p

as i was saying in clocks, my 1.2Ghz 36 core TILE does pretty well compared to 2Ghz or faster ARMs. Between a 32 core epyc at 2Ghz and an ARM server at 3Ghz thats only 4 cores, which will be faster at networking?

Latencies and clocks are different. The reason is that a lot of busses are parallel, meaning that they dont rely on clocks, whereas the NIC is also parallel so its much more different than you think. you dont need a 10Ghz CPU to route 10Gb/s do you, i mean look at switch CPUs, they are massively parallel low clocked CPUs with massive busses that are in parallel.
 
PPC is one of the best choices, but more development on ARM just meant they didnt get picked. Apple did use them in the past and thats saying something. I do like sun sparc though :p

For the project at hand, the PPC was just too expensive... Kirkwood met all the project requirements...
 
Also got a heads up that the Netgear Nighthawk X10 has been updated to be able to use the SFP+ port as WAN port: https://kb.netgear.com/000058223/R9000-Firmware-Version-1-0-3-16

Could this be the 1st consumer-level wireless router that could potentially support >1Gbps WAN?

Understand this was never its intended purpose since this feature was added after the fact via firmware and I don't believe Tim benchmarked the SPF+ throughput as part of his original review here.

Would expect more consumer-level wireless routers to slowly begin supporting >1Gbps WAN throughput 'out of the box' as more folks are able to transition over to 10GE service...
 
For the project at hand, the PPC was just too expensive... Kirkwood met all the project requirements...
Yeah both the MIPS and PPC cost a lot more than ARM for the performance too, not to mention the development and support as well. As i was also showing is that clocks arent king here either, as clocks between archs arent equal in many ways, so many differences too between architectures like their internal busses and more.

For example the TILE has low internal latency from the NIC all the way to the CPU. Same with PCIe ports connected to the CPU as well (one good thing about some of the SoCs in routers too). In epyc, the PCIe ports are directly connected to the CPU but intel's xeon/high performance equivalent have some PCIe ports on CPU, some on chipset and its very confusing as well.

Plenty of differences, and latency actually doesnt matter internally because of the way routing works, pushing those packets only requires internal bandwidth more than latency, but latency from any recent x86 far outclasses the smaller SoCs like ARM and MIPS. Clocks arent everything too as i was saying. LTT has been using servers with manycore many CPU xeons at 2+ghz pushing multiple 10Gbe for purposes of storage, video editing and so on.
 
In addition I'm also thinking I could try to experiment and build my own setup using CentOS and VPP with the NAT plugin.
If nothing else just the understanding of how VPP works would be useful, especially if TNSR is any good indication of the current trends.

I looked into VPP a little bit today and it looks like a very powerful platform that offers significant speed increases on basic commodity hardware. The documentation on the fd.io Wiki also looks pretty good and there is even a small tutorial on how to setup a home gateway. Of course, the only thing a VPP based custom setup won't have is a nice UI to configure everything, but since you are already looking into VPP as a viable option, I assume you won't need that :). The one thing that left me wondering though: If you use VPP for routing/NAT, what are you planning to use as your firewall, and how would this impact packet throughput?

Going back to BSD based pfSense setup: Instead of going with a Supermicro based Xeon D 2146NT or similar, what would prevent you from just building your custom firewall using something like in i7-8700, i7-8700K, or i7-8086K? The 8086K, has 6 cores and boosts up to 5GHz (so very fast clock). The i7-8700 looks like a really nice option with a high clock speed (3.2GHz base, 4.6GHz turbo), 6 cores, but only a 65W TDP. You could put that together with dual 10Gbit ethernet adapter (e.g. Intel X540/X550) and a dual 10Gbit SFP+ adapter (e.g. Chelsio T520). Having said that, the only issue I see there is less PCI Express lanes - 16 in this kind of setup vs. 32 in the Xeon-D 2100 based setups. Would that cause a huge performance detriment in your use case? I guess it depends to an extent on the network topology and network demands. I just figured that a higher clocked multi-core i7 might be able to drive more packets per second and maybe even come in a bit cheaper in terms of budget. Plus, it could be repurposed easily later as a desktop/server.
 
I looked into VPP a little bit today and it looks like a very powerful platform that offers significant speed increases on basic commodity hardware. The documentation on the fd.io Wiki also looks pretty good and there is even a small tutorial on how to setup a home gateway. Of course, the only thing a VPP based custom setup won't have is a nice UI to configure everything, but since you are already looking into VPP as a viable option, I assume you won't need that :). The one thing that left me wondering though: If you use VPP for routing/NAT, what are you planning to use as your firewall, and how would this impact packet throughput?

Yes, this is parts that would require a lot of consideration, which is why this would be a skunkworks project behind a more tried and tested solution in a virtualized environment.
As for a more concrete answer, I do not fully know yet. I know there are flow tables and security groups available in VPP to configure, although they definitely would impose some performance hits. I did see an article on hardware acceleration for the security groups features, and I would have to do some research to see if this made its way into the code base or not.

The lack of a GUI is not that bad while in experimental state and as long as it's only me doing this. There are some configuration tools using configuration files, which should be quite useful for small adjustments etc. If I get things running so good that it's time to make a more generalized distribution however that's when a UI for this type of thing would be a necessity.

I did also find FRINX (https://frinx.io/) and a lot of interest from OpenStack, so with all the work that seems to be done around VPP at the moment it is definitely a bit of a moving target.

Going back to BSD based pfSense setup: Instead of going with a Supermicro based Xeon D 2146NT or similar, what would prevent you from just building your custom firewall using something like in i7-8700, i7-8700K, or i7-8086K? The 8086K, has 6 cores and boosts up to 5GHz (so very fast clock). The i7-8700 looks like a really nice option with a high clock speed (3.2GHz base, 4.6GHz turbo), 6 cores, but only a 65W TDP. You could put that together with dual 10Gbit ethernet adapter (e.g. Intel X540/X550) and a dual 10Gbit SFP+ adapter (e.g. Chelsio T520). Having said that, the only issue I see there is less PCI Express lanes - 16 in this kind of setup vs. 32 in the Xeon-D 2100 based setups. Would that cause a huge performance detriment in your use case? I guess it depends to an extent on the network topology and network demands. I just figured that a higher clocked multi-core i7 might be able to drive more packets per second and maybe even come in a bit cheaper in terms of budget. Plus, it could be repurposed easily later as a desktop/server.

This could be a way to go down as well, but I'm not all that worried about repurposing a Xeon D or a Atom C either. I'll definitely be needing some upgrade to my NAS or something similar anyhow. The repurposing part was a bit more related to buying a Mikrotik or Ubiquity equipment, even though they may be slightly cheaper than some options for the homegrown router solution. No matter which solution I go with (homegrown or COTS) I do anticipate a few years down the line it will be interesting to replace.

I would have to do some calculations what a beefily configured i7 would end up cost wise, and also weigh in the factor of power consumption etc. ECC or lack thereof is something I haven't even thought about in the context of a software router.
 
The most important factor to getting 10Gb/s internet with processing is really just internal bandwidths. Remember that CPU ratings means max memory controller speed and also the memory settings matter also. So it doesnt matter if you use an i7 or a many core atom based xeon or even a mikrotik CCR, the internal bandwidths matter most. As long as you have enough total CPU power and sufficient internal bandwidths, you'll do 10Gb/s fine.
 
So I have a question. If you only use IPv6 can you connect a 10 gig layer 3 switch directly without NAT? I am not the best at IPv6 but at some point you could do away with NAT using IPv6. Could this be a solution? Your network design would be a little strange on how you use all the 10 gig. You could do a bunch of laggs.

So the way it would work is you create a VLAN with a point to point outside IP address. You use the layer 3 switch to router all you local traffic to the point to point IP address which would also be your default route.
 
Last edited:
So I have a question. If you only use IPv6 can you connect a 10 gig layer 3 switch directly without NAT? I am not the best at IPv6 but at some point you could do away with NAT using IPv6. Could this be a solution? Your network design would be a little strange on how you use all the 10 gig. You could do a bunch of laggs.

So the way it would work is you create a VLAN with a point to point outside IP address. You use the layer 3 switch to router all you local traffic to the point to point IP address which would also be your default route.

That would be doable, in case the ISP provides a fully native IPv6 setup. Using something like 6RD or other tunneling, you would still get stuck with issues.
In addition, somewhere in the chain going from IPv6 towards IPv4 only resources would become a severe bottleneck.
 
I was thinking if you only use IPv6 and no IPv4.

So now you got me thinking. Could you hook up several gig routers to the IPv6 switch where the routers did 6to4 tunnels. This way you could have IPv4 also. IPv4 would only exist within the router or routers all outside the router would only be IPv6 in the layer 3 switch.

I may have the tunnel backwards.
 
Last edited:
So now you got me thinking. Could you hook up several gig routers to the IPv6 switch where the routers did 6to4 tunnels. This way you could have IPv4 also. IPv4 would only exist within the router or routers all outside the router would only be IPv6 in the layer 3 switch.

Hmm... got me to thinking - if I were an ISP that was pushing 100 percent IPv6, I would likely also deploy CGN for the IPv4 apps that are not IPv6 native (yes, there are still many apps* that need IPv4 and likely will never be updated) - and CGN solves a lot of the IPv4 public IP management issues...

CGN is getting more common, and like NAT, it's a bit ugly from an architecture perspective.

* gaming consoles are a good example here...
 

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top