sfx2000
Part of the Furniture
(disclaimer - I'm just a customer - I have no direct interests with Intel, Netgate, ADI Engineering, or pfSense - sharing this info to clear things - as the whole C20xx mess isn't very clear)
As many now know - there is a possible issue with some Intel C20xx series Chips and the LPC controller having potential burnout issues resulting in a bricked device within 36 months of service life.
The sensational post was on the Register.UK... that's ok, it's page clicks, and it's a serious issue for some vendors
https://www.theregister.co.uk/2017/02/06/cisco_intel_decline_to_link_product_warning_to_faulty_chip/
The reasonable post was over on ServeTheHome.com - where Patrick Kennedy did a great job, taking the time to reach out to various vendors...
https://www.servethehome.com/intel-atom-c2000-series-bug-quiet/
Putting it outright - see this...
Which is kind of a big deal for some platforms - the LPC burns out - if it hosts the BIOS, then the device doesn't boot...
Seems like a yikes moment, rightfully so - as there are C20xx chips all over the place - hence the Reg's clickbait - it's reasonable there..
pfSense/Netgate/ADI Engineering - Rangeley boxes
Netgate doesn't use those lines out of the chipset - and they've taken actions to protect the chip against potential failures via a coreboot update.
The pfSense branded boxes include the SG-8860, SG-4860, SG-2440, SG-2220 - and the Netgate branded RCE-VE devices that do not have pfSense bundled directly..
Top level - The Netgate/ADI/pfSense boxes do not appear to have the Intel issue with C20xx
pfSense does ship some SuperMicro based units that might be impacted - basically, if one is working with pfSense/Netgate, you pretty much know what you have, and when in doubt, reach out to support for clarification here.
Direct from the pfSense Community forums... like I mentioned - the ADI designed boxes don't appear to be impacted - comments from jwt below...
SW updates for Netgate/pfSense on the HW
ADI Engineering has issued a Coreboot update - Coreboot is basically the BIOS of the RCE-VE/SG platforms (similar to CFE or uBoot for those on ARM)...
In a nutshell... (caution - pdf link here)
For current pfSense/Netgate C20xx machines...
If running pfSense - this is an easy update for these boards... they've added a package that automates the update to some degree...
And they have manual updates for the RCE-VE-2*** series if running Centos or other platforms outside of pfSense in the same document.
Going back to Netgate's initial response...
And it's nice to see that pfSense, Netgate, and ADI engineering - they back their equipment and customers...
https://www.netgate.com/blog/clock-signal-component-issue.html
As many now know - there is a possible issue with some Intel C20xx series Chips and the LPC controller having potential burnout issues resulting in a bricked device within 36 months of service life.
The sensational post was on the Register.UK... that's ok, it's page clicks, and it's a serious issue for some vendors
https://www.theregister.co.uk/2017/02/06/cisco_intel_decline_to_link_product_warning_to_faulty_chip/
The reasonable post was over on ServeTheHome.com - where Patrick Kennedy did a great job, taking the time to reach out to various vendors...
https://www.servethehome.com/intel-atom-c2000-series-bug-quiet/
Putting it outright - see this...
Which is kind of a big deal for some platforms - the LPC burns out - if it hosts the BIOS, then the device doesn't boot...
Seems like a yikes moment, rightfully so - as there are C20xx chips all over the place - hence the Reg's clickbait - it's reasonable there..
pfSense/Netgate/ADI Engineering - Rangeley boxes
Netgate doesn't use those lines out of the chipset - and they've taken actions to protect the chip against potential failures via a coreboot update.
The pfSense branded boxes include the SG-8860, SG-4860, SG-2440, SG-2220 - and the Netgate branded RCE-VE devices that do not have pfSense bundled directly..
Top level - The Netgate/ADI/pfSense boxes do not appear to have the Intel issue with C20xx
pfSense does ship some SuperMicro based units that might be impacted - basically, if one is working with pfSense/Netgate, you pretty much know what you have, and when in doubt, reach out to support for clarification here.
Direct from the pfSense Community forums... like I mentioned - the ADI designed boxes don't appear to be impacted - comments from jwt below...
One of the things to note is that the LPC bus (including SERIRQ) is not used on RCC-VE (SG-8860, SG-4860, SG-2440), and RCC-DFF2 (SG-2220).
The LPC bus is used on RCC (XG-2758), and all these units have been reworked to implement the fix.
The LPC bus is also used on the affected units from companies including Supermicro, Lanner, HPE, ASRock, and yes, even Cisco.
A design that uses the component in question will place potentially big loads on the signal in question at the board level. Every time the signal transitions from 0 to 1, there is a big current spike through the weak pullup charging the external capacitance (board traces, external loads), plus DC driver requirement for the off-chip inputs, termination networks, etc.
Hypothetically, consider the situation where the on-chip output pullup drivers that are “weak” in a given design would have much, much less loading on our boards. Our design presents zero external load, only the on-chip load. This on-chip load is on the order of 10x lower than a board where the LPC bus is in-use. This lower loading stresses the weak pull-up transistor much less than a design that has all the additional capacitive loading on the signal(s) in question due to the presence of the LPC bus.
Capacitive loading is a thing. Feel free to educate yourself.
Rangeley and other embedded communications processors have a rated lifetime characterized for 24x7 usage and 10 years. Desktop CPUs and even Avoton are characterized for an 8 hour workday usage for 5 years. This number actually depends on the SKU in question, but is generally true.
Before you protest that Rangeley and Avoton are the same die: true, but fails to account for the bin sorting/yield management that makes for different SKUs and families. Chips are extensively tested while they're still on the wafers.
One failed core (for any reason) probably means you get a quad core. Out comes the laser to cut the fuses, and *bam*, 4 cores evaporate off the die. More than one failed core, but less than 4: IDK, ask Intel. More that 4 failed cores, or won't run at 2.4GHz plus some margin, and for sure you have a 2 core. QAT part failed, or something else wrong, and they make it an Avoton.
In case I'm not being clear: here's something to think about: They're all the same die, but Rangeley is rated for much longer lifetimes. Ask yourself why.
5 years * 365 * 8 = 14.6k hours. (This number is really 5 x 52 x 5 x 8 = 10,400 hours, but use the higher figure as you wish.)
10 years * 365 * 24 = 87.6k hours.
Capacitive loading is, again, a thing.
Not using the LPC bus is... unusual for an Intel design.
We have zero need for it, so we didn't use it.
The LPC bus is used on RCC (XG-2758), and all these units have been reworked to implement the fix.
The LPC bus is also used on the affected units from companies including Supermicro, Lanner, HPE, ASRock, and yes, even Cisco.
A design that uses the component in question will place potentially big loads on the signal in question at the board level. Every time the signal transitions from 0 to 1, there is a big current spike through the weak pullup charging the external capacitance (board traces, external loads), plus DC driver requirement for the off-chip inputs, termination networks, etc.
Hypothetically, consider the situation where the on-chip output pullup drivers that are “weak” in a given design would have much, much less loading on our boards. Our design presents zero external load, only the on-chip load. This on-chip load is on the order of 10x lower than a board where the LPC bus is in-use. This lower loading stresses the weak pull-up transistor much less than a design that has all the additional capacitive loading on the signal(s) in question due to the presence of the LPC bus.
Capacitive loading is a thing. Feel free to educate yourself.
Rangeley and other embedded communications processors have a rated lifetime characterized for 24x7 usage and 10 years. Desktop CPUs and even Avoton are characterized for an 8 hour workday usage for 5 years. This number actually depends on the SKU in question, but is generally true.
Before you protest that Rangeley and Avoton are the same die: true, but fails to account for the bin sorting/yield management that makes for different SKUs and families. Chips are extensively tested while they're still on the wafers.
One failed core (for any reason) probably means you get a quad core. Out comes the laser to cut the fuses, and *bam*, 4 cores evaporate off the die. More than one failed core, but less than 4: IDK, ask Intel. More that 4 failed cores, or won't run at 2.4GHz plus some margin, and for sure you have a 2 core. QAT part failed, or something else wrong, and they make it an Avoton.
In case I'm not being clear: here's something to think about: They're all the same die, but Rangeley is rated for much longer lifetimes. Ask yourself why.
5 years * 365 * 8 = 14.6k hours. (This number is really 5 x 52 x 5 x 8 = 10,400 hours, but use the higher figure as you wish.)
10 years * 365 * 24 = 87.6k hours.
Capacitive loading is, again, a thing.
Not using the LPC bus is... unusual for an Intel design.
We have zero need for it, so we didn't use it.
SW updates for Netgate/pfSense on the HW
ADI Engineering has issued a Coreboot update - Coreboot is basically the BIOS of the RCE-VE/SG platforms (similar to CFE or uBoot for those on ARM)...
In a nutshell... (caution - pdf link here)
RELEASE ADI_RCCVE-01.00.00.12
Release Date: 03/01/2017
The versions of software components used in this release are:
• SageBIOS: SageBios_Mohon_Peak_292.
• FSP: RANGELEY_FSP_POSTGOLD3.
• microcode: M01406D8125 for B0 stepping.
• Descriptor: ADI unlocked
New Features
• Workaround for Intel C2000 Errata AVR.58
A software workaround for Intel C2000 Errata AVR.50 has been implemented in this release. The
workaround disables SERIRQ to prevent indeterminate interrupt behavior for systems that do not have
external pull up resistor on SERIRQ PIN.
Release Date: 03/01/2017
The versions of software components used in this release are:
• SageBIOS: SageBios_Mohon_Peak_292.
• FSP: RANGELEY_FSP_POSTGOLD3.
• microcode: M01406D8125 for B0 stepping.
• Descriptor: ADI unlocked
New Features
• Workaround for Intel C2000 Errata AVR.58
A software workaround for Intel C2000 Errata AVR.50 has been implemented in this release. The
workaround disables SERIRQ to prevent indeterminate interrupt behavior for systems that do not have
external pull up resistor on SERIRQ PIN.
(sfx - editorial comment - I see what the fix is, but this needs to be cleaned up - it's AVR.54, not .50 or .58)
For current pfSense/Netgate C20xx machines...
If running pfSense - this is an easy update for these boards... they've added a package that automates the update to some degree...
And they have manual updates for the RCE-VE-2*** series if running Centos or other platforms outside of pfSense in the same document.
Going back to Netgate's initial response...
And it's nice to see that pfSense, Netgate, and ADI engineering - they back their equipment and customers...
https://www.netgate.com/blog/clock-signal-component-issue.html
Although most Netgate Security Gateway appliances will not experience this problem, we are committed to replacing or repairing products affected by this issue for a period of at least 3 years from date of sale, for the original purchaser.
A board level workaround has been identified for the existing production stepping of the component which resolves the issue. This workaround is being cut into production as soon as possible after Chinese New Year. Additionally, some of our products are able to be reworked post-production to resolve the issue.
We apologize for the limited information available at this time. Due to confidentiality agreements, we are restricted in what we can discuss. We will communicate additional information as it becomes available.
As always, please be assured we will do the right thing for our customers at Netgate and the pfSense community.
A board level workaround has been identified for the existing production stepping of the component which resolves the issue. This workaround is being cut into production as soon as possible after Chinese New Year. Additionally, some of our products are able to be reworked post-production to resolve the issue.
We apologize for the limited information available at this time. Due to confidentiality agreements, we are restricted in what we can discuss. We will communicate additional information as it becomes available.
As always, please be assured we will do the right thing for our customers at Netgate and the pfSense community.
Last edited: