What's new

Solved Fixed! - ASUS XT8 - Wrong FW flashed - Rescue not available - CFE OK

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

fleckens

Occasional Visitor
Hi all

I'm hoping someone can help me please...

I have an ASUS router that looks like someone (not me) has flashed it with an incorrect FW. The symptoms are solid white which turns to solid green and stays like that.

I've opened it up and plugged in a serial RS232 to the board and compared the output with a known good board. This confirms my suspiscion of wrong flash, see attached image for a side by side comparison.

Pressing the reset button will reset the device but not enter into rescue by holding it down, the light will not turn purple.

Long pressing WPS button press at power didn't help either, but it does get me to a CFE command prompt .

Seems like it dosen't get far enough to start the network stack and it's not visible to the PC running recovery.


If I just let it boot normally I only see these errors repeating over and over once it finished booting and settled down.

MDIO Error: MDIO got failure status on phy 24
MDIO Error: MDIO got failure status on phy 24
MDIO Error: MDIO got failure status on phy 24
MDIO Error: MDIO got failure status on phy 24
MDIO Error: MDIO got failure status on phy 24
MDIO Error: MDIO got failure status on phy 24
MDIO Error: MDIO got failure status on phy 24
MDIO Error: MDIO got failure status on phy 24

As I have a CFE command line and can interact with it. Is it possible to upload a nand flash this way?

Thank you.

1727260162440.png
 
Last edited:
OK.... I got impatient and unsoldered a known good NAND from a donor XT8 which is dead and replaced the one in mine. All good I thought, work looks clean, no shorts and each pin inspected for good physical connection.

I now get the white light turning to blinking green as expected but then it boot loops. The serial out shows lots of repeated MDIO errors and then it reboots/loops.

MDIO Error: MDIO got failure status on phy 31
MDIO Error: MDIO got failure status on phy 31
MDIO Error: MDIO got failure status on phy 31
etc.

Now I have no issues to get it into rescue using the long press reset at power on and flash the latest official XT8 FW, all completed succesfully. But MDIO problem persists.

After some investigation I've found the donor NAND has the MAC addresses of the hardware from the donor router and these no longer match my routrer, how can I change this? I think this is could be cause of my MDIO issues. I think MDIO uses MAC address in communications
 
Update... managed to change the MAC address using the CFE command line option "b" to set board parameters, the MAC address is now correct but still have the MDIO issue. I'm starting to lean towards the idea that this router has an underlying hardware issue and someone thought they'd flash the firmware to fix it, and in the process made it worse by loading the wrong FW. Now that it has the right FW it's a lot better as rescue mode is fine, but I'm now probably faced with the original issue.

All output on the serial port looks fine until it tries "Detecting PHYs"

Broadcom Archer Packet Accelerator Intializing
Open archer Netdevice
Archer DS DPI Initialized
Archer US DPI Initialized
Archer WLAN Interface Construct (Threshold 32 packets, Timeout 500 usec, Jiffies 10)
Sysport 0 WOL IRQ 53
Sysport 1 WOL IRQ 59
Archer WLAN Rx Thread Initialized
SYSPORT Driver Constructed (Budget: 128, Coalescing: 826828 nsec)
Archer Mode set to L2+L3
Archer Mode set to L2+L3
Initialized Archer Host Layer
Flow Table (256, 64): e167b000, 4194304 bytes
Flow Info Table (44, 65536): e1a7c000, 2883584 bytes
Command List Table (128): virt 0xd6400000, phys 0x16400000, size 2097152 bytes
Command List Control Table (8): d6ac0000, 131072 bytes
Sysport Classifier Initialized: Maximum 16384 flows
Archer IQ status changed from 0 to 1
[NTC bitpool] idx_pool_init: 551:FHW[0]:Create Index Pool_Size = 16384
Retrieve num_fhw_path=64
Pathstats allocated 2048 bytes
Broadcom Packet Flow Cache HW acceleration enabled.
Enabled Archer binding to Flow Cache
Broadcom Archer Network Processor Char Driver v0.1 Registered <339>
Archer TCP Pure ACK Enabled
crossbar Mux: connect cb_idx:0 int_ep 0 to ext_ep 2
crossbar Mux: connect cb_idx:0 int_ep 1 to ext_ep -1

Detecting PHYs...
MDIO Error: MDIO got failure status on phy 31

Loading firmware into detected PHYs...

Adjusted SF2 SGPHY: sphy_ctrl=0x0001081b
getrcal for res select 5, int 667, ext 638, ratio 122 ratio1 152, rcal 9
Setting SGMII Calibration value to 0x9
PLL_PON(8051): 0x0250; TX_PON(8067): 0x0048; RX_PON(80b2): 0x0980
MDIO Error: MDIO got failure status on phy 31
MDIO Error: MDIO got failure status on phy 31
MDIO Error: MDIO got failure status on phy 31
MDIO Error: MDIO got failure status on phy 31
MDIO Error: MDIO got failure status on phy 31


Any thoughts anyone?
 
Last edited:
One other thing of note.... the 2.5Gb chip on the two boards is different... wonder if that has a bearing on it. The board versions are both the same 1.40.
 
I think MDIO uses MAC address in communications

Been a long time since I've messed around with CFE, but u-boot performs similar tasks, and both have to support the same Linux basics...

With u-boot, the parent defines the bus, the child defines the PHY by node ID - MAC address comes in a little bit later, as normally we define MAC addresses in another MTD partition to support production, e.g. we do the RF cal on the board for WiFi, and if pass, we assign the MAC addresses for all interfaces and write that to a partition - for QCA devices, it's the ART.

IMHO - Sounds like HW damage, to be honest - you're getting the MDIO error because the node is not responding
 
Hi, thanks for responding.....

I'm now happy it's not MAC related as I found that this can be changed

As you say it it's probably a hardware fault, but the one thing that's nagging me is that the 2.5Gh ethernet chip is not the same between the donor router and target router. And its failing at exactly the point it needs to load the FW into this chip.

I've compared the bootup sequences for a working and non working:

Left hand side is working XT8 that has a 54991 chip
Right hand is the faulty router which has a 50991 chip but a transplanted NAND from a router with a 54991 chip.


The ASUS firmware for both is the same, v1 hardware. I'm wondering if there is a device tree or some other statement stored somewhere that defines this chip differently.

I've checked the power supply for this chip and that seems fine.


1727342272556.png



There is an option in CFE to load a DTB via TFTP but I would need to find a working one...

1727343213911.png


I just checked all my XT8s and have one with the same chip... found this is the logs on one of my nodes:

Jan 1 00:00:29 kernel: 50991EL B0 3590:50c9 --> 0x7
Jan 1 00:00:29 kernel: Loading firmware into detected PHYs...
Jan 1 00:00:29 kernel: Firmware version: Blackfin B0 v02-02-06
Jan 1 00:00:29 kernel: Loading firmware into PHYs: map=0x80 count=1
Jan 1 00:00:29 kernel: Halt the PHYs processors operation
Jan 1 00:00:29 kernel: Upload the firmware into the on-chip memory

Any ideas how I can extract this bit of the config and add it to the none working router?

The last option would be to also replace the chip but that would not be easy as it's a small BGA.

I see that gzb90 had this same issue over a year ago but hasn;t been around since.
 
Last edited:
Left hand side is working XT8 that has a 54991 chip
Right hand is the faulty router which has a 50991 chip but a transplanted NAND from a router with a 54991 chip.

Take a photo of the two chips - the 54991/50991 part numbers don't line up to anything...
 
Take a photo of the two chips - the 54991/50991 part numbers don't line up to anything...
not much to see.... they're ethernet chips see below for the 54991, 50991 looks identical.

I'm convinced this is just a device tree type issue. Will spend more time on it tomorrow.


1727378035119.png
 
They do if you put "BCM" on the front of them. Hint: they're 2.5Gb ethernet transceivers.

Yah - saw that jumping around over on Mouser and Digikey...

Broadocm doesn't support the 50991 part, and the 54991 one in certain SKU's is still available.

54991 is not a drop-in replacement for the 50991 - this is both for the SGMII interfaces (1.8 vs 3.3) and the magnetics out to the port...

Looking at the logs provided - the bus/port impacted is not the PHY's noted, but on the SoC itself, as the MDIO bus fails...

My best guess is that the XT8 got a lightning hit, and smoked the port - just saying...
 
Yah - saw that jumping around over on Mouser and Digikey...

Broadocm doesn't support the 50991 part, and the 54991 one in certain SKU's is still available.

54991 is not a drop-in replacement for the 50991 - this is both for the SGMII interfaces (1.8 vs 3.3) and the magnetics out to the port...

Looking at the logs provided - the bus/port impacted is not the PHY's noted, but on the SoC itself, as the MDIO bus fails...

My best guess is that the XT8 got a lightning hit, and smoked the port - just saying...

That's certainly a possibility..... however, see my first post, fails on PHY 24 before NAND swap, now fails on PHY 31 after NAND swap. The only thing that's changed is the FW and if the SOC is trying different PHYs wouldn't that suggest the bus is ok? Also, the fact that the 3 ethernet ports work fine as managed to use ASUS rescue after swapping the NAND. I'm no expert but I would expect to see something different if it had lightning damage, I'm still very much leaning towards config/hardware parameter/device tree type issue.

Maybe I'm being too optimistic :). Wll take another look later, maybe I'll scope around that chip and compare with the working board.

I'm determined not to let this thing beat me, just yet!
 
Found out that this device is also known as the RT-AX95.

In CFE you have the option to change the board type (option B). So even though I had AX6600 selected, this is not the right model. I changed this to RT-AX95 and no no more PHY error. It's now working fine.

So it should be set to AX6600 if the board has a 54991 ethernet chip, and RT-AXE95 if it has the 50991chip.

There is also a 3rd option of board in CFE AX66001 and in total there were about 10 different routers to select from. WTF Asus! How confusing. I guess the time these routers were being produced they used whichever chips they could get access to as it was around the time of the chip shortage.

So, moral of my story..... if you use a donor nand or nand image make sure you change the board selected in CFE, the same FW is obviously used across a handful of boards.

Anyway glad it's fixed and hope this helps someone. I guess it would apply to other ASUS routers of this generation.
 
Last edited:

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top