My R7800 got a bad bit in one of the NAND flash sectors. It's in the netgear mtd partition, the one that has the UBI volume in it. Because of this, the overlay ubifs filesystem doesn't mount and it's pretty much totally broken. But at least the Voxel firmware lets me log in with telnet so I should be able to fix it.
While ECC is supposed to fix bad bits and UBI is supposed to deal with bad blocks, two details make this not work.
I should be able to fix it by using ubiformat to reformat the partition and then reload the various volumes that were on it. But where to get the original contents of those volumes?
I see references that say to truly factory restore the firmware, one should flash erase that partition. So there must be something to restore the default contents or that would break the device. Note that one shouldn't flash erase it, as that destroys the UBI wear-leveling information. Better to ubiformat which will erase the partition but restore the wear-level information. But restoring the contents is the same issue either way.
There are two reasons this one bit error breaks UBI.
ECC should fix all one bit errors, but this bit in in an erased blocked. Erased blocks are all 1s. That's not valid ECC information. A NAND page can only be written once before it needs to be erased, so it's not possible to write valid ECC data to an erased page as then it couldn't be used for real data, having been written once. So erased pages have no ECC. Which should be fine, as there is no data (erased!) on them anyway to protect. But it does mean that a one bit error on an erased page shows up as a bad page, rather than being automatically fixed.
UBI is supposed to be able to deal with bad blocks. But it does this by checking if there are bad after it erases them. If a freshly erased block isn't bad, it's put in the blank block pool and expected to stay blank. UBI can't cope with a blank block going bad on its own without getting used, which is apparently what happened here.
While ECC is supposed to fix bad bits and UBI is supposed to deal with bad blocks, two details make this not work.
I should be able to fix it by using ubiformat to reformat the partition and then reload the various volumes that were on it. But where to get the original contents of those volumes?
I see references that say to truly factory restore the firmware, one should flash erase that partition. So there must be something to restore the default contents or that would break the device. Note that one shouldn't flash erase it, as that destroys the UBI wear-leveling information. Better to ubiformat which will erase the partition but restore the wear-level information. But restoring the contents is the same issue either way.
There are two reasons this one bit error breaks UBI.
ECC should fix all one bit errors, but this bit in in an erased blocked. Erased blocks are all 1s. That's not valid ECC information. A NAND page can only be written once before it needs to be erased, so it's not possible to write valid ECC data to an erased page as then it couldn't be used for real data, having been written once. So erased pages have no ECC. Which should be fine, as there is no data (erased!) on them anyway to protect. But it does mean that a one bit error on an erased page shows up as a bad page, rather than being automatically fixed.
UBI is supposed to be able to deal with bad blocks. But it does this by checking if there are bad after it erases them. If a freshly erased block isn't bad, it's put in the blank block pool and expected to stay blank. UBI can't cope with a blank block going bad on its own without getting used, which is apparently what happened here.