What's new

Can't format JFFS partition =(

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

I did implement bad block support a few years ago, but I can't remember if it was only in the mtd-erase code or in the other operations as well.
Only in The Code i think.... Because cant erase or mount our partition
 
I did implement bad block support a few years ago, but I can't remember if it was only in the mtd-erase code or in the other operations as well.

I did a quick look at the code, and it looks like there's a path for the ARM routers where bad blocks aren't taken into account on an erase. I wrote a patch based on @sfx2000 's OpenWRT patch (had to change it a bit, since we have a different mtd) to add a bad block skip. It compiles, but I'm not in a position to test it for a few days. If anyone wants to try it 'untested', post here with your router type and I'll do a build for you.
 
I did a quick look at the code, and it looks like there's a path for the ARM routers where bad blocks aren't taken into account on an erase. I wrote a patch based on @sfx2000 's OpenWRT patch (had to change it a bit, since we have a different mtd) to add a bad block skip. It compiles, but I'm not in a position to test it for a few days. If anyone wants to try it 'untested', post here with your router type and I'll do a build for you.

Can you post the patch to Gist or paste.bin? I can review it.
 
Your mtd_block_is_bad() doesn't seem to handle things entirely the same way as the code I used here:

https://github.com/RMerl/asuswrt-merlin/blob/master/release/src/router/rc/mtd.c#L127

Based on my code, if the ioctl() returns a value < 0, you need to check the secondary error code to determine if the issue is that the NAND device simply doesn't support badblock mapping (in which case you shouldn't treat it as an error and simply report that the block isn't bad), or if it's a genuine error (in which case we could handle it as if it was a bad block I suppose? I'm not 100% sure on that one).

My recommendation would be:

1) check the secondary error code like in my version of the code and deal accordingly
2) also migrate unlock_and_erase() (which contains my code) to share the same mtd_bad_block() function.

We could make the skipbb variable a global, so if for some reason we encounter an HW revision without badblock support, we will disable future checks until the next router reboot.

I'll work on getting a patch including those changes together, then you can have a look at it.
 
Here's my proposed patch, based on both our code:

https://gist.github.com/RMerl/a603628bd2f6d81de5b977b86967d7a5

Changes:

1) replaced my current code with a call to mtd_is_bad_block()
2) made mtd_is_bad_block() check if bb reporting is supported, if not, set a global flag to skip future checks
3) Made the function available for MIPS as well (as it's now used for the RT-AC66U)
4) If an error occurs when requesting for a BB check, silently ignore it

That fourth point, I'm kinda thorn. It's how I used to handle it, as I consider that a BB check shouldn't be a fatal error - the erase attempt afterward should decide whether the action should truly fail or not. Thoughts?
 
That fourth point, I'm kinda thorn. It's how I used to handle it, as I consider that a BB check shouldn't be a fatal error - the erase attempt afterward should decide whether the action should truly fail or not. Thoughts?
I think it really depends on if there are really is any hardware that doesn't support the MEMGETBADBLOCK ioctl. Maybe you have some more info here. If all the hardware is supposed to support it, and it errors on the call, it may be better to just give up then.

Also, if we're going to really dig in to this one.....take a look at the mtd_write function, near the end after the comment 'Do-It'. Looks like it may be exposed as well, but it would require a little more work since you have to adjust the block pointer end on the fly to match the length of what you are writing taking into account any skips.
 
I think it really depends on if there are really is any hardware that doesn't support the MEMGETBADBLOCK ioctl. Maybe you have some more info here. If all the hardware is supposed to support it, and it errors on the call, it may be better to just give up then.

At the rhythm at which Asus issues new hardware revisions, I'd rather be prepared. The original code I based mine on did that check, so I decided to stay safe and also implement it.

Also, I don't think the RT-N66U supports it, while the RT-AC66U and current newer models do. So if we're going to use the same code for all models, we'll need it.

Also, if we're going to really dig in to this one.....take a look at the mtd_write function, near the end after the comment 'Do-It'. Looks like it may be exposed as well, but it would require a little more work since you have to adjust the block pointer end on the fly to match the length of what you are writing taking into account any skips.

Looks like another location that would need checcking indeed.

I'm starting to feel uncomfortable with all that mtd code patching tho, as we don't have any test case to confirm that we're not causing new issues by skipping blocks like this.
 
I'm starting to feel uncomfortable with all that mtd code patching tho, as we don't have any test case to confirm that we're not causing new issues by skipping blocks like this.
Just as an FYI...I was playing a bit with my original patch on my fork....I happen to have a 68R with a couple of bad blocks. Added a couple of status messages.....
Code:
admin@RT-AC68R-BC88:/tmp/home/root# mtd-erase2
usage: mtd-erase2 [device]
admin@RT-AC68R-BC88:/tmp/home/root# mtd-erase2 brcmnand
Erase MTD brcmnand
Skipping bad block at 0x038e0000
Erase MTD brcmnand OK!
 
I'm starting to feel uncomfortable with all that mtd code patching tho, as we don't have any test case to confirm that we're not causing new issues by skipping blocks like this.

In some ways - I would agree - as blocks can and will go randomly bad - that's the nature of NAND - it's one thing to look at mtd in one state - e.g. at boot or when an event might happen, but underneath it all, one could have a block going bad in any place - userland, kernel, whereever...

A patch like this, and in some ways I regret pointing out what OpenWRT has done, does help, but only at one point in time - and the risk here is that even with the patch, there might be code that just sits there and burns holes in flash - and that is the risk with straight NAND vs. eMMC (and external storage) where there is an intermediary layer of HW that works this outside of the OS...
 
A patch like this, and in some ways I regret pointing out what OpenWRT has done, does help, but only at one point in time - and the risk here is that even with the patch, there might be code that just sits there and burns holes in flash - and that is the risk with straight NAND vs. eMMC (and external storage) where there is an intermediary layer of HW that works this outside of the OS...

The patch itself is fine - it relies on ioctl(), asking the NAND driver if the block is mapped as being bad. So the whole mechanism itself is done by the nand driver, not by the code.

The OpenWRT code you pointed at was nothing new. I was using the same code before even OpenWRT did, except I only did so with the RT-AC66U (the first model which came with NAND with a badblock table). John just dug a bit deeper to locate other areas of the mtd code which also needed the same table lookups to be made.
 
Just as an FYI...I was playing a bit with my original patch on my fork....I happen to have a 68R with a couple of bad blocks. Added a couple of status messages.....

What I especially wonder is, when we skip blocks that are bad, does the driver transparently also skip them when reading the data, or are we leaving a gap in there?
 
A patch like this, and in some ways I regret pointing out what OpenWRT has done, does help, but only at one point in time
I was thinking about this as well.....what we are playing with is some 'big' functions (for lack of a better word), that take care of initialization types of events. What I don't know, is how subsequent 'random' updates are handled by the kernel. I would assume these update IOs would have to take into account bad blocks?
 
The patch itself is fine - it relies on ioctl(), asking the NAND driver if the block is mapped as being bad. So the whole mechanism itself is done by the nand driver, not by the code.

The OpenWRT code you pointed at was nothing new. I was using the same code before even OpenWRT did, except I only did so with the RT-AC66U (the first model which came with NAND with a badblock table). John just dug a bit deeper to locate other areas of the mtd code which also needed the same table lookups to be made.

Only reason why I bring this up - we had a android handset a couple of years back, and it did just that, and we found a very similar issue - device acceptance engineers knew something was wrong, vendor in china couldn't find a problem, and it really came down to me and another guy sitting down with a JTAG and some custom code to sort it... odd thing is that the micron flash was good, but the toshiba flash behaved differently - and that was unexpected...

Eric - you bring up a really good point though - across the different platforms, and even within a single model, one might see different vendors in the flash department...

Before rolling this patch in - one will need a few crash tests across different platforms, and even there, note the platform, and the flash type/vendor...
 
we are playing with is some 'big' functions (for lack of a better word), that take care of initialization types of events. What I don't know, is how subsequent 'random' updates are handled by the kernel. I would assume these update IOs would have to take into account bad blocks?

Code here does - and there's a fair amount of trust that things would "just work"... so it might be ok for a single product SKU - same HW, but it would have to be tested across multiple models and NAND vendors...

Or just close eyes and cross fingers - but that rarely ends well :D
 
One option would be to kick this upstream to Asus directly - and let their QA look at things, as they are support builds across their product line, both active and legacy - let them review the changes, as they're closer to the HW, and if it works, then this is good, as it would reduce warranty returns..
 
Well....we have a couple of folks with unusable jffs. Maybe a test is in order (pretty much nothing to lose)
 

Similar threads

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top