Can't format JFFS partition =(

Jozek · Nov 29, 2016

RMerlin said:
I did implement bad block support a few years ago, but I can't remember if it was only in the mtd-erase code or in the other operations as well.

Only in The Code i think.... Because cant erase or mount our partition

john9527 · Nov 29, 2016

RMerlin said:
I did implement bad block support a few years ago, but I can't remember if it was only in the mtd-erase code or in the other operations as well.

I did a quick look at the code, and it looks like there's a path for the ARM routers where bad blocks aren't taken into account on an erase. I wrote a patch based on @sfx2000 's OpenWRT patch (had to change it a bit, since we have a different mtd) to add a bad block skip. It compiles, but I'm not in a position to test it for a few days. If anyone wants to try it 'untested', post here with your router type and I'll do a build for you.

RMerlin · Nov 29, 2016

john9527 said:
I did a quick look at the code, and it looks like there's a path for the ARM routers where bad blocks aren't taken into account on an erase. I wrote a patch based on @sfx2000 's OpenWRT patch (had to change it a bit, since we have a different mtd) to add a bad block skip. It compiles, but I'm not in a position to test it for a few days. If anyone wants to try it 'untested', post here with your router type and I'll do a build for you.

Can you post the patch to Gist or paste.bin? I can review it.

RMerlin · Nov 29, 2016

For reference, my original patch was with commit 52ad1c0c333a64c7cd264c998553c2b52be47044 .

john9527 · Nov 29, 2016

RMerlin said:
Can you post the patch to Gist or paste.bin? I can review it.

Thanks for the review.....
https://gist.github.com/john9527/19122380d636b617fe463c8796cae0bc

RMerlin · Nov 30, 2016

Your mtd_block_is_bad() doesn't seem to handle things entirely the same way as the code I used here:

https://github.com/RMerl/asuswrt-merlin/blob/master/release/src/router/rc/mtd.c#L127

Based on my code, if the ioctl() returns a value < 0, you need to check the secondary error code to determine if the issue is that the NAND device simply doesn't support badblock mapping (in which case you shouldn't treat it as an error and simply report that the block isn't bad), or if it's a genuine error (in which case we could handle it as if it was a bad block I suppose? I'm not 100% sure on that one).

My recommendation would be:

1) check the secondary error code like in my version of the code and deal accordingly
2) also migrate unlock_and_erase() (which contains my code) to share the same mtd_bad_block() function.

We could make the skipbb variable a global, so if for some reason we encounter an HW revision without badblock support, we will disable future checks until the next router reboot.

I'll work on getting a patch including those changes together, then you can have a look at it.

RMerlin · Dec 1, 2016

Here's my proposed patch, based on both our code:

https://gist.github.com/RMerl/a603628bd2f6d81de5b977b86967d7a5

Changes:

1) replaced my current code with a call to mtd_is_bad_block()
2) made mtd_is_bad_block() check if bb reporting is supported, if not, set a global flag to skip future checks
3) Made the function available for MIPS as well (as it's now used for the RT-AC66U)
4) If an error occurs when requesting for a BB check, silently ignore it

That fourth point, I'm kinda thorn. It's how I used to handle it, as I consider that a BB check shouldn't be a fatal error - the erase attempt afterward should decide whether the action should truly fail or not. Thoughts?

john9527 · Dec 1, 2016

RMerlin said:
That fourth point, I'm kinda thorn. It's how I used to handle it, as I consider that a BB check shouldn't be a fatal error - the erase attempt afterward should decide whether the action should truly fail or not. Thoughts?

I think it really depends on if there are really is any hardware that doesn't support the MEMGETBADBLOCK ioctl. Maybe you have some more info here. If all the hardware is supposed to support it, and it errors on the call, it may be better to just give up then.

Also, if we're going to really dig in to this one.....take a look at the mtd_write function, near the end after the comment 'Do-It'. Looks like it may be exposed as well, but it would require a little more work since you have to adjust the block pointer end on the fly to match the length of what you are writing taking into account any skips.

RMerlin · Dec 1, 2016

john9527 said:
I think it really depends on if there are really is any hardware that doesn't support the MEMGETBADBLOCK ioctl. Maybe you have some more info here. If all the hardware is supposed to support it, and it errors on the call, it may be better to just give up then.

At the rhythm at which Asus issues new hardware revisions, I'd rather be prepared. The original code I based mine on did that check, so I decided to stay safe and also implement it.

Also, I don't think the RT-N66U supports it, while the RT-AC66U and current newer models do. So if we're going to use the same code for all models, we'll need it.

john9527 said:
Also, if we're going to really dig in to this one.....take a look at the mtd_write function, near the end after the comment 'Do-It'. Looks like it may be exposed as well, but it would require a little more work since you have to adjust the block pointer end on the fly to match the length of what you are writing taking into account any skips.

Looks like another location that would need checcking indeed.

I'm starting to feel uncomfortable with all that mtd code patching tho, as we don't have any test case to confirm that we're not causing new issues by skipping blocks like this.

john9527 · Dec 1, 2016

RMerlin said:
I'm starting to feel uncomfortable with all that mtd code patching tho, as we don't have any test case to confirm that we're not causing new issues by skipping blocks like this.

Just as an FYI...I was playing a bit with my original patch on my fork....I happen to have a 68R with a couple of bad blocks. Added a couple of status messages.....

Code:

admin@RT-AC68R-BC88:/tmp/home/root# mtd-erase2
usage: mtd-erase2 [device]
admin@RT-AC68R-BC88:/tmp/home/root# mtd-erase2 brcmnand
Erase MTD brcmnand
Skipping bad block at 0x038e0000
Erase MTD brcmnand OK!

sfx2000 · Dec 1, 2016

RMerlin said:
I'm starting to feel uncomfortable with all that mtd code patching tho, as we don't have any test case to confirm that we're not causing new issues by skipping blocks like this.

In some ways - I would agree - as blocks can and will go randomly bad - that's the nature of NAND - it's one thing to look at mtd in one state - e.g. at boot or when an event might happen, but underneath it all, one could have a block going bad in any place - userland, kernel, whereever...

A patch like this, and in some ways I regret pointing out what OpenWRT has done, does help, but only at one point in time - and the risk here is that even with the patch, there might be code that just sits there and burns holes in flash - and that is the risk with straight NAND vs. eMMC (and external storage) where there is an intermediary layer of HW that works this outside of the OS...

sfx2000 · Dec 1, 2016

john9527 said:
Just as an FYI...I was playing a bit with my original patch on my fork....I happen to have a 68R with a couple of bad blocks. Added a couple of status messages...

That's encouraging...

RMerlin · Dec 1, 2016

sfx2000 said:
A patch like this, and in some ways I regret pointing out what OpenWRT has done, does help, but only at one point in time - and the risk here is that even with the patch, there might be code that just sits there and burns holes in flash - and that is the risk with straight NAND vs. eMMC (and external storage) where there is an intermediary layer of HW that works this outside of the OS...

The patch itself is fine - it relies on ioctl(), asking the NAND driver if the block is mapped as being bad. So the whole mechanism itself is done by the nand driver, not by the code.

The OpenWRT code you pointed at was nothing new. I was using the same code before even OpenWRT did, except I only did so with the RT-AC66U (the first model which came with NAND with a badblock table). John just dug a bit deeper to locate other areas of the mtd code which also needed the same table lookups to be made.

RMerlin · Dec 1, 2016

john9527 said:
Just as an FYI...I was playing a bit with my original patch on my fork....I happen to have a 68R with a couple of bad blocks. Added a couple of status messages.....

What I especially wonder is, when we skip blocks that are bad, does the driver transparently also skip them when reading the data, or are we leaving a gap in there?

john9527 · Dec 1, 2016

sfx2000 said:
A patch like this, and in some ways I regret pointing out what OpenWRT has done, does help, but only at one point in time

I was thinking about this as well.....what we are playing with is some 'big' functions (for lack of a better word), that take care of initialization types of events. What I don't know, is how subsequent 'random' updates are handled by the kernel. I would assume these update IOs would have to take into account bad blocks?

sfx2000 · Dec 1, 2016

RMerlin said:
The patch itself is fine - it relies on ioctl(), asking the NAND driver if the block is mapped as being bad. So the whole mechanism itself is done by the nand driver, not by the code.

The OpenWRT code you pointed at was nothing new. I was using the same code before even OpenWRT did, except I only did so with the RT-AC66U (the first model which came with NAND with a badblock table). John just dug a bit deeper to locate other areas of the mtd code which also needed the same table lookups to be made.

Only reason why I bring this up - we had a android handset a couple of years back, and it did just that, and we found a very similar issue - device acceptance engineers knew something was wrong, vendor in china couldn't find a problem, and it really came down to me and another guy sitting down with a JTAG and some custom code to sort it... odd thing is that the micron flash was good, but the toshiba flash behaved differently - and that was unexpected...

Eric - you bring up a really good point though - across the different platforms, and even within a single model, one might see different vendors in the flash department...

Before rolling this patch in - one will need a few crash tests across different platforms, and even there, note the platform, and the flash type/vendor...

sfx2000 · Dec 1, 2016

john9527 said:
we are playing with is some 'big' functions (for lack of a better word), that take care of initialization types of events. What I don't know, is how subsequent 'random' updates are handled by the kernel. I would assume these update IOs would have to take into account bad blocks?

Code here does - and there's a fair amount of trust that things would "just work"... so it might be ok for a single product SKU - same HW, but it would have to be tested across multiple models and NAND vendors...

Or just close eyes and cross fingers - but that rarely ends well

sfx2000 · Dec 1, 2016

One option would be to kick this upstream to Asus directly - and let their QA look at things, as they are support builds across their product line, both active and legacy - let them review the changes, as they're closer to the HW, and if it works, then this is good, as it would reduce warranty returns..

john9527 · Dec 1, 2016

Well....we have a couple of folks with unusable jffs. Maybe a test is in order (pretty much nothing to lose)

sfx2000 · Dec 1, 2016

john9527 said:
Well....we have a couple of folks with unusable jffs. Maybe a test is in order (pretty much nothing to lose)

Yep...

Thread starter	Title	Forum	Replies	Date
	How to restore /jffs/zoneinfo?	Asuswrt-Merlin	10	Jun 18, 2025
	Solved Best practices for sorting various add-on data on, USB or JFFS?	Asuswrt-Merlin	7	Feb 12, 2025
G	Solved: RT-AX86U Required /jffs Restore	Asuswrt-Merlin	2	Jan 1, 2025
S	prevent a jffs script to be run twice in parallel	Asuswrt-Merlin	1	Aug 15, 2024
D	AX 56U jffs error	Asuswrt-Merlin	1	Aug 11, 2024
F	Factory and Hard Reset - Low on Free JFFS Storage - GT-AX6000	Asuswrt-Merlin	9	Jul 30, 2024
K	RT-AX56U – No WiFi, missing/corrupt “factory” partition (need help to restore mtd10)	Asuswrt-Merlin	2	Yesterday at 7:50 AM

Can't format JFFS partition =(

Occasional Visitor

Part of the Furniture

Asuswrt-Merlin dev

Asuswrt-Merlin dev

Part of the Furniture

Asuswrt-Merlin dev

Asuswrt-Merlin dev

Part of the Furniture

Asuswrt-Merlin dev

Part of the Furniture

Part of the Furniture

Part of the Furniture

Asuswrt-Merlin dev

Asuswrt-Merlin dev

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest