What's new

USB drive failure...or something worse?

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Normally yes, but for example, I have an old laptop that the SSD finally hit its limit and died. The laptop was not worth putting more than $10 to $20 in, and I got an 850 EVO 250G off ebay for like $9 shipped. It had 7 or 8 TBW out of like 150 it is rated for. For something like this, that's a perfect use case for a used but not abused SSD. The laptop is still useful to me as it has a serial port and I use it for hooking up to cars and also in my lab for router console connections, so didn't want to trash it, but also didn't want to spend $50 on a new huge drive that would be of no benefit over an older one. I think the OP's use case fits that too, why get a brand new super fast and big SSD when you need a tiny, slow, older one (since the router can't keep up with even several year old SSDs anyway).

Or to put it another way, if cost is a factor and it comes down to spending $20 on a new USB drive or $20 on a used SSD and enclosure, I'd go for option B.
You are braver than me... I wouldn't trust getting a working used SSD off ebay that has any life left in it... ;)
 
You are braver than me... I wouldn't trust getting a working used SSD off ebay that has any life left in it... ;)

Most of them are pulled from work laptops by recyclers, or sometimes are brand new pulls from companies that put in larger ones as soon as they get them (since you can buy the larger ones much cheaper separately). I've done it with the 2.5" SSD above and even got 2 new pull NVMe SSDs. All were exactly as described, the 850 evo they had like 100 available and was said to have at least 90% remaining (was more like 95% on the one I got) and the NVME both had less than 100G (which was initial testing and factory installed software image). Even got the OPAL SED version since that's the one Lenovo uses and that's what they came out of.

The NVMe were the OEM version of the 970 evo plus for a fraction of the price, I think I paid $30 for the 512GB (the OEM is 512 instead of 500 but same thing, just less overprovisioning forced in) over a year ago.

You can tell from the listing which are some gamer who may have used up half its life, and which are recyclers which are usually very gently used work laptops or new pulls.

Ebay has all kinds of guarantees anyway and nearly always rule in favor of the buyer (one of the reasons I stopped selling on ebay a few years back, since it was so easy to get scammed or have them send back their defective part, etc).
 
Most of them are pulled from work laptops by recyclers, or sometimes are brand new pulls from companies that put in larger ones as soon as they get them (since you can buy the larger ones much cheaper separately). I've done it with the 2.5" SSD above and even got 2 new pull NVMe SSDs. All were exactly as described, the 850 evo they had like 100 available and was said to have at least 90% remaining (was more like 95% on the one I got) and the NVME both had less than 100G (which was initial testing and factory installed software image). Even got the OPAL SED version since that's the one Lenovo uses and that's what they came out of.

The NVMe were the OEM version of the 970 evo plus for a fraction of the price, I think I paid $30 for the 512GB (the OEM is 512 instead of 500 but same thing, just less overprovisioning forced in) over a year ago.

You can tell from the listing which are some gamer who may have used up half its life, and which are recyclers which are usually very gently used work laptops or new pulls.

Ebay has all kinds of guarantees anyway and nearly always rule in favor of the buyer (one of the reasons I stopped selling on ebay a few years back, since it was so easy to get scammed or have them send back their defective part, etc).
Sorry... but I just love my new see-thru enclosure with pretty blue blinky light... ;)

AMWts8DUtKd6Ei9yom27rgVZfEzUK8VaLXvdV04EJYwJYW4az6a-6BHpBkyoxX7f_9ClcB1z7YtR8Icb0mWeJNyncykbF6hKkP4HZSCHC4VZmCco8bn8EgN-aQIg_JNTaDKJPEw7vvNy_Bp6LavvVFj-T4zQ4Pa1iv43jcoDxDKFmitXXa2fQttSgdhIDnoacsEh_h54LmocBj-HMCv94ccYdN_4cdo3M9bK2GzsnfP-bJJw7NgRAgBFO0URYnZb0CA4Tz686OA9YYLXzBzl3_y0_FI_0b6KmidpsLvH5bkNSoDPeTQzQ-7hX0H0Wt8802-SMxzsrC1PCPJJOrr4bk88TpAp1s6nnZCDBQ-P3bc71LCh7khPTttj7wnELhpMwSxGKjAqYX4FlOtdOSAyM3Gz5Iri2ajJ75EVatlJA7jQllzLFK80ssT8wHQPz8Vyu_L-TMNQuiYmRKuho2Ke-DJ0cDpLlY6EWKMEDWb0nrn1lRBB4ZVnpIHp5HgO4UVk7VQAhT7KTbWNO8ZMM6Vzzd7ddrExEUHd4_prDKCvnG08x3HCTcx44egePxmn3Pn-k_kkYYGKtuWkzOOUwGatErjhAEyADGF9WOVpYW4aUygoKONeiXEXdQZSrhXEvwFHtFJfVab_tlbvEIPDI7Lgz9jE-tkDpyZ9zfEfTYenBF2E63G7Ssp_LMHFS1GiVmhyP95_bwC_iRO6k7XViZUTf3mm7EpEvXI-mINIFRMymPHwxhG7HyDqzJM4oejc2S2K2IGts7BHahi42SLGYLkL-69yGG5bNPasD7DlKNGzmO8PCCLTqQus_XR3N0iL4JFjULFFQ-EJ25Zkmxwq6y1OzGLel5uRrVM5R7ouZsavOIlQ-x21HYaXNOTIIdKV8UPOKHq-fEgXo8tiIXeZKvJ441CbBu1HELk=w995-h749-s-no
 
Most of them are pulled from work laptops by recyclers, or sometimes are brand new pulls from companies that put in larger ones as soon as they get them (since you can buy the larger ones much cheaper separately).

I don't know about that. Must be coming from companies with lower security standards. No drive comes out of my warehouse in one piece. I had to purchase new machines when SSDs become common. They make them pretty like this:

1680574977956.png


love my new see-thru enclosure

You can detect fire starting a bit earlier. :)
 
I don't know about that. Must be coming from companies with lower security standards. No drive comes out of my warehouse in one piece. I had to purchase new machines when SSDs become common. They make them pretty like this:

View attachment 49046



You can detect fire starting a bit earlier. :)

Server drives still go through the shredder but many companies surprisingly just return their leased laptops, sometimes without even wiping them. I guess nowadays with most companies using bitlocker they are less concerned, but even before that there were plenty of recyclers selling HDs. I've even gotten ones that were not wiped or encrypted.

Then you have the ultra paranoid financial firms I work with that either will not return your hardware (and will pay for it), or require you to wipe it while they watch, or will shred the compactflash card. I mean, its a router with fairly limited storage space, not a whole lot we could be storing on it. Though there is a ton of sensitive info flowing through it, and they do have capture ability now, so I guess I can see their point.
 
Destroy before it leaves the building and upload pictures with serial numbers. One of the services we provide.

Luckily most don't require it as everything they're doing is encrypted anyway. And most of the ones that do, allow us to do it remotely while they watch from the console, or send screen shots, showing everything except the IOS is deleted. The ones that want it destroyed, we say go for it, here's your bill, good luck fitting it through your HD shredder.
 
Just an update....

Nothing definitive yet, but I have had one of those Kingston Data Traveler metallic 2.0 drives in for a couple days now and so far no issue with that or the UPS seemingly falling off the USB bus. I think I need to let it run at least through the weekend before I believe that it is stable. I have had one of these Data Traveler drives totally die on the router before, but it probably lasted a couple of years.

Meanwhile, I have thrown all sorts of testing on the misbehaving drive hooked up to my Windows machine. Used several flash testing / stress test programs which continually wrote to the whole drive for hours on end and not a single indicator that anything is wrong with the drive. While not impossible, it seems improbably that this drive would behave perfectly fine in one machine, but misbehave in another after it has had a 10-12 month track record of working fine.

Therefore I'm still not convinced that something isn't wrong with the USB interface in the router.

I have found one other USB 3.0 drive in my collection, and if the current configuration runs without issue through the weekend I will then probably replace it with the other 3.0 drive and let that run for about a week. If that seems stable I will probably put back the original drive and, assuming it fails again within a few days, I think we can safely say that it does seem to be related just to the drive.

If I can validate that it is only the drive, and the other 3.0 drive works successfully I will consider simply moving to an SSD to hopefully prevent a drive failure from happening every 2 years.
 
Your method outlined above seems reasonable.

But, for a few dollars, and many hours of saved time, I would just get the SSD+enclosure today and write off USB 'keys' after today (at least for use with Asus RMerlin-powered routers used for amtm + scripts).

I would also consider upgrading your old router to an AX class model too (you're sure you don't have any NVRAM issues with it?).
 
Your method outlined above seems reasonable.
But, for a few dollars, and many hours of saved time, I would just get the SSD+enclosure today and write off USB 'keys' after today (at least for use with Asus RMerlin-powered routers used for amtm + scripts).
I think that's the plan, but only if I can verify it is indeed the drives and not the USB interface (or something else) on the router.
If I need to replace the router, I will likely by the equivalent or perhaps decide to upgrade to a slightly newer model in which case I will probably opt for the SSD as well. For posterity I'd still like to determine what the root cause here is rather than just tossing things out and doing a complete replace.
I would also consider upgrading your old router to an AX class model too (you're sure you don't have any NVRAM issues with it?).
No, at this point I can't say definitively what is wrong. What specific issues might I have with NVRAM? How would they present themselves?
I don't seem to be loosing any configuration for the router (i.e. the output of any 'nvram get' command), and all the issues I have seen on this particular problem were solved with either a reinsert of the USB or a reboot, so I don't think NVRAM is the problem, but if you think it could be related I'm interested to hear how.

I'm not really interested in upgrading the device just to upgrade. I realize this model is coming up on 10 years, but TBH it meets my needs right now and if it is not defective, I see no reason to replace it.
 
Last edited:
I'm talking specifically about the low NVRAM issue with the latest RMerlin firmware which pushes the features and hardware of the router to their limits (and over).

 
Just an update....

Nothing definitive yet, but I have had one of those Kingston Data Traveler metallic 2.0 drives in for a couple days now and so far no issue with that or the UPS seemingly falling off the USB bus. I think I need to let it run at least through the weekend before I believe that it is stable. I have had one of these Data Traveler drives totally die on the router before, but it probably lasted a couple of years.

Meanwhile, I have thrown all sorts of testing on the misbehaving drive hooked up to my Windows machine. Used several flash testing / stress test programs which continually wrote to the whole drive for hours on end and not a single indicator that anything is wrong with the drive. While not impossible, it seems improbably that this drive would behave perfectly fine in one machine, but misbehave in another after it has had a 10-12 month track record of working fine.

Therefore I'm still not convinced that something isn't wrong with the USB interface in the router.

I have found one other USB 3.0 drive in my collection, and if the current configuration runs without issue through the weekend I will then probably replace it with the other 3.0 drive and let that run for about a week. If that seems stable I will probably put back the original drive and, assuming it fails again within a few days, I think we can safely say that it does seem to be related just to the drive.

If I can validate that it is only the drive, and the other 3.0 drive works successfully I will consider simply moving to an SSD to hopefully prevent a drive failure from happening every 2 years.

Bear in mind that during your stress testing, Windows may be ignoring a bad sector (handling it better than the router), it may even be able to permanently mark that area as bad. SSDs do this automatically using the built in controller but flash drives, as far as I know, do not. But once you start getting bad sectors/cells on a flash drive, it means it is toast. If windows (or whatever you're using to test it) did manage to mark the sector as unusable it is possible that it will work again in the router, at least for a while.

2 years is very good for a thumb drive that is subject to high usage.
 
2 years is very good for a thumb drive that is subject to high usage.

Indeed - and as I mentioned - these drives are so low cost (just picked up a 128GB SanDisk for 14 bucks over a BestBuy), if one fails, try to erase it, and drop it in the e-waste bin...

It's just not worth the trouble to try to recondition a thumb drive that has failed - if replacement is $14, what is your time worth?
 
It's just not worth the trouble to try to recondition a thumb drive that has failed - if replacement is $14, what is your time worth?
^ This -- If anything starts going wrong with a flashdrive, it's no longer worth reconditioning... that's just the first domino that fell... it's going to just keep happening again and again. As soon as I experience any flashdrive failure, it's considered dead to me.
 
I'm talking specifically about the low NVRAM issue with the latest RMerlin firmware which pushes the features and hardware of the router to their limits (and over).

Thanks, I had never been aware of this issue.

Currently I show:
Code:
size: 62133 bytes (3403 left)

I ran the command to see my largest variables:

Code:
1345 custom_clientlist
931 nc_setting_conf
761 sshd_authkeys
549 rc_support
375 wl1_chansps
362 vts_rulelist
...

It appears my model (RT-AC68) doesn't have anything in the /jffs/nvram directory, so I am unable to move those top 2 offenders off to there. Additionally it doesn't appear that any of the cleanup options is really recommended because while you can clear them after a reboot to free up NVRAM, a warning from RMerlin stated this could cause a problem if you then add too much back in and then reboot. I can probably move the vts_rulelist out of the GUI and just create IP Table rules, but for the 362 bytes it doesn't seem worth it.

In any event, while the number is a bit high, there is still enough left that it doesn't look like I am running into that issue. I still have 3.4 bytes left which is more than twice my current largest variable. But, I'm glad you brought this to my attention and will keep my eye on it.

I wonder if the upgrade from 386_3.2 to the latest version will exacerbate the issue. In any event, I currently don't see a reason to update my FW at all as I don't seem to have any issues. Of course I will have to eventually, but every year this configuration works for me is one year less I need to pay for an upgrade and another year (or even just 6 months) of allowing prices to fall, etc.
 
Bear in mind that during your stress testing, Windows may be ignoring a bad sector (handling it better than the router), it may even be able to permanently mark that area as bad. SSDs do this automatically using the built in controller but flash drives, as far as I know, do not. But once you start getting bad sectors/cells on a flash drive, it means it is toast. If windows (or whatever you're using to test it) did manage to mark the sector as unusable it is possible that it will work again in the router, at least for a while.

2 years is very good for a thumb drive that is subject to high usage.
Indeed - and as I mentioned - these drives are so low cost (just picked up a 128GB SanDisk for 14 bucks over a BestBuy), if one fails, try to erase it, and drop it in the e-waste bin...

It's just not worth the trouble to try to recondition a thumb drive that has failed - if replacement is $14, what is your time worth?
^ This -- If anything starts going wrong with a flashdrive, it's no longer worth reconditioning... that's just the first domino that fell... it's going to just keep happening again and again. As soon as I experience any flashdrive failure, it's considered dead to me.

I've run all sorts of tools on the drive, including those that look for bad sectors. I've also run some linux ones on it through WSL. There is no indication that the drive is bad or that you would be able to distinguish it from any other drive you just unwrapped.

While I agree that if you can validate that a drive failed, even once, it is a good idea to replace that drive it is just as bad an idea to assume that a failure (especially how it is exhibiting in my explanation) is the fault of the drive. There is a good probability (strictly speaking odds) that the drive is at fault, but based on the symptoms it could be just as likely a problem with the USB port or bus on the router.

I think blaming the drive as bad is extremely poor troubleshooting in this case given what the logs show (both on the router at fail time and the lack of any error logs when testing). It isn't that I am trying to save $15 (technically I only need a 4GB drive, so really it's probably less than $10) because I don't want to replace the drive. It's that I don't want to blame a drive when the evidence isn't clear that the drive is at fault.

Until I can determine to a reasonable degree of certainty that the router itself isn't damaged, discussing replacing the drive with whatever is putting the cart before the horse.
 
Until I can determine to a reasonable degree of certainty that the router itself isn't damaged, discussing replacing the drive with whatever is putting the cart before the horse.
If/when you put the drive back in the router (and it fails) look for the initial error messages that we discussed earlier.
 
If/when you put the drive back in the router (and it fails) look for the initial error messages that we discussed earlier.
Yes, that is a key point and I have some additional logging going on in hope to catch it.

At this point I'm not actually sure I'm going to wait through the weekend testing this 2.0 drive, I think I'm going to put my other 3.0 in tomorrow and try to kill 2 birds with one stone. If I get a failure on that, I'll either try it in 2.0 mode or put the 2.0 drive back in, but I think testing with another 3.0 drive first will save time if it doesn't exhibit failure within a several day time frame.
 

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top