bengalih
Senior Member
I run entware off my USB with several applications on my RT-AC68U.
In the past 5-6 years I have had several failures of my USB drives. On one or two occurrences, these were due to disk corruption likely due to too many hard power offs and no fsck running, etc. In these cases I re-formatted/partitioned/built the drive and everything was fine.
Two of these occurrences the drives actually failed completely. I moved them over to my PC to test and they were inaccessible and needed to be replaced.
I am however now having a strange issue and don't know if it is the USB drive of perhaps something worse.
Essentially, over the past 10 days I have had 3-4 failures of the drive. My entware install (/opt) goes offline and the mount in inaccessible.
The first time it happened, I wasn't sure what went on, and due to speed I simply rebooted the router. It came back up and everything worked fine for about 24 hours.
The next time it happened all I had to do was unplug the USB drive and plug it back in. It remounted automatically and everything was running again...for about 24 hours.
After the 4th time I put the drive into my PC (using WSL on Windows) and checked the drive, it appeared to have some errors in e2fsck, but I had some initial issues mounting it and not sure if I caused these.
I ran some USB drive testing tools in Windows to test the drive - basically writing/verifying to the entire drive multiple times and didn't see a single error.
I went ahead and rebuilt the drive entirely. I restored a tar of my backed up configuration (similar process I have done in the past when I had complete drive failure).
The rebuilt drive ran fine for about 3 days and then fell offline again.
I don't have the entire syslog as it got overrun with other errors due to the drive being offline, but here is bulk of it:
It certainly looks like a failure/corruption of the EXT4 file system, and it could be the drive itself is failing and causing this corruption.
My concern is that there isn't something more insidious, like the USB port on the router being flaky, etc.
This was a 3.0 drive, and I have another 2.0 drive in there at the moment with another copy of the system. I'm hoping that after a week or so of running fine on that I can concede that the other drive is just bad.
However, I'm a little concerned because all testing on my Windows system reports no issues with this drive.
Has anyone seen anything like this before?
In the past 5-6 years I have had several failures of my USB drives. On one or two occurrences, these were due to disk corruption likely due to too many hard power offs and no fsck running, etc. In these cases I re-formatted/partitioned/built the drive and everything was fine.
Two of these occurrences the drives actually failed completely. I moved them over to my PC to test and they were inaccessible and needed to be replaced.
I am however now having a strange issue and don't know if it is the USB drive of perhaps something worse.
Essentially, over the past 10 days I have had 3-4 failures of the drive. My entware install (/opt) goes offline and the mount in inaccessible.
The first time it happened, I wasn't sure what went on, and due to speed I simply rebooted the router. It came back up and everything worked fine for about 24 hours.
The next time it happened all I had to do was unplug the USB drive and plug it back in. It remounted automatically and everything was running again...for about 24 hours.
After the 4th time I put the drive into my PC (using WSL on Windows) and checked the drive, it appeared to have some errors in e2fsck, but I had some initial issues mounting it and not sure if I caused these.
I ran some USB drive testing tools in Windows to test the drive - basically writing/verifying to the entire drive multiple times and didn't see a single error.
I went ahead and rebuilt the drive entirely. I restored a tar of my backed up configuration (similar process I have done in the past when I had complete drive failure).
The rebuilt drive ran fine for about 3 days and then fell offline again.
I don't have the entire syslog as it got overrun with other errors due to the drive being offline, but here is bulk of it:
Code:
Apr 3 00:24:49 kernel: usb 2-1: device descriptor read/64, error -71
Apr 3 00:24:50 kernel: usb 2-1: device not accepting address 5, error -71
Apr 3 00:24:51 kernel: usb 2-1: device not accepting address 6, error -71
Apr 3 00:25:12 kernel: sd 0:0:0:0: Device offlined - not ready after error recovery
Apr 3 00:25:12 kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr 3 00:25:12 kernel: end_request: I/O error, dev sda, sector 4989176
Apr 3 00:25:12 kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr 3 00:25:12 kernel: end_request: I/O error, dev sda, sector 4989200
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145:
Apr 3 00:25:12 kernel: JBD2: I/O error detected when updating journal superblock for sda1-8.
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detectedcomm conn_diag:
Apr 3 00:25:12 kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Apr 3 00:25:12 kernel: reading directory lblock 0previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_journal_start_sb:252: Detected aborted journal
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm udhcpc: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm watchdog: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_journal_start_sb:252: Detected aborted journal
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #131073: comm nginx: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm sed: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm amas_lib: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm preinit: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm cp: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm cp: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm touch: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm grep: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm sh: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm dhcpc_lease: reading directory lblock 0
Apr 3 00:25:12 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:12 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262145: comm networkmap: reading directory lblock 0
Apr 3 00:25:12 kernel: usb 2-1: device descriptor read/64, error -71
Apr 3 00:25:13 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:13 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm chmod: reading directory lblock 0
Apr 3 00:25:13 kernel: usb 2-1: device descriptor read/64, error -71
Apr 3 00:25:13 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:13 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #2: comm [: reading directory lblock 0
...
Apr 3 00:25:19 kernel: usb 2-1: device not accepting address 17, error -71
Apr 3 00:25:20 kernel: usb 3-1: device descriptor read/64, error -62
...
Apr 3 00:25:27 kernel: EXT4-fs (sda1): previous I/O error to superblock detected
Apr 3 00:25:27 kernel: EXT4-fs error (device sda1): ext4_find_entry:921: inode #262251: comm preinit: reading directory lblock 0
Apr 3 00:25:27 ovpn-server1[2457]: Options error: --dh fails with 'dh.pem': No such file or directory (errno=2)
...
It certainly looks like a failure/corruption of the EXT4 file system, and it could be the drive itself is failing and causing this corruption.
My concern is that there isn't something more insidious, like the USB port on the router being flaky, etc.
This was a 3.0 drive, and I have another 2.0 drive in there at the moment with another copy of the system. I'm hoping that after a week or so of running fine on that I can concede that the other drive is just bad.
However, I'm a little concerned because all testing on my Windows system reports no issues with this drive.
Has anyone seen anything like this before?