What's new

R9000 running Voxel's 1.0.4.42HF rebooting randomly.

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

These logs are saved before rebooting (actually saved continuously as soon as changed):
/proc/kmsg (dmesg)
/var/log/messages
/var/log/log-message

Are saved to your USB-device in a directory: /system_logs
If no USB found, to router internal flash: /opt/kamoj/logs

See release_notes.txt for which add-on version this option was added too. (Not long time ago).
 
Last edited:
@Voxel and @kamoj, would be this something I need to be worried about?

upload_2020-6-30_9-36-3.png
 
These logs are saved before rebooting (actually saved continuously as soon as changed):
/proc/kmsg (dmesg)
/var/log/messages
/var/log/log-message

Are saved to your USB-device in a directory: /system_logs
If no USB found, to router internal flash: /opt/kamoj/logs

See release_notes.txt for which add-on version this option was added too. (Not long time ago).

I did install the 5.3b15 version and enabled this feature but can't find /system_logs on any of my USB drives nor in /opt/kamoj either, is there a way to check if your script failed to enabled this feature somehow?
 
The function will try to write to the first available partition on the first found drive (or with label "optware"). Maybe you can not see it?
(Many USB disks are formatted with several partitions)
If you have route problems try to strip down the functionality as much as possible, while fault finding.
Mount a flash-drive or no USB at all to test it, please.
(There is also an early boot log you can enable if crash happens before router has finished the booting).
I did install the 5.3b15 version and enabled this feature but can't find /system_logs on any of my USB drives nor in /opt/kamoj either, is there a way to check if your script failed to enabled this feature somehow?
 
The function will try to write to the first available partition on the first found drive (or with label "optware"). Maybe you can not see it?
(Many USB disks are formatted with several partitions)
If you have route problems try to strip down the functionality as much as possible, while fault finding.
Mount a flash-drive or no USB at all to test it, please.
(There is also an early boot log you can enable if crash happens before router has finished the booting).

I do have an External HDD attached to USB1 and a flash-drive in USB2, the USB2 is labeled optware but none of them have the /system_logs folder, I have checked the USB1 EFI partition in my HDD and nothing there either, can't find a log anywhere in your add-on files where I can track down the issue, would be great if you add something like this in Beta so we can help you with debugging :)
 
The function will try to write to the first available partition on the first found drive (or with label "optware"). Maybe you can not see it?
(Many USB disks are formatted with several partitions)
If you have route problems try to strip down the functionality as much as possible, while fault finding.
Mount a flash-drive or no USB at all to test it, please.
(There is also an early boot log you can enable if crash happens before router has finished the booting).
The function will try to write to the first available partition on the first found drive (or with label "optware"). Maybe you can not see it?
(Many USB disks are formatted with several partitions)
If you have route problems try to strip down the functionality as much as possible, while fault finding.
Mount a flash-drive or no USB at all to test it, please.
(There is also an early boot log you can enable if crash happens before router has finished the booting).

Is this related to the EEPROM memory onboard?
 
Hello,

Perhaps this has been resolved, but I have been running 1.0.4.42HF since it was released. I did have an issue with the initial install but cant remember what it was. I did a factory reset and all went well the second time. I use QOS but not the 60 GHz radio. I also run the Kamoj add-on with OpenVPN and DNSCrypt-2 active. The router hasn't had I hiccup until today.

Today I installed the 5.3b15 version of Kamoj and activated Adguard. Its only been a few hours but the cpu temp did hit 79c and then the router rebooted. I seem to recall 75c being the previous high cpu temp.
 
After reboot you can see the highest temperature in Router Information: CPU Temperature: Top ever.
It's not recommended to run hotter than 75 deg C / 167 deg F.
Hello,

Perhaps this has been resolved, but I have been running 1.0.4.42HF since it was released. I did have an issue with the initial install but cant remember what it was. I did a factory reset and all went well the second time. I use QOS but not the 60 GHz radio. I also run the Kamoj add-on with OpenVPN and DNSCrypt-2 active. The router hasn't had I hiccup until today.

Today I installed the 5.3b15 version of Kamoj and activated Adguard. Its only been a few hours but the cpu temp did hit 79c and then the router rebooted. I seem to recall 75c being the previous high cpu temp.
 
Yes, this is beta, so good you report it thank you.
I don't own an R9000, and don't know how the USB detection is working in that router.
But if not found, the files should be written to internal flash... So please remove all USB devices and try again.
(It's working perfect on the R7800)
To debug, please run this:
Code:
touch /var/logs/kamoj.log
#In the web-browser: Check the box: Settings: Save the system logs to USB. (or /opt/kamoj/logs if no USB detected)
nvram get kamoj_save_system_logs
cat /var/logs/kamoj.log
ps w|grep -v grep|grep -E "$(cat /var/run/kamoj_dmesg_log.pid)|$(cat /var/run/kamoj_messages_log.pid)|$(cat /var/run/kamoj_log-message_log.pid)"
cat /var/logs/kamoj.log
\rm /var/logs/kamoj.log
I do have an External HDD attached to USB1 and a flash-drive in USB2, the USB2 is labeled optware but none of them have the /system_logs folder, I have checked the USB1 EFI partition in my HDD and nothing there either, can't find a log anywhere in your add-on files where I can track down the issue, would be great if you add something like this in Beta so we can help you with debugging :)
 
Yes, that means that this error did not exist at "factory". But I don't know how many "worn blocks" that are really bad.
The flash memories are constructed that way, and there are mechanisms to handle the wear.
I wouldn't worry myself for only one error.
My bad I mean if the worn bad:1 means an issue with my flash memory buiilt-in my R9000
 
Yes, this is beta, so good you report it thank you.
I don't own an R9000, and don't know how the USB detection is working in that router.
But if not found, the files should be written to internal flash... So please remove all USB devices and try again.
(It's working perfect on the R7800)
To debug, please run this:
Code:
touch /var/logs/kamoj.log
#In the web-browser: Check the box: Settings: Save the system logs to USB. (or /opt/kamoj/logs if no USB detected)
nvram get kamoj_save_system_logs
cat /var/logs/kamoj.log
ps w|grep -v grep|grep -E "$(cat /var/run/kamoj_dmesg_log.pid)|$(cat /var/run/kamoj_messages_log.pid)|$(cat /var/run/kamoj_log-message_log.pid)"
cat /var/logs/kamoj.log
\rm /var/logs/kamoj.log
@kamoj

I just executed all the commands and found out logs are being saved at /mnt/sda1

upload_2020-6-30_16-5-11.png



but the weird thing is my disks are not mounted as sda1, when I execute fdisk I get this. It seems sda1 is my Flashdrive (which is labeled optware) but not idea why is not mounted sda1, not even my HDD is mounted sdb1 either.

upload_2020-6-30_16-6-3.png


upload_2020-6-30_16-7-5.png


At least I know is not your code :p and is something else :)
 
Good that the USB-detection actually works on R9000, but ...

No good. :(:eek:
Seems like the dmesg log got your culprit.
Some problem with bad blocks in the UBI area.
When I look closer at your prev picture (https://www.snbforums.com/threads/r...f-rebooting-randomly.64887/page-2#post-598664).
It shows some UBI error as well, doesn't it? It's a bit bad formatted, sorry I never had an R9000 to test it with.
What is UBI? I do see 1 bad block out of 37 allowed, I flashed the stock FW when it died using TFTP then flashed Voxel's but never had to format the flash.

upload_2020-6-30_16-37-29.png
 
Last edited:
Don't erase the flash. That would reset the counter and cause usage of the bad block again.
You can see UBI as a protective logical layer smoothing over the HW errors.

I created this part of the add-on to help people with bad flash memories.

I'm no way an expert in the area, so I advice you to google and ask around.

Here is some basic info though:

MTD (Memory Technology Devices) are NAND/NOR-based flash memory chips
used for storing non-volatile data like boot images and configurations.

MTD devices are for data storage, they differ from hard disks and RAM
in several aspects.

The biggest difference is that while hard disk sectors are rewritable,
MTD device sectors must be erased before rewriting —
which is why they are more commonly called erase-blocks.

Second, hard disk sectors can be rewritten several times without wearing out the hardware,
but MTD device sectors have a limited life and are not usable after about
1000-100,000 erase operations. (1000 for MLC NAND - 100000 for NOR, SLC NAND )
The worn out erase-blocks are called bad blocks and the software must
take care not to use such blocks.

There is an extremely simple FTL layer in Linux MTD subsystem - mtdblock.
It emulates block devices over MTD devices

UBI is a volume management system for raw flash devices which manages multiple
logical volumes on a single physical flash device and spreads the I/O load
(i.e, wear-leveling) across whole flash chip.

https://bootlin.com/blog/managing-flash-storage-with-linux/
http://www.linux-mtd.infradead.org/faq/general.html
http://www.linux-mtd.infradead.org/doc/ubi.html#L_usptools
https://opensourceforu.com/2012/01/working-with-mtd-devices/
https://openwrt.org/docs/techref/flash.layout


What is UBI? I do see 1 bad block out of 37 allowed, I flashed the stock FW when it died using TFTP then flashed Voxel's but never had to format the flash.

View attachment 24409
 
Last edited:
Don't erase the flash. That would reset the counter and cause uaage of the bad block again.
You can see UBI as a protective logical layer smoothing over the HW errors.

I created this part of the add-on to help people with bad flash memories.

I'm no way an expert in the area, so I advice you to google and ask around.

Here is some basic info though:

MTD (Memory Technology Devices) are NAND/NOR-based flash memory chips
used for storing non-volatile data like boot images and configurations.

MTD devices are for data storage, they differ from hard disks and RAM
in several aspects.

The biggest difference is that while hard disk sectors are rewritable,
MTD device sectors must be erased before rewriting —
which is why they are more commonly called erase-blocks.

Second, hard disk sectors can be rewritten several times without wearing out the hardware,
but MTD device sectors have a limited life and are not usable after about
1000-100,000 erase operations. (1000 for MLC NAND - 100000 for NOR, SLC NAND )
The worn out erase-blocks are called bad blocks and the software must
take care not to use such blocks.

There is an extremely simple FTL layer in Linux MTD subsystem - mtdblock.
It emulates block devices over MTD devices

UBI is a volume management system for raw flash devices which manages multiple
logical volumes on a single physical flash device and spreads the I/O load
(i.e, wear-leveling) across whole flash chip.

https://bootlin.com/blog/managing-flash-storage-with-linux/
http://www.linux-mtd.infradead.org/faq/general.html
http://www.linux-mtd.infradead.org/doc/ubi.html#L_usptools
https://opensourceforu.com/2012/01/working-with-mtd-devices/
https://openwrt.org/docs/techref/flash.layout



@kamoj thanks for this incredible info, if I understood correctly UBIFS should be able to "autocorrect" the issue moving the data out of the bad block to a good one and mark down the bad block, thing is I was still seeing ECC errors in the logs, like UBIFS wasn't doing it.

upload_2020-7-1_9-16-2.png


After checking with ubinfo command I've found how the UBI0 is built, short story is I've found 1 bad block in reported and I saw device mtd17 (traffic_meter.bak) having the ECC errors in dmesg log, so I checked the Traffic Meter feature was enabled, as soon I disabled and restarted my router the ECC errors are gone and ECC error counter remains zero.

upload_2020-7-1_9-16-12.png


So far dmesg isn't reporting any ECC error in 11 hrs running

upload_2020-7-1_9-17-40.png


thank you so much for your valuable help! crossing my fingers this was the issue.

Btw, could you check in your add-on 5.3b15, when you click dmesg log for R9000 it doesn't show anything, just blank. Otherwise it works well with the save log feature you just added.
 
After reboot you can see the highest temperature in Router Information: CPU Temperature: Top ever.
It's not recommended to run hotter than 75 deg C / 167 deg F.

Hello,

Kamoj add-on sure makes things easy! Highest temp still says 79c but I've been nowhere near that except the one time right after installing 5.3b15 and starting AdGuard Home. I only had that one "mystery" reboot and since then temps have stayed at 68 to 72c.

I installed 5.3b16 a little while ago...all is working and temps so far are 67 to 70c. All is good!

BL
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top