Issues with watchdog #2

aicastell · 2017-04-24T10:34:07Z

Having watchdog issues running an Atheros AR9330 board with kernel module "dragino2_si3217x" loaded.

When dragino2_si3217x kernel module is unloaded (/etc/init.d/dragino2-si3217x stop before enabling watchdog), watchdog timeout works fine, rebooting the board as expected (less than 40secs).

However, when dragino2_si3217x kernel module is loaded, a watchdog timeout reboots the board, but the process hangs for a long variable period (up to 5 minutes blocked). In some tests the board got completely hanged up, and never rebooted.

This module uses a BLOB so can't be debugged. I would really appreciate any help on that.

Thank you in advance!

tgillett · 2017-04-25T03:45:19Z

Hi Angel Are you using a Dragino FXS telephony daughter board with your AR9330 board? If not, then you should not have the dragino2_si3217x kernel module loaded as it is just the driver code for that hardware device. What is the router device that you are using, and which VT firmware image have you installed? Or are you building your own firmware image? Regards Terry

…

On Mon, Apr 24, 2017 at 8:34 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: Having watchdog issues running an Atheros AR9330 board with kernel module "dragino2_si3217x" loaded. When dragino2_si3217x kernel module is unloaded (/etc/init.d/dragino2-si3217x stop before enabling watchdog), watchdog timeout works fine, rebooting the board as expected (less than 40secs). However, when dragino2_si3217x kernel module is loaded, a watchdog timeout reboots the board, but the process hands for a long variable period (up to 5 minutes blocked). This module uses a BLOB so can't be debugged. I would really appreciate any help on that. Thank you in advance! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcZbChkMANB-Wy7smfyaNCbqly3sfks5rzHqfgaJpZM4NF-s2> .

aicastell · 2017-04-25T06:14:56Z

Hi Terry.

Are you using a Dragino FXS telephony daughter board with your AR9330 board?

Yes, we are using a custom board based on Dragino design, with a Dragino FXS telephony daughter board based on Silabs 3217x chip connected over it. The datasheet of that chip is not available in any site, and the BLOB used by kernel module makes debugging almost impossible. It works really well, the only issue we have found is related with watchdog when that dragino2_si3217x kernel module is loaded.

What is the router device that you are using, and which VT firmware image have you installed?

Our rootfs is built with a standard OpenWRT 15.05 distro, running the standard 3.18.36 Linux kernel built by OpenWRT.

Do you have some hardware available to test this issue?

Best regards,
-- Ivan

tgillett · 2017-04-25T08:26:55Z

Hi Ivan

The SiLabs code base is restricted due to non-disclosure agreement.
Let me follow up with Steve Song to see if he can arrange access for you.

Regards
Terry

tgillett · 2017-04-25T08:32:01Z

OK. Let me check with Steve on this.

…

On Tue, Apr 25, 2017 at 4:14 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: Hi Terry. Are you using a Dragino FXS telephony daughter board with your AR9330 board? Yes, we are using a custom board, design based on dragino dragino, with a Silabs 3217x daughter board connected over it. What is the router device that you are using, and which VT firmware image have you installed? Our rootfs is built with a standar OpenWRT 15.05 distro, running a 3.18.36 kernel. Best regards, -- Ivan — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcXgD0WQ8u2Mh2Pz5XoYDTm_mF7GZks5rzY9hgaJpZM4NF-s2> .

stevesong · 2017-04-25T13:52:42Z

Hi all. Unfortunately the SiLabs code is covered by their NDA. The good news is that it is pretty easy to sign an NDA with SiLabs. If you can sign the NDA with them, we can share the code. https://www.silabs.com/

aicastell · 2017-04-26T06:21:16Z

It seems a bug related with your kernel module. Its a lot of work to get the NDA, the source code, and time for debugging a kernel module. Does this problem reproduce in your hardware too? Could you please check it and give some feedback?

tgillett · 2017-04-26T11:31:01Z

Hi Ivan The FXS module code and hardware has been in use for a number of years without any reported issues. If I understand correctly, the issue is that if the watchdog restarts the board, the normal boot sequence hangs for a long time, and that if the FXS kernel module is not loaded, then the boot sequence runs correctly. Is that correct? Do you have a piece of test code that triggers the watchdog reset condition for testing? If so I can try it on a standard MP2-FXS device and check the reboot operation. Regards Terry

…

On Wed, Apr 26, 2017 at 4:21 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: It seems a bug related with your kernel module. Its a lot of work to get the NDA, the source code, and time for debugging your source code. Does this problem reproduce in your hardware too? Could you please check it and give some feedback? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcefFqpCxftC3w0frs9b7HonTsui9ks5rzuJcgaJpZM4NF-s2> .

tgillett · 2017-04-26T11:39:07Z

Also can you describe how you have implemented the watchdog function please?

…

On Wed, Apr 26, 2017 at 9:30 PM, T Gillett ***@***.***> wrote: Hi Ivan The FXS module code and hardware has been in use for a number of years without any reported issues. If I understand correctly, the issue is that if the watchdog restarts the board, the normal boot sequence hangs for a long time, and that if the FXS kernel module is not loaded, then the boot sequence runs correctly. Is that correct? Do you have a piece of test code that triggers the watchdog reset condition for testing? If so I can try it on a standard MP2-FXS device and check the reboot operation. Regards Terry On Wed, Apr 26, 2017 at 4:21 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: > It seems a bug related with your kernel module. Its a lot of work to get > the NDA, the source code, and time for debugging your source code. Does > this problem reproduce in your hardware too? Could you please check it and > give some feedback? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABjDcefFqpCxftC3w0frs9b7HonTsui9ks5rzuJcgaJpZM4NF-s2> > . >

aicastell · 2017-04-26T11:43:05Z

If I understand correctly, the issue is that if the watchdog restarts the board, the normal boot sequence hangs for a long time, and that if the FXS kernel module is not loaded, then the boot sequence runs correctly. Is that correct?

Exactly:
* If kernel module is not loaded: watchdog triggered --> boot sequence runs ok.
* If kernel module is loaded: watchdog triggered -- > boot sequence hangs for a very long time.

Do you have a piece of test code that triggers the watchdog reset condition for testing?

Of course! :) This command triggers watchdog after 10 seconds:

# echo a > /dev/watchdog

tgillett · 2017-04-26T11:58:16Z

Ivan This is what I get on my device: # ubus call system watchdog { "status": "running", "timeout": 21, "frequency": 5 } # echo a > /dev/watchdog /bin/ash: can't create /dev/watchdog: Device or resource busy # ls /dev/w* /dev/watchdog Is there some difference in the way watchdog is implemented??? Regards Terry

…

On Wed, Apr 26, 2017 at 9:43 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: If I understand correctly, the issue is that if the watchdog restarts the board, the normal boot sequence hangs for a long time, and that if the FXS kernel module is not loaded, then the boot sequence runs correctly. Is that correct? Exactly. If kernel module is not loaded, when the watchdog triggers, boot sequence runs ok. If kernel module is loaded, when the watchdog triggers, we have the issue with the boot sequence. Do you have a piece of test code that triggers the watchdog reset condition for testing? Of course! :) This command triggers watchdog after 10 seconds: $ echo a > /dev/watchdog — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcQwKWQ2F7jnqs5RJk22_mywdgKSIks5rzy3JgaJpZM4NF-s2> .

aicastell · 2017-04-26T12:00:52Z

Probably some userspace app opened your /dev/watchdog device and is managing it. Try with this command to discover that app:

# lsof | grep "/dev/watchdog"

Stop that application (kill the process) before executing the "echo" test.

tgillett · 2017-04-26T12:14:06Z

There is nothing I can find in our SECN code that references /dev/watchdog. Is there some way to check for something that has opened /dev/watchdog on the running system?

…

On Wed, Apr 26, 2017 at 10:00 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: Probably some userspace app opened your /dev/watchdog device and is managing it. Can you search for that app and stop it before executing the test? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcWo6E0_Z8t6ACK1cnm7cHUdfM2Krks5rzzH0gaJpZM4NF-s2> .

aicastell · 2017-04-26T12:31:32Z

Try with this command:

# lsof | grep "/dev/watchdog"

You will get the PID of the process that is managing the watchdog. Stop that application (kill the process) before executing the "echo" test.

tgillett · 2017-04-26T20:55:08Z

There is no lsof command available. I guess this is not a busybox command. Is it available in another package perhaps?

…

On Wed, Apr 26, 2017 at 10:31 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: Try with this command: $ lsof | grep "/dev/watchdog" — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcQj_s5-z4d5wsqlmU1HmryQRdiSfks5rzzklgaJpZM4NF-s2> .

tgillett · 2017-04-26T21:36:54Z

Is there some other way to trigger the watchdog to test? I tried an infinite while loop but that only consumed 50% CPU so it didn't trip the watchdog. A reboot from command line causes a normal reboot process to occur, so I am not sure what would be different about a reboot triggered by the watchdog.

…

On Thu, Apr 27, 2017 at 6:55 AM, T Gillett ***@***.***> wrote: There is no lsof command available. I guess this is not a busybox command. Is it available in another package perhaps? On Wed, Apr 26, 2017 at 10:31 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: > Try with this command: > > $ lsof | grep "/dev/watchdog" > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABjDcQj_s5-z4d5wsqlmU1HmryQRdiSfks5rzzklgaJpZM4NF-s2> > . >

aicastell · 2017-04-27T07:03:50Z

There is no lsof command available. I guess this is not a busybox command. Is it available in another package perhaps?

The lsof tool is included in a different package:

# opkg search /usr/bin/lsof
lsof - 4.86-2

You can install it with:

# opkg install lsof

If you don't have access to this tool, I can send you the compiled binary.

I tried an infinite while loop but that only consumed 50% CPU so it didn't trip the watchdog.

You don't need a heavy CPU load to reproduce the issue. Watchdog automatically reboots the board when it is enabled and you wait 15 seconds without sending a keepalive to it.

Is there some other way to trigger the watchdog to test?

The easiest way to control watchdog from userspace is using the "echo" command:

To disable the watchdog:

# echo -n V > /dev/watchdog

To enable watchdog and send a keepalive, restarting the watchdog timer:

# echo a > /dev/watchdog

If you execute previous command every second, watchdog timer is restarted, and board is never rebooted. If you dont execute the previous command after 15 seconds, watchdog will automatically reboot the board. This use case can be used to reproduce the bug, it's very easy.

Kernel driver managing watchdog is the built-in watchdog timer on the Atheros AR71XX/AR724X/AR913X SoCs, we have this kernel options configured:

CONFIG_WATCHDOG=y
CONFIG_ATH79_WDT=y

A reboot from command line causes a normal reboot process to occur, so I am not sure what would be different about a reboot triggered by the watchdog.

Using "reboot" command to reboot the board doesn't reproduce the issue, that works fine.

Please, let me know if you need something else to check this issue.

tgillett · 2017-04-27T09:32:56Z

Hi Ivan I installed the lsof module and got this output: # ls -l /dev | grep watch crw-r--r-- 1 root root 10, 130 Jan 1 1970 watchdog # lsof COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME procd 1 root cwd DIR 0,14 0 145 / procd 1 root rtd DIR 0,14 0 145 / procd 1 root txt REG 31,3 42520 1791 /sbin/procd procd 1 root mem REG 31,3 359596 98 /lib/libuClibc-0.9.33.2.so procd 1 root mem REG 31,3 78648 300 /lib/libgcc_s.so.1 ...etc.... # lsof | grep /dev/watchdog # (No output) # lsof|grep watch procd 1 root 3w CHR 10,130 0t0 84 /watchdog # echo a > /dev/watchdog /bin/ash: can't create /dev/watchdog: Device or resource busy # echo -n V > /dev/watchdog /bin/ash: can't create /dev/watchdog: Device or resource busy So it seems that there is no userspace app associated with the watchdog process, and that it will not accept input from 'echo' command. Do you have a reference to documentation on the watchdog as implemented in OpenWrt / LEDE ? Regards Terry

…

On Thu, Apr 27, 2017 at 5:03 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: There is no lsof command available. I guess this is not a busybox command. Is it available in another package perhaps? The lsof tool is incluided in a different package: # opkg search /usr/bin/lsof lsof - 4.86-2 You can install it with: # opkg install lsof If you don't have it available, I can send you the binary compiled. Is there some other way to trigger the watchdog to test? The easy way to controll watchdog from userspace is using the "echo" command: 1. To disable the watchdog: echo -n V > /dev/watchdog 2. To start watchdog and restart watchdog timer: echo a > /dev/watchdog 3. To restart watchdog timer: echo a > /dev/watchdog 4. If you dont restart watchdog timer after n seconds, watchdog will reboot the board. Please, let me know if you need something else to check this issue. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDccHUhzly0aXM2o0nfgBUGhZmVcueks5r0D3WgaJpZM4NF-s2> .

aicastell · 2017-04-27T09:52:56Z

ls -l /dev | grep watch
crw-r--r-- 1 root root 10, 130 Jan 1 1970 watchdog

I get exactly the same output.

lsof | grep watch
procd 1 root 3w CHR 10,130 0t0 84 watchdog

Ok, in your case, procd (PID number 1) seems to be the userspace app taking control of watchdog. We need to release it to be able to manage watchdog from userspace with "echo" command... Let me check on Google to try to discover a way to disable this behaviour...

tgillett · 2017-04-27T10:00:50Z

Can we also write some code that we can run from command line to make the CPU so busy that the watchdog is actually triggered as in a real event.

…

On Thu, Apr 27, 2017 at 7:52 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: ls -l /dev | grep watch crw-r--r-- 1 root root 10, 130 Jan 1 1970 watchdog I get exactly the same output. lsof | grep watch procd 1 root 3w CHR 10,130 0t0 84 watchdog Ok, in your case, procd seems to be the userspace app getting control of watchdog. We need to release it to manage watchdog from userspace... Let's check on Google to find a way to disable this behaviour... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcUcq9YFhGIPoS8xZ6p5Oxe_6LHYjks5r0GV4gaJpZM4NF-s2> .

aicastell · 2017-04-27T10:03:44Z

Please, try this:

# ubus call system watchdog '{ "stop": true }'

In theory, this stops sending keepalive to the watchdog, and your board will be restarted by the watchdog after 15 secs.

If this doesn't work, we can provide a procd patch to close /dev/watchdog after opening it... But before trying this, let me know about first option...

tgillett · 2017-04-27T10:24:31Z

Can you point me to some documentation for these watchdog calls please? This is the output I got /# ubus call system watchdog '{ "stop": true }' { "status": "stopped", "timeout": 21, "frequency": 5 } And the device rebooted and hung !!!

…

On Thu, Apr 27, 2017 at 8:03 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: Please, try this: # ubus call system watchdog '{ "stop": true }' In theory, this will stop sending keepalive, and your board will be restarted by the watchdog after 15 secs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcZq9HAlUo0fpOgIbQYN6gjPOY_Z4ks5r0GgAgaJpZM4NF-s2> .

aicastell · 2017-04-27T10:32:35Z

And the device rebooted and hung !!!

So you confirm the bug, right?

Ok, ubus is a standard command to send messages to the bus. You can find more info here:

https://wiki.openwrt.org/doc/techref/ubus

The origin of the problem seems to be related with dragino2_si3217x kernel module and some strange interaction with watchdog driver. Try unloading dragino2_si3217x kernel module and repeat the test, now watchdog should work fine, rebooting the board as expected...

tgillett · 2017-04-27T11:38:31Z

To be precise: I tested with /etc/init.d/dragino2-si3217x enabled and disabled, using the command ubus call system watchdog '{ "stop": true }' to trigger the watchdog operation. With the module enabled, the device hangs after the watchdog reboots. With the module disabled, the device reboots normally. When rebooted from the command line with a 'reboot' command the device reboots normally. So it does appear there is a bug in the module that affects the reboot behaviour when initiated by the watchdog. Now, how to debug it... I would also like to find a way to trigger the watchdog other than by using the ubus command ie run a process that makes the CPU too busy to reset the watchdog, and make sure that that also causes the fault.

…

On Thu, Apr 27, 2017 at 8:32 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: And the device rebooted and hung !!! So you confirm the bug, right? Ok, ubus is a standard command to send messages to the bus and talk with different hardware and software components. You can find more info here: https://wiki.openwrt.org/doc/techref/ubus The origin of the problem seems to be related with dragino2_si3217x kernel module and some strange interaction with watchdog driver. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcVE2Hz_4KxyN_sYJ5_fdQeaX0CH7ks5r0G7DgaJpZM4NF-s2> .

aicastell · 2017-04-27T12:49:53Z

I would also like to find a way to trigger the watchdog other than by using the ubus command ie run a process that makes the CPU too busy to reset the watchdog, and make sure that that also causes the fault.

I suggest testing this in the same way we do it, patching the procd source code to get a procd binary that releases the watchdog after booting the board. Then you'll be able to control the watchdog from userspace with "echo" commands.

The patch is available here:

0002-Release-watchdog-on-stop.txt

To avoid github restrictions I renamed .patch as .txt. Restore the name with .patch extension and copy the patch file into directory:

openwrt/package/system/procd/patches/0002-Release-watchdog-on-stop.patch

After recompiling openwrt, the patched procd binary will be ready.

After all, you'll be able to reproduce the same issue but with previously suggested "echoes" instead of using the ubus command. That's the way we reproduce it, and it hangs too.

aicastell · 2017-05-04T07:42:29Z

Hi all! Are you working on this issue? Do you plan to fix it? If we can help in some way, let me know about it. Thank you!

tgillett · 2017-05-04T08:41:49Z

Hi Ivan Yes we are actively working on it. We will get back to you as soon as we have some information. You may get a request shortly from Vittorio for some additional information/logs. Regards Terry

…

On Thu, May 4, 2017 at 5:42 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: Hi all! Are you working on this issue? Do you plan to fix it? If we can help in some way, let me know about it. Thank you! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcbrVhhys3z5f6stCBZx3iu91Oiqcks5r2YFlgaJpZM4NF-s2> .

tgillett · 2017-05-05T02:01:35Z

A progress report from the developer working on this issue: "I was also able to reproduce the watchdog issue, almost never with my CC, but almost always with LEDE. It seems to be an hardware bug of the AR9331 SoC. I would need to check with JTAG to be 100% sure, but it seems that if the watchdog issues the full chip reset while the CPU is writing to GPIOs (eg. for bit-banged SPI) and the SLIC mailbox is enabled on the alternative set of SLIC pins (18...22), then the SoC never exits from the full chip reset condition. Nothing gets printed to the serial console, and u-boot does not load. It's even possible that the bootstrap register is being corrupted, or who knows what else... There's probably a reason if the SLIC GPIO pin selection register bits are marked as reserved in the AR9331 datasheet... A workaround is to switch the watchdog action from full chip reset to interrupt, and then let the Linux kernel handle that interrupt as an emergency reboot. It seems to work quite well this way, with no hangs... Of course it won't work if the kernel itself hangs, but it seems that for some reason the watchdog won't work anyway if the CPU is stalled (eg. the "normal" FCR watchdog is still not triggered many seconds after having executed "halt" or "echo o > /proc/sysrq-trigger" to halt the Linux kernel)."

…

On Thu, May 4, 2017 at 6:41 PM, T Gillett ***@***.***> wrote: Hi Ivan Yes we are actively working on it. We will get back to you as soon as we have some information. You may get a request shortly from Vittorio for some additional information/logs. Regards Terry On Thu, May 4, 2017 at 5:42 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: > Hi all! Are you working on this issue? Do you plan to fix it? If we can > help in some way, let me know about it. Thank you! > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABjDcbrVhhys3z5f6stCBZx3iu91Oiqcks5r2YFlgaJpZM4NF-s2> > . >

aicastell · 2017-05-05T06:51:15Z

This is really hard to debug issue, you are doing an excelent job. Thank you very much for that detailed progress report! :)

We will wait until the hardware bug in the AR9331 SoC being confirmed. However, meanwhile we would like to test the suggested workaround. How can we test it? What should we do to switch the watchdog action from full chip reset to interrupt to let the kernel manage that irq as an emergency reboot?

tgillett · 2017-05-05T08:08:46Z

Ivan PM me and I will send you the patch. Terry

…

On Fri, May 5, 2017 at 4:51 PM, Angel Ivan Castell Rovira < ***@***.***> wrote: This is really hard to debug issue, you are doing an excelent job. Thank you very much for that detailed progress report! :) We will wait until the hardware bug in the AR9331 SoC being confirmed. However, meanwhile we would like to test the suggested workaround. How can we test it? What should we do to switch the watchdog action from full chip reset to interrupt to let the kernel manage that irq as an emergency reboot? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABjDcRhBem5isY6bfKaP5yxPXBwIMXRaks5r2sbkgaJpZM4NF-s2> .

aicastell · 2017-05-05T11:43:32Z

Ok, we tested the patch and it works fine, so we have the "b" plan ready :) Now we will wait more news on this before deciding following steps. Thanks a lot for the help you are providing with this issue!!

VittGam · 2017-05-05T12:57:18Z

Hello Ivan,

I'm the developer of the FXS driver and of the aforementioned patch.

It's been quite a tough issue to debug, and if it's really an hw bug of the SoC like it seems to be, then QCA is probably already aware of it... I'm going to check if there's a similar workaround in the QCA version of OpenWrt.

Anyway I've now seen that there are some other watchdog drivers in Linux that do exactly what my patch is doing, using the watchdog in interrupt mode and then issuing an emergency reboot when the interrupt is fired. So this is probably a "good" way to go.

By the way, you mentioned that sometimes after 5 minutes or so, the board would indeed boot up. It would be great if you could get serial console logs, or even dmesg/logread logs, from this situation, since I wasn't able to reproduce it.

Best regards,
Vittorio G

VittGam · 2017-05-05T14:36:16Z

By the way, I can confirm that stopping the watchdog userspace refresher (without disabling the watchdog itself) and then halting the kernel with echo o > /proc/sysrq-trigger (which at the end executes the equivalent of while(1); in kernel space), will not even let the normal full-chip-reset watchdog do its job: the SoC is never resetted.

Instead, a kernel panic (like when forced with echo c > /proc/sysrq-trigger) will always trigger a reboot after a while, with both FCR and interrupt watchdog modes, and even if the /proc/sys/kernel/panic reboot timer is disabled (which by the way is set to 3 seconds by default in OpenWrt/LEDE).

aicastell · 2017-05-05T14:39:17Z

Anyway I've now seen that there are some other watchdog drivers in Linux that do exactly what my patch is doing, using the watchdog in interrupt mode and then issuing an emergency reboot when the interrupt is fired. So this is probably a "good" way to go.

Yes, in fact it works fine. But that's a problem in an un-atended router when the kernel crashes. So, for the moment this will be the "b" plan, we'll wait until being completely sure there is no other option...

By the way, you mentioned that sometimes after 5 minutes or so, the board would indeed boot up. It would be great if you could get serial console logs, or even dmesg/logread logs, from this situation, since I wasn't able to reproduce it.

Ok, we'll try to reproduce it again with the debug console attached, and will come back to you with the results.

Kind regards,
-- Ivan

aicastell · 2017-05-09T07:49:13Z

Hi all.

This is the output from the console. These are the last kernel logs before watchdog reset:

[   78.240000] ath79_wdt: device closed unexpectedly, watchdog timer will not stop!
[   80.440000] udevd[2165]: starting version 173
[ 1480.920000] ath79_wdt: device closed unexpectedly, watchdog timer will not stop!

Watchdog resets the board and there is no output through console after several minutes. At the end, U-Boot starts again a new boot process:

*********************************************
*        U-Boot 1.1.4  (Jun  3 2014)        *
*********************************************

AP121 (AR9331) U-Boot for Dragino v2 MS14

DRAM:   64 MB
FLASH:  Winbond W25Q128 (16 MB)
CLOCKS: 400/400/200/33 MHz (CPU/RAM/AHB/SPI)

LED on during eth initialization...
Hit any key to stop autobooting:  0

After, the kernel starts loading as usual... Not very useful information, I know.

Hope you have done some progress with that issue and have better news. Thank you in advance! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with watchdog #2

Issues with watchdog #2

aicastell commented Apr 24, 2017 •

edited

Loading

tgillett commented Apr 25, 2017 via email

aicastell commented Apr 25, 2017 •

edited

Loading

tgillett commented Apr 25, 2017

tgillett commented Apr 25, 2017 via email

stevesong commented Apr 25, 2017

aicastell commented Apr 26, 2017 •

edited

Loading

tgillett commented Apr 26, 2017 via email

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 26, 2017 •

edited

Loading

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 26, 2017 •

edited

Loading

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 26, 2017 •

edited

Loading

tgillett commented Apr 26, 2017 via email

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 27, 2017 •

edited

Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 •

edited

Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 •

edited

Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 •

edited

Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 •

edited

Loading

aicastell commented May 4, 2017

tgillett commented May 4, 2017 via email

tgillett commented May 5, 2017 via email

aicastell commented May 5, 2017

tgillett commented May 5, 2017 via email

aicastell commented May 5, 2017

VittGam commented May 5, 2017

VittGam commented May 5, 2017

aicastell commented May 5, 2017

aicastell commented May 9, 2017 •

edited

Loading

Issues with watchdog #2

Issues with watchdog #2

Comments

aicastell commented Apr 24, 2017 • edited Loading

tgillett commented Apr 25, 2017 via email

aicastell commented Apr 25, 2017 • edited Loading

tgillett commented Apr 25, 2017

tgillett commented Apr 25, 2017 via email

stevesong commented Apr 25, 2017

aicastell commented Apr 26, 2017 • edited Loading

tgillett commented Apr 26, 2017 via email

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 26, 2017 • edited Loading

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 26, 2017 • edited Loading

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 26, 2017 • edited Loading

tgillett commented Apr 26, 2017 via email

tgillett commented Apr 26, 2017 via email

aicastell commented Apr 27, 2017 • edited Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 • edited Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 • edited Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 • edited Loading

tgillett commented Apr 27, 2017 via email

aicastell commented Apr 27, 2017 • edited Loading

aicastell commented May 4, 2017

tgillett commented May 4, 2017 via email

tgillett commented May 5, 2017 via email

aicastell commented May 5, 2017

tgillett commented May 5, 2017 via email

aicastell commented May 5, 2017

VittGam commented May 5, 2017

VittGam commented May 5, 2017

aicastell commented May 5, 2017

aicastell commented May 9, 2017 • edited Loading

aicastell commented Apr 24, 2017 •

edited

Loading

aicastell commented Apr 25, 2017 •

edited

Loading

aicastell commented Apr 26, 2017 •

edited

Loading

aicastell commented Apr 26, 2017 •

edited

Loading

aicastell commented Apr 26, 2017 •

edited

Loading

aicastell commented Apr 26, 2017 •

edited

Loading

aicastell commented Apr 27, 2017 •

edited

Loading

aicastell commented Apr 27, 2017 •

edited

Loading

aicastell commented Apr 27, 2017 •

edited

Loading

aicastell commented Apr 27, 2017 •

edited

Loading

aicastell commented Apr 27, 2017 •

edited

Loading

aicastell commented May 9, 2017 •

edited

Loading