why did this esp reboot

Moderators: grovkillen, Stuntteam, TD-er

Post Reply
Message
Author
GravityRZ
Normal user
Posts: 206
Joined: 23 Dec 2019, 21:24

why did this esp reboot

#1 Post by GravityRZ » 20 Feb 2024, 14:18

i am seeing this in the main page
what is the reason this esp did a reboot

Boot: External Watchdog (1)
Reset Reason: Hardware Watchdog
Last Action before Reboot: PLUGIN_READ: timer, id: 3

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#2 Post by TD-er » 20 Feb 2024, 14:35

What kind of task do you have running at position 3 and 4?
Which build are you running? (exact filename, see sysinfo page)

The reason I'm asking about task 3 and 4 is that this decoded message had once a one-off "bug" so depending on the build it could mean task #3 or #4 :)

Anyway the External Watchdog means that the ESP was stuck in some loop.
This can be a programming error, but more likely it was waiting for some device which is no longer replying or sending data to a controller which failed for whatever reason (e.g. server not responding)

PLUGIN_READ is the call that does actually read data from a sensor and sends it to a controller.

GravityRZ
Normal user
Posts: 206
Joined: 23 Dec 2019, 21:24

Re: why did this esp reboot

#3 Post by GravityRZ » 20 Feb 2024, 14:57

i am runnig this build
Build: ESP_Easy_mega_20231225_normal_ESP8266_4M1M Dec 25 2023
task 3 and 4 have ds18b20 temperature devices installed on gpio-12
as far as i can tell these devices aways work
they send out on a 15 seconfd interval

could it be that if wifi connection is bad it tries to send but fails

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#4 Post by TD-er » 20 Feb 2024, 15:02

Yep, if you're sending to a controller that requires network, then a bad WiFi signal can cause these crashes.

I've seen that DNS lookups are prone to causing crashes when those fail.

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#5 Post by erstec » 22 Feb 2024, 17:33

It seems that something changed in 20231225 with 1-Wire routines.
Atm I have 5 ESP8266 devices, all work perfectly with 20231130, for months, but exactly after update to 20231225 tasks reading DS18B20 causes reboots randomly.
Disabling of Environment - 1-Wire Temperature tasks solves issue (but no temp readings ofc), as well as downgrade to 20231130.

P.S. WiFi is perfect, like -45dB

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#6 Post by TD-er » 22 Feb 2024, 17:54

Which exact build filename did you use?
I can't remember there has been a lot of changes on ESP8266 regarding this.

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#7 Post by erstec » 22 Feb 2024, 18:00

Release one, from GitHub Releases zip, Build: ESP_Easy_mega_20231225_normal_ESP8266_4M1M Dec 25 2023
As well as todays build from latest 'mega' branch Build: ESP_Easy_mega_20240222_normal_ESP8266_4M1M Feb 22 2024
Both ends with Last Action before Reboot: PLUGIN_READ: timer, id: 11 after random time, like from tenth seconds to a hour
Tasks 11 and 12 are Environment - 1-Wire Temperature

Maybe some deeper level logging via serial will show you more things?

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#8 Post by TD-er » 22 Feb 2024, 18:09

Do you have multiple DS18b20 on the same bus?
Do you have multiple set on the same task?
Do you have multiple 1-wire tasks with multiple sensors per task?

Can you share some of the stats as shown on the task config page?
Like this:

Code: Select all

1Address:	28-ff-2a-43-bb-22-02-f4 [DS18B20]
Resolution:	12
Parasite Powered:	false
Samples Read Success:	200076
Samples Read Init Failed:	0
Samples Read Retry:	1
Samples Read Failed:	2
This is taken from an ESP32 node running Build: ESP_Easy_mega_20231106_max_ESP32_16M8M_LittleFS Nov 6 2023

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#9 Post by TD-er » 22 Feb 2024, 18:19

And one other thing....
What have you set as "error value" ?
One of the few changes in those builds is how task values are being dealt with and P004_Dallas is one of the few that might have to deal with NaN values.

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#10 Post by erstec » 22 Feb 2024, 18:29

Each device have two same type DS sensors on same bus, each sensor on separate task (two Environment - 1-Wire Temperature tasks per device).
Interval - 30 sec on both tasks.
In addition one Light/Lux - BH1750 and Display - OLED SSD1306/SH1106 Framed tasks on device.
Home Assistant (openHAB) MQTT controller to which data are published.

First one:

Code: Select all

Address:	28-10-86-27-05-00-00-d1 [DS18B20]
Resolution:	12
Parasite Powered:	false
Samples Read Success:	12
Samples Read Init Failed:	0
Samples Read Retry:	1
Samples Read Failed:	0
Second one:

Code: Select all

Address:	28-03-95-c3-04-00-00-0d [DS18B20]
Resolution:	12
Parasite Powered:	false
Samples Read Success:	8
Samples Read Init Failed:	0
Samples Read Retry:	0
Samples Read Failed:	0
Thing noticed on 20231130 FW: Samples Read Failed have some value, why last one - no

I can't gather more stats as it reboots...

TD-er wrote: 22 Feb 2024, 18:09 Do you have multiple DS18b20 on the same bus?
Do you have multiple set on the same task?
Do you have multiple 1-wire tasks with multiple sensors per task?

Can you share some of the stats as shown on the task config page?
Like this:

Code: Select all

1Address:	28-ff-2a-43-bb-22-02-f4 [DS18B20]
Resolution:	12
Parasite Powered:	false
Samples Read Success:	200076
Samples Read Init Failed:	0
Samples Read Retry:	1
Samples Read Failed:	2
This is taken from an ESP32 node running Build: ESP_Easy_mega_20231106_max_ESP32_16M8M_LittleFS Nov 6 2023

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#11 Post by erstec » 22 Feb 2024, 18:30

NaN, as it was here for years
What should I select to not provide any data in case of error? Ignore?
TD-er wrote: 22 Feb 2024, 18:19 And one other thing....
What have you set as "error value" ?
One of the few changes in those builds is how task values are being dealt with and P004_Dallas is one of the few that might have to deal with NaN values.

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#12 Post by TD-er » 22 Feb 2024, 18:33

Maybe you can try some numerical value other than NaN.
If that doesn't make a difference please try "ignore".

Have to get dinner now, but will look into this a bit more later this evening.

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#13 Post by TD-er » 22 Feb 2024, 18:35

It might be this NaN is causing some issues, however I have no clue yet why.
Can you see if it gets disconnected from the MQTT broker before it crashes?

Maybe also try to not send it to the broker, to see if this makes a difference.

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#14 Post by erstec » 22 Feb 2024, 19:30

Tried Ignore, 125 and -127 - no changes.
It is not disconnects before crash, just drops.
Disable of Controller (MQTT) make it stable, runs for 45+ minutes already without restarts

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#15 Post by TD-er » 22 Feb 2024, 22:49

Could it be that there is some controller setting which is a bit more timing critical, now is acting up?
After all the timing critical reading of the 1Wire sensor does temporarily disable interrupts.

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#16 Post by erstec » 23 Feb 2024, 07:23

Rechecked. Controller settings are mostly default, except IP address and LWT topic.

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#17 Post by erstec » 23 Feb 2024, 07:41

One more thing noticed (after update to 20231215) - PWM Frequency are changed. In one of esp8266 device I use explicit setting (in rules) like pwm,12,650,0,200 and signal on gpio12 are much "faster" than was on 20231130. Ofc it is separate thing, but just for info.

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#18 Post by TD-er » 23 Feb 2024, 12:03

Yep, PWM with fade has changed and that's indeed not related to this.
As your unit seems to be working without the MQTT controller enabled, I think it probably doesn't have anything to do with the 1Wire implementation at all.

I guess we need to look a bit closer at the MQTT/controller settings?

I know it did work fine before with the same settings, but some small change in the code could trigger some crashes which were almost an issue in previous builds.

For example: What is the set timeout? Can you test with increasing it?
How much free RAM do you have right before it crashes?
What is the minimal send interval in the controller?

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#19 Post by erstec » 23 Feb 2024, 12:20

For me it seems have, as disable send to controller from both DS's - it stops rebooting. Same time BH1750 send data every 15 seconds and i addition I tried to add two dummy tasks with 10 sec intervals. It is stable with them too.

So, tests, changed both DS's interval to 10 sec, as it causes reboot much faster that with 30s
Client Timeout: was 100, increased to 1000 ms, result - rebooting
Free RAM always between 7500-10000
Minimum Send Interval: was 100, increased to 1000 - reboots.

Additionally tried to change Full Queue Action between Ignore new and Delete Oldest - same
Tried to not use DNS(hostname) but just IP for MQTT Broker - same

Additional question: how to revert PWM to work same as it was before without visiting remote location and be sure fan are running at same speed?

Screenshots of settings attached
TD-er wrote: 23 Feb 2024, 12:03 Yep, PWM with fade has changed and that's indeed not related to this.
As your unit seems to be working without the MQTT controller enabled, I think it probably doesn't have anything to do with the 1Wire implementation at all.

I guess we need to look a bit closer at the MQTT/controller settings?

I know it did work fine before with the same settings, but some small change in the code could trigger some crashes which were almost an issue in previous builds.

For example: What is the set timeout? Can you test with increasing it?
How much free RAM do you have right before it crashes?
What is the minimal send interval in the controller?
Attachments
Screenshot 2024-02-23 131746.png
Screenshot 2024-02-23 131746.png (37.75 KiB) Viewed 806 times
Screenshot 2024-02-23 131733.png
Screenshot 2024-02-23 131733.png (41.3 KiB) Viewed 806 times

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#20 Post by TD-er » 23 Feb 2024, 12:37

The minimum send interval for MQTT controllers can be much lower as the connection is kept open
So 100 msec is fine, but 10 msec probably is too :)
Making it longer will only cause the controller to use more RAM in the queue.

On the latest GH Actions builds I was hoping the amount of free RAM would be higher.

The PWM should not have changed regarding the actual frequency, only that it might no longer be "blocking" when fading to a different PWM value.
So what was changed (and I have to check the code if it also is changed for ESP8266) is this:
Command for changing PWM would not return until the fade was done.
This was blocking like nothing else could be done during this fade as the CPU was 100% working on just the fade.

Now the command is given and the fade reschedules itself to adjust the PWM to the next step in the fade process.
This makes the PWM command return immediately.
Thus if you were constantly giving several fade commands in a sequence, then this will for sure show a change in behavior.
But also it would have made the ESP react extremely slow and nothing else would be done in the mean time like taking measurements, processing messages to controllers etc.

So can you try to not perform a fade and see if this makes any difference?

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#21 Post by erstec » 23 Feb 2024, 12:55

Tried minimum send interval of value 10 - same...

Regarding PWM, thank you for explanation, now it is clear, just tested on a bench and it is fixed
TD-er wrote: 23 Feb 2024, 12:37 The minimum send interval for MQTT controllers can be much lower as the connection is kept open
So 100 msec is fine, but 10 msec probably is too :)
Making it longer will only cause the controller to use more RAM in the queue.

On the latest GH Actions builds I was hoping the amount of free RAM would be higher.

The PWM should not have changed regarding the actual frequency, only that it might no longer be "blocking" when fading to a different PWM value.
So what was changed (and I have to check the code if it also is changed for ESP8266) is this:
Command for changing PWM would not return until the fade was done.
This was blocking like nothing else could be done during this fade as the CPU was 100% working on just the fade.

Now the command is given and the fade reschedules itself to adjust the PWM to the next step in the fade process.
This makes the PWM command return immediately.
Thus if you were constantly giving several fade commands in a sequence, then this will for sure show a change in behavior.
But also it would have made the ESP react extremely slow and nothing else would be done in the mean time like taking measurements, processing messages to controllers etc.

So can you try to not perform a fade and see if this makes any difference?

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#22 Post by TD-er » 23 Feb 2024, 13:04

Fixed as in no more reboots?

erstec
Normal user
Posts: 10
Joined: 22 Feb 2024, 17:28

Re: why did this esp reboot

#23 Post by erstec » 23 Feb 2024, 13:05

No no :) only PWM is clear and works as before
TD-er wrote: 23 Feb 2024, 13:04 Fixed as in no more reboots?

TD-er
Core team member
Posts: 8756
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: why did this esp reboot

#24 Post by TD-er » 23 Feb 2024, 13:10

Ah... (a bit "too bad" :) )

Post Reply

Who is online

Users browsing this forum: No registered users and 39 guests