Willow Application Server #211

stintel · 2023-06-29T14:37:39Z

No description provided.

And have it disable the power amplifier for events PAUSED, STOPPED, FINISHED and ERROR.

Fixes: dc3f443 ("main: add cancel button")

We're modifying the recording variable in one task, and we read it in another. Make the variable volatile, to be sure we don't read an optimized or cached value.

Don't send STOP on WAKE when we're already recording, just avoid sending START again, and let the STOP on WAKE END trigger everything. Don't stop streaming to WIS when reading from the recorder returns 0. This should avoid early cut-off when something can't keep up for some reason.

During a torture run with WIS over HTTPS we've seen the following crash: ***ERROR*** A stack overflow in task http_stream_writer has been detected. Increase the task stack size to avoid this.

During a torture run with heap poisoning enabled, the following crash was observed: Guru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled. Core 1 register dump: PC : 0x4038abca PS : 0x00060033 A0 : 0x8038b847 A1 : 0x3fcc9150 A2 : 0x3d800014 A3 : 0x0000000c A4 : 0x3d800038 A5 : 0x3d800274 A6 : 0x00060023 A7 : 0x00000000 A8 : 0x3d800014 A9 : 0x00000394 A10 : 0x3d9a28f4 A11 : 0x00000038 A12 : 0xabba1234 A13 : 0x00000003 A14 : 0x0000001e A15 : 0x3d80007c SAR : 0x0000001f EXCCAUSE: 0x0000001d EXCVADDR: 0xabba1240 LBEG : 0x400570e8 LEND : 0x400570f3 LCOUNT : 0x00000000 /opt/esp/idf/components/heap/heap_tlsf.c:206 /opt/esp/idf/components/heap/multi_heap.c:187 /opt/esp/idf/components/heap/multi_heap_poisoning.c:234 /opt/esp/idf/components/heap/multi_heap_poisoning.c:229 /opt/esp/idf/components/heap/heap_caps.c:175 /opt/esp/idf/components/heap/heap_caps.c:239 /opt/esp/idf/components/newlib/heap.c:24 /opt/esp/idf/components/json/cJSON/cJSON.c:243 /opt/esp/idf/components/json/cJSON/cJSON.c:1633 /opt/esp/idf/components/json/cJSON/cJSON.c:1674 /opt/esp/idf/components/json/cJSON/cJSON.c:1674 /opt/esp/idf/components/json/cJSON/cJSON.c:1116 /opt/esp/idf/components/json/cJSON/cJSON.c:1087 /opt/esp/idf/components/json/cJSON/cJSON.c:1173 /willow/main/endpoint/hass.c:58 /opt/esp/idf/components/esp_event/esp_event.c:145 /opt/esp/idf/components/esp_websocket_client/esp_websocket_client.c:155 /opt/esp/idf/components/esp_websocket_client/esp_websocket_client.c:555 Line 59 in hass.c is cJSON_Parse(resp). Let's add a few extra checks to make sure we do not call cJSON_Parse on garbage data.

We're regularly observing LVGL using 100% CPU with scrolling text. This seems to cause all kinds of random behaviour. Disable it for now.

We've observed the following crash: Guru Meditation Error: Core 0 panic'ed (Double exception). Core 0 register dump: PC : 0x4038424a PS : 0x00040936 A0 : 0x820e7897 A1 : 0x3d9a74b0 A2 : 0x00000000 A3 : 0x3d9a74f0 A4 : 0x00000000 A5 : 0x00000000 A6 : 0x00000000 A7 : 0x00000000 A8 : 0x820ed3f3 A9 : 0x3d9a7490 A10 : 0x00000003 A11 : 0x3d99ecd4 A12 : 0x00000000 A13 : 0x00000000 A14 : 0x3d9a74f0 A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x00000002 EXCVADDR: 0xfffffff0 LBEG : 0x42148bf1 LEND : 0x42148bfc LCOUNT : 0x00000000 Backtrace: 0x40384247:0x3d9a74b0 0x420e7894:0x00000000 |<-CORRUPTED /opt/esp/tools/xtensa-esp32s3-elf/esp-2021r2-patch5-8.4.0/xtensa-esp32s3-elf/bin/xtensa-esp32s3-elf-addr2line -e build/willow.elf 0x40384247:0x3d9a74b0 0x420e7894:0x00000000 /opt/esp/idf/components/freertos/port/xtensa/xtensa_context.S:194 /opt/esp/idf/components/mbedtls/mbedtls/library/md.c:601 On this line, mbedtls_sha512_finish_ret is called. Let's disable SHA hardware acceleration, as we're suspecting this causes issues.

We've experienced the following crash during boot after disabling SHA hardware acceleration: ***ERROR*** A stack overflow in task websocket_task has been detected. Increase the task stack size to avoid this.

Found with clang-tidy: /willow/main/endpoint/openhab.c:70:9: warning: Null pointer passed as 1st argument to string length function [clang-analyzer-unix.cstring.NullArg]

Found with clang-tidy.

Found with clang-tidy: /willow/main/was.c:104:21: warning: 1st function call argument is an uninitialized value [clang-analyzer-core.CallAndMessage]

And define it to the same value used in sdkconfig.willow if it doesn't. Fixes the following error running clang-check: /willow/main/audio.c:655:42: error: use of undeclared identifier 'CONFIG_TASK_WDT_PANIC' [clang-diagnostic-error]

Fixes the following clang-tidy warnings: /willow/main/endpoint/hass.c:278:5: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores] /willow/main/endpoint/hass.c:278:5: note: Value stored to 'ret' is never read /willow/main/endpoint/hass.c:319:5: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores] /willow/main/endpoint/hass.c:319:5: note: Value stored to 'ret' is never read We print an error message in hass_set_http_auth so we don't need to print an error message in the callers. We might consider returning early in the caller when hass_set_http_auth returns false, or changing the function to void if we keep the current behaviour. Not sure which one would be best, so let's just fix the warnings for now.

The audio_board_lcd_init returns a void pointer so we need to cast it to esp_lcd_panel_handle_t.

Saying wake word when VAD_END/WAKEUP_END hasn't been trigger yet causes multiple session timers to start. Let's do like we do for VAD_START and do nothing when we're already recording.

Terminating the audio pipeline will destroy the tasks for the elements in the pipeline, so the tasks have to be recreated every time we stream to WIS. This isn't very efficient, and it doesn't appear to be needed. We also suspect this could leak memory, and sometimes causes this error: E (16:45:01.400) AUDIO_PIPELINE: audio_pipeline_resume failed After this error, the device is no longer able to stream to WIS.

The ESP_AUDIO_PREFER_MEM mode seems to destroy and recreate its element tasks on every run. This is not very efficient, and as we have plenty of free memory, we can perfectly use the ESP_AUDIO_PREFER_SPEED mode instead. TTS log of 2nd wake before this change: I (02:13:12.437) WILLOW/AUDIO: Using WIS TTS URL 'https://wis.local:19000/api/tts?format=WAV&speaker=CLB&text=0' I (02:13:12.450) AUDIO_PIPELINE: audio_pipeline_unlinked W (02:13:12.454) AUDIO_PIPELINE: There are no listener registered I (02:13:12.461) AUDIO_PIPELINE: link el->rb, el:0x3d9f7c78, tag:IN_http, rb:0x3db20950 I (02:13:12.470) AUDIO_PIPELINE: link el->rb, el:0x3d9f8484, tag:DEC_auto, rb:0x3db20988 I (02:13:12.478) AUDIO_PIPELINE: link el->rb, el:0x3db1968c, tag:Audio_forge, rb:0x3db132cc I (02:13:12.488) AUDIO_ELEMENT: [IN_http-0x3d9f7c78] Element task created I (02:13:12.495) AUDIO_ELEMENT: [IN_http] AEL_MSG_CMD_RESUME,state:1 I (02:13:12.503) AUDIO_ELEMENT: [DEC_auto-0x3d9f8484] Element task created I (02:13:12.509) AUDIO_ELEMENT: [DEC_auto] AEL_MSG_CMD_RESUME,state:1 I (02:13:13.285) AUDIO_ELEMENT: [Audio_forge-0x3db1968c] Element task created I (02:13:13.285) AUDIO_ELEMENT: [Audio_forge] AEL_MSG_CMD_RESUME,state:1 I (02:13:13.289) AUDIO_ELEMENT: [OUT_iis-0x3da17d48] Element task created I (02:13:13.297) AUDIO_ELEMENT: [OUT_iis] AEL_MSG_CMD_RESUME,state:1 I (02:13:13.308) AUDIO_ELEMENT: IN-[IN_http] AEL_IO_DONE,0 I (02:13:13.310) AUDIO_ELEMENT: IN-[DEC_auto] AEL_IO_DONE,-2 I (02:13:13.785) AUDIO_ELEMENT: IN-[Audio_forge] AEL_IO_DONE,-2 I (02:13:13.933) AUDIO_ELEMENT: IN-[OUT_iis] AEL_IO_DONE,-2 I (02:13:14.340) AUDIO_ELEMENT: [http_stream_writer] AEL_MSG_CMD_PAUSE TTS log of 2nd wake after this change: I (02:18:19.707) WILLOW/AUDIO: Using WIS TTS URL 'https://wis.local:19000/api/tts?format=WAV&speaker=CLB&text=0' I (02:18:19.718) AUDIO_ELEMENT: [IN_http] AEL_MSG_CMD_RESUME,state:1 I (02:18:19.725) AUDIO_ELEMENT: [DEC_auto] AEL_MSG_CMD_RESUME,state:1 I (02:18:20.513) AUDIO_ELEMENT: [Audio_forge] AEL_MSG_CMD_RESUME,state:1 I (02:18:20.514) AUDIO_ELEMENT: [OUT_iis] AEL_MSG_CMD_RESUME,state:1 I (02:18:20.525) AUDIO_ELEMENT: IN-[IN_http] AEL_IO_DONE,0 I (02:18:20.526) AUDIO_ELEMENT: IN-[DEC_auto] AEL_IO_DONE,-2 I (02:18:20.996) AUDIO_ELEMENT: IN-[Audio_forge] AEL_IO_DONE,-2 I (02:18:21.146) AUDIO_ELEMENT: IN-[OUT_iis] AEL_IO_DONE,-2 I (02:18:21.616) AUDIO_ELEMENT: [http_stream_writer] AEL_MSG_CMD_PAUSE This also appears to work around the following observed behavior with task heap debugging using esp_dump_per_task_heap_info: Task: main -> CAP_8BIT: 1392004 CAP_32BIT: 0 Task: esp_periph -> CAP_8BIT: 25296 CAP_32BIT: 0 Task: wifi -> CAP_8BIT: 105068 CAP_32BIT: 0 Task: Pre-Scheduler allocs -> CAP_8BIT: 4068 CAP_32BIT: 0 Task: ipc0 -> CAP_8BIT: 30028 CAP_32BIT: 0 Task: tiT -> CAP_8BIT: 2908 CAP_32BIT: 0 Task: sys_evt -> CAP_8BIT: 100 CAP_32BIT: 0 Task: websocket_task -> CAP_8BIT: 724 CAP_32BIT: 0 Task: websocket_task -> CAP_8BIT: 25708 CAP_32BIT: 0 Task: i2s_stream_read -> CAP_8BIT: 2048 CAP_32BIT: 0 Task: at_read -> CAP_8BIT: 10584 CAP_32BIT: 0 Task: feed_task -> CAP_8BIT: 8192 CAP_32BIT: 0 Task: ipc1 -> CAP_8BIT: 24 CAP_32BIT: 0 Task: btm_rrm_t -> CAP_8BIT: 100 CAP_32BIT: 0 Task: esp_timer -> CAP_8BIT: 208 CAP_32BIT: 0 Task: media_task -> CAP_8BIT: 41456 CAP_32BIT: 0 Task: recorder_task -> CAP_8BIT: 24 CAP_32BIT: 0 Task: http_stream_wri -> CAP_8BIT: 4904 CAP_32BIT: 0 Task: �� Tasks like the latter would appear randomly, sometimes with a short sequence of binary data for task name, but regularly with multiple lines (100+) full of it, eventually slowing things down so badly that the TWDT triggers a restart.

We're still able to lock up the display. When this happens, the audio being sent to WIS is often incomplete. Enable SPIRAM_RODATA and SPIRAM_FETCH_INSTRUCTIONS to hopefully finally fix this.

We only stop the audio pipeline to stream to WIS when the session timer expires. VAD_END and WAKE_END will potentially trigger afterwards. If this happens when a new stream to WIS was already started, all kinds of weirdness can occur. Force stop the recorder when the session timer expires to avoid this.

We were still able to lock up the display, with both our custom font and montserat. Enable LCD_RGB_ISR_IRAM_SAFE to test if this helps.

During several torture runs, the following error appeared sporadically: E (23:45:27.704) esp-tls-mbedtls: read error :-0x7180: E (23:45:27.705) TRANSPORT_BASE: esp_tls_conn_read error, errno=Success Disabling AES hardware acceleration seems to fix those.

stintel force-pushed the feature/was branch 13 times, most recently from 410a269 to 0fdc19c Compare June 30, 2023 22:49

stintel force-pushed the feature/was branch 5 times, most recently from f511bcf to 2b6a9da Compare July 14, 2023 13:08

stintel force-pushed the feature/was branch 8 times, most recently from fc05edf to ff7ec43 Compare July 19, 2023 10:26

kristiankielhofner force-pushed the feature/was branch from 527150e to c7ea431 Compare July 21, 2023 06:33

stintel force-pushed the feature/was branch 3 times, most recently from d7101f2 to aa4ee65 Compare July 23, 2023 19:24

stintel and others added 27 commits August 17, 2023 00:37

audio: add ESP Audio event callback

92ff8bc

And have it disable the power amplifier for events PAUSED, STOPPED, FINISHED and ERROR.

audio: use ESP Audio's async play

ffc10e1

Update torture, logging, and add reset

67be27e

treewide: get LVGL lock timeout from config

eb7c489

treewide: free cJSON_Print memory

9d33856

audio: get LVGL lock before hiding cancel button

279b8d0

Fixes: dc3f443 ("main: add cancel button")

audio: make recording variable volatile

0c0531f

We're modifying the recording variable in one task, and we read it in another. Make the variable volatile, to be sure we don't read an optimized or cached value.

audio: increase HTTP stream task stack size

c1cfea6

During a torture run with WIS over HTTPS we've seen the following crash: ***ERROR*** A stack overflow in task http_stream_writer has been detected. Increase the task stack size to avoid this.

treewide: disable text LVGL text scrolling

8249106

We're regularly observing LVGL using 100% CPU with scrolling text. This seems to cause all kinds of random behaviour. Disable it for now.

hass: increase WebSocket client task stack size

31f2b3c

We've experienced the following crash during boot after disabling SHA hardware acceleration: ***ERROR*** A stack overflow in task websocket_task has been detected. Increase the task stack size to avoid this.

openhab: fix potential NULL pointer dereference

61baad0

Found with clang-tidy: /willow/main/endpoint/openhab.c:70:9: warning: Null pointer passed as 1st argument to string length function [clang-analyzer-unix.cstring.NullArg]

openhab: fix memory leak

d0319b7

Found with clang-tidy.

was: prevent nvs_commit call with uninitialized value

fc143c9

Found with clang-tidy: /willow/main/was.c:104:21: warning: 1st function call argument is an uninitialized value [clang-analyzer-core.CallAndMessage]

audio: check if CONFIG_TASK_WDT_PANIC is defined

4db0cb2

And define it to the same value used in sdkconfig.willow if it doesn't. Fixes the following error running clang-check: /willow/main/audio.c:655:42: error: use of undeclared identifier 'CONFIG_TASK_WDT_PANIC' [clang-diagnostic-error]

display: cast audio_board_lcd_init return value

9f3f86e

The audio_board_lcd_init returns a void pointer so we need to cast it to esp_lcd_panel_handle_t.

audio: check recording for WAKEUP_START event

c8ff9f3

Saying wake word when VAD_END/WAKEUP_END hasn't been trigger yet causes multiple session timers to start. Let's do like we do for VAD_START and do nothing when we're already recording.

audio: properly deinit audio

260ae64

sdkconfig: enable SPIRAM_RODATA and SPIRAM_FETCH_INSTRUCTIONS

4f1765c

We're still able to lock up the display. When this happens, the audio being sent to WIS is often incomplete. Enable SPIRAM_RODATA and SPIRAM_FETCH_INSTRUCTIONS to hopefully finally fix this.

sdkconfig: enable LCD_RGB_ISR_IRAM_SAFE

95e976b

We were still able to lock up the display, with both our custom font and montserat. Enable LCD_RGB_ISR_IRAM_SAFE to test if this helps.

stintel force-pushed the feature/was branch from ed46442 to b15610b Compare August 16, 2023 21:38

stintel merged commit 3f8cb2d into main Aug 17, 2023
11 checks passed

stintel deleted the feature/was branch August 17, 2023 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Willow Application Server #211

Willow Application Server #211

stintel commented Jun 29, 2023

Willow Application Server #211

Willow Application Server #211

Conversation

stintel commented Jun 29, 2023