Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak? #61

Open
mStirner opened this issue Oct 23, 2024 · 8 comments
Open

Possible memory leak? #61

mStirner opened this issue Oct 23, 2024 · 8 comments
Labels
bug Something isn't working investigate

Comments

@mStirner
Copy link
Member

Local "production" system produces OOM, killing the connector:

[Wed Oct 23 20:50:22 2024] in:imjournal invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[Wed Oct 23 20:50:22 2024] CPU: 1 PID: 839 Comm: in:imjournal Kdump: loaded Not tainted 5.14.0-427.37.1.el9_4.x86_64 #1
[Wed Oct 23 20:50:22 2024] Hardware name: Red Hat KVM, BIOS 1.16.0-4.module+el8.9.0+1408+7b966129 04/01/2014
[Wed Oct 23 20:50:22 2024] Call Trace:
[Wed Oct 23 20:50:22 2024]  <TASK>
[Wed Oct 23 20:50:22 2024]  dump_stack_lvl+0x34/0x48
[Wed Oct 23 20:50:22 2024]  dump_header+0x4a/0x201
[Wed Oct 23 20:50:22 2024]  oom_kill_process.cold+0xb/0x10
[Wed Oct 23 20:50:22 2024]  out_of_memory+0xed/0x2e0
[Wed Oct 23 20:50:22 2024]  __alloc_pages_slowpath.constprop.0+0x6e8/0x960
[Wed Oct 23 20:50:22 2024]  __alloc_pages+0x21d/0x250
[Wed Oct 23 20:50:22 2024]  folio_alloc+0x17/0x50
[Wed Oct 23 20:50:22 2024]  __filemap_get_folio+0x1cd/0x330
[Wed Oct 23 20:50:22 2024]  filemap_fault+0x404/0x700
[Wed Oct 23 20:50:22 2024]  __do_fault+0x37/0x1f0
[Wed Oct 23 20:50:22 2024]  do_read_fault+0xf4/0x1d0
[Wed Oct 23 20:50:22 2024]  do_pte_missing+0x1a9/0x400
[Wed Oct 23 20:50:22 2024]  __handle_mm_fault+0x32b/0x670
[Wed Oct 23 20:50:22 2024]  handle_mm_fault+0xcd/0x290
[Wed Oct 23 20:50:22 2024]  do_user_addr_fault+0x1b4/0x6a0
[Wed Oct 23 20:50:22 2024]  exc_page_fault+0x62/0x150
[Wed Oct 23 20:50:22 2024]  asm_exc_page_fault+0x22/0x30
[Wed Oct 23 20:50:22 2024] RIP: 0033:0x7f088d173850
[Wed Oct 23 20:50:22 2024] Code: Unable to access opcode bytes at RIP 0x7f088d173826.
[Wed Oct 23 20:50:22 2024] RSP: 002b:00007f088c4ae9b8 EFLAGS: 00010206
[Wed Oct 23 20:50:22 2024] RAX: 0000000000000005 RBX: 00007f0880000c10 RCX: 00007f088d389830
[Wed Oct 23 20:50:22 2024] RDX: 0000000000000004 RSI: 00007f088d20f21d RDI: 00007f088c996068
[Wed Oct 23 20:50:22 2024] RBP: 00007f088c996068 R08: 0000000000000000 R09: 00007f0880000c48
[Wed Oct 23 20:50:22 2024] R10: 00007f0880039ad0 R11: 0000000000000202 R12: 00007f088d20f21d
[Wed Oct 23 20:50:22 2024] R13: 0000000000000004 R14: 000000000000000a R15: 00007f088c4aeb30
[Wed Oct 23 20:50:22 2024]  </TASK>
[Wed Oct 23 20:50:22 2024] Mem-Info:
[Wed Oct 23 20:50:22 2024] active_anon:208048 inactive_anon:173121 isolated_anon:0
                            active_file:312 inactive_file:32 isolated_file:0
                            unevictable:0 dirty:0 writeback:3
                            slab_reclaimable:6335 slab_unreclaimable:11087
                            mapped:195 shmem:2860 pagetables:27659
                            sec_pagetables:0 bounce:0
                            kernel_misc_reclaimable:0
                            free:14025 free_pcp:110 free_cma:0
[Wed Oct 23 20:50:22 2024] Node 0 active_anon:498036kB inactive_anon:1026640kB active_file:64kB inactive_file:1296kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:904kB dirty:0kB writeback:12kB shmem:11440kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 206848kB writeback_tmp:0kB kernel_stack:11120kB pagetables:110636kB sec_pagetables:0kB all_unreclaimable? no
[Wed Oct 23 20:50:22 2024] Node 0 DMA free:7264kB boost:0kB min:388kB low:484kB high:580kB reserved_highatomic:0KB active_anon:1096kB inactive_anon:5888kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Oct 23 20:50:22 2024] lowmem_reserve[]: 0 1720 1720 1720 1720
[Wed Oct 23 20:50:22 2024] Node 0 DMA32 free:48696kB boost:0kB min:44664kB low:55828kB high:66992kB reserved_highatomic:4096KB active_anon:709812kB inactive_anon:807880kB active_file:68kB inactive_file:1048kB unevictable:0kB writepending:260kB present:2080608kB managed:1802104kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Oct 23 20:50:22 2024] lowmem_reserve[]: 0 0 0 0 0
[Wed Oct 23 20:50:22 2024] Node 0 DMA: 15*4kB (UM) 2*8kB (UM) 1*16kB (U) 3*32kB (UM) 3*64kB (M) 2*128kB (UM) 2*256kB (UM) 2*512kB (UM) 1*1024kB (U) 2*2048kB (ME) 0*4096kB = 7292kB
[Wed Oct 23 20:50:22 2024] Node 0 DMA32: 1294*4kB (UME) 203*8kB (UME) 63*16kB (UME) 117*32kB (UME) 65*64kB (UME) 41*128kB (UME) 23*256kB (UME) 13*512kB (UME) 10*1024kB (UME) 3*2048kB (M) 0*4096kB = 49888kB
[Wed Oct 23 20:50:22 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed Oct 23 20:50:22 2024] 3468 total pagecache pages
[Wed Oct 23 20:50:22 2024] 160 pages in swap cache
[Wed Oct 23 20:50:22 2024] Free swap  = 232kB
[Wed Oct 23 20:50:22 2024] Total swap = 2097148kB
[Wed Oct 23 20:50:22 2024] 524150 pages RAM
[Wed Oct 23 20:50:22 2024] 0 pages HighMem/MovableOnly
[Wed Oct 23 20:50:22 2024] 69784 pages reserved
[Wed Oct 23 20:50:22 2024] 0 pages cma reserved
[Wed Oct 23 20:50:22 2024] 0 pages hwpoisoned
[Wed Oct 23 20:50:22 2024] Tasks state (memory values in pages):
[Wed Oct 23 20:50:22 2024] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Wed Oct 23 20:50:22 2024] [    596]     0   596     8980      173    90112      224          -250 systemd-journal
[Wed Oct 23 20:50:22 2024] [    609]     0   609     8214       64    81920      448         -1000 systemd-udevd
[Wed Oct 23 20:50:22 2024] [    679]     0   679     4539       36    61440      672         -1000 auditd
[Wed Oct 23 20:50:22 2024] [    701]    81   701     2699       96    65536      128          -900 dbus-broker-lau
[Wed Oct 23 20:50:22 2024] [    702]    81   702     1249       96    49152       64          -900 dbus-broker
[Wed Oct 23 20:50:22 2024] [    703]     0   703    63835      128   139264      672             0 NetworkManager
[Wed Oct 23 20:50:22 2024] [    706]     0   706    19799      112    61440      576             0 irqbalance
[Wed Oct 23 20:50:22 2024] [    707]     0   707     4792       96    73728      256             0 systemd-logind
[Wed Oct 23 20:50:22 2024] [    722]   997   722    21229      110    65536      128             0 chronyd
[Wed Oct 23 20:50:22 2024] [    741]     0   741 57404265   352114 109527040   460573             0 node
[Wed Oct 23 20:50:22 2024] [    751]     0   751     3965       96    69632      288         -1000 sshd
[Wed Oct 23 20:50:22 2024] [    754]     0   754     1521       64    53248      192             0 crond
[Wed Oct 23 20:50:22 2024] [    755]     0   755      761       64    40960       32             0 agetty
[Wed Oct 23 20:50:22 2024] [    759]   988   759     4126       64    69632      288             0 mosquitto
[Wed Oct 23 20:50:22 2024] [    785]     0   785    48249      232   163840     1920             0 rsyslogd
[Wed Oct 23 20:50:22 2024] [    786]     0   786   123677        0   126976      768             0 system-mqtt
[Wed Oct 23 20:50:22 2024] [    844]     0   844     2745       84    65536      288             0 nginx
[Wed Oct 23 20:50:22 2024] [    845]   989   845     3708       84    73728      448             0 nginx
[Wed Oct 23 20:50:22 2024] [    846]   989   846     3708      148    73728      416             0 nginx
[Wed Oct 23 20:50:22 2024] [    847]   989   847     3906       84    73728      672             0 nginx
[Wed Oct 23 20:50:22 2024] [    848]   989   848     3708      116    73728      448             0 nginx
[Wed Oct 23 20:50:22 2024] [   1217]   991  1217   755365    14039  1007616    37382             0 mongod
[Wed Oct 23 20:50:22 2024] [   7675]     0  7675   324195    11815  2265088    10864             0 node
[Wed Oct 23 20:50:22 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/connector.service,task=node,pid=741,uid=0
[Wed Oct 23 20:50:22 2024] Out of memory: Killed process 741 (node) total-vm:229617060kB, anon-rss:1407944kB, file-rss:512kB, shmem-rss:0kB, UID:0 pgtables:106960kB oom_score_adj:0

  • test/setup cgroups memory limit
  • investigate memory leak
    • valgrind
    • heaptrack
@mStirner mStirner added bug Something isn't working investigate labels Oct 23, 2024
@mStirner
Copy link
Member Author

[root@open-haus ~]# ps -T -p 28664
    PID    SPID TTY          TIME CMD
  28664   28664 ?        00:06:06 node
  28664   28668 ?        00:00:00 node
  28664   28674 ?        00:00:14 node
  28664   28675 ?        00:00:13 node
  28664   28676 ?        00:00:13 node
  28664   28677 ?        00:00:13 node
  28664   28678 ?        00:00:00 node
  28664   28682 ?        00:00:05 node
  28664   28683 ?        00:00:05 node
  28664   28684 ?        00:00:05 node
  28664   28685 ?        00:00:05 node
  28664   45607 ?        00:01:31 node
  28664   45608 ?        00:00:17 node
  28664   45609 ?        00:00:13 node
  28664   45610 ?        00:00:02 node
  28664   45611 ?        00:00:03 node
  28664   45612 ?        00:00:02 node
  28664   45613 ?        00:00:02 node
  28664   45614 ?        00:00:02 node
  28664   45615 ?        00:00:02 node
  28664   45616 ?        00:00:02 node
  28664   45617 ?        00:00:02 node
  28664   45618 ?        00:00:02 node
  28664   45619 ?        00:00:02 node
  28664   45620 ?        00:00:02 node
  28664   45621 ?        00:00:02 node
  28664   45622 ?        00:00:02 node
  28664   45623 ?        00:00:02 node
  28664   45624 ?        00:00:02 node
  28664   45627 ?        00:00:03 node
  28664   45691 ?        00:00:02 node
  28664   45873 ?        00:00:02 node
  28664   46298 ?        00:00:02 node
  28664   46435 ?        00:00:01 node
  28664   46438 ?        00:00:01 node
  28664   46440 ?        00:00:02 node
  28664   46682 ?        00:00:01 node
  28664   46968 ?        00:00:01 node
  28664   46973 ?        00:00:01 node
  28664   47063 ?        00:00:01 node
  28664   47066 ?        00:00:01 node
  28664   47106 ?        00:00:01 node
  28664   47110 ?        00:00:01 node
  28664   47112 ?        00:00:01 node
  28664   47307 ?        00:00:01 node
  28664   47361 ?        00:00:01 node
  28664   47364 ?        00:00:01 node
  28664   47513 ?        00:00:01 node
  28664   47592 ?        00:00:01 node
  28664   47595 ?        00:00:01 node
  28664   47632 ?        00:00:00 node
  28664   47634 ?        00:00:00 node
  28664   47637 ?        00:00:00 node
  28664   47717 ?        00:00:00 node
  28664   47817 ?        00:00:00 node
  28664   47819 ?        00:00:00 node
  28664   47821 ?        00:00:00 node
  28664   47824 ?        00:00:00 node
  28664   47826 ?        00:00:00 node
  28664   47830 ?        00:00:00 node
  28664   47833 ?        00:00:00 node
  28664   47921 ?        00:00:00 node
  28664   47923 ?        00:00:00 node
  28664   47925 ?        00:00:00 node
  28664   47927 ?        00:00:00 node
  28664   47931 ?        00:00:00 node
  28664   47957 ?        00:00:00 node
  28664   47962 ?        00:00:00 node
  28664   47964 ?        00:00:00 node
  28664   47966 ?        00:00:00 node
  28664   47968 ?        00:00:00 node
  28664   47975 ?        00:00:00 node
  28664   47978 ?        00:00:00 node
  28664   47980 ?        00:00:00 node
  28664   47982 ?        00:00:00 node
  28664   47985 ?        00:00:00 node
  28664   47988 ?        00:00:00 node
  28664   47992 ?        00:00:00 node
  28664   48135 ?        00:00:00 node
  28664   48137 ?        00:00:00 node
  28664   48541 ?        00:00:00 node
[root@open-haus ~]# ps -T -p 28664 | wc -l
81
[root@open-haus ~]# systemctl status connector
● connector.service - OpenHaus SmartHome/IoT connector
     Loaded: loaded (/usr/lib/systemd/system/connector.service; enabled; preset: disabled)
     Active: active (running) since Wed 2024-10-23 20:50:24 CEST; 2 days ago
       Docs: https://open-haus.io
   Main PID: 28664 (node)
      Tasks: 80 (limit: 11107)
     Memory: 754.6M
        CPU: 21min 41.849s
     CGroup: /system.slice/connector.service
             └─28664 /usr/bin/node index.js

Oct 25 23:18:28 open-haus.lan node[28664]:   args: [
Oct 25 23:18:28 open-haus.lan node[28664]:     {
Oct 25 23:18:28 open-haus.lan node[28664]:       _id: '65b2874b98c30b71f603a07a',
Oct 25 23:18:28 open-haus.lan node[28664]:       description: 'ESP8266/ESPHome IKEA VINDSTYRKA Particle sensor',
Oct 25 23:18:28 open-haus.lan node[28664]:       topic: 'air-sensor/sensor/particulate_matter_25m_concentration/state',
Oct 25 23:18:28 open-haus.lan node[28664]:       timestamps: [Object],
Oct 25 23:18:28 open-haus.lan node[28664]:       labels: []
Oct 25 23:18:28 open-haus.lan node[28664]:     }
Oct 25 23:18:28 open-haus.lan node[28664]:   ]
Oct 25 23:18:28 open-haus.lan node[28664]: }
  • 25.10.2024 - 23:18

@mStirner
Copy link
Member Author

mStirner commented Oct 26, 2024

[root@open-haus ~]# systemctl status connector
● connector.service - OpenHaus SmartHome/IoT connector
     Loaded: loaded (/usr/lib/systemd/system/connector.service; enabled; preset: disabled)
     Active: active (running) since Sat 2024-10-26 12:46:59 CEST; 11h ago
       Docs: https://open-haus.io
   Main PID: 51700 (node)
      Tasks: 60 (limit: 11107)
     Memory: 418.3M
        CPU: 3min 28.653s
     CGroup: /system.slice/connector.service
             └─51700 /usr/bin/node index.js

Oct 26 22:42:11 open-haus.lan node[51700]:       'led-strip=ws2801',
Oct 26 22:42:11 open-haus.lan node[51700]:       'magic=8d89fe9e-1f86-480b-bd25-005b8cadaf19'
Oct 26 22:42:11 open-haus.lan node[51700]:     ],
Oct 26 22:42:11 open-haus.lan node[51700]:     timestamps: { created: 1726259422761, updated: null },
Oct 26 22:42:11 open-haus.lan node[51700]:     room: null,
Oct 26 22:42:11 open-haus.lan node[51700]:     enabled: true,
Oct 26 22:42:11 open-haus.lan node[51700]:     icon: null
Oct 26 22:42:11 open-haus.lan node[51700]:   }
Oct 26 22:42:11 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 26 22:42:11 open-haus.lan node[51700]: [2024.10.26 - 22:42.11.565][info][system] [connected] tcp://192.168.2.178:8080
[root@open-haus ~]#  ps -T -p 51700 | wc -l
61
  • 27.10.2024 - 00:16

@mStirner
Copy link
Member Author

mStirner commented Oct 27, 2024

Thoughts:

May be backend realted?!
After restarting backend & disableing some plugins, the ammount of node processes reduced to 25/28/33 processes:

27.10.2024 - 01:58
Plugin sdeaktiviert
- onky/pioneer
- shelly integration
- smartmeter
- "worker thread demo" removed
- me-ac-mbs-1 integration


Bevor sbackend restart: 61 connector worker threads (ps -T -p 51700 | wc -l)
Nach neustart: 25/28/33 connector worker trheads

Perhaps the backed is not signaling correctly a broken/collapsed bridged.
But why is this not catched with WebSocket ping/pong (Maybe because its not their purpose, and just apply to "disconnected wires" type of errors, where no ACK etc. pp. package is send)


A simple repl proceess alone starts 15 node processes:

marc@workstation:~$ ps -T aux | grep -i node | wc -l
2
marc@workstation:~$ ps -T aux | grep -i node | wc -l
17

2 before issuing node, 17 = after

@mStirner
Copy link
Member Author

[root@open-haus ~]# systemctl status connector
● connector.service - OpenHaus SmartHome/IoT connector
     Loaded: loaded (/usr/lib/systemd/system/connector.service; enabled; preset: disabled)
     Active: active (running) since Sat 2024-10-26 12:46:59 CEST; 1 day 11h ago
       Docs: https://open-haus.io
   Main PID: 51700 (node)
      Tasks: 68 (limit: 11107)
     Memory: 486.4M
        CPU: 11min 12.142s
     CGroup: /system.slice/connector.service
             └─51700 /usr/bin/node index.js

Oct 27 22:17:01 open-haus.lan node[51700]:       'led-strip=ws2801',
Oct 27 22:17:01 open-haus.lan node[51700]:       'magic=8d89fe9e-1f86-480b-bd25-005b8cadaf19'
Oct 27 22:17:01 open-haus.lan node[51700]:     ],
Oct 27 22:17:01 open-haus.lan node[51700]:     timestamps: { created: 1726259422761, updated: null },
Oct 27 22:17:01 open-haus.lan node[51700]:     room: null,
Oct 27 22:17:01 open-haus.lan node[51700]:     enabled: true,
Oct 27 22:17:01 open-haus.lan node[51700]:     icon: null
Oct 27 22:17:01 open-haus.lan node[51700]:   }
Oct 27 22:17:01 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 27 22:17:02 open-haus.lan node[51700]: [2024.10.27 - 22:17.02.039][info][system] [connected] tcp://192.168.2.178:8080
[root@open-haus ~]# ps -T -p 51700 | wc -l
69
  • 27.10.2024 - 23:30

@mStirner
Copy link
Member Author

Added some debug messages in the backend to see if the ws is closed properly:
In router.api.device.js:

                        ["end", "close"].forEach((event) => {
                            stream.once(event, () => {
                                console.log(`>> stream fired "${event}" event in router.api.devices.js`);
                                ws.close(0, `event "${event}" fired on stream`);
                            });
                        });

In class.interface.js .pipe(..., {end: true}), was added.

@mStirner
Copy link
Member Author

Oct 28 15:50:02 open-haus.lan node[61996]: [2024.10.28 - 15:50.02.263][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:51:02 open-haus.lan node[61996]: [2024.10.28 - 15:51.02.438][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:52:02 open-haus.lan node[61996]: [2024.10.28 - 15:52.02.464][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:53:02 open-haus.lan node[61996]: [2024.10.28 - 15:53.02.495][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:54:02 open-haus.lan node[61996]: [2024.10.28 - 15:54.02.521][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:55:02 open-haus.lan node[61996]: [2024.10.28 - 15:55.02.533][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:55:56 open-haus.lan node[61996]: [2024.10.28 - 15:55.56.242][warn][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] WebSocket is dead, reconnect ws://192.168.2.178:8080/
Oct 28 15:55:56 open-haus.lan node[61996]: [2024.10.28 - 15:55.56.243][debug][devices] Bridge closed: iface 66e4a0de72e58e7085f2a2dc <-> tcp://192.168.2.178:8080 (b6c74864-9c1b-4cf3-88cf-f3110356e5a5)
Oct 28 15:55:56 open-haus.lan node[61996]: [2024.10.28 - 15:55.56.244][error][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] WebSocket closed from "ws://192.168.2.178:8080/"
Oct 28 15:56:01 open-haus.lan node[61996]: [2024.10.28 - 15:56.01.346][debug][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] Try to connecto to WebSocket "ws://192.168.2.178:8080/"
Oct 28 15:56:01 open-haus.lan node[61996]: [2024.10.28 - 15:56.01.432][debug][http] 127.0.0.1 - [GET] /api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 15:56:01 open-haus.lan node[61996]: [2024.10.28 - 15:56.01.436][debug][devices] Bridge created for interface "66e4a0de72e58e7085f2a2dc" (76a6bf95-f503-4d1b-91e2-461da9ffc420)
Oct 28 15:56:02 open-haus.lan node[61996]: [2024.10.28 - 15:56.02.604][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:56:31 open-haus.lan node[61996]: [2024.10.28 - 15:56.31.346][debug][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] Could not create WebSocket connection in 30s to 192.168.2.178:8080; Redo...
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.448][debug][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] Try to connecto to WebSocket "ws://192.168.2.178:8080/"
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.535][debug][http] 127.0.0.1 - [GET] /api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.540][debug][devices] Bridge created for interface "66e4a0de72e58e7085f2a2dc" (a9f009be-16c4-4c71-959a-555e9830e910)
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.676][info][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] WebSocket connected to: ws://192.168.2.178:8080/

"Websocket is dead": But no debug message ">> stream fired "end" event in router.api.devices.js" is printed.
Is the websocket between backend & connector keept alive and just a new bridged worker spawenen in the connector?
Before this message, there where 30 tasks shown systemctl status connector, now 31...

.178 is the lowboard custom LED Strip.
Due to a cheap PSU the Pi is undervoltaged and random restarts.

@mStirner
Copy link
Member Author

Oct 28 16:10:07 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 16:10:07 open-haus.lan node[51700]: [2024.10.28 - 16:10.07.746][info][system] [connected] tcp://192.168.2.178:8080
...
Oct 28 16:13:18 open-haus.lan node[51700]: Message from backend for bidrigin interface {"iface":"66e4a0de72e58e7085f2a2dc","type":"request","uuid":"3ddf6291-64f8-4139-b705-824dbf31048f","socket":true}
Oct 28 16:13:18 open-haus.lan node[51700]: bridge request
...
Oct 28 16:13:18 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
...
Oct 28 16:15:38 open-haus.lan node[51700]: Message from backend for bidrigin interface {"iface":"66e4a0de72e58e7085f2a2dc","type":"request","uuid":"502a72d9-4102-4fa2-bdf9-c10a0e538c5f","socket":true}
Oct 28 16:15:38 open-haus.lan node[51700]: bridge request
...
Oct 28 16:15:38 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 16:15:38 open-haus.lan node[51700]: [2024.10.28 - 16:15.38.544][info][system] [connected] tcp://192.168.2.178:8080
...
Oct 28 16:18:59 open-haus.lan node[51700]: Message from backend for bidrigin interface {"iface":"66e4a0de72e58e7085f2a2dc","type":"request","uuid":"e3c3b57c-c022-421c-876f-b3b538acc1ae","socket":true}
Oct 28 16:18:59 open-haus.lan node[51700]: bridge request
...
Oct 28 16:18:59 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 16:18:59 open-haus.lan node[51700]: [2024.10.28 - 16:18.59.225][info][system] [connected] tcp://192.168.2.178:8080

Turning on/off the raspberry, leads to broken websocket connection.
So far so clear. But it seems the underlaying bridiging websocket is not closed properley which leads to spawning worker over and over again when a retry is attempt to establish a ws connection from the plugin to the pi.

Is this a issue in the plugin, backend or connector.
Or a combination of all of them?

The backend/connector should handle that edge case.
When a bridge is no longer need it should be teared down.

@mStirner
Copy link
Member Author

This could be a wider problem that thought.
When a attempt is made do a non existing tcp socket/server, this error raises.

I think the backend should handle this and terminate the websocket connection between connector & backend, when no data is transmitted for a certain time.

mStirner added a commit to mStirner/connector that referenced this issue Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working investigate
Projects
None yet
Development

No branches or pull requests

1 participant