harbor-db Always in recovery mode #1804

Tebby0753 · 2024-08-05T10:19:23Z

Hello:
I used helm to deploy harbor, and it has been running for about more than a year, but recently harbor-db often prints: the database system is in recovery mode. I see that the monitoring resources are normal. The following is the printed log:

2024-08-05 10:07:43.816 UTC [2480] FATAL: the database system is in recovery mode
2024-08-05 10:07:53.812 UTC [2490] FATAL: the database system is in recovery mode
2024-08-05 10:08:02.040 UTC [1635] LOG: database system was not properly shut down; automatic recovery in progress
2024-08-05 10:08:03.465 UTC [1635] LOG: invalid record length at 1/41B81728: wanted 24, got 0
2024-08-05 10:08:03.465 UTC [1635] LOG: redo is not required
2024-08-05 10:08:03.825 UTC [2501] FATAL: the database system is in recovery mode
2024-08-05 10:08:07.401 UTC [1] LOG: database system is ready to accept connections
2024-08-05 10:08:17.371 UTC [1] LOG: server process (PID 2514) exited with exit code 141
2024-08-05 10:08:17.371 UTC [1] LOG: terminating any other active server processes
2024-08-05 10:08:17.371 UTC [2517] WARNING: terminating connection because of crash of another server process
2024-08-05 10:08:17.371 UTC [2517] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-08-05 10:08:17.371 UTC [2517] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-08-05 10:08:17.372 UTC [2505] WARNING: terminating connection because of crash of another server process
2024-08-05 10:08:17.372 UTC [2505] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-08-05 10:08:17.372 UTC [2505] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2024-08-05 10:08:18.251 UTC [1] LOG: all server processes terminated; reinitializing
2024-08-05 10:08:18.548 UTC [2518] LOG: database system was interrupted; last known up at 2024-08-05 10:08:07 UTC
2024-08-05 10:08:23.812 UTC [2528] FATAL: the database system is in recovery mode
2024-08-05 10:08:33.811 UTC [2538] FATAL: the database system is in recovery mode
2024-08-05 10:08:43.813 UTC [2548] FATAL: the database system is in recovery mode
2024-08-05 10:08:44.191 UTC [2558] FATAL: the database system is in recovery mode
2024-08-05 10:08:53.810 UTC [2568] FATAL: the database system is in recovery mode
2024-08-05 10:09:03.813 UTC [2578] FATAL: the database system is in recovery mode
2024-08-05 10:09:13.816 UTC [2587] FATAL: the database system is in recovery mode
2024-08-05 10:09:23.817 UTC [2597] FATAL: the database system is in recovery mode
2024-08-05 10:09:33.813 UTC [2607] FATAL: the database system is in recovery mode
2024-08-05 10:09:43.810 UTC [2617] FATAL: the database system is in recovery mode
2024-08-05 10:09:50.197 UTC [2627] FATAL: the database system is in recovery mode
2024-08-05 10:09:53.812 UTC [2637] FATAL: the database system is in recovery mode
2024-08-05 10:10:03.816 UTC [2647] FATAL: the database system is in recovery mode
2024-08-05 10:10:13.809 UTC [2656] FATAL: the database system is in recovery mode
2024-08-05 10:10:23.816 UTC [2666] FATAL: the database system is in recovery mode

The log will keep looping...
i want to know what happened，Thanks!

Tebby0753 · 2024-08-05T10:23:47Z

harbor-db resource monitoring：

zyyw · 2024-08-07T08:44:50Z

According to this resources:

This error of:

2024-08-05 10:07:53.812 UTC [2490] FATAL: the database system is in recovery mode
2024-08-05 10:08:02.040 UTC [1635] LOG: database system was not properly shut down; automatic recovery in progress

2024-08-05 10:08:17.371 UTC [2517] WARNING: terminating connection because of crash of another server process
2024-08-05 10:08:17.371 UTC [2517] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-08-05 10:08:17.371 UTC [2517] HINT: In a moment you should be able to reconnect to the database and repeat your command.

may have something to do with the disk space utilization and memory allocation in your environment.

abinet · 2024-09-10T13:04:10Z

We are experiencing same problem now. Harbor version 2.10, internal PostgreSQL database. Following pattern is observable multiple times per day:

2024-09-09 16:59:39.372 UTC [1] LOG:  server process (PID 1129079) exited with exit code 141ESC[0K
2024-09-09 16:59:39.372 UTC [1] LOG:  terminating any other active server processesESC[0K
2024-09-09 16:59:39.410 UTC [1] LOG:  all server processes terminated; reinitializingESC[0K
2024-09-09 16:59:39.473 UTC [1129087] LOG:  database system was interrupted; last known up at 2024-09-09 16:02:59 UTCESC[0K
2024-09-09 16:59:41.136 UTC [1129087] LOG:  database system was not properly shut down; automatic recovery in progressESC[0K
2024-09-09 16:59:41.159 UTC [1129087] LOG:  redo starts at 1/36EC9F70ESC[0K
2024-09-09 16:59:41.160 UTC [1129087] LOG:  invalid record length at 1/36ECA070: wanted 24, got 0ESC[0K
2024-09-09 16:59:41.160 UTC [1129087] LOG:  redo done at 1/36ECA038 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 sESC[0K
2024-09-09 16:59:41.225 UTC [1] LOG:  database system is ready to accept connectionsESC[0K
2024-09-09 16:59:47.355 UTC [1129115] LOG:  PID 1128695 in cancel request did not match any processESC[0K
2024-09-09 16:59:49.162 UTC [1129116] LOG:  PID 1128696 in cancel request did not match any processESC[0K
2024-09-09 16:59:49.163 UTC [1129117] LOG:  PID 1128718 in cancel request did not match any processESC[0K
2024-09-09 16:59:58.588 UTC [1129139] LOG:  PID 1129004 in cancel request did not match any processESC[0K
2024-09-09 17:00:07.196 UTC [1129164] LOG:  PID 1128841 in cancel request did not match any processESC[0K

Extending memory resources for database pod does not help. I am not able to see any anomalies in Harbor Grafana Dashboards (although there is no postgres specific dashboard provided).
There are neither OOM killer logs nor OOM Kubernetes events.

Here I found similar issues goharbor/harbor#20272, #1184 but they are closed already without solution being provided.

General PostgreSQL suggestion in such cases is to investigate core dumps. I am not sure if Harbor Database Pod has been configured to write core dumps, nor allows such a configuration.

Any suggestions and ideas how to troubleshoot this are very appreciated.

antwacky · 2024-10-14T16:11:50Z

I am also seeing this, I've previously resolved this by running some cleanup of the DB as mentioned here although I wish it would do this itself.

valkiriaaquatica · 2024-11-16T20:53:09Z

Hi, in my case I was using a storage class tht was using NFS storage with HDD, i changed it to Longhorn+HDD and seems better, no more Always in recovery mode recevied

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harbor-db Always in recovery mode #1804

harbor-db Always in recovery mode #1804

Tebby0753 commented Aug 5, 2024

Tebby0753 commented Aug 5, 2024

zyyw commented Aug 7, 2024

abinet commented Sep 10, 2024

antwacky commented Oct 14, 2024

valkiriaaquatica commented Nov 16, 2024

harbor-db Always in recovery mode #1804

harbor-db Always in recovery mode #1804

Comments

Tebby0753 commented Aug 5, 2024

Tebby0753 commented Aug 5, 2024

zyyw commented Aug 7, 2024

abinet commented Sep 10, 2024

antwacky commented Oct 14, 2024

valkiriaaquatica commented Nov 16, 2024