bottomless: recover when a checkpoint happen outside of bottomless replication control #597

psarna · 2023-08-11T08:47:09Z

Current bottomless replication implementation depends heavily on the fact that we control checkpoints - it replicates data straight from the WAL file, so it needs to be aware when a checkpoint happens, in order to make sure everything gets replicated, and its own metadata gets updated.
If a checkpoint happens outside of bottomless replication control, e.g. by another database connection that doesn't use bottomless virtual WAL methods, we can see a log entry like this:

2023-08-11T08:42:25.917655Z ERROR bottomless::replicator: [BUG] Local max valid frame is 0, while replicator thinks it's 10

Right now bottomless just logs the error and continues, but perhaps we should consider a more robust mechanism, e.g. marking current generation as potentially corrupt/partial, and creating a new one ASAP, so that we can always restore the state safely.

Opinions? cc @Horusiath @MarinPostma

NOTE: There's a separate sub-issue of this one that we experienced seeing the log error above in sqld, which wasn't supposed to happen -- perhaps we have a connection somewhere that didn't properly disable wal_autocheckpoint?

The text was updated successfully, but these errors were encountered:

psarna · 2023-08-11T08:48:30Z

One way to trigger such a state manually is to run sqld --enable-bottomless-replication, inject some data, and then create a shell connection on the side, straight on the data file, e.g. sqlite3 data.sqld/dbs/default/data, and perform a PRAGMA wal_checkpoint(TRUNCATE) on it.

psarna · 2023-08-11T09:36:10Z

Ok, update: since neither #547 nor #574 are applied yet, we don't really disable autocheckpoint on connections. That means we often perform a checkpoint outside of bottomless control, which explains why we see the error in the logs from time to time.

It's also very important to remember that the database connection that performs our periodic checkpoint uses bottomless WAL methods -- a regular db connection is not enough, since such a checkpoint won't trigger our custom replication code that happens in on_checkpoint callback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bottomless: recover when a checkpoint happen outside of bottomless replication control #597

bottomless: recover when a checkpoint happen outside of bottomless replication control #597

psarna commented Aug 11, 2023

psarna commented Aug 11, 2023

psarna commented Aug 11, 2023

bottomless: recover when a checkpoint happen outside of bottomless replication control #597

bottomless: recover when a checkpoint happen outside of bottomless replication control #597

Comments

psarna commented Aug 11, 2023

psarna commented Aug 11, 2023

psarna commented Aug 11, 2023