You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 18, 2023. It is now read-only.
Current bottomless replication implementation depends heavily on the fact that we control checkpoints - it replicates data straight from the WAL file, so it needs to be aware when a checkpoint happens, in order to make sure everything gets replicated, and its own metadata gets updated.
If a checkpoint happens outside of bottomless replication control, e.g. by another database connection that doesn't use bottomless virtual WAL methods, we can see a log entry like this:
2023-08-11T08:42:25.917655Z ERROR bottomless::replicator: [BUG] Local max valid frame is 0, while replicator thinks it's 10
Right now bottomless just logs the error and continues, but perhaps we should consider a more robust mechanism, e.g. marking current generation as potentially corrupt/partial, and creating a new one ASAP, so that we can always restore the state safely.
NOTE: There's a separate sub-issue of this one that we experienced seeing the log error above in sqld, which wasn't supposed to happen -- perhaps we have a connection somewhere that didn't properly disable wal_autocheckpoint?
The text was updated successfully, but these errors were encountered:
One way to trigger such a state manually is to run sqld --enable-bottomless-replication, inject some data, and then create a shell connection on the side, straight on the data file, e.g. sqlite3 data.sqld/dbs/default/data, and perform a PRAGMA wal_checkpoint(TRUNCATE) on it.
Ok, update: since neither #547 nor #574 are applied yet, we don't really disable autocheckpoint on connections. That means we often perform a checkpoint outside of bottomless control, which explains why we see the error in the logs from time to time.
It's also very important to remember that the database connection that performs our periodic checkpoint uses bottomless WAL methods -- a regular db connection is not enough, since such a checkpoint won't trigger our custom replication code that happens in on_checkpoint callback.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Current bottomless replication implementation depends heavily on the fact that we control checkpoints - it replicates data straight from the WAL file, so it needs to be aware when a checkpoint happens, in order to make sure everything gets replicated, and its own metadata gets updated.
If a checkpoint happens outside of bottomless replication control, e.g. by another database connection that doesn't use bottomless virtual WAL methods, we can see a log entry like this:
Right now bottomless just logs the error and continues, but perhaps we should consider a more robust mechanism, e.g. marking current generation as potentially corrupt/partial, and creating a new one ASAP, so that we can always restore the state safely.
Opinions? cc @Horusiath @MarinPostma
NOTE: There's a separate sub-issue of this one that we experienced seeing the log error above in sqld, which wasn't supposed to happen -- perhaps we have a connection somewhere that didn't properly disable wal_autocheckpoint?
The text was updated successfully, but these errors were encountered: