-
Notifications
You must be signed in to change notification settings - Fork 38
turn off sqlite auto_checkpoint and introduce period-based checkpoint calls #547
turn off sqlite auto_checkpoint and introduce period-based checkpoint calls #547
Conversation
2216505
to
ce6b596
Compare
@MarinPostma this commit makes sqld take over checkpointing, which is what we discussed a few times in context of deduplicating WAL. I'd appreciate you take a look to see if it doesn't interfere with replication in any way (it shouldn't, but worth double-checking) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Horusiath also, we should turn off auto_checkpoint in every single database connection that opens, because by default, each connection opens with autocheckpoint set to 1000 pages. So, all places that call libsql_open, like
sqld/sqld-libsql-bindings/src/lib.rs
Line 90 in f50803f
let rc = rusqlite::ffi::libsql_open_v2( |
sqlite3_wal_autocheckpoint(db, 0)
as well. If we miss even one, it will cause a checkpoint to happen eventually
we should limit the WAL size as well. The disk size on the platform is rather, and the WAL can grow very fast |
"Checkpoint failed: database journal_mode is not WAL" | ||
)) | ||
} else { | ||
conn.execute("VACUUM", ())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure a vacuum is necessary on every checkpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the original code. I'm not sure, what would be the best course of action here. We can leave VACUUM as an explicit request from the user side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was different, it was meant for replication log recovery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'll just remove it and let users perform it explicitly. It shouldn't be a problem most of the time.
eaacb8c
to
075c370
Compare
@Horusiath was the comment about optionally turning off autocheckpoint for every single connection that we open already applied, or not yet? |
@psarna question would be - should we turn off auto_checkpoint by default on connection open? In general this only happens if checkpoint interval and bottomless are set up. Given that these can be set either as env vars or command line params, it gets extremely tedious to pass that information down to connection open. |
@Horusiath to the best of my knowledge we open db connections in two main places for user traffic:
and two places for our own maintenance ops:
, and moreover, I think we only ever intend to replicate to S3 from the primary, so the connection open in That leaves 3 places to handle, and I think all of them at some point have access to our Since autocheckpoint is a per-connection option, and we want to control how often checkpoints happen, I don't see any other option rather than consistently disable it if needed for each connection that we open. There's also an SQLite compilation flag that allows changing the default threshold, but it's not that interesting from our point of view, since we need to be able to determine the autocheckpoint threshold from command-line parameters. Another option to consider would be to contribute to libSQL and add a global option that sets the default WAL autocheckpoint for new connections, but that sounds a little too specific to be generally useful, so I'm not sure. In any case, we need to be able to consistently disable autocheckpointing for all connections that we create, otherwise we'd have spurious checkpoints that we don't want. /cc @haaawk |
Generally,
|
37d3ce2
to
f5e8cc6
Compare
@Horusiath after giving it some more thought, I think the following makes sense:
yes, replicas don't use bottomless anyway
yes
does it only ever get called on startup, before bottomless is initialized? If so, we can leave autocheckpoint on, or always turn it off, I don't care much either way
here's the only place where we actually need to push the configuration option, so that we know if we should enable or disable autocheckpoint, and to which value ... and let me also reiterate my mantra that the most important bit is to make sure that the checkpointing fiber uses a connection with our WAL methods from primary/logger.rs registered, so that it calls bottomless code on checkpoint. |
actually, we probably shouldn't even for that, maybe a handle to manually trigger checkpoints that we are sure bottomless will pick up. But we can't really completely turn it off either, because a dump can be very large |
@psarna could you review again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good except for one place when we're using QueryBuilderConfig::default in libsql.rs
bb890d6
to
c2185e6
Compare
@@ -192,7 +192,7 @@ impl Default for Options { | |||
Options { | |||
create_bucket_if_not_exists: true, | |||
verify_crc: true, | |||
use_compression: CompressionKind::None, | |||
use_compression: CompressionKind::Gzip, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this change related to auto checkpointing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, I've left it when started running comparisons with Litestream. Keeping non-compressed WAL frames around is extremely expensive and we definitely shouldn't do that by default.
Changes were made by Bartosz to address the comments
This PR turns off auto_checkpoint in sqld-embed SQLite instance (which by default is every 1000 pages) and introduces a fiber loop, which checkpoints given configured time interval (by default it's every 1h).