turn off sqlite auto_checkpoint and introduce period-based checkpoint calls #547

Horusiath · 2023-07-24T11:38:42Z

This PR turns off auto_checkpoint in sqld-embed SQLite instance (which by default is every 1000 pages) and introduces a fiber loop, which checkpoints given configured time interval (by default it's every 1h).

sqld/src/lib.rs

sqld/src/replication/primary/logger.rs

psarna · 2023-07-25T07:16:30Z

@MarinPostma this commit makes sqld take over checkpointing, which is what we discussed a few times in context of deduplicating WAL. I'd appreciate you take a look to see if it doesn't interfere with replication in any way (it shouldn't, but worth double-checking)

psarna

@Horusiath also, we should turn off auto_checkpoint in every single database connection that opens, because by default, each connection opens with autocheckpoint set to 1000 pages. So, all places that call libsql_open, like

sqld/sqld-libsql-bindings/src/lib.rs

Line 90 in f50803f

let rc = rusqlite::ffi::libsql_open_v2(

, need to run sqlite3_wal_autocheckpoint(db, 0) as well. If we miss even one, it will cause a checkpoint to happen eventually

MarinPostma · 2023-07-25T07:30:46Z

we should limit the WAL size as well. The disk size on the platform is rather, and the WAL can grow very fast

sqld/src/lib.rs

sqld/src/replication/primary/logger.rs

MarinPostma · 2023-07-25T07:35:13Z

sqld/src/replication/primary/logger.rs

+                    "Checkpoint failed: database journal_mode is not WAL"
+                ))
+            } else {
+                conn.execute("VACUUM", ())?;


not sure a vacuum is necessary on every checkpoint

That's the original code. I'm not sure, what would be the best course of action here. We can leave VACUUM as an explicit request from the user side.

The intent was different, it was meant for replication log recovery

Maybe I'll just remove it and let users perform it explicitly. It shouldn't be a problem most of the time.

psarna · 2023-07-31T09:34:29Z

@Horusiath also, we should turn off auto_checkpoint in every single database connection that opens, because by default, each connection opens with autocheckpoint set to 1000 pages. So, all places that call libsql_open, like

sqld/sqld-libsql-bindings/src/lib.rs

Line 90 in f50803f

let rc = rusqlite::ffi::libsql_open_v2(

, need to run sqlite3_wal_autocheckpoint(db, 0) as well. If we miss even one, it will cause a checkpoint to happen eventually

@Horusiath was the comment about optionally turning off autocheckpoint for every single connection that we open already applied, or not yet?

Horusiath · 2023-08-01T06:26:47Z

@psarna question would be - should we turn off auto_checkpoint by default on connection open? In general this only happens if checkpoint interval and bottomless are set up. Given that these can be set either as env vars or command line params, it gets extremely tedious to pass that information down to connection open.

psarna · 2023-08-01T06:47:17Z

@Horusiath to the best of my knowledge we open db connections in two main places for user traffic:

[sarna@sarna-pc sqld]$ git grep sqld_libsql_bindings::Connection::open\(
sqld/src/database/libsql.rs:    sqld_libsql_bindings::Connection::open(path, flags, wal_methods, hook_ctx)
sqld/src/replication/replica/injector.rs:        let conn = sqld_libsql_bindings::Connection::open(

and two places for our own maintenance ops:

[sarna@sarna-pc sqld]$ git grep rusqlite::Connection::open\(
sqld/src/main.rs:    let conn = rusqlite::Connection::open(db_path.join("data"))?;
sqld/src/replication/primary/logger.rs:        let conn = rusqlite::Connection::open(data_path)?;

, and moreover, I think we only ever intend to replicate to S3 from the primary, so the connection open in replica/injector.rs can be ignored for now.

That leaves 3 places to handle, and I think all of them at some point have access to our Config instance, which contains all the information we need to determine if we should disable autocheckpoint or not.

Since autocheckpoint is a per-connection option, and we want to control how often checkpoints happen, I don't see any other option rather than consistently disable it if needed for each connection that we open. There's also an SQLite compilation flag that allows changing the default threshold, but it's not that interesting from our point of view, since we need to be able to determine the autocheckpoint threshold from command-line parameters. Another option to consider would be to contribute to libSQL and add a global option that sets the default WAL autocheckpoint for new connections, but that sounds a little too specific to be generally useful, so I'm not sure. In any case, we need to be able to consistently disable autocheckpointing for all connections that we create, otherwise we'd have spurious checkpoints that we don't want. /cc @haaawk

Horusiath · 2023-08-01T14:06:10Z

Generally, sqld_libsql_bindings::Connection::open is used by:

replica::injector::FrameInjector - leave auto_checkpoint to default?
As part of libsql::open_db which is used in few different places:
- storage_monitor which is read only (so we can leave autocheckpoint untouched)
- DumpLoader which can execute any arbitrary scripts.
- libsql::Connection<'a> which can be called from anywhere and doesn't seem to have access to bottomless config

psarna · 2023-08-11T10:06:44Z

@Horusiath after giving it some more thought, I think the following makes sense:

replica::injector::FrameInjector - leave auto_checkpoint to default?

yes, replicas don't use bottomless anyway

As part of libsql::open_db which is used in few different places:
storage_monitor which is read only (so we can leave autocheckpoint untouched)

yes

DumpLoader which can execute any arbitrary scripts.

does it only ever get called on startup, before bottomless is initialized? If so, we can leave autocheckpoint on, or always turn it off, I don't care much either way

libsql::Connection<'a> which can be called from anywhere and doesn't seem to have access to bottomless config

here's the only place where we actually need to push the configuration option, so that we know if we should enable or disable autocheckpoint, and to which value

... and let me also reiterate my mantra that the most important bit is to make sure that the checkpointing fiber uses a connection with our WAL methods from primary/logger.rs registered, so that it calls bottomless code on checkpoint.

MarinPostma · 2023-08-11T10:12:24Z

does it only ever get called on startup, before bottomless is initialized? If so, we can leave autocheckpoint on, or always turn it off, I don't care much either way

actually, we probably shouldn't even for that, maybe a handle to manually trigger checkpoints that we are sure bottomless will pick up. But we can't really completely turn it off either, because a dump can be very large

Horusiath · 2023-08-21T08:16:31Z

@psarna could you review again?

sqld/src/main.rs

sqld/src/lib.rs

sqld/src/replication/replica/injector.rs

sqld/src/connection/libsql.rs

bottomless/src/replicator.rs

haaawk

Looks mostly good except for one place when we're using QueryBuilderConfig::default in libsql.rs

… calls

haaawk · 2023-08-22T12:15:29Z

bottomless/src/replicator.rs

@@ -192,7 +192,7 @@ impl Default for Options {
        Options {
            create_bucket_if_not_exists: true,
            verify_crc: true,
-            use_compression: CompressionKind::None,
+            use_compression: CompressionKind::Gzip,


How is this change related to auto checkpointing?

I think, I've left it when started running comparisons with Litestream. Keeping non-compressed WAL frames around is extremely expensive and we definitely shouldn't do that by default.

Changes were made by Bartosz to address the comments

Horusiath requested a review from psarna July 24, 2023 11:38

Horusiath force-pushed the bart/sqld-interval-checkpoint branch from 2216505 to ce6b596 Compare July 24, 2023 12:45

psarna reviewed Jul 25, 2023

View reviewed changes

sqld/src/lib.rs Show resolved Hide resolved

sqld/src/replication/primary/logger.rs Outdated Show resolved Hide resolved

sqld/src/replication/primary/logger.rs Outdated Show resolved Hide resolved

psarna previously requested changes Jul 25, 2023

View reviewed changes

MarinPostma reviewed Jul 25, 2023

View reviewed changes

Horusiath requested a review from psarna July 25, 2023 12:13

Horusiath force-pushed the bart/sqld-interval-checkpoint branch from eaacb8c to 075c370 Compare July 26, 2023 12:17

Horusiath mentioned this pull request Aug 2, 2023

Asynchronous snapshot upload #574

Merged

Horusiath force-pushed the bart/sqld-interval-checkpoint branch 2 times, most recently from 37d3ce2 to f5e8cc6 Compare August 2, 2023 07:24

psarna mentioned this pull request Aug 11, 2023

bottomless: recover when a checkpoint happen outside of bottomless replication control #597

Open

haaawk reviewed Aug 22, 2023

View reviewed changes

sqld/src/main.rs Outdated Show resolved Hide resolved

haaawk reviewed Aug 22, 2023

View reviewed changes

sqld/src/main.rs Outdated Show resolved Hide resolved

haaawk reviewed Aug 22, 2023

View reviewed changes

sqld/src/lib.rs Outdated Show resolved Hide resolved

haaawk reviewed Aug 22, 2023

View reviewed changes

sqld/src/lib.rs Outdated Show resolved Hide resolved

haaawk reviewed Aug 22, 2023

View reviewed changes

sqld/src/lib.rs Show resolved Hide resolved

haaawk reviewed Aug 22, 2023

View reviewed changes

sqld/src/replication/replica/injector.rs Show resolved Hide resolved

haaawk reviewed Aug 22, 2023

View reviewed changes

sqld/src/connection/libsql.rs Outdated Show resolved Hide resolved

haaawk reviewed Aug 22, 2023

View reviewed changes

bottomless/src/replicator.rs Outdated Show resolved Hide resolved

haaawk suggested changes Aug 22, 2023

View reviewed changes

turn off sqlite auto_checkpoint and introduce period-based checkpoint…

3350d7f

… calls

Horusiath added 11 commits August 22, 2023 12:13

bottomless: added measurements to critical ops

831b906

fixed review remarks

9f2e1da

turn off vacuum on each checkpoint

cf9b2fd

made auto_checkpoint configurable at Connection::open level

00d2113

Connection::open fixed data_path for auto_checkpoint setup

66c9c31

fix clippy errors

2d5e09d

fixing mess after merge with main

e6a536f

reattach rust-toolchain.toml

d7e61a3

put enough updates to enforce auto_checkpoint in bottomless tests

7e4cc61

post-merge fixes

0b02b1c

libsqld factory: propagate auto_checkpoint from config

c2185e6

Horusiath force-pushed the bart/sqld-interval-checkpoint branch from bb890d6 to c2185e6 Compare August 22, 2023 10:45

haaawk reviewed Aug 22, 2023

View reviewed changes

haaawk approved these changes Aug 22, 2023

View reviewed changes

Horusiath enabled auto-merge August 22, 2023 12:31

Horusiath added this pull request to the merge queue Aug 22, 2023

Merged via the queue into libsql:main with commit a61dffa Aug 22, 2023
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turn off sqlite auto_checkpoint and introduce period-based checkpoint calls #547

turn off sqlite auto_checkpoint and introduce period-based checkpoint calls #547

Horusiath commented Jul 24, 2023

psarna commented Jul 25, 2023

psarna left a comment

MarinPostma commented Jul 25, 2023

MarinPostma Jul 25, 2023

Horusiath Jul 25, 2023

MarinPostma Jul 25, 2023

Horusiath Jul 25, 2023

psarna commented Jul 31, 2023

Horusiath commented Aug 1, 2023

psarna commented Aug 1, 2023

Horusiath commented Aug 1, 2023 •

edited

Loading

psarna commented Aug 11, 2023

MarinPostma commented Aug 11, 2023 •

edited

Loading

Horusiath commented Aug 21, 2023

haaawk left a comment

haaawk Aug 22, 2023 •

edited

Loading

Horusiath Aug 22, 2023

turn off sqlite auto_checkpoint and introduce period-based checkpoint calls #547

turn off sqlite auto_checkpoint and introduce period-based checkpoint calls #547

Conversation

Horusiath commented Jul 24, 2023

psarna commented Jul 25, 2023

psarna left a comment

Choose a reason for hiding this comment

MarinPostma commented Jul 25, 2023

MarinPostma Jul 25, 2023

Choose a reason for hiding this comment

Horusiath Jul 25, 2023

Choose a reason for hiding this comment

MarinPostma Jul 25, 2023

Choose a reason for hiding this comment

Horusiath Jul 25, 2023

Choose a reason for hiding this comment

psarna commented Jul 31, 2023

Horusiath commented Aug 1, 2023

psarna commented Aug 1, 2023

Horusiath commented Aug 1, 2023 • edited Loading

psarna commented Aug 11, 2023

MarinPostma commented Aug 11, 2023 • edited Loading

Horusiath commented Aug 21, 2023

haaawk left a comment

Choose a reason for hiding this comment

haaawk Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Horusiath Aug 22, 2023

Choose a reason for hiding this comment

Horusiath commented Aug 1, 2023 •

edited

Loading

MarinPostma commented Aug 11, 2023 •

edited

Loading

haaawk Aug 22, 2023 •

edited

Loading