-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvdb: add sqlite #7251
kvdb: add sqlite #7251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work with the quick turn around here! Just a few things I need to dig a bit deeper into, but I get the extra step to extract out some of the SQL logic, though I think a bit more consolidation is possible.
kvdb/sqlite/db.go
Outdated
// postgresReplacements define a set of postgres keywords that should be swapped | ||
// out with certain other sqlite keywords in any queries. | ||
var postgresReplacements = common_sql.PostgresCmdReplacements{ | ||
"BYTEA": "BLOB", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the other comment about potentially reversing this operation (also name inconsistency here).
Noting that we'll need to tag a new |
c503d8b
to
c01520c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review on this @Roasbeef 💪
f6d3e6b
to
ce63f61
Compare
@Roasbeef: review reminder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice and straightforward! LGTM, only a few nits or suggestions from my end 🎉
@@ -0,0 +1,21 @@ | |||
//go:build !kvdb_sqlite || (windows && (arm || 386)) || (linux && (ppc64 || mips || mipsle || mips64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So SQLite won't be available on Windows i386/ARM and any PPC/MIPS platform? Not sure if we need to adjust the error message below to be a bit more verbose about why SQLite isn't available (e.g. "sqlite backend not available for this operating system and/or architecture").
Found a cool blog post w/ some more options we might want to mess around with: https://phiresky.github.io/blog/2020/sqlite-performance-tuning/ Stand outs there include the auto vacuum and the change to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is really shaping up well! Just a few comments, moving on to some hands on testing now, also have some ideas w.r.t other params we can tune.
if readOnly { | ||
locker = db.lock.RLocker() | ||
locker := newNoopLocker() | ||
if db.cfg.WithTxLevelLock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a response to that comment above postgres' concurrency control? So then this would be on by default for postgres?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it was in response to the comment about noticing that this tx level locking was not necessary as sqlite handles it. But did not want to make changes to the postgres logic (also ran into itest failures when removing this for postgres).
It is not on by default - in the postgres config, we now set WithTxLevelLock: true
fc8ce96
to
2884ded
Compare
Looking pretty good now, looks like one of the test fails due to the new build tags:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🍭
That's weird... The coverage reporter is attempting to read a file that was renamed/moved. Maybe something weird with Go submodules and coverage? Going to re-run to see if that fixes it. Otherwise it will be fixed as soon as we tag the submodule and update it in the follow-up PR. |
yeah I dont think it is cause of the build tags but rather just that it seems to not know how to deal with deleted/moved files. Wasn't worried about this since the test passes in the follow up PR that points to this one |
Okay, but let's hold off on merging this PR until the follow-up PR is close to be merged as well. Otherwise the coverage step will be broken on master for a while. |
latest update just adds Also removed the DBDir option from the sqlite config since we now place it in LNDs chain dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎄
4a4cb6c
to
9c4a3ba
Compare
9c4a3ba
to
24c8a68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see sqlite being added to lnd!
I did have some trouble understanding the split of the work between this PR and the next #7252. I'd expect this PR to only do prep without any mention of sqlite really yet.
`) | ||
|
||
for from, to := range replacements { | ||
finalCmd = strings.Replace(finalCmd, from, to, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to string replacements after the create definition could perhaps be to have helper functions that output "BLOB" and "BIGSERIAL" based on the requested dialect and use those in the create definition above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper functions could be provided through an interface so that each backend can expose it's own keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will allow us to write migration *.sql
files with the DDL statements in one dialect and then translate them at migration time into other dialects. That avoids needing to stitch together the DDL statements in the code, which I think is nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want *.sql
files that need to be packaged into the lnd binary?
For other projects I switched to a different migration lib that can also use variables that contain sql code (#6176 (comment)). In that case, you could still use those helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that works very well with the virtual file system added in a recent Go version. I saw your comment about the migration library. We didn't run into the described problem yet, but a thing to keep an eye on in the upcoming invoice database migration PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean you didn't run into it yet with golang-migrate/migrate
. It is very easy to trigger it. Just add a panic somewhere in the migration. Then you need the cli tool to clear the dirty flag again.
Hi @joostjager , thanks for your review!
the reason for the split is because this PR makes changes to the I will take a look at the rest of your comments some time tomorrow 👍 |
kvdb/sqlite/db.go
Outdated
// sqliteDB contains two sqlite connections, one is optimised for reads as it | ||
// uses the default BEGIN DEFERRED directive for starting a db transaction and | ||
// the other is optimised for writes as it uses the BEGIN IMMEDIATE directive | ||
// for starting a db transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reading up on sqlite transactions on https://www.sqlite.org/lang_transaction.html and saw this:
While a read transaction is active, any changes to the database that are implemented by separate database connections will not be seen by the database connection that started the read transaction. If database connection X is holding a read transaction, it is possible that some other database connection Y might change the content of the database while X's transaction is still open, however X will not be able to see those changes until after the transaction ends. While its read transaction is active, X will continue to see an historic snapshot of the database prior to the changes implemented by Y.
To what extent does this apply to your setup with two connections? It seems different from bbolt where you're never reading an historic snapshot afaik?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question re how connections work under the hood. I wish both sqlite and bbolt used more formal terminology here such as "snapshot isolation", or "linearizability", etc. AFACIT, this snapshot behavior seems to be identical to how bolt functions: read transactions can happen concurrently, they operate based on a snapshot of the database (the B+Tree is cloned from leaf up to root), write transactions can happen while read transactions are active.
Bolt allows only one read-write transaction at a time but allows as many read-only transactions as you want at a time. Each transaction has a consistent view of the data as it existed when the transaction started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yes that sounds mostly identical. What I don't think happens in bbolt is that read and write transactions actually run concurrently? So in bbolt the write only starts after all the reads have finished. For sqlite, it does sound a bit as if they can run in parallel, with the read txes having their snapshot. In lnd, there are sometimes non-database things happening inside a db tx block. Not sure if this may lead to differences in behavior with bbolt vs sqlite, given the potentially parallel transaction execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That I don't think happens in bbolt is that read and write transactions actually run concurrently?
My understanding is that there can only be a single writer at a time (there's a write mutex in the main db struct). Read transactions grab the read side of a RWMutex (for the mmap), while write transactions grab a write mutex (distinct from the mmap mutex, which needs to be obtained to update the memory map to commit). The read transaction then operates on a snapshot of the DB at that time. Readers also don't block other readers. If a write transaction is created, then readers can still be created (they copy the B+Tre root with the existing pointers). All other write transactions are blocked until that write finishes.
An example trace through bolt DB transactions:
- For read transactions, a RWMutex is held for the mmap
- Root bucket gets copied, which effectively copies the B+Tree root along w/ the other pointers (for both)
- For write transactions, the write lock is obtained
So with the above flow, a writer can start (grab the write block) and then get a snapshot copy, while several other readers also start and obtain a snapshot copy.
. For sqlite, it does sound a bit as if they can run in parallel, with the read txes having their snapshot.
So this is basically the goal of the WAL: readers can look at the WAL, then start a transaction and ensure they apply the changes in the WAL. Writers just append to the end of the WAL. There's an index that has an exclusive lock on it, which needs to be consulted so both sides know what the latest state of the WAL is.
My interpretation of the "database connection" terminology in the sqlite
docs is the object return from the sqlite.Open
call in C. sqlite
has a few diff flags that can be passed to open. Our library defaults to SQLITE_OPEN_FULLMUTEX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the clearest set of docs from sqlite
about their isolation model: https://www.sqlite.org/isolation.html. The WAL docs explain the read/write interaction (section 2.2): https://www.sqlite.org/wal.html.
If the same database is being read and written using two different database connections (two different sqlite3 objects returned by separate calls to sqlite3_open()) and the two database connections do not have a shared cache, then the reader is only able to see complete committed transactions from the writer. Partial changes by the writer that have not been committed are invisible to the reader. This is true regardless of whether the two database connections are in the same thread, in different threads of the same process, or in different processes. This is the usual and expected behavior for SQL database systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all these pointers. It indeed seems that the original concern that I had is ungrounded.
kvdb/sqlite/db.go
Outdated
|
||
// sqliteDB contains two sqlite connections, one is optimised for reads as it | ||
// uses the default BEGIN DEFERRED directive for starting a db transaction and | ||
// the other is optimised for writes as it uses the BEGIN IMMEDIATE directive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps explain why BEGIN IMMEDIATE optimizes for writes?
I saw #7252 (comment) and that DEFERRED creates busy errors. It seems that IMMEDIATE makes this less likely to happen, but is it true that it still can happen when a write tx runs for longer than the busy timeout while another write is waiting for that duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So immediate will immediately grab the exclusive lock with means no other transactions can start. Without this, it's possible that with deferred, a transaction starts as a read, then needs to upgrade to a write and gets locked out, which triggers the busy timeout. With this we're trying to mark all write transactions as such early, so it doesn't need to upgrade at all.
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
It's still possible for this to return sqlite_busy
though if another write is started on another db connection:
IMMEDIATE cause the database connection to start a new write immediately, without waiting for a write statement. The BEGIN IMMEDIATE might fail with SQLITE_BUSY if another write transaction is already active on another database connection.
As is, we have a distinct DB connection for writes, so this doesn't appear to apply.
If it's possible instead have the library let us set the txn type on the single connection level, then we'd just switch over to that, and we wouldn't need to worry about having multiple db connections.
(also note that this is part of the other PR, we need to merge that and tag in order to move forward)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible that with deferred, a transaction starts as a read, then needs to upgrade to a write and gets locked out, which triggers the busy timeout.
Thinking about this more - if there is one dedicated connection for reads and one for writes ("As is, we have a distinct DB connection for writes, so this doesn't appear to apply."), how can a write get locked out really? That one write connection can only handle one write tx, so there can never be another one? I am probably still missing a part of the model, perhaps related to the threading model that sqlite runs in?
(also note that this is part of the other PR, we need to merge that and tag in order to move forward)
I thought the code is introduced in this PR, so I comment on it here?
Btw, by my original comment starting with 'perhaps explain', I meant capturing this rationale in the code comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more - if there is one dedicated connection for reads and one for writes ("As is, we have a distinct DB connection for writes, so this doesn't appear to apply."), how can a write get locked out really?
The reason we added the two connections, was so we can use BEGIN IMMEDIATE
via a round about way. We're trying to allow the following situation to be possible:
If X starts a transaction that will initially only read but X knows it will eventually want to write and does not want to be troubled with possible SQLITE_BUSY_SNAPSHOT errors that arise because another connection jumped ahead of it in line, then X can issue BEGIN IMMEDIATE to start its transaction instead of just an ordinary BEGIN. The BEGIN IMMEDIATE command goes ahead and starts a write transaction, and thus blocks all other writers. If the BEGIN IMMEDIATE operation succeeds, then no subsequent operations in that transaction will ever fail with an SQLITE_BUSY error.
A write can get locked out simply because another writer is active. A reader can get kicked out (SQLITE_BUSY
) if it starts a read, then attempts an update while another write transaction exists concurrently that's also updating.
That one write connection can only handle one write tx, so there can never be another one?
Yeah there can only be a single write transaction active for a connection, where "connection" is defined as the object returned from sqlite.Open
. While that is active, the others will block iiuc.
Also I got confused myself w.r.t which PR I was commenting on 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusion level isn't going down with these threads, but I'll try to not miss anything. Maybe a single PR just for review and then splitting up right before merge to get the tagging right wouldn't have been a bad option either.
The reason we added the two connections, was so we can use BEGIN IMMEDIATE via a round about way.
I found this, but not sure if it is a good idea. Maybe it can realize tx-level IMMEDIATE: mattn/go-sqlite3#400 (comment)
It's still possible for this to return sqlite_busy though if another write is started on another db connection
As is, we have a distinct DB connection for writes, so this doesn't appear to apply.
That single write connection is used by all kinds of goroutines in lnd. So I do think this still applies. I ran a quick test with multiple goroutines writing to the same sqlite connection, and if one of those is slow, the others will happily return sqlite_busy
.
Also as you mentioned in #7252 (comment), it isn't really a single connection, because database/sql
is still making a pool internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review and then splitting up right before merge to get the tagging right wouldn't have been a bad option either.
the other PR does contain all the commits. So could just comment on there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I am way off here, but I have opened an issue on the modernc.org/sqlite repo to see if they cant set/not set the begin directives based on the passed in driver.TxOptions: https://gitlab.com/cznic/sqlite/-/issues/127
If im on the right track there then that would remove the need for the two conns on our end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other PR does contain all the commits. So could just comment on there?
Yes, probably should have done that from the start. But then I didn't understand how they were related. And now, this is where we are 😄
I think that issue that you opened is a really smart solution to the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that issue that you opened is a really smart solution to the problem.
awesome! I have opened a PR on their repo. Hopefully we can have fast turn around on that side. Does look like they create a new tag after almost every PR they merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MR on the gitlab sqlite repo was merged and a new tag was released 🎉 so I have removed this two-connection work around now
e6f9eab
to
fb49b63
Compare
So re the thread in the other PR, I think we should just go back to |
Even with WAL mode? |
Yes, safe from corruption, but it may still lose data in case of power failure. So not suitable for channel state data. |
Yeah my understanding matches @joostjager's above. Those that want to make a diff tradeoff can set the sync normal field on start up. |
fb49b63
to
5d6c888
Compare
Bump the go version to go1.18 so that we can get rid of the '// +build' comments in the upcoming commits where some of the '//go:build' comments will become too complex to write as '// +build' comments.
In this commit, a mutex is added to guard access to the global dbConns variable in the kvdb/postgres package.
Due to the update of the kvdb package to go1.18, the `// +build` directives are no longer required. They are removed in this commit in order to simplify upcoming commits. Along the way, a few typos are fixed and an unused struct is removed.
In this commit, changes are made to the `kvdb/postgres` package so that all all the non-postgres-specific code is generalised to be applicable for all sql code. A follow up commit will move all the general sql code into its own package.
In this commit, all the sql, non-postgres-specific, code is moved out of the postgres package and into a new sqlbase package. This will make it more easily reusable for future sql integrations.
5d6c888
to
43bc273
Compare
Since the two-connection work around has now been removed here - I think this PR is ok to merge since we can continue playing with the various pragma options in the other PR. This one only hardcodes |
Ok, based on the above merging this (then will tag), since all the param tuning is now in the other PR and we use a single connection here. |
In this PR, the kvdb package is first adapted so that all the non-postgres specific code in the
kvdb/postgres
package is moved into a newkvdb/sqlbase
package. Then a newkvdb/sqlite
package is added which also makes use of thesqlbase
package.Since the
kvdb
package is a module, the code that will make use of the new sqlite code will be done in a follow up PR.