Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SQL functions in Rust for checkpointing in SQLite bagfile databases #9

Open
ruffsl opened this issue Oct 8, 2019 · 3 comments
Open

Comments

@ruffsl
Copy link
Member

ruffsl commented Oct 8, 2019

Diesel, the Rust crate being migrated to for interfacing with recording in SQLite .db3, enables the use of custom sql_functions which can also be implemented in rust:
http://docs.diesel.rs/diesel/macro.sql_function.html#use-with-sqlite

I'm still unsure how to formulate a SQL query that could loop over all rows given a topic_id (A vs B) in the messages (1,2,...,N) table such that it could replicate a hash chain as follows in (9) and (10):

image

An example of specifying a .filter() query using the topic_id as a criteria is as follows:

use crate::schema::messages::dsl::topic_id;
let message_results = messages::table
.filter(topic_id.eq(topic_result.id))
.load::<message::Message>(&conn)
.expect("Error loading messages");

Would it be possible to replicate something like a LAG function (guessing here?) over filter query and feed that into a custom function in SQL sequentially? Is this even possible in SQLite?

https://www.sqlitetutorial.net/sqlite-window-functions/sqlite-lag/

CREATE TABLE messages(
id INTEGER PRIMARY KEY NOT NULL,
topic_id INTEGER NOT NULL,
timestamp INTEGER NOT NULL,
data BLOB NOT NULL,
bbr_digest BLOB);

Relates to: diesel-rs/diesel#1775 , cc @weiznich

@weiznich
Copy link

weiznich commented Oct 8, 2019

So if I understand the calculations above correctly each step needs the result of the previous one, right? I should be possible to do that as custom aggregate function in sql, but diesel does currently not provide a interface for custom aggregate functions with the sqlite backend. (It is possible to implement that and it shouldn't be to hard to implement that, If you are interested in using that approach I can provide a few pointers what needs to be done exactly)

That said: The easiest solution here is to just load data batch wise, do the calculation and after that load and process the next batch.

@ruffsl
Copy link
Member Author

ruffsl commented Oct 8, 2019

each step needs the result of the previous one, right?

Correct, digest bytes from the previous message (of the same topic) is used in the current row's hmac.

should be possible to do that as custom aggregate function in sql

Ah, this is termed as a "custom aggregate function" in SQL... TDIL!
Is this what you are referring to with respect to SQLite?:

If you are interested in using that approach I can provide a few pointers what needs to be done

Certainly, I'm still a new rustacean, but I could look into it. Perhaps you'd like to write those pointers and todo items under a new ticket back over at diesel-rs/diesel to proceed further.

That said: The easiest solution here is to just load data batch wise, do the calculation and after that load and process the next batch.

Thats seem like a reasonable workaround for now. Would you suggest to add a partial struct with only the id and topic_id fields to make a filter query and load all the array of indexes to iterate/stride over, or are there other preferred methods for loading? From the FormStruct pattern for partial updates, that would seem to mimic this pattern.

#[derive(Debug, diesel::Queryable)]
pub struct Message {
pub id: i64,
pub topic_id: i64,
pub timestamp: i64,
pub data: Vec<u8>,
pub bbr_digest: Option<Vec<u8>>,
}

@weiznich
Copy link

weiznich commented Oct 9, 2019

Ah, this is termed as a "custom aggregate function" in SQL... TDIL!
Is this what you are referring to with respect to SQLite?:

I refer to this functionality.

Certainly, I'm still a new rustacean, but I could look into it. Perhaps you'd like to write those pointers and todo items under a new ticket back over at diesel-rs/diesel to proceed further.

I've opened a issue for this: diesel-rs/diesel#2191 If you try to implement this and hit a problem just ping me there or in our gitter channel.

Thats seem like a reasonable workaround for now. Would you suggest to add a partial struct with only the id and topic_id fields to make a filter query and load all the array of indexes to iterate/stride over, or are there other preferred methods for loading? From the FormStruct pattern for partial updates, that would seem to mimic this pattern.

In general, you don't need a struct to load values. You could just use tuples there. With big amounts of data it is also preferable to select only those fields that are really required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants