Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming queries? #1775

Closed
spease opened this issue Jul 5, 2018 · 7 comments
Closed

Streaming queries? #1775

spease opened this issue Jul 5, 2018 · 7 comments

Comments

@spease
Copy link

spease commented Jul 5, 2018

Is there a way to process rows as they're received, ie a way to get an iterator for a query rather than a vector? Or is this technically impossible due to how the lowlevel works? Thanks.

@weiznich
Copy link
Member

weiznich commented Jul 5, 2018

It is currently not possible. Maybe a later version of diesel may add support for this feature.
I will close this issue now, because there is nothing actionable for the diesel team in here.
(To implement this impl Trait on trait functions needs to be supported in rust)

@weiznich weiznich closed this as completed Jul 5, 2018
@perzanko
Copy link

perzanko commented Apr 2, 2019

Has something changed in this area?

@ruffsl
Copy link

ruffsl commented Oct 8, 2019

As I'm new to both Rust and SQL, are there any patterns to efficiently iterate over large SQL tables to post process or update rows in diesel? I'm thinking streaming queries would serve this purpose, but haven't used databases enough to know what are all the kind of design patterns that exist for this task.

Before diesel, I was using query_map in rusqlite, though I think it still loads the entire vec into memory.
https://docs.rs/rusqlite/0.20.0/rusqlite/struct.Statement.html#method.query_map

(To implement this impl Trait on trait functions needs to be supported in rust)

@weiznich, could you please link to any open tickets or Rust design PRs so I can read more on this?

@weiznich
Copy link
Member

weiznich commented Oct 8, 2019

Iterating again over the whole result set is much cheaper than loading data from the database, so adding a specific functionality to map results there is meaningless in my opinion.

That said for update statements it is not required to load data from the database, modify it and write it back, you could those queries fully on sql side. Basically something like diesel::update(users::table).set(users::follower_count.eq(users::follower_count + 1)).execute(&conn)? will result in the following sql: UPDATE users SET follower_count = follower_count + 1;

Streaming queries are blocked on a async interface. So basically #1399

@ruffsl
Copy link

ruffsl commented Oct 8, 2019

Streaming queries are blocked on a async interface. So basically #1399

Hmm, I could see how the async interface could improve things, but I don't understand the relation to the ticket for changing .gitkeep to .keep. Did you mistype the issue number there?

Iterating again over the whole result set is much cheaper than loading data from the database, so adding a specific functionality to map results there is meaningless in my opinion.

I was more concerned with RAM memory than disk latency, as I my use case may encounter .db3 files with tables or queries of that could easily exceed that of the total RAM available on the machine. When working with datasets around 10GB to 50GB, I'd like to avoid loading everything into memory at once, given the pipeline is sequential; i.e. more or a less computing/updating a sha256 hash chain rows.

@weiznich
Copy link
Member

weiznich commented Oct 8, 2019

Hmm, I could see how the async interface could improve things, but I don't understand the relation to the ticket for changing .gitkeep to .keep. Did you mistype the issue number there?

Yes I've mistyped the number, should be #399.

I was more concerned with RAM memory than disk latency, as I my use case may encounter .db3 files with tables or queries of that could easily exceed that of the total RAM available on the machine. When working with datasets around 10GB to 50GB, I'd like to avoid loading everything into memory at once, given the pipeline is sequential; i.e. more or a less computing/updating a sha256 hash chain rows.

If you are using Sqlite it is even simpler than I've suggested above. You can just register a random rust function as sql_function! there. Sqlite will then call your provided function if you execute the corresponding sql.
So you would do something like update your_table set column = calculate_sha256(column) where calculate_sha256 is your provided function. You don't need to load anything than, sqlite will handle the rest for you.

@ruffsl
Copy link

ruffsl commented Oct 8, 2019

Thanks @weiznich ! I don't want to derail this issue, so I'll continue iterating over at dledr/bbr_ros2#9
Feel free to drop a suggestion on an SQL query or the terminology to describe lagging over the rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants