Streaming queries? #1775

spease · 2018-07-05T19:46:32Z

Is there a way to process rows as they're received, ie a way to get an iterator for a query rather than a vector? Or is this technically impossible due to how the lowlevel works? Thanks.

weiznich · 2018-07-05T20:14:44Z

It is currently not possible. Maybe a later version of diesel may add support for this feature.
I will close this issue now, because there is nothing actionable for the diesel team in here.
(To implement this impl Trait on trait functions needs to be supported in rust)

perzanko · 2019-04-02T19:14:18Z

Has something changed in this area?

ruffsl · 2019-10-08T02:15:50Z

As I'm new to both Rust and SQL, are there any patterns to efficiently iterate over large SQL tables to post process or update rows in diesel? I'm thinking streaming queries would serve this purpose, but haven't used databases enough to know what are all the kind of design patterns that exist for this task.

Before diesel, I was using query_map in rusqlite, though I think it still loads the entire vec into memory.
https://docs.rs/rusqlite/0.20.0/rusqlite/struct.Statement.html#method.query_map

(To implement this impl Trait on trait functions needs to be supported in rust)

@weiznich, could you please link to any open tickets or Rust design PRs so I can read more on this?

weiznich · 2019-10-08T08:53:19Z

Iterating again over the whole result set is much cheaper than loading data from the database, so adding a specific functionality to map results there is meaningless in my opinion.

That said for update statements it is not required to load data from the database, modify it and write it back, you could those queries fully on sql side. Basically something like diesel::update(users::table).set(users::follower_count.eq(users::follower_count + 1)).execute(&conn)? will result in the following sql: UPDATE users SET follower_count = follower_count + 1;

Streaming queries are blocked on a async interface. So basically #1399

ruffsl · 2019-10-08T09:30:46Z

Streaming queries are blocked on a async interface. So basically #1399

Hmm, I could see how the async interface could improve things, but I don't understand the relation to the ticket for changing .gitkeep to .keep. Did you mistype the issue number there?

Iterating again over the whole result set is much cheaper than loading data from the database, so adding a specific functionality to map results there is meaningless in my opinion.

I was more concerned with RAM memory than disk latency, as I my use case may encounter .db3 files with tables or queries of that could easily exceed that of the total RAM available on the machine. When working with datasets around 10GB to 50GB, I'd like to avoid loading everything into memory at once, given the pipeline is sequential; i.e. more or a less computing/updating a sha256 hash chain rows.

weiznich · 2019-10-08T09:42:27Z

Hmm, I could see how the async interface could improve things, but I don't understand the relation to the ticket for changing .gitkeep to .keep. Did you mistype the issue number there?

Yes I've mistyped the number, should be #399.

I was more concerned with RAM memory than disk latency, as I my use case may encounter .db3 files with tables or queries of that could easily exceed that of the total RAM available on the machine. When working with datasets around 10GB to 50GB, I'd like to avoid loading everything into memory at once, given the pipeline is sequential; i.e. more or a less computing/updating a sha256 hash chain rows.

If you are using Sqlite it is even simpler than I've suggested above. You can just register a random rust function as sql_function! there. Sqlite will then call your provided function if you execute the corresponding sql.
So you would do something like update your_table set column = calculate_sha256(column) where calculate_sha256 is your provided function. You don't need to load anything than, sqlite will handle the rest for you.

ruffsl · 2019-10-08T10:49:35Z

Thanks @weiznich ! I don't want to derail this issue, so I'll continue iterating over at dledr/bbr_ros2#9
Feel free to drop a suggestion on an SQL query or the terminology to describe lagging over the rows.

weiznich closed this as completed Jul 5, 2018

ruffsl mentioned this issue Oct 8, 2019

Use SQL functions in Rust for checkpointing in SQLite bagfile databases dledr/bbr_ros2#9

Open

spease mentioned this issue Jul 7, 2020

Installing diesel_cli on Windows : some assembly required #487

Closed

This was referenced Aug 9, 2021

PoC - Absorbing views and params into Nexus oxidecomputer/omicron#191

Closed

Database queries with Diesel oxidecomputer/omicron#192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming queries? #1775

Streaming queries? #1775

spease commented Jul 5, 2018

weiznich commented Jul 5, 2018

perzanko commented Apr 2, 2019

ruffsl commented Oct 8, 2019

weiznich commented Oct 8, 2019

ruffsl commented Oct 8, 2019

weiznich commented Oct 8, 2019

ruffsl commented Oct 8, 2019

Streaming queries? #1775

Streaming queries? #1775

Comments

spease commented Jul 5, 2018

weiznich commented Jul 5, 2018

perzanko commented Apr 2, 2019

ruffsl commented Oct 8, 2019

weiznich commented Oct 8, 2019

ruffsl commented Oct 8, 2019

weiznich commented Oct 8, 2019

ruffsl commented Oct 8, 2019