How to use Postgres MERGE Command with Diesel? #4323
-
Hi, I have a metadata store in Postgres, which I fully manage with Diesel.So far its been great, but I am working currently towards adding a continuous update mechanism that runs every 12 hours. By observation, the change set is rather small with a small number of updates i.e. adding missing fields or adding some new metadata, but overall nothing too crazy and I estimate about 90% of the existing data remain unaffected by each update; Unfortunately, the metadata source is external and does not provide any CDC that means, I don't know which of the incoming data have changed, were added, or removed. Therefore, I am left with three options:
Merge was added with PG 15 and allows you to take data that are equal in shape as the target data, merge it with a table and then updating, inserting or deleting values in that table depending on some rules. It is exactly what I need to update my metadata. I did not see any direct support for Merge in Diesel, so I suppose I have two options. A) Create a temp table with the new metadata, merge temp table into actual table, and then just delete temp table. This looks sensible to me as I can verify that the new metadata have been correctly insert (just count) and then verify that the count of the updated target table matches the count of the temp table before deleting the temp table. Also, if something goes really wrong the hard way i.e power outage, I can still resort to basically deleting everything in the target table and insert the new metadata directly from the temp table. This should never ever happen, but it is always good to have contingency plan on file. B) Generate a giant parametrized raw query and send it to Postgres. A raw query is quite problematic since some tables have tens of thousands of metadata and if the raw query got stuck or encounters an error during execution, I end up wrapping everything into transactions of which 99% aren't exactly needed since there is zero chance of concurrent writes. This is because the metadata store is read-only by default and only the import/update agent is allowed to write. Not sure what the best way forward is, but on paper Postgres Merge looks great to me. Any words of wisdom anyone can share? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
From what you write about your problem using a As for In addition to the solutions you already listed it's also possible to build your own DSL extension outside of diesel. In contrast to upstreaming the implementation to diesel this would have the advantage that you only need to implement that parts of the SQL that you really need and that you can play fast and lose on the type constraints there. That would likely be a bit safer than having a "giant parametrized raw query". |
Beta Was this translation helpful? Give feedback.
-
Thank you, that was super helpful. I build an initial version of the update process, Basically, I implemented all the Diesel CRUD ops over types in one crate i.e. pg_metadata and then wrote a manager with a connection pool in another crate i.e. pg_metadata_manager. The thing is, those manager crates are embedded in my microservices thus the connection pooling. Also, as a nice byproduct, the bulk of the API signature is immutable since you can get a mutable reference from a pooled connection, but not so easily from a singular PGConnection. Testing the low level implementation in pg_metadata is trivial thanks to the dangling transaction pattern. I really can't stress enough how easy the generated Diesel DSL is to test. However, testing the entire update process that uses the pg_metadata_manager has turned out to be tricky. The core problem really is that I can't figure out how to inject a single test connection with a test transaction that It is not quite working yet. The regular connection pooling is woking fine, but the test connection not yet. Any idea how to make pooled connections testable? Here is what I did so far: pub struct PostgresMetaDataManager {
pool: Pool<ConnectionManager<PgConnection>>,
test: bool,
test_conn: Option<PooledConnection<ConnectionManager<PgConnection>>>,
}
// Normal constructors building a connection pool
pub async fn new(url: &str) -> Result<Self, PostgresDBError> {
Self::build(false, url).await
}
// Testing constructor
pub async fn with_test_transaction(url: &str) -> Result<Self, PostgresDBError> {
Self::build(true, url).await
}
async fn build(
test: bool,
url: &str,
) -> Result<Self, PostgresDBError> {
// create a new connection pool
let pool = match Pool::builder()
.test_on_check_out(true)
.max_size(10)
.build(ConnectionManager::<PgConnection>::new(url))
{
Ok(pool) => pool,
Err(e) => {
return Err(PostgresDBError::ConnectionFailed(e.to_string()));
}
};
/// However, if we have a test case, create a new connection and begin a test transaction.
let test_conn = if test {
let mut conn = pool.get().expect("Failed to get connection from pool");
conn.begin_test_transaction()
.expect("[PostgresMDDBManager]: Failed to begin test transaction");
Some(conn)
} else {
// If non-test case, return None
None
};
Ok(Self { pool, test, test_conn) })
}
impl PostgresMetaDataManager {
pub(crate) fn get_connection(&self) -> impl Connection {
if self.test {
// This doesn't work b/c self cannot move
self.test_conn.unwrap();
// This doesn't work either b/c PooledConnection does not implement clone
self.test_conn.unwrap().clone();
} else {
// In a normal context, just return a connection from the pool
self.pool.get().expect("Failed to get connection from pool")
}
}
} |
Beta Was this translation helpful? Give feedback.
For tests with a pool you want to do two things:
Connection::begin_test_transaction
on checkout.The diesel master branch provides an test customizer implementation for this: https://docs.diesel.rs/master/diesel/r2d2/struct.TestCustomizer.html
That's not released yet, but you likely can just copy the implementation and look onto the implementation to understand how it works