Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSERT Support #505

Open
miramar-labs opened this issue Oct 26, 2022 · 4 comments
Open

UPSERT Support #505

miramar-labs opened this issue Oct 26, 2022 · 4 comments

Comments

@miramar-labs
Copy link

I'm using the JDBC Source connector to bring a MSSQL table into kafka as a topic.
I have two sink connectors configured on that topic .. the JDBC Sink connector (pushing the data back out to an identical table in MSSQL) and the Snowflake Sink Connector (where I'm trying to do the same, but as a Snowflake table).
The JDBC sink seems to handle upserts just fine, in that say the source table contains 3 rows and I update one of them, the destination table contains exactly 3 rows with one updated.
The snowflake sink does not seem to work this way ... updates appear in the destination table as new rows (with the same key but different timestamps).
Q: How can I do proper UPSERT to a Snowflake table? Or is this not supported?

@miramar-labs
Copy link
Author

I have also tried creating a stream on the raw VARIANT snowflake table, with the idea that I could use a task to periodically run a MERGE into a target table, but no matter if I INSERT or UPDATE rows in the source database, the stream always shows them as INSERTS.. so the merge won't work in that case.

@lyuboxa
Copy link

lyuboxa commented Jan 16, 2024

Any plans to support update/upsert functionality in the near future?
I'm curious how that plays with the snow pipe streaming?

@airbots
Copy link

airbots commented Aug 5, 2024

Was this can be resolved by a delete and a insert?

@lyuboxa
Copy link

lyuboxa commented Aug 6, 2024

One of the unanswered questions is around cost effectiveness. A merge requires compute time but so does a snowpipe processing. Thus what will be more effective way to have upserts at reasonable cost:

  1. Stage files, import via snow pipe in a staging table, merge table to live table
  2. Stage files, perform merge from staged file in live table
  3. option?

Additionally if order is to be preserved between create/update on the same record instance (say merge over id), there is some order in which these merges should happen, so more of a stack than a queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants