-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDBC bridge + bigquery #154
Comments
You're not crazy :) As long as the datasource has fully functional JDBC driver, you should be able to access that from ClickHouse, via JDBC bridge.
The overhead of JDBC bridge is around 10% - 20% according to my test earlier, and it does not count the initial query for type inferring. However, I found it's acceptable for processing millions of rows or even near realtime monitoring(grafana + distributed query against various databases). In the case of cross-region data sync, it's actually faster than direct connect for some databases because of lz4 compression.
Unfortunately I'd suggest you limiting the usage for below reasons:
|
Thanks for the detailed writeup! Can you link me to some more information on clickhouse-data-service ? Sounds very worth checking out. |
Sorry it does not exist and I only mentioned it in ClickHouse/clickhouse-java#784 :p An alternative and more generic implementation as far as I know of, is trinodb/trino#1839. |
Alright, thanks for your help! I'll leave it up to you whether or not to leave this ticket as documentation or to close it. |
So, I'd love for some official documentation on how to combine this with bigquery's jdbc bridge.
I just tested this... and surprisingly it works. Essentially just dumped all the jar files from the bigquery connector (issue pending to publish it in maven... currently a zip download) into the drivers directory, and set up a source for it.
This for us might be an interesting option. We're looking into some data sync jobs between bigquery and clickhouse, and it would allow us to use dbt materialized views for this for example. Probably wouldn't be optimal performance, but in terms of ease of use for our developers it would be a pretty interesting option.
How .. insane is this? It feels ... wrong. At the same time it did work fairly quickly. Is it worth spending the time to tuning timeouts and such to be able to use this to materialize bigquery data in clickhouse? I'd love to avoid our teams needing spark etc for this.
The text was updated successfully, but these errors were encountered: