A library for reading data from and transferring data to Greenplum databases with Apache Spark, for Spark SQL and DataFrames.
This library is 100x faster than Apache Spark's JDBC DataSource while transferring data from Spark to Greenpum databases.
Also, this library is fully transactional .
CREATE TABLE tbl
USING greenplum
options (
url "jdbc:postgresql://greenplum:5432/",
delimiter "\t",
dbschema "gptest",
dbtable "store_sales",
user 'gptest',
password 'test')
AS
SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;
CREATE TEMPORARY TABLE tbl
USING greenplum
options (
url "jdbc:postgresql://greenplum:5432/",
delimiter "\t",
dbschema "gptest",
dbtable "store_sales",
user 'gptest',
password 'test')
INSERT INTO TABLE tbl SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;
Please refer to Spark SQL Guide - JDBC To Other Databases to learn more about the similar usage.