Skip to content

Releases: snowflakedb/spark-snowflake

v3.1.0

19 Nov 00:23
3c85815
Compare
Choose a tag to compare

Improvements

  • Upgraded JDBC to 3.19.0.
  • Changed the internal file format from Json to Parquet when loading structured data.
    • Introduced a new parameter use_json_in_structured_data, which default to false. Once enabled, this change will be revoked.

New Features

  • Supported Parquet file format when loading data from Spark to Snowflake.
    • Introduced a new parameter use_parquet_in_write, which default to false. When enabled, Spark connector will only use Parquet file format when loading data from Spark to Snowflake.
    • Introduced a new dependency parquet-avro. The default version is 1.13.1. Since its dependency, parquet-column, is a Spark built-in lib, an incompatible issue may be occurred during runtime. Please manually adjust the version of parquet-avro to fix this issue.

v3.0.0

31 Jul 19:31
384dccf
Compare
Choose a tag to compare
  • Improvements
    • Upgraded JDBC to 3.17.0 to Support LOB
    • Supports Spark 3.5.0
    • Removed the Advanced Query Pushdown feature
      • Since version 3.0.0, Spark connector will only have one artifact in each release, which will be compatible with most Spark versions.
      • The old version of Spark connector (2.x.x) will continue to be supported up to 2 years.
      • A conversion tool which can convert DataFrames between Spark and Snowpark will be introduced in the future Spark connector release soon. It will be an alternative of Advanced Query Pushdown feature.
  • Bug Fixes
    • Remove the requirement of SFUSER parameter when using OAUTH

Release Spark Connector 2.16.0

03 Jun 17:10
0468f3e
Compare
Choose a tag to compare

Bug Fixes

  1. Fix proxy protocol accidentally impacts s3 protocol.

Improvements

  1. Upgrade JDBC to 3.16.1
  2. Clean up legacy spark streaming code
  3. Disable abort_detached_query at Session level by default

Release Spark Connector 2.15.0

23 Feb 18:47
3a26f61
Compare
Choose a tag to compare

Bug Fixes

  • Fixed "cancelled queries can be restarted in the Spark retries after application closed"

New Features

  • Introduce a new parameter trim_space which defaults to false. When turn it on, Spark connector will automatically trim values of StringType columns when saving to Snowflake table.

Release Spark Connector 2.14.0

18 Jan 00:04
d7f8e98
Compare
Choose a tag to compare

Improvement

  • Upgraded JDBC to 3.14.4

New Features

  • New parameter string_timestamp_format
    • specifies the timestamp format when saving from string columns of Spark Dataframe to timestamp columns of Snowflake table
    • the default value is TZHTZM YYYY-MM-DD HH24:MI:SS.FF9
    • the detail of timestamp format can be found from here
    • If the source Dataframe contains timestamp columns, this parameter will be reset to the default value and can't be overwritten.

Release Spark Connector 2.13.0

19 Sep 18:10
9428e03
Compare
Choose a tag to compare

Bug Fixes

  • Can't upload binary data from Spark to Snowflake if the source Dataframe contains structure columns.

Release Spark Connector 2.12.0

23 May 20:30
fedb7a3
Compare
Choose a tag to compare

Support Spark 3.4:

  • Added support for Spark 3.4

NOTE:

  • Starting from version 2.12.0, the Snowflake Connector for Spark supports Spark 3.2, 3.3 and 3.4.
    Version 2.12.0 of the Snowflake Connector for Spark does not support Spark 3.1. Note that previous versions of the connector continue to support Spark 3.1.

Release Spark Connector 2.11.3

21 Apr 16:23
Compare
Choose a tag to compare

Updated the mechanism for writing DataFrames to accounts on GCP:

  1. Updated the mechanism for writing DataFrames to accounts on GCP. After December 2023, previous versions of the Spark Connector will no longer be able to write DataFrames, due to changes in GCP.
  2. Added the option to disable preactions and postactions validation for session sharing.
    • To disable validation, set the option FORCE_SKIP_PRE_POST_ACTION_CHECK_FOR_SHARED_SESSION to true. The default is false.
    • Important: Before setting this option, make sure that the queries in preactions and postactions don't affect the session settings. Otherwise, you may encounter issues with results.
  3. Fixed an issue when performing a join or union across different schemas when the two DataFrames are accessing tables with different sfSchema and the same name table in sfSchema is in the left DataFrame.
  4. Updated the connector to use the Snowflake JDBC driver 3.13.30.

Release Spark Connector 2.11.2

20 Mar 22:54
Compare
Choose a tag to compare

Added support sharing JDBC connection:

  1. Added support for using the same JDBC connection for different jobs and actions when the same Spark Connector options are used to access Snowflake.
    In previous versions, the Spark Connector created a new JDBC connector for each job or action.
    The Spark Connector supports the following options and API methods for enabling and disabling this feature:

    • To specify that the connector should not use the same JDBC connection, set the support_share_connection connector option to false. (The default value is true, which means that the feature is enabled.)
    • To enable or disable the feature programmatically, call one of the following global static functions: SparkConnectorContext.disableSharedConnection() / SparkConnectorContext.enableSharingJDBCConnection().
    • Note: In the following special cases, the Spark Connector will not use the shared connection:
      • If preactions or postactions are set, and those preactions or postactions are not CREATE TABLE, DROP TABLE, or MERGE INTO, the Spark Connector will not use the shared connection.
      • Utility functions in Utils such as Utils.runQuery(), Utils.getJDBCConnection() will not use the shared connection.
  2. Updated the connector to use the Snowflake JDBC driver 3.13.29.

Release Spark Connector 2.11.1

13 Dec 21:28
Compare
Choose a tag to compare

Added support for AWS VPCE deployments and fixed some bugs:

  1. Support AWS VPCE. Added the configuration option S3_STAGE_VPCE_DNS_NAME for specifying the VPCE DNS name at the session level.
  2. Updated the connector to close JDBC connections to avoid connection leakage.
  3. Fixed a NullPointerException issue when sending telemetry messages.
  4. Added a new configuration option treat_decimal_as_long to enable the Spark Connector to return Long values instead of BigDecimal values, if the query returns Decimal(<any_precision>, 0). WARNING: If the value is greater than the maximum value of Long, an error will be raised.
  5. Added a new option proxy_protocol for specifying the proxy protocol (http or https) with AWS deployments. (The option has no effect on Azure and GCP deployments.)
  6. Added support for counting rows in a table where the row count is greater than the maximum value of Integer.
  7. Updated the connector to use the Snowflake JDBC driver 3.13.24.