Releases: snowflakedb/spark-snowflake
Releases · snowflakedb/spark-snowflake
v3.1.0
Improvements
- Upgraded JDBC to
3.19.0
. - Changed the internal file format from Json to Parquet when loading structured data.
- Introduced a new parameter
use_json_in_structured_data
, which default tofalse
. Once enabled, this change will be revoked.
- Introduced a new parameter
New Features
- Supported Parquet file format when loading data from Spark to Snowflake.
- Introduced a new parameter
use_parquet_in_write
, which default tofalse
. When enabled, Spark connector will only use Parquet file format when loading data from Spark to Snowflake. - Introduced a new dependency
parquet-avro
. The default version is1.13.1
. Since its dependency,parquet-column
, is a Spark built-in lib, an incompatible issue may be occurred during runtime. Please manually adjust the version ofparquet-avro
to fix this issue.
- Introduced a new parameter
v3.0.0
- Improvements
- Upgraded JDBC to 3.17.0 to Support LOB
- Supports Spark 3.5.0
- Removed the Advanced Query Pushdown feature
- Since version
3.0.0
, Spark connector will only have one artifact in each release, which will be compatible with most Spark versions. - The old version of Spark connector (2.x.x) will continue to be supported up to 2 years.
- A conversion tool which can convert DataFrames between Spark and Snowpark will be introduced in the future Spark connector release soon. It will be an alternative of Advanced Query Pushdown feature.
- Since version
- Bug Fixes
- Remove the requirement of
SFUSER
parameter when using OAUTH
- Remove the requirement of
Release Spark Connector 2.16.0
Bug Fixes
- Fix proxy protocol accidentally impacts s3 protocol.
Improvements
- Upgrade JDBC to 3.16.1
- Clean up legacy spark streaming code
- Disable
abort_detached_query
at Session level by default
Release Spark Connector 2.15.0
Bug Fixes
- Fixed "cancelled queries can be restarted in the Spark retries after application closed"
New Features
- Introduce a new parameter
trim_space
which defaults tofalse
. When turn it on, Spark connector will automatically trim values ofStringType
columns when saving to Snowflake table.
Release Spark Connector 2.14.0
Improvement
- Upgraded JDBC to
3.14.4
New Features
- New parameter
string_timestamp_format
- specifies the timestamp format when saving from string columns of Spark Dataframe to timestamp columns of Snowflake table
- the default value is
TZHTZM YYYY-MM-DD HH24:MI:SS.FF9
- the detail of timestamp format can be found from here
- If the source Dataframe contains timestamp columns, this parameter will be reset to the default value and can't be overwritten.
Release Spark Connector 2.13.0
Bug Fixes
- Can't upload binary data from Spark to Snowflake if the source Dataframe contains structure columns.
Release Spark Connector 2.12.0
Support Spark 3.4:
- Added support for Spark 3.4
NOTE:
- Starting from version 2.12.0, the Snowflake Connector for Spark supports Spark 3.2, 3.3 and 3.4.
Version 2.12.0 of the Snowflake Connector for Spark does not support Spark 3.1. Note that previous versions of the connector continue to support Spark 3.1.
Release Spark Connector 2.11.3
Updated the mechanism for writing DataFrames to accounts on GCP:
- Updated the mechanism for writing DataFrames to accounts on GCP. After December 2023, previous versions of the Spark Connector will no longer be able to write DataFrames, due to changes in GCP.
- Added the option to disable preactions and postactions validation for session sharing.
- To disable validation, set the option FORCE_SKIP_PRE_POST_ACTION_CHECK_FOR_SHARED_SESSION to true. The default is false.
- Important: Before setting this option, make sure that the queries in preactions and postactions don't affect the session settings. Otherwise, you may encounter issues with results.
- Fixed an issue when performing a join or union across different schemas when the two DataFrames are accessing tables with different sfSchema and the same name table in sfSchema is in the left DataFrame.
- Updated the connector to use the Snowflake JDBC driver 3.13.30.
Release Spark Connector 2.11.2
Added support sharing JDBC connection:
-
Added support for using the same JDBC connection for different jobs and actions when the same Spark Connector options are used to access Snowflake.
In previous versions, the Spark Connector created a new JDBC connector for each job or action.
The Spark Connector supports the following options and API methods for enabling and disabling this feature:- To specify that the connector should not use the same JDBC connection, set the support_share_connection connector option to false. (The default value is true, which means that the feature is enabled.)
- To enable or disable the feature programmatically, call one of the following global static functions: SparkConnectorContext.disableSharedConnection() / SparkConnectorContext.enableSharingJDBCConnection().
- Note: In the following special cases, the Spark Connector will not use the shared connection:
- If preactions or postactions are set, and those preactions or postactions are not CREATE TABLE, DROP TABLE, or MERGE INTO, the Spark Connector will not use the shared connection.
- Utility functions in Utils such as Utils.runQuery(), Utils.getJDBCConnection() will not use the shared connection.
-
Updated the connector to use the Snowflake JDBC driver 3.13.29.
Release Spark Connector 2.11.1
Added support for AWS VPCE deployments and fixed some bugs:
- Support AWS VPCE. Added the configuration option S3_STAGE_VPCE_DNS_NAME for specifying the VPCE DNS name at the session level.
- Updated the connector to close JDBC connections to avoid connection leakage.
- Fixed a NullPointerException issue when sending telemetry messages.
- Added a new configuration option treat_decimal_as_long to enable the Spark Connector to return Long values instead of BigDecimal values, if the query returns Decimal(<any_precision>, 0). WARNING: If the value is greater than the maximum value of Long, an error will be raised.
- Added a new option proxy_protocol for specifying the proxy protocol (http or https) with AWS deployments. (The option has no effect on Azure and GCP deployments.)
- Added support for counting rows in a table where the row count is greater than the maximum value of Integer.
- Updated the connector to use the Snowflake JDBC driver 3.13.24.