Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1055561: Check whether SMT returning null values no longer stops a data ingestion. #816

Merged
merged 8 commits into from
Apr 12, 2024

Conversation

sfc-gh-akowalczyk
Copy link
Contributor

@sfc-gh-akowalczyk sfc-gh-akowalczyk commented Apr 2, 2024

Overview

SNOW-1055561

The scenario being tested is:

  1. Configure a a connector with a SMT extracting a single field.
  2. Produce a few messages, half of them contain the extracted field while the others do not (returns null in that case).
  3. Check whether all the messages were successfully ingested.

Pre-review checklist

  • This change should be part of a Behavior Change Release. See go/behavior-change.
  • This change has passed Merge gate tests
  • Snowpipe Changes
  • Snowpipe Streaming Changes
  • This change is TEST-ONLY
  • This change is README/Javadocs only
  • This change is protected by a config parameter <PARAMETER_NAME> eg snowflake.ingestion.method.
    • Yes - Added end to end and Unit Tests.
    • No - Suggest why it is not param protected
  • Is his change protected by parameter <PARAMETER_NAME> on the server side?
    • The parameter/feature is not yet active in production (partial rollout or PrPr, see Changes for Unreleased Features and Fixes).
    • If there is an issue, it can be safely mitigated by turning the parameter off. This is also verified by a test (See go/ppp).

Copy link
Contributor

@sfc-gh-lshcharbaty sfc-gh-lshcharbaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use a real SnowflakeSinkConnector with a fake streamingClientHandler in tests.
Blocking until then.

Copy link
Contributor

@sfc-gh-lshcharbaty sfc-gh-lshcharbaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress!
Added a few comments to make the things even neater.

import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;

public class SmtIT extends ConnectClusterBaseIT {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we setup logging levels for this test? We are flooded with TRACE right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any particular solution in mind?

Copy link
Collaborator

@sfc-gh-rcheng sfc-gh-rcheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! lgtm, lets get the IT passing

Copy link
Contributor

@sfc-gh-lshcharbaty sfc-gh-lshcharbaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!
As discussed offline, in the future we can consider injecting FakeStreamingClientHandler as a dependency that will allow us to isolate connectors and parallelize test execution.

Comment on lines 65 to 66
connectCluster.kafka().deleteTopic(smtTopic);
connectCluster.deleteConnector(smtConnector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can create a topic and generate a connector name as well as clean them up in @BeforeEach/@AfterEach.

@sfc-gh-xhuang
Copy link
Collaborator

While you're testing SMT, can you also take a look at this PR: #711

and merge as needed?

@sfc-gh-akowalczyk sfc-gh-akowalczyk merged commit 255055f into master Apr 12, 2024
32 of 33 checks passed
@sfc-gh-akowalczyk sfc-gh-akowalczyk deleted the akowalczyk/snow-1055561 branch April 12, 2024 06:08
sudeshwasnik pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request May 30, 2024
sudeshwasnik pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request May 31, 2024
sudeshwasnik added a commit to confluentinc/snowflake-kafka-connector that referenced this pull request May 31, 2024
* SNOW-922795 Added third-party-licenses to .zip distribution (snowflakedb#784)

* PROD-39429 Add PR template (snowflakedb#786)

* Run subset of e2e tests (snowflakedb#788)

* SNOW-979849: Add offset verification logic (snowflakedb#795)

SNOW-979849

Add offset verification logic as part of channel open. We're verifying that the current start offset = previous end offset +1, if not true, then it potentially means missing or duplicate data. Note that there are some false positives that we can't avoid like when SMT is used, but this is good enough to do blast radius analysis for any corruption issue.

* SNOW-1117446 Snowpipe Streaming client provider map (snowflakedb#794)

* NO-SNOW: upgrade JDBC to 3.14.5 (snowflakedb#799)

NO-SNOW

upgrade JDBC to 3.14.5

* SNOW-1049869 Cleanup streaming ingest threads when SinkTask stop() is called (snowflakedb#800)

* [NO-SNOW] Upgrade dependency versions (snowflakedb#802)

* Upgrade connect-api to 3.7.0 (snowflakedb#803)

* SNOW-1057151: Add poc test setup for EmbeddedConnectCluster (snowflakedb#801)

* SNOW-1049869 Rewrite client optimization unit tests (snowflakedb#804)

* Release v2.2.1 (snowflakedb#805)

* SNOW-1055524 E2E test with nullable SMT (snowflakedb#808)

* SNOW-1057165  - Test-doubles for snowflake-ingest-java (snowflakedb#809)

* SNOW-1055524 E2E SMT without schema evolution (snowflakedb#811)

* SNOW-1055524 Check table schema in e2e smt (snowflakedb#813)

* SNOW-1019628 update file cleaner for snowpipe ingest (snowflakedb#807)

* SNOW-1055524 E2E SMT for a Snowpipe based connector (snowflakedb#812)

* NO-SNOW Remove performance tests (snowflakedb#817)

* SNOW-1055561: Check whether SMT returning null values no longer stops a data ingestion. (snowflakedb#816)

* Update to release 2.2.2 (snowflakedb#819)

---------

Co-authored-by: Michał Bobowski <[email protected]>
Co-authored-by: Jay Patel <[email protected]>
Co-authored-by: Toby Zhang <[email protected]>
Co-authored-by: Xin Huang <[email protected]>
Co-authored-by: Wojciech Trefon <[email protected]>
Co-authored-by: revi cheng <[email protected]>
Co-authored-by: Lex Shcharbaty <[email protected]>
Co-authored-by: Artur Chyży <[email protected]>
Co-authored-by: Greg Jachimko <[email protected]>
Co-authored-by: Adrian Kowalczyk <[email protected]>
sudeshwasnik pushed a commit to confluentinc/snowflake-kafka-connector that referenced this pull request Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants