Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARKNLP-1059] Adding aggressiveMatching parameter to DocumentSimilarityRanker #14370

Conversation

danilojsl
Copy link
Contributor

@danilojsl danilojsl commented Aug 16, 2024

Description

This pull request introduces the aggregationMethod parameter to the DocumentSimilarityRanker annotator. The new parameter allows users to specify the method used to aggregate multiple sentence embeddings into a single vector representation.

Motivation and Context

Allows users to tailor the aggregation method to their specific use case, whether they need a general overview (AVERAGE), focus on the initial context (FIRST), or emphasize the strongest signals (MAX).

This change solves the following issues:

How Has This Been Tested?

Screenshots (if appropriate):

  • Local Tests
  • Google Colab notebook

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@danilojsl danilojsl added bug-fix DON'T MERGE Do not merge this PR labels Aug 16, 2024
@danilojsl danilojsl force-pushed the bug/SPARKNLP-1059-DocumentSimilarityRanker-columns-issue branch from 3e53d4d to d20c4de Compare August 16, 2024 21:47
@danilojsl danilojsl force-pushed the bug/SPARKNLP-1059-DocumentSimilarityRanker-columns-issue branch from d20c4de to 81fcbea Compare August 16, 2024 21:49
@maziyarpanahi maziyarpanahi changed the base branch from master to release/542-release-candidate August 28, 2024 09:53
@maziyarpanahi maziyarpanahi merged commit 3a05001 into release/542-release-candidate Aug 28, 2024
6 checks passed
maziyarpanahi added a commit that referenced this pull request Aug 28, 2024
* Adding demo notebook for Image Classification Annotators (#14360)

* Upload SwinForImageClassification.ipynb

* Uploading ConvNextForImageClassification

* [SPARKNLP-1058] Adding aggressiveMatching parameter (#14365)

* [SPARKNLP-1059] Adding aggressiveMatching parameter to DocumentSimilarityRanker (#14370)

* [SPARKNLP-1059] Adding aggressiveMatching parameter to DocumentSimilarityRanker

* [SPARKNLP-1059] Updates Document Similarity Ranker notebook

* Bump to 5.4.2 [run doc]

* Update Scala and Python APIs

* update conda to 5.4.2 [skip test]

---------

Co-authored-by: Abdullah mubeen <[email protected]>
Co-authored-by: Danilo Burbano <[email protected]>
Co-authored-by: github-actions <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix DON'T MERGE Do not merge this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants