Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added split by line to DocumentSplitter #8525

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

srini047
Copy link
Contributor

@srini047 srini047 commented Nov 8, 2024

Related Issues

fixes #8519

Proposed Changes:

The new split_by value: line, has been introduced in the DocumentSplitter to enable users to split documents based on the \n character. This comes handy in multiple conditions like splitting CSV, etc.

@srini047 srini047 requested review from a team as code owners November 8, 2024 15:33
@srini047 srini047 requested review from dfokina and Amnah199 and removed request for a team November 8, 2024 15:33
@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Nov 8, 2024
@coveralls
Copy link
Collaborator

coveralls commented Nov 8, 2024

Pull Request Test Coverage Report for Build 11802880844

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 90.18%

Files with Coverage Reduction New Missed Lines %
components/preprocessors/document_splitter.py 1 98.35%
Totals Coverage Status
Change from base Build 11660533312: 0.04%
Covered Lines: 7778
Relevant Lines: 8625

💛 - Coveralls

@srini047
Copy link
Contributor Author

@sjrl Can you review the PR.

@sjrl
Copy link
Contributor

sjrl commented Nov 12, 2024

Left one comment, otherwise this looks good to me! But let's have @Amnah199 do the final review :)

Co-authored-by: Sebastian Husch Lee <[email protected]>
@silvanocerza silvanocerza merged commit a045c0e into deepset-ai:main Nov 14, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support splitting of CSV documents
4 participants