Version 1.0.0 release PR - "Ardent adenine" #10

MatthiasZepper · 2023-07-18T16:05:06Z

This pull request represents an essentially full rewrite of umi-transfer by Johannes Alneberg (@alneberg) and me (@MatthiasZepper).

New and Improved Features:

Code organization: The code base has been split into separate files, with each file representing a subcommand and its associated CLI configuration. This improves clarity and allows for easy integration of additional subcommands and functionalities in the future.
Enhanced CLI options: The CLI arguments have been revamped for improved usability. Previously, specifying the output directly was not possible, hindering the creation of a nf-core Nextflow module. Specifying an output is still optional, but now the output file names are derived from the input file names rather than from a constant base provided as CLI argument. Furthermore, the delimiter used to join the UMIs can be customized now. The --edit_nr flag has been renamed to --correct_numbers and applies to both files for better consistency.
Improved output file handling: The output file name will automatically include a .gz suffix if the -z/--compress flag is enabled. Conversely, an eventual suffix will be removed if no compression was requested. Additionally, the tool verifies that the output file does not exist yet and prompts for overwrite confirmation (unless -f/--force is specified).
Enhanced error handling: Functions have been rewritten to utilize Results and Options, enabling proper error handling. Before, many functions simply panicked and the program crashed, for example if a non-existing input file was specified.
UMI ID validation: The tool now compares the ID of the UMI to that of the read, ensuring that the tool terminates upon encountering a mismatch. This prevents incorrect UMIs from being added to the read IDs due to differently sorted files.
Automated tests: Several unit tests and extensive integration tests have been implemented to enhance the reliability of the tool.
Continuous integration pipelines: The CI pipelines have been refactored, and a new release pipeline builds the tool for seven common architectures.

Discontinued Previous Features:

Support for inline UMIs: The previous inline functionality for transferring fixed-length UMIs was limited and did not support offsets or regular expressions. Since there are existing tools like umitools that already serve this purpose, we decided to prioritize the development of novel functionality. However, the new subcommand structure in the code paves the way for future support of inline UMIs.
Progress bar: The progress bar provided a helpful visual aid, but it required counting one of the files to determine the total number of reads, resulting in the need to read the file twice. Considering performance reasons, we made the decision to remove this feature, especially since most runs are expected to be non-interactive in workflow systems like Nextflow.
Multi-threading: In the previous version (0.1) of umi-transfer, it was possible to run the tool on two cores when processing paired FastQ files, with each file assigned to a separate thread. However, the tool's performance was primarily limited by output compression, and multi-threading caused significant overhead. A future version of umi-transfer will be designed to run fully asynchronous and efficiently scale over multiple threads. In the meantime, we recommend utilizing FIFOs and external compression with tools like pigz.
Support for singletons: To simplify the code structure, we made the second FastQ file mandatory. For running on singletons, you can provide the same input twice and redirect one of the output files to /dev/null using a FIFO.

Make Johannes' simplification the base for the new dev.

Some amendments to Johannes simplification

…further subcommands in later versions.

…mented an output overwrite check and prompt.

…uf. Had do clone twice :-(

…odify the input file names if no output file names were given.

… delimiter.

… for the ReadFile enum by introducing a <Box> around the compressed input. Also used a BufReader for the plain text file.

This PR adds extensive unit and integration tests and enables removal of an input `*.gz` extension if the output is not compressed.

…vily inspired by/copied from Alex Hallam's tidy-viewer release action.

…ntained.

...and unlock the YOLO batch from Github.

github-advanced-security · 2023-07-18T16:09:47Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

src/umi_errors.rs

…he consistency, though.

…o 1.0.0

…nsion. This will result in a more consistent behaviour.

src/file_io.rs

… from Result to Option and back.

MatthiasZepper · 2023-08-01T15:11:14Z

During code review, I'd specifically appreciate feedback on the arguments --in and --in2. Should it rather be --in1 instead of --in for consistency?

--in and --in2 were adapted from BBTools, still one of my most favourite tool suites. For a tool that supports singletons and may only take one input, --in seems preferable, but to simplify the code structure, we made the second FastQ file mandatory in umi-transfer. Accordingly, --in1 and --in2 seem the more appropriate choice.

On the other hand, there are vague plans to rewrite it in an entirely asynchronous manner for better performance and with an optional --in2. In that sense, --in appears more future-proof.

README.md

alneberg

Great work! Especially impressed you got around to write so many tests!

MatthiasZepper · 2023-08-16T12:11:02Z

Version 1.0.0 - "Ardent adenine" ready to be released!

alneberg and others added 30 commits April 13, 2023 11:05

Johannes over-simplification

3eae330

Tidying up a bit

83c8612

Added back option to gzip output

3d6a18f

Started adjusting the readme

b862026

Readme table formatting

bfe147b

more readme changes

743ca2d

Wrote the performance guide

2ff5e62

Version 0.2.0

532faac

Added background and for developers sections to README

94bd11f

Use file-format crate to check for gzipped file

86135d8

Added 'written by' to authors list in clap

b379998

Failed splitting, cannot reach file_io from umi_external

6d3fe05

Managed to use modules

c1f4dcb

Added a gitignore

d00f40f

Adding some basic error handling.

723b08b

Implemented some basic error handling.

b2c502b

Merge pull request #5 from alneberg/johannes_simple

abc07d0

Make Johannes' simplification the base for the new dev.

Merge pull request #6 from MatthiasZepper/johannes-simplification

2202ab3

Some amendments to Johannes simplification

Further refactor: Subcommand structure to allow for easy addition of …

aef2bac

…further subcommands in later versions.

Implemented a simple counter for the records.

c62c17a

Switching the CLI arguments from strings to Option<PathBuf> and imple…

029c087

…mented an output overwrite check and prompt.

Cleaner code for output checks and suffix updates.

e1c7402

file_io::append_to_path() must work without owning the provided PathB…

18323d3

…uf. Had do clone twice :-(

Finished autogeneration of the file name extension, used a Regex to m…

01a2db5

…odify the input file names if no output file names were given.

Implemented fixing the read numbers for both reads and using a custom…

71ec566

… delimiter.

Readme updates.

aa2ab19

Fixing issues highlighted by Clippy.

eba0854

Addressed the clippy warning 'Large size difference between variants'…

0d5c6d4

… for the ReadFile enum by introducing a <Box> around the compressed input. Also used a BufReader for the plain text file.

Readme updates.

b3a8890

Github unfortunately doesn't render simple style commands in the Readme.

a11c3d4

MatthiasZepper and others added 12 commits July 13, 2023 23:27

Finished the integration tests to test file output.

7a84a3a

Merge pull request #8 from MatthiasZepper

96db94d

This PR adds extensive unit and integration tests and enables removal of an input `*.gz` extension if the output is not compressed.

Create new Testing workflow comprising Clippy and Tarpaulin.

053f147

Devise a new release action that includes cross-plattform builds. Hea…

a2e4785

…vily inspired by/copied from Alex Hallam's tidy-viewer release action.

Refurbish the Docker image build action.

734c2c7

Change EventTriggers to test workflows in my fork.

16099a2

Bugfixes in the GithubAction workflows. Fingers crossed...

9b7ccdc

Slight tweaks to the GithubActions and the Dockerfile.

c71aeb2

Switch to ructions from actions-rs, since the latter seem to be unmai…

dc56395

…ntained.

Extract version from Cargo.toml with grep to have it in the filenames.

282db79

Modify tarpaulin command to incluce integration tests

e855e1c

Update the Github Actions...

e69f639

...and unlock the YOLO batch from Github.

github-advanced-security bot found potential problems Jul 18, 2023

View reviewed changes

src/umi_errors.rs Fixed Show fixed Hide fixed

MatthiasZepper changed the title ~~Version 0.2 release~~ Version 0.2.0 release PR - "Ardent adenine" Jul 18, 2023

Readme updates to inclue binaries and Docker.

5a77497

MatthiasZepper mentioned this pull request Jul 27, 2023

Add recipe for umi-transfer bioconda/bioconda-recipes#42034

Merged

MatthiasZepper changed the title ~~Version 0.2.0 release PR - "Ardent adenine"~~ Version 1.0.0 release PR - "Ardent adenine" Jul 27, 2023

MatthiasZepper added 4 commits July 31, 2023 21:14

Implement exceptions to the prompts for /dev/null and FIFOs.

7b1f5e0

Fix the 'all variants have the same postfix' warning. I appreciated t…

eff67b2

…he consistency, though.

Include instructions for singletons in README.md and change version t…

c95b357

…o 1.0.0

Dropping support for implicit compressed output by specifing .gz exte…

7d99e79

…nsion. This will result in a more consistent behaviour.

github-advanced-security bot found potential problems Aug 1, 2023

View reviewed changes

src/file_io.rs Fixed Show fixed Hide fixed

Match on Ok() instead of Some() in rectify_extension() to avoid going…

da8cff3

… from Result to Option and back.

alneberg reviewed Aug 14, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

Typofix in README.md

3213396

alneberg approved these changes Aug 14, 2023

View reviewed changes

MatthiasZepper merged commit 1835689 into main Aug 16, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.0.0 release PR - "Ardent adenine" #10

Version 1.0.0 release PR - "Ardent adenine" #10

MatthiasZepper commented Jul 18, 2023 •

edited

Loading

github-advanced-security bot commented Jul 18, 2023

MatthiasZepper commented Aug 1, 2023

alneberg left a comment

MatthiasZepper commented Aug 16, 2023

Version 1.0.0 release PR - "Ardent adenine" #10

Version 1.0.0 release PR - "Ardent adenine" #10

Conversation

MatthiasZepper commented Jul 18, 2023 • edited Loading

New and Improved Features:

Discontinued Previous Features:

github-advanced-security bot commented Jul 18, 2023

MatthiasZepper commented Aug 1, 2023

alneberg left a comment

Choose a reason for hiding this comment

MatthiasZepper commented Aug 16, 2023

MatthiasZepper commented Jul 18, 2023 •

edited

Loading