A tool to identify full length cDNA reads. Primers have to be specified as they are on the forward strand.
Install via pip:
pip install git+https://github.com/nanoporetech/pychopper.git
Or clone the repository:
git clone https://github.com/nanoporetech/pychopper.git
And install the package:
python setup.py install
Install the package in developer mode:
python setup.py develop
Run the tests:
make test
Build the documentation:
make docs
Issue make help
to get a list of make
targets.
usage: cdna_classifier.py [-h] -b primers [-i input_format] [-g aln_params]
[-t target_length] [-s score_percentile]
[-n sample_size] [-r report_pdf] [-u unclass_output]
[-S stats_output] [-A scores_output] [-x]
[-l heu_stringency]
input_fastx output_fastx
Tool to identify full length cDNA reads. Primers have to be specified as they
are on the forward strand.
positional arguments:
input_fastx Input file.
output_fastx Output file.
optional arguments:
-h, --help show this help message and exit
-b primers Primers fasta.
-i input_format Input/output format (fastq).
-g aln_params Alignment parameters (match,
mismatch,gap_open,gap_extend).
-t target_length Number of bases to scan at each end (200).
-s score_percentile Score cutoff percentile (98).
-n sample_size Number of samples when calculating score cutoff
(100000).
-r report_pdf Report PDF.
-u unclass_output Write unclassified reads to this file.
-S stats_output Write statistics to this file.
-A scores_output Write alignment scores to this file.
-x Use more sensitive (and error prone) heuristic mode
(False).
-l heu_stringency Stringency in heuristic mode (0.25).
Example usage:
cdna_classifier.py -b cdna_barcodes.fas -r report.pdf -u unclassified.fq input.fq full_length_output.fq
Example usage in heuristic mode which is more sensitive (and more error prone):
cdna_classifier.py -x -b cdna_barcodes.fas -r report.pdf -u unclassified.fq input.fq full_length_output.fq
The primers have to specified as they are on the forward strand (see data/cdna_barcodes.fas
for an example).
The score cutoffs for each primer are calculated by aligning them against random sequences and taking the -s
percentile of the score distribution (98 by default).
- Please fork the repository and create a merge request to contribute.
- Use bumpversion to manage package versioning.
- The code should be PEP8 compliant, which can be tested by
make lint
.
(c) 2018 Oxford Nanopore Technologies Ltd.
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
See the post announcing the tool at the Oxford Nanopore Technologies community here.