MinEncCanKmer

Minimal encoding of canonical k-mers

Compilation

gcc -O3 -D_FILE_OFFSET_BITS=64 -pthread -mbmi -o canonical fasta.c canonical.c -lm

Usage

./canonical <fasta> <k> <b>

where

fasta is a (multiple) fasta file
k is the k-mer length, optional, default 5, maximum 31
b is the number of buckets, optional, default 4

Output: distribution of k-mers to buckets

2-bit encoding

To switch to standard 2-bit encoding, (un)comment the following lines:

// process_string(seq,k,threads,t)
   process_string_std(seq,k,threads,t)

General alphabets

For encoding canonical k-mers on general (non-DNA) alphabets, Python scripts are provided in the according subfolder, where minenc.py outputs the encoding of all k-mers for a given alphabet size, and minenc_rc.py encodes considering reverse complementation.

C++ implementation

An implementation of the functionality in C++ is available in the genesis library, see here. The library also offers other useful functionality for working with sequences and k-mers.

Citation

Please cite: Wittler R. General encoding of canonical k-mers. Peer Community Journal. 2023;3: e87. https://doi.org/10.24072/pcjournal.323

License

fasta.c and fasta.h are borrowed from FragGeneScan-Plus.
MinEncCanKmer is licensed under the GNU general public license.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
example_data		example_data
general_alphabet		general_alphabet
LICENSE		LICENSE
README.md		README.md
canonical.c		canonical.c
fasta.c		fasta.c
fasta.h		fasta.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MinEncCanKmer

Compilation

Usage

2-bit encoding

General alphabets

C++ implementation

Citation

License

About

Releases

Packages

Contributors 2

Languages

License

gi-bielefeld/MinEncCanKmer

Folders and files

Latest commit

History

Repository files navigation

MinEncCanKmer

Compilation

Usage

2-bit encoding

General alphabets

C++ implementation

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages