Skip to content

Commit

Permalink
Update pipeline_details.md
Browse files Browse the repository at this point in the history
  • Loading branch information
campanam authored Jan 27, 2023
1 parent 5bac9a6 commit 23b45b3
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions doc/pipeline_details.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,13 @@ The genMapIndex process generates the GenMap [7] index for the reference sequenc
The genMapMap process calculates the mappability for the reference sequence using GenMap (`genmap map -K 30 -E 2 -b`). It then filters the GenMap results using [`filterGM.rb`](ruby_r_scripts.md#filterGMrb) to exclude any sequences with mappability < 1.0 (`filterGM.rb <raw_genmap.bed> 1.0 exclude > <genmap.1.0.bed>`).

## repeatMask
The repeatMask process uses RepeatMasker [8] to soft-mask repeat regions in the reference sequence based on a pre-defined target species (`RepeatMasker -gccalc -nolow -species <specified species>`). The soft-masked reference sequence is passed to the repeatModeler process.
The repeatMask process uses RepeatMasker [8] to soft-mask repeat regions in the reference sequence based on a pre-defined target species (`RepeatMasker -gccalc -nolow -xsmall -species <specified species>`). The soft-masked reference sequence is passed to the repeatModeler process.

## repeatModeler
The repeatModeler process uses RepeatModeler [9] and the RepeatMasker soft-masked reference sequence to generate a species-specific repeat library for the reference sequence. First, a database is built using `BuildDatabase`, then the library is built using `RepeatModeler`. The resulting library (`consensi.fa.classified`) RepeatMasker -pa ${rm_pa} -gccalc -nolow -lib consensi.fa.classified is passed to the repeatMaskRM process.

## repeatMaskRM
The repeatMaskRM process uses RepeatMasker and the RepeatModeler custom repeat library to soft mask the reference sequence (`RepeatMasker -gccalc -nolow -lib consensi.fa.classified`). The RepeatMasker out file (.out) is then converted to bed using [`RM2bed.rb`](ruby_r_scripts.md#RM2bedrb).
The repeatMaskRM process uses RepeatMasker and the RepeatModeler custom repeat library to soft mask the reference sequence (`RepeatMasker -gccalc -nolow -xsmall -lib consensi.fa.classified`). The RepeatMasker out file (.out) is then converted to bed using [`RM2bed.rb`](ruby_r_scripts.md#RM2bedrb).

## maskIndels
Using [`indels2bed.rb`](ruby_r_scripts.md#indels2bedrb), the maskIndels process scans the all-sites VCF generated by the genotypeGVCFs process for indels and generates a bed file excluding all sites a set number of bp upstream/downstream of the identified indels (`indels2bed.rb <all-sites.vcf> <indel pad in bp> > <indels.bed>`).
Expand Down

0 comments on commit 23b45b3

Please sign in to comment.