Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python code for an implementation of G4-Hunter algorithm #2

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

JocelynSP
Copy link

@JocelynSP JocelynSP commented Apr 26, 2017

This is an implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016.

It merges windows to regions more sensibly than the supplied binary executable, so that regions do not overlap. Merged regions reflect the published algorithm in that terminal As and Ts are not shown.
When run on the supplied Mitochondria_NC_012920_1.fasta the windows scores agree with those of the supplied binary executable.
It does not currently output a Score_plot.pdf

This is a naive implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016.
It is inefficient, and doesn't fully match the scoring system in that nucleotides at the terminals of a window are given a score that does not reflect any extension of the run of matching nucleotides outside the window.
It merges windows to regions more sensibly, so that regions do not overlap, and regions more accurately reflect the published algorithm in that terminal As and Ts are removed.
It does not output a Score_plot.pdf
Add option to ScoreSeq function to adjust score based on runs extending outside window
Correct errors in how scores were adjusted for runs comencing before the window being scored
@JocelynSP
Copy link
Author

I have now matched the scoring system, so windows have the score adjusted for the length of run outside the window. This gives the same output for the file Mitochondria_NC_012920_1.fasta with window 25nts and threshold 1.5 as the original. (Except for being tab-separated instead of space-separated)

@JocelynSP JocelynSP changed the title Python code for a naive implementation of G4-Hunter algorithm Python code for an implementation of G4-Hunter algorithm May 1, 2017
@mahzer
Copy link

mahzer commented Jan 18, 2018

Hi Jocelyn,
Nice work!
Is that possible to get the result as a BED file when using a reference genome as an input? I know it's possible with the original R script but I'd like to try the new feature that you have implemented.
Thanks,
MZ

@AnimaTardeb
Copy link
Owner

AnimaTardeb commented Jan 18, 2018 via email

@JocelynSP
Copy link
Author

Hi Mahzer,
I don't know what original R code you are referring to, do you mean the original Python / binary, or might you be on the wrong post?

I am not interested in doing more work on this script, but it would not be hard to add a bed-format output, or to convert the _merged.tsv file to bed-format.
BED files have no column headers. They have 3 to 12 tab-separated fields, with chrom , start and end being required, as Amina said above. See: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
In the merged.tsv file, the chrom is a section heading and would have to be written in field 1 instead; then Start and End can go in fields 2 and 3. The Sequence could go in field 4 (name), or name could just be '.' Then field 5 (score) is Score, and no other optional fields would be used
Jocelyn

@mahzer
Copy link

mahzer commented Jan 19, 2018

Thanks, Amina and Jocelyn.

I was referring to the R scripts included in the supplementary of the paper. I did not look at all of them and thought one of them is the actual code in R.

MZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants