Python code for an implementation of G4-Hunter algorithm #2

JocelynSP · 2017-04-26T23:41:11Z

This is an implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016.

It merges windows to regions more sensibly than the supplied binary executable, so that regions do not overlap. Merged regions reflect the published algorithm in that terminal As and Ts are not shown.
When run on the supplied Mitochondria_NC_012920_1.fasta the windows scores agree with those of the supplied binary executable.
It does not currently output a Score_plot.pdf

This is a naive implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016. It is inefficient, and doesn't fully match the scoring system in that nucleotides at the terminals of a window are given a score that does not reflect any extension of the run of matching nucleotides outside the window. It merges windows to regions more sensibly, so that regions do not overlap, and regions more accurately reflect the published algorithm in that terminal As and Ts are removed. It does not output a Score_plot.pdf

Add files via upload

Add option to ScoreSeq function to adjust score based on runs extending outside window

Correct errors in how scores were adjusted for runs comencing before the window being scored

JocelynSP · 2017-05-01T07:14:21Z

I have now matched the scoring system, so windows have the score adjusted for the length of run outside the window. This gives the same output for the file Mitochondria_NC_012920_1.fasta with window 25nts and threshold 1.5 as the original. (Except for being tab-separated instead of space-separated)

mahzer · 2018-01-18T03:16:20Z

Hi Jocelyn,
Nice work!
Is that possible to get the result as a BED file when using a reference genome as an input? I know it's possible with the original R script but I'd like to try the new feature that you have implemented.
Thanks,
MZ

AnimaTardeb · 2018-01-18T10:11:09Z

Hey, Jocelyn did a nice work, But I still don't understand why the flanking bases disturb a lot of people although they can play an important role for the G4 folding in vitro. As I have done a lot of experiment I needed to know all the possible bases that can play or not a role in the G4 folding. It is nice to have a sequence of 60 bases with 3 potential G4 seq but it is also nice to know that if I separate them it is because there is between two sequences, bases that weakened the score and should be the best positions where it is possible separate the three G4s for in vitro or in vivo testing for your Q M.Z. the BED files are like if i am not wrong name chromosome \t start \t end I think you can add the column chromosome name on Excel and save the file in .bed and I never heard about an R origin G4 program I always coded the G4-hunter in python and the statistics in R Again, Jocelyn nice code but I am not the one how favors the merging sequence but it is nice to have another version of the code Have a nice day Amina B https://github.com/AnimaTardeb/

…

On Thu, Jan 18, 2018 at 4:16 AM, mahzer ***@***.***> wrote: Hi Jocelyn, Nice work! Is that possible to get the result as a BED file when using a reference genome as an input? I know it's possible with the original R script but I'd like to try the new feature that you have implemented. Thanks, MZ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKtS4l0AwxLEczKsSQtqfJo8pNS6gHjLks5tLreFgaJpZM4NJjgk> .

JocelynSP · 2018-01-19T02:28:09Z

Hi Mahzer,
I don't know what original R code you are referring to, do you mean the original Python / binary, or might you be on the wrong post?

I am not interested in doing more work on this script, but it would not be hard to add a bed-format output, or to convert the _merged.tsv file to bed-format.
BED files have no column headers. They have 3 to 12 tab-separated fields, with chrom , start and end being required, as Amina said above. See: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
In the merged.tsv file, the chrom is a section heading and would have to be written in field 1 instead; then Start and End can go in fields 2 and 3. The Sequence could go in field 4 (name), or name could just be '.' Then field 5 (score) is Score, and no other optional fields would be used
Jocelyn

mahzer · 2018-01-19T04:20:03Z

Thanks, Amina and Jocelyn.

I was referring to the R scripts included in the supplementary of the paper. I did not look at all of them and thought one of them is the actual code in R.

MZ

JocelynSP added 4 commits April 27, 2017 09:30

Merge pull request #1 from PapenfussLab/JocelynSP-code-1

e2e7865

Add files via upload

Correct score

5936ad4

Add option to ScoreSeq function to adjust score based on runs extending outside window

Correct mis-handling of runs outside window

30af38d

Correct errors in how scores were adjusted for runs comencing before the window being scored

JocelynSP changed the title ~~Python code for a naive implementation of G4-Hunter algorithm~~ Python code for an implementation of G4-Hunter algorithm May 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python code for an implementation of G4-Hunter algorithm #2

Python code for an implementation of G4-Hunter algorithm #2

JocelynSP commented Apr 26, 2017 •

edited

Loading

JocelynSP commented May 1, 2017

mahzer commented Jan 18, 2018

AnimaTardeb commented Jan 18, 2018 via email

JocelynSP commented Jan 19, 2018

mahzer commented Jan 19, 2018

Python code for an implementation of G4-Hunter algorithm #2

Are you sure you want to change the base?

Python code for an implementation of G4-Hunter algorithm #2

Conversation

JocelynSP commented Apr 26, 2017 • edited Loading

JocelynSP commented May 1, 2017

mahzer commented Jan 18, 2018

AnimaTardeb commented Jan 18, 2018 via email

JocelynSP commented Jan 19, 2018

mahzer commented Jan 19, 2018

JocelynSP commented Apr 26, 2017 •

edited

Loading