-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python code for an implementation of G4-Hunter algorithm #2
base: master
Are you sure you want to change the base?
Conversation
This is a naive implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016. It is inefficient, and doesn't fully match the scoring system in that nucleotides at the terminals of a window are given a score that does not reflect any extension of the run of matching nucleotides outside the window. It merges windows to regions more sensibly, so that regions do not overlap, and regions more accurately reflect the published algorithm in that terminal As and Ts are removed. It does not output a Score_plot.pdf
Add files via upload
Add option to ScoreSeq function to adjust score based on runs extending outside window
Correct errors in how scores were adjusted for runs comencing before the window being scored
I have now matched the scoring system, so windows have the score adjusted for the length of run outside the window. This gives the same output for the file Mitochondria_NC_012920_1.fasta with window 25nts and threshold 1.5 as the original. (Except for being tab-separated instead of space-separated) |
Hi Jocelyn, |
Hey,
Jocelyn did a nice work, But I still don't understand why the flanking
bases disturb a lot of people although they can play an important role for
the G4 folding in vitro.
As I have done a lot of experiment I needed to know all the possible bases
that can play or not a role in the G4 folding.
It is nice to have a sequence of 60 bases with 3 potential G4 seq but it is
also nice to know that if I separate them it is because there is between
two sequences, bases that weakened the score and should be the best
positions where it is possible separate the three G4s for in vitro or in
vivo testing
for your Q M.Z. the BED files are like if i am not wrong
name chromosome \t start \t end
I think you can add the column chromosome name on Excel and save the file
in .bed
and I never heard about an R origin G4 program I always coded the
G4-hunter in python and the statistics in R
Again, Jocelyn nice code but I am not the one how favors the merging
sequence but it is nice to have another version of the code
Have a nice day
Amina B
https://github.com/AnimaTardeb/
…On Thu, Jan 18, 2018 at 4:16 AM, mahzer ***@***.***> wrote:
Hi Jocelyn,
Nice work!
Is that possible to get the result as a BED file when using a reference
genome as an input? I know it's possible with the original R script but I'd
like to try the new feature that you have implemented.
Thanks,
MZ
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKtS4l0AwxLEczKsSQtqfJo8pNS6gHjLks5tLreFgaJpZM4NJjgk>
.
|
Hi Mahzer, I am not interested in doing more work on this script, but it would not be hard to add a bed-format output, or to convert the _merged.tsv file to bed-format. |
Thanks, Amina and Jocelyn. I was referring to the R scripts included in the supplementary of the paper. I did not look at all of them and thought one of them is the actual code in R. MZ |
This is an implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016.
It merges windows to regions more sensibly than the supplied binary executable, so that regions do not overlap. Merged regions reflect the published algorithm in that terminal As and Ts are not shown.
When run on the supplied Mitochondria_NC_012920_1.fasta the windows scores agree with those of the supplied binary executable.
It does not currently output a Score_plot.pdf