-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leakage when using abundance file #208
Comments
@cimendes I am trying to replicate what you did.
I hope this helps some. Let me know if you want to keep troubleshooting. |
Ops, I think something got lost in translation when trying to link things from google sheets as I only have a Linux workstation... :P And indeed I linked the wrong reference file! I'm sorry about that! (I forgot to update the file in the zenodo repository). But here it is. Thank you for the help in debugging this! 😍 |
@cimendes I am still waiting for it to finish running, but I think the issue is BioPython 1.79 (issue #207). So, by downgrading to BioPython 1.78, and using the FASTA you posted in the last message along with removing the windows This is as far as it has gotten:
I am waiting for it to finish. But this is the furthest I have been able to get with it so far. But BioPython 1.79 introduced a number of Deprecations and changes that seem to be breaking a lot of scripts. It should've been a major release (at least 1.80) to signal these issues. To downgrade BioPython, you can use My best guess at was is happening is that BioPython seems to have changed the defaults of how Seq objects store their data to |
@andersgs name your price! I'll ship it to Australia! I'll totally try that. Thank you so much! |
Hahaha... You are welcome. But, don't thank me yet. Make sure it works. :) |
@cimendes, I am sorry to say, it has now run long enough that I have reached the same error:
|
I am trying downgrading joblib to 0.17.0 --- as per the pipenv.lock file. |
Hummm... @cimendes still no joy for me. Any luck for you? I am trying a slightly different approach. I cloned the repo, and created an environment using pipenv and the Pipfile.lock. Will run within this environment. Something about the dependencies might be causing problems. I think we also need to have a way of either capturing any errors in parallel subprocesses or running in series to see what is causing the bug. |
I canceled my job after your post. :( I did manage to create a sample using the coverage file without having this memory leakage error, but this option doesn't allow me to set the total read number so I need to do some math to compensate. If this works the problem is isolated to using an abundance file. Or the combination of using the abundance file and a very high read number. I'll keep you posted! |
Hej folks! Thanks for reporting and trying to debug this. I just came back from vacation and will take a look. /Hadrien |
Hi guys, no news for this issue? I'm having the same trouble for the first time despite using your great tool since 2 years. I have an abundance file with 5 genomes and would like to generate 10 millions reads
My RAM dramatically crashes despite i am working on a 1 To RAM server:
here is my setup:
|
Any ideas please ? |
Hi, I'm wondering if there's a way to clear memory with some kind of garbage collector between chromosomes. I notice that it creates files for each chromosome, and when generating reads for the next record, RAM usage doesn't decrease. Alternatively, is there a way to generate reads for each chromosome separately and then combine them? All my code is stored there: |
Hi! Did you try the latest release? The new 2.0.0 version has a complete rework of the multiprocessing pipeline which includes a memory leaking fix. |
That sounds great! I haven't tried the latest release yet, but I'll definitely give it a go! |
Hello!
I've been using InSilicoSeq to generate mock communities for a project of my own to assess assembly quality (https://github.com/cimendes/LMAS).
To match the distribution of a real community, I've computed an abundance file to use with insilicoseq. Unfortunately, when using this option, I have the following issue:
UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
The iss execution never progresses. I've tried running it in a computer note with 250 Gb of available memory and the issue still pressists. Any assistance is very much appreciated.
The command that I'm running:
iss generate --genomes ZymoBIOMICS_genomes.fasta --output LMS --abundance_file Zymos\ mock\ Log\ Samples\ Abundance\ -\ Abundance\ file\ LOG.tsv --cpus 40 -n 95665106 --model miseq
The abundance file passed is available here. The complete genomes are available here
Thank you very much for your assistance!
The text was updated successfully, but these errors were encountered: