Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run this tool? #13

Open
jolespin opened this issue Aug 11, 2021 · 9 comments
Open

How to run this tool? #13

jolespin opened this issue Aug 11, 2021 · 9 comments
Assignees

Comments

@jolespin
Copy link

jolespin commented Aug 11, 2021

I'm working on an institute-wide pipeline for JCVI and had some trouble running your tool.

Here's my version installed via pip:

 viral_verify --version
viral_verify, version 0.1.1

Here's my command:

viral_verify -i veba_output/binning/47-Drifterexpttime4punches_S40/tmp/unbinned.fasta -o veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output -H /usr/local/scratch/CORE/jespinoz/db/pfam/v33.1/Pfam-A.hmm -t 16

Edit: I had to decompress the PFAM database which was the error in the original post that I've edited since then.

Should I be using the PFAM database or the database from FigShare?

Can you update the Usage on your GitHub?

This is the results output:

veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output/
├── classified-fasta-output
│   ├── unbinned-chromosome.fasta
│   └── unbinned-unclassified.fasta
├── unbinned-circularized.fasta
├── unbinned-genes.fa
├── unbinned-hmmsearch.domtblout
├── unbinned-hmmsearch.output
├── unbinned-proteins-circularized.fa
├── unbinned-proteins.fa
└── unbinned-results.csv

I ran the version from GitHub on a differen tdataset and got the following output:

testing/viralverify_output/
├── oral_viruses_domtblout
├── oral_viruses_feature_table.txt
├── oral_viruses_genes.fa
├── oral_viruses_input_with_circ.fasta
├── oral_viruses_out_pfam
├── oral_viruses_prodigal.log
├── oral_viruses_proteins_circ.fa
├── oral_viruses_proteins.fa
├── oral_viruses_result_table.csv
├── Prediction_results_fasta
│   ├── oral_viruses_chromosome.fasta
│   ├── oral_viruses_plasmid.fasta
│   ├── oral_viruses_plasmid_uncertain.fasta
│   ├── oral_viruses_virus.fasta
│   └── oral_viruses_virus_uncertain.fasta
└── viralverify.log

1 directory, 15 files

How come the output is so different between the pip and GitHub versions?

@mikeraiko
Copy link
Collaborator

That's funny - someone else forked this repo about a year ago, refactored and submitted to pypi as viral_verify (https://github.com/peterk87/viral_verify) . That's why the output and such is so different. Thanks for pointing that out!

Meanwhile, our current github version is awaiting approval for bioconda channel. As soon as that happens, I'll update accordingly.

@jolespin
Copy link
Author

That is so weird. They also took the namespace too?

What's the process like for getting something on bioconda?

@mikeraiko
Copy link
Collaborator

That's open source, after all...
Bioconda submission turned out to be pretty straightforward. Create recipe (yaml and build.sh files) with all metadata and dependencies, test and commit to bioconda recipes repository. https://bioconda.github.io/contributor/workflow.html
Then, after all CI tests, it needs ti be reviewed by someone of bioconda members. No idea how long it takes :)
github.com/bioconda/bioconda-recipes/pull/30186

@AndAvia
Copy link

AndAvia commented Jul 23, 2024

I'm working on an institute-wide pipeline for JCVI and had some trouble running your tool.

Here's my version installed via pip:

 viral_verify --version
viral_verify, version 0.1.1

Here's my command:

viral_verify -i veba_output/binning/47-Drifterexpttime4punches_S40/tmp/unbinned.fasta -o veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output -H /usr/local/scratch/CORE/jespinoz/db/pfam/v33.1/Pfam-A.hmm -t 16

Edit: I had to decompress the PFAM database which was the error in the original post that I've edited since then.

Should I be using the PFAM database or the database from FigShare?

Can you update the Usage on your GitHub?

This is the results output:

veba_output/binning/47-Drifterexpttime4punches_S40/intermediate/viral_viralverify_output/
├── classified-fasta-output
│   ├── unbinned-chromosome.fasta
│   └── unbinned-unclassified.fasta
├── unbinned-circularized.fasta
├── unbinned-genes.fa
├── unbinned-hmmsearch.domtblout
├── unbinned-hmmsearch.output
├── unbinned-proteins-circularized.fa
├── unbinned-proteins.fa
└── unbinned-results.csv

I ran the version from GitHub on a differen tdataset and got the following output:

testing/viralverify_output/
├── oral_viruses_domtblout
├── oral_viruses_feature_table.txt
├── oral_viruses_genes.fa
├── oral_viruses_input_with_circ.fasta
├── oral_viruses_out_pfam
├── oral_viruses_prodigal.log
├── oral_viruses_proteins_circ.fa
├── oral_viruses_proteins.fa
├── oral_viruses_result_table.csv
├── Prediction_results_fasta
│   ├── oral_viruses_chromosome.fasta
│   ├── oral_viruses_plasmid.fasta
│   ├── oral_viruses_plasmid_uncertain.fasta
│   ├── oral_viruses_virus.fasta
│   └── oral_viruses_virus_uncertain.fasta
└── viralverify.log

1 directory, 15 files

How come the output is so different between the pip and GitHub versions?

Hi there, so in the end which database did you use or is that one annotated out a bit more accurately.

@jolespin
Copy link
Author

I use geNomad now.

@AndAvia
Copy link

AndAvia commented Jul 23, 2024

I use geNomad now.

All right, thanks.

@jolespin
Copy link
Author

jolespin commented Jul 23, 2024

Apologies @AndAvia , I wrote that from my phone but should have given more context. Here's the geNomad publication: https://www.nature.com/articles/s41587-023-01953-y and here's the GitHub: https://github.com/apcamargo/genomad

I developed a wrapper around geNomad for my "binning-viral" module (though, it doesn't really bin and more so identifies contigs that are viral) in my VEBA package. Here's the publication for VEBA (https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae528/7697622) and here's the GitHub (https://github.com/jolepsin/veba).

If you only want to perform viral analysis, I would recommend just using geNomad because VEBA has a lot of functionality in other modules (e.g., assembly w/ SPAdes, rnaSPAdes, Fly or eukaryotic binning/gene modeling, etc) and requires more dependencies/databases.

@AndAvia
Copy link

AndAvia commented Jul 24, 2024

@jolespin Thank you so much for your patience in replying! I only need to do virus identification at the moment, because there are so many virus identification software, I'm going to use genomad, VIBRANT, virfinder, deepvirfinder, virsorter, virsorter2, ViralVerify and these, but you said that ViralVerify two databases have different results, and I don't know which database to choose.I had a chance to look at your VEBA, and I found it very impressive! Wishing you a wonderful day!

@Dmitry-Antipov
Copy link
Contributor

Hi,
Since this tool was released in 2021 and didn't receive significant updates, I'd also recommend for checking newer alternatives. If you are still interested to run exactly viralVerify, I'd definitely retrain the db with the updated pfam-a and genbank viral/plasmid/chromosomal sequences

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants