what configurations do I need to change? #40

mfazel · 2022-09-26T15:23:22Z

Hi,

I'm trying pore-C data analysis and have a couple questions probably trivial to you but I could not figure them out by looking at Readme documentations here.
I installed Pore-C-SnakeMake and followed the documentation and ran the test and it finished successfully. Now my question is how do I run it on my own data. Precisely, what configurations (files, paths etc.) do I have to change?
Also is it possible to use for example hg19 instead of hg38 that is used in the example test and also a different enzyme. What changes do I have to make. (I modified basecalls.tsv to my data but it failed) and I'm guessing there are more changes to make but not sure where.

Thanks

wenluo711 · 2022-09-29T18:31:21Z

I guess you'll have to add a hg19 reference file to the config/reference.tsv file? and modify the basecalls.tsv accordingly?

Oksanak22 · 2022-10-01T02:05:23Z

Hello,
I also faced trouble running the Pore C pipeline.
What I have to install extra to be able to run at list "pore_c refgenome virtual-digest".

Thank you.

eharr · 2022-10-02T14:39:11Z

@mfazel all of the configuration you need is in the files referenced in the README, there are comments in the headers of each of the files that describes what each means, let me know if you have any questions.

*  `config/config.yaml` - A yaml file containing settings for the pipeline. Input data is specified in the following tab-delimited files.
*  `config/basecall.tsv` - Metadata and locations of the pore-c sequencing run fastqs.
*  `config/references.tsv` - Locations of the draft/scaffold/reference assemblies that the pore-c reads will be mapped to.
*  `config/phased_vcfs.tsv` - [Optional] The location of phased vcf files that can be used to haplotag poreC reads.

eharr · 2022-10-02T14:42:27Z

@Oksanak22 if you're having an issue installing the pipeline would you mind opening a separate issue with the error log?

mfazel · 2023-02-13T16:11:38Z

Hi Eoghan,
I have a couple of questions about pore-C snakeMake and hope they are not so lame, but I'm using this pipeline for the first time.
I cloned the repo and it ran smooth without error on the toy example from chr21 & chr22 according to github.
Now I have some real whole genome sequencing pore-C data and want to run the pipeline. I have to say, some parts of the github readme.md are not clear (at least to a first user) or not comprehensive.

I modified the four files in the config folder as mentioned in github and ran with few of my files. After it finished, some parts of the results were not the same as the test example, ie. Juicebox, assembly (visualization formats), matrix and pairs folders are missing.
Also I see there are a couple of files in .test/resources folder that is not clear what their role is in the pipeline or how should be generated if necessary (read_ids.txt, GM12878.conf.bed, GM12878_NlaIII.sequencing_summary.txt)
I removed these files after test run but not sure if this caused the missing folders in results.
Also I did not understand what the fast5 folder is for (or comes from) in the resources folder when I run the pipeline on my own data.

Thanks,
M.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what configurations do I need to change? #40

what configurations do I need to change? #40

mfazel commented Sep 26, 2022

wenluo711 commented Sep 29, 2022

Oksanak22 commented Oct 1, 2022

eharr commented Oct 2, 2022

eharr commented Oct 2, 2022

mfazel commented Feb 13, 2023

what configurations do I need to change? #40

what configurations do I need to change? #40

Comments

mfazel commented Sep 26, 2022

wenluo711 commented Sep 29, 2022

Oksanak22 commented Oct 1, 2022

eharr commented Oct 2, 2022

eharr commented Oct 2, 2022

mfazel commented Feb 13, 2023