Skip to content

My final thesis project work from my Biomedical Genomics MSc undertaken at NUIG, 2021. Includes Nextflow, Snakemake, Bash, Python and R scripts.

Notifications You must be signed in to change notification settings

gavinf97/thesisreal

Repository files navigation

Thesis Github: Gavin Farrell

Updated: 3/9/21

Status; being updated and cleaned -> Github Pages will be set up for ease of access + Github Wiki

-> Contact [email protected] for any specific files or details before then

Certain aspects of the project are also being broken down into their own repositories for ease for access
eg: Nextflow primary processing pipelines: https://github.com/gavinf97/Nextflow_AntiSMASH

Title: Is the a Loss of BGC Diveristy in Colorectal Microbiome

Description

Given the known dysbiosis of the colorectal gut microbiome in colorectal cancer patients when compared to normal controls, this thesis aimed to determine if there was a further correlation to the number and types of biosynthetic gene clusters (BGC) present in the microbes of the gut microbiome between colorectal cancer patients and normal controls. Data used was short read 100 bp paired end reads. Data was assembled into contigs with Megahit and MetaSpades, and searched for BGC using AntiSMASH and Biosynthetic MetaSpades. Further analsysis and data interpreatation was done using Bash, Python and R scripts.

0. Additional files

0.1. Sample Accessions and Status

Avaiable on ENA at: https://www.ebi.ac.uk/ena/browser/view/PRJEB12449?show=reads
Accession: PRJEB12449

Requires updating as mix of Conda environment and Singularity (docker base) container used for pipelines

0.3. List of tools used (unavailable)


1. Primary Processing Pipelines:

Snakemake: Test Pipeline (available)

Test pipeline for initial decision of which pipeline development tool to use. Discontinued early into the project as Nextflow had more suited functionality and suitability to the project demands over Snakemake.

Nextflow: Megahit Pipeline (available)

Pipeline assembled contigs using Megahit and direct ENA accession downloads for input samples.

Nextflow: MetaSpades Pipeline (available)

Later stage pipelines. Switched from direct ENA downloads to using nf-core pipeline 'Fetch NGS' to download data first. Then used Biosynthetic MetaSpades to form contigs and extend contigs into larger scaffolds, better for large BGC retrieval.

NF-core Fetch NGS;
https://nf-co.re/fetchngs

(.jpg files display the tools used in the pipelines and methods undertaken to process the data)

2. Bash - File Management (available)

Bash scripts were used to take output BGC data from Nextflow Megahit and MetaSpades pipelines and perform QC and merging steps.

3. Python - File Parsing (available)

Merged BGC data was parsed using Python into basic tsv/csv files for use in R.

4. R - Graphing Data (available)

R scripts took basic Python parsed data files and cleaned up csv/tsv files. File data was then normalised and graphed for interpretation and determination of gut microbiome BGC correlation to colorectal cancer.

About

My final thesis project work from my Biomedical Genomics MSc undertaken at NUIG, 2021. Includes Nextflow, Snakemake, Bash, Python and R scripts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published