Skip to content

PROFUNGIS post processing uploads ZOTUs sample fasta files into a reference ZOTU table

Notifications You must be signed in to change notification settings

imartorelli/PROFUNGIS_post_processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

PROFUNGIS_post_processing

PROFUNGIS post processing uploads ZOTUs sample fasta files into a reference ZOTU table

The following scripts provide ways to post-process ZOTU sequence data generated by PROFUNGIS pipeline

Script1: name: generate_zotu_ref1.py desc: generate_zotu_ref1.py allows to generate a reference file from a FASTA format and appends attributes which can be used as a template to upload to the Entity ZOTU of MDDB.

Script2: name: update_ref_map.py desc: update_ref_map.py allows to update the ZOTU reference file with new ZOTUs and keeps track of existing ZOTUs detected in reference

Please note that PROFUNGIS needs to be launched before using the input post-processing input files. PROFUNGIS output fasta files are the inputs for updating MDDB tables refseq ZOTU and contains (ZOTU - sample mapping)

How it works

generate_zotu_ref1.py

The script requests two arguments from the user: a fasta file and requests if primary keys would like to be generated ('Y'/'N')

If second argument == 'N', then a csv file is generated in which it includes the labels 'seq_id' taken from the original FASTA sequence label, followed by a sequence attribute 'sequence' which contains the sequence belonging to the ITS2 marker in this case

If second argument = 'Y', then a csv file is generated in which it includes a 'refseq_pk' which is generated by using a specific suffix used to refer to the db table belonging to the ZOTU reference set.

A relationship table is also created in which the following mapping is stored: SRR name | Zotu label | Primary key assigned to the Zotu label

update_ref_map.py

This script can be run sequentially to a reference ZOTU file generated (for example by launching generate_zotu_ref1.py), as it updates and traces existing ZOTU sequences (ZOTU detected). This version of the script requests one FASTA argument from the user (string) Update: option str <fasta.fa filename> or list of fasta <fasta_list.txt filename>

In input a ZOTU reference file is provided, this is usually generated as a dump of the existing MDDB existing ZOTU reference table. Also the previous tracker is initially provided, such to update and keep track with how many new ZOTU sequences will be generated as reference.

The script also generates a mapping RefZOTU - NewZOTU in the following format:
SRR id | Zotu label | Primary key of ref ZOTU and assigns new primary keys to the new ZOTUs not detected as a reference ZOTU, else provides the mapping of ZOTUs which have been already detected to a reference ZOTU, thus the primary key of the reference ZOTU is given

This mapping is useful such that it provides two types of information:

  • traces and provides which ZOTUs are new,
  • traces which ZOTUs are shared among different samples

How to run the scripts

####generate_zotu_ref1.py

generate_zotu_ref1.py <srr_filename.fa>

where <srr_filename.fa> is in FASTA format -- PROFUNGIS generates ZOTU fasta files with name belonging to the SRA SRR. 
(ex: SRR1502226_zotus_final.fa))

####update_ref_map.py

update_ref_map.py <srr_filename.fa> <RefZOTU.csv> <mapping.csv>

where	<srr_filename.fa> -- is the next (new) processed ZOTU file (FASTA format) generated by PROFUNGIS
	<RefZOTU.csv> -- ZOTU reference file (unique ZOTUs)
	<mapping.csv> -- the mapping table which keeps track of which ZOTU sequence belongs to which sequence sample 
	HINT: for testing you can use the output files generated by generate_zotu_ref1.py
	(ex: generate_zotu_ref1.py <newfasta.fa> <refseq_table_pk.csv> <mapping_table_pk_zotu_srr.csv>)

Requirements

generate_zotu_ref1.py

  • csv
  • pandas
  • re
  • Bio

update_ref_map.py

  • csv
  • pandas
  • re
  • Bio

###OUTPUTS

  • generate_zotu_ref1.py -> ZOTU reference file with extended annotation

    outputs:

    mapping_table_pk_zotu_srr.csv -> traces the mapping of the original Fasta Label to assigned PK otu_seq_mapping_to_update.csv -> a simple reference ZOTU table generated from the given FASTA record_track.csv -> tracker of how many reference ZOTUs have been generated refseq_table_pk.csv -> the ZOTU list with extended annotation used as reference

  • update_ref_map.py -> update ZOTU reference table and mappings

    outputs:

    mapping_table_pk_zotu_srr.csv -> updates the mapping table of the original Fasta Label to assigned PK or to new PK generated if not found otu_seq_mapping_to_update.csv -> provides the table format of the new ZOTUs coming in for update record_track.csv -> updates the tracker of how many reference ZOTUs have been generated from the new FASTA refseq_table_pk.csv -> the updated ZOTU list with new PK generated if new ZOTU was detected

About

PROFUNGIS post processing uploads ZOTUs sample fasta files into a reference ZOTU table

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages