SG-NEx data source contains long read (Oxford Nanopore) RNA sequencing data for commonly used cell lines. The data is hosted by AWS on S3 and can be accessed using direct links or the aws CLI.
The SG-NEx S3 bucket contains the following types of data:
- Raw sequencing signal (fast5)
- Basecalled sequences (fastq)
- Aligned sequences (bam)
- Data visualisation tracks (bigwig/bigbed)
- Annotations
- Processed data for RNA modification detection
- Sample and experiment information
The SG-NEx S3 BLOW5 bucket contains the following types of data:
Below is the folder index for the open data buckets:
To access raw sequencing (fast5) files:
aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fast5/ # list samples
aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fast5/sample_name . # download fast5 files to your local directory
To access raw sequencing (blow5) files:
aws s3 ls --no-sign-request s3://sg-nex-data-blow5/ # list samples
aws s3 sync --no-sign-request s3://sg-nex-data-blow5/sample_name . # download blow5 file and the index to your local directory
To access basecalled sequencing (fastq) files:
aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fastq/ # list samples
aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fastq/sample_name . # download fastq files to your local directory
We provide both genome and transcriptome aligned files:
aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/bam/genome/ # list samples inside this folder
aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/bam/genome/sample_name . # download bam files that are aligned to genome
aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/bam/transcriptome/ # list samples inside this folder
aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/bam/transcriptome/sample_name . # download bam files that are aligned to transcriptome
We provide bigbed and bigwig files which can be directly visualised any genome browser. These files follow the UCSC chromosome naming convention and they can be directly visualised using the UCSC Genome Browser:
The files can be accessed and downloaded through S3 as well:
aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/genome_browser_data/bigbed/ # list all bigbed files
aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/genome_browser_data/bigwig/ # list all bigwig files
aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/genome_browser_data/bigbed/sample_name.bigbed . # download bigbed file for the a specific sample
The genome and transcriptome fasta files and the gtf file describing the genome annotations and which were used to process thedata can also be accessed. The latest SG-NEx data release used Ensembl version 91 (see here for links to original data). Two sets of annotations are provided in the bucket:
- Grch38 Ensembl annotations (without spike in RNAs)
- Grch38 Ensembl + Sequin + SIRV and ERCC annotations
aws s3 ls --no-sign-request s3://sg-nex-data/data/annotations/genome_fasta/ # list included genome fasta files used for processing the sequencing data
aws s3 sync --no-sign-request s3://sg-nex-data/data/annotations/genome_fasta . # download genome fasta files used for processing the sequencing data
aws s3 ls --no-sign-request s3://sg-nex-data/data/annotations/transcriptome_fasta/ # list included transcriptome fasta files used for processing the sequencing data
aws s3 sync --no-sign-request s3://sg-nex-data/data/annotations/transcriptome_fasta . # download transcriptome fasta files used for processing the sequencing data
aws s3 ls --no-sign-request s3://sg-nex-data/data/annotations/gtf_file/ # list included annotation gtf files used in processing the sequencing data
aws s3 sync --no-sign-request s3://sg-nex-data/data/annotations/gtf_file . # download nnotation gtf files used for processing the sequencing data
Long read direct RNA sequencing has allows the detection of RNA modification with RNA modification tools, such as xPore and m6Anet. To simplify the analysis of RNA modifications using the SG-Nex datasets, you can download the processed files to use with xPore and m6Anet. To download the processed data for differential RNA modification analysis with xPore:
aws s3 ls --no-sign-request s3://sg-nex-data/data/processed_data/xpore/ # list all samples that have processed data for RNA modification detection using xPore
aws s3 sync --no-sign-request s3://sg-nex-data/data/processed_data/xpore/sample_name . # download the json and index file needed for running xPore
To download the processed data for detection of m6A using m6Anet:
aws s3 ls --no-sign-request s3://sg-nex-data/data/processed_data/m6Anet/ # list all samples that have processed data for RNA modification detection using m6Anet
aws s3 sync --no-sign-request s3://sg-nex-data/data/processed_data/m6Anet/sample_name . # download the json and index file needed for running m6Anet
These files are provided for a subset of samples, please see here for the sample list with matched processed data for xPore and m6Anet.
Detailed information for each sequencing sample is provided here. The data also includes multiplexed samples which share the same fast5/blow5 files. The information about the multiplexed samples can be found here. The files can also be accessed directly on S3:
aws s3 ls --no-sign-request s3://sg-nex-data/metadata/ # list metadata files
aws s3 sync --no-sign-request s3://sg-nex-data/metadata/ . # download the metadata files