Skip to content

guo-cheng/awosome-bioinformatics

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

awosome-bioinformatics

Abstract: A curated list of resources for learning bioinformatics. Some of this repo resources were collected by BioInstaller project. You can use BioInstaller to directly download the source code or database files, or fetch the meta information by BioInstaller::get.meta()$item.

Purpose:

  • Provide some of bioinformatics learning resources for beginners
  • Provide a profiling of bioinformatics

Field:

  • Next generation sequencing (NGS)
  • Bioinformatics Data Analysis
Table of content

Table of content


Resources

General

Journal

Sequencing Technology

This section mainly copied from enseqlopedia.

Thanks this work: Hadfield, J. & Retief, J. A profusion of confusion in NGS methods naming. Nat Methods 15, 7-8 (2018).

RNA Sequencing Methods

DNA Sequencing Methods

Tools

Package management

Web Application Developement Framework

Web-based Service

  • UCSC
  • NCBI
  • ExPASy
  • EMBL-EBI
  • TCGA
  • COSMIC
    • COSMIC-3D: a comprehensive integration of cancer mutations with protein structure across the human genome and structural proteome, seeking to support the identification and characterization of protein targets for novel drug design in precision oncology
  • St. Jude PeCan Data Portal
  • BIG Data Center
  • DAVID Bioinformatics Resources
  • cBioPortal
  • Oncotator
  • QIAGEN Analysis Platform
  • Wordcloud
  • Omictools
  • iCoMut
  • UniProt
  • Pfam
  • SMART
  • STRING
  • DiseaseEnhancer
  • SEECancer
  • eQTL Browser
  • Cistrome Project
  • VarCards
  • superdrug2
  • MeDReaders
  • ECOdrug
  • rSNPBase3.0
  • MNDR
  • MSDD
  • funcoup
  • proteinatlas
  • DGIdb
  • Drugbank
  • InterPro
  • ncbi-biosystems
  • denovo-db
  • The Human Phenotype Ontology (HPO)
  • FANTOM
  • dbNSFP
  • regSNP-intron
  • RADAR
  • DARNED
  • REDIportal
  • LNCediting
  • EggNOG
  • MiSTIC
  • DTMiner
  • PDBFlex
  • Cancer3d
  • Dsysmap
  • CBS Prediction Servers
  • wANNOVAR: Public web service of ANNOVAR
  • Harmonizome: Search for genes or proteins and their functional terms extracted and organized from over a hundred publicly available resources
  • GDA: A web-based tool that combines NCI60 uniquely large number of drug sensitivity data with CCLE and NCI60 gene mutation and expression profiles
  • CLUE: Unravel biology with the world’s largest perturbation-driven gene expression dataset
  • CMAP: The Connectivity Map (also known as cmap) is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes.
  • pssmsearch: a web application to discover novel protein motifs (SLiMs, mORFs, miniMotifs) and PTM sites
  • bammmotif: Bayesian Markov Models (BaMMs), a web server for de-novo motif discovery and regulatory sequence analysis
  • LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis
  • GeNets: a unified web platform for network-based genomic analyses
  • HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization
  • paintomics: a web resource for the pathway analysis and visualization of multi-omics data
  • kinact: a computational approach for predicting activating missense mutations in protein kinases
  • VAReporter: VAReporter can provide comprehensive annotation by integrating a wide variety of biomedical databases
  • SNPnexus: SNPnexus was designed to simplify and assist in the selection of functionally relevant Single Nucleotide Polymorphisms (SNP) for large-scale genotyping studies of multifactorial disorders
  • Oncoscape: an online open-access dataanalysis and visualization platform that empowers researchers and clinicians to discover novel patterns and relationships between linked clinical and molecular data
  • cellmarker: a manually curated resource of cell markers in human and mouse
  • awesome: a database of SNPs that affect protein post-translational modifications
  • hmdb: an online database of small molecule metabolites found in the human body, which facilitates human metabolomics research including the identification and characterization of human metabolites using NMR and MS
  • redoxdb: a curated database of protein oxidative modification
  • instruct: a database of 3D protein interactome networks with structural resolution
  • consensuspathdb: integrates interaction networks in Homo sapiens including binary and complex protein-protein, genetic, metabolic, signaling, gene regulatory and drug-target interactions, as well as biochemical pathways
  • phosphonetworks: a database for experimentally determined kinase-substrate relationships
  • dbsno: protein S-nitrosylation (SNO) is a reversible post-translational modification (PTM) and involves the covalent attachment of nitric oxide (NO) to the thiol group of cysteine (Cys) residues. Given the increasing number of proteins reported to be regulated by this modification, S-nitrosylation is considered to act, in a manner analogous to phosphorylation, as a pleiotropic regulator that elicits dual effects to regulate diverse pathophysiological processes by altering protein function, stability, and conformation change in various cancers and human disorders
  • hpdi: Human Protein-DNA Interactome (hPDI)
  • islandviewer: an integrated interface for computational identification and visualization of genomic islands
  • appris: a system that deploys a range of computational methods to provide annotations of alternative splice isoforms and identify principal isoforms for vertebrate species
  • rbpdb: a collection of RNA-binding proteins linked to a curated database of published observations of RNA binding
  • type2diabetesgenetics: providing data and tools to promote understanding and treatment of type 2 diabetes and its complications
Clinical Annotation
  • CIViC
  • DoCM
  • ClinVar
  • Intogen
  • Cancer Hotspots
  • DisGeNET
  • Cancer Biomarkers database
  • OncoKB: Precision Oncology Knowledge Base
  • LncRNADisease: Not only a resource that curated the experimentally supported lncRNA-disease association data but also a platform that integrated tool(s) for predicting novel lncRNA-disease associatons
  • fusiongdb: fusion gene annotation DataBase, which collected 48 117 FGs across pan-cancer from three representative fusion gene resources: the improved database of chimeric transcripts and RNA-seq data (ChiTaRS 3.1), an integrative resource for cancerassociated transcript fusions (TumorFusions), and The Cancer Genome Atlas (TCGA) fusions by Gao et al.
  • sedb: the comprehensive human Super-Enhancer database.
  • pmkb: the cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations
  • ewasdb: epigenome-wide association study database
  • dcdb: DCDB (Drug Combination Database), Accumulating scientific and clinical evidences have suggested the use of drug combinations as a safe and effective approach, to treat complicated and refractory diseases. The Drug Combination Database (DCDB) is devoted to the research and development of multi-component drugs. The current version of DCDB collected 1363 drug combinations (330 approved and 1033 investigational, including 237 unsuccessful usages), involving 904 individual drugs, 805 targets
Noncoding RNA Related Database
  • CSCD
  • AtCircDB
  • CircNet
  • circBase
  • circRNADb
  • exoRBase
  • EVLncRNAs
  • NONCODE: an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs)
  • MiTranscriptome: a catalog of human long poly-adenylated RNA transcripts derived from computational analysis of high-throughput RNA sequencing (RNA-Seq) data from over 6,500 samples spanning diverse cancer and tissue types
  • FANTOM CAT: an atlas of human long non-coding RNAs with accurate 5’ ends
  • lnc2cancer2: an updated database that provides comprehensive experimentally supported associations between lncRNAs and human cancers
  • sm2mir: a manual curated database which collects and incorporates the experimentally validated small molecules' effects on miRNA expression in 20 species from the published papers. Each entry contains the detailed information about small molecules, miRNAs and their relationships, including species, small molecule name, DrugBank Accession number, PubChem CID, approved by FDA or not, miRNA name, miRBase Accession number, expression pattern of miRNA, experimental detection method, tissues or conditions for detection, evidences in the reference, PubMed ID and the published year of the reference
  • oncomirdb: a Database for Oncogenic & Tumor-Suppressive MicroRNAs
  • mircancer: provides comprehensive collection of microRNA (miRNA) expression profiles in various human cancers which are automatically extracted from published literatures in PubMed. It utilizes text mining techniques for information collection. Manual revision is applied after auto-extraction to provide 100% precision
  • lncipedia: a public database for long non-coding RNA (lncRNA) sequence and annotation. The current release contains 127,802 transcripts and 56,946 genes
  • mirnest: an integrative collection of animal, plant and virus microRNA data
  • mirtarbase: the experimentally validated microRNA-target interactions database
  • mirdb: an online resource for microRNA target prediction and functional annotations
eQTL Related Database

Sequencing Data Portal

Local tools

Quality Control
Alignment And Assembly
Variant Detection (SNVs, INDELs, SVs)
  • GATK
  • MuTect
  • lofreq
  • VarScan2
  • freebayes
  • TVC
  • SomaticSniper
  • speedseq
  • FusionCatcher
  • svtoolkit
  • pindel
  • breakdancer
  • delly
  • CNVkit
  • speedseq
  • GRIDSS
  • PancanQTL
  • TumorFusions
  • SVScore
  • SVTools
  • RDDpred
  • iseq
  • deepvariant
  • SV2
  • facets
  • MutScan
  • svaba: structural variation and indel detection by local assembly
  • manta: structural variant and indel caller using mapped sequencing data
  • JAFFA: a multi-step pipeline that takes either raw RNA-Seq reads, or pre-assembled transcripts, then searches for gene fusions
  • Picky: structural variants pipeline for long reads
  • CREST: a algorithm for detecting genomic structural variations at base-pair resolution using next-generation sequencing data
  • Control-FREEC: a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data
  • Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs
  • GISTIC2: facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers
  • BreaKmer: A method to identify structural variation from sequencing data in target regions
  • deTiN: DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.
  • vadir: an integrated approach to Variant Detection in RNA
Variant Annotation
Variant Visualization (SNVs, INDELs, SVs)
Variant Screen
Alternative Splicing
Gene Expression Data Analysis
Virus Related
  • viral-ngs
  • qap
  • ROP: discovering the source of all RNA-seq reads, including those originating from repeat sequences, recombinant B and T cell receptors, and microbial communities
  • ViFi: pipeline for identifying viral integration and fusion mRNA reads from NGS data
  • hgtid: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
Single Cell
  • seurat
  • SCnorm
  • dropClust
  • scran: batch effect adjust
  • trendsceek: spatial expression trends in single-cell gene expression data
  • scRNA-tools: a database of software tools for the analysis of single-cell RNA-seq data.
  • awesome-single-cell: list of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc.
  • SAVER: SAVER (Single-cell Analysis Via Expression Recovery) implements a regularized regression prediction and empirical Bayes method to recover the true gene expression profile in noisy and sparse single-cell RNA-seq data.
Protein Data Related
Expression Quantitative Trait Loci, eQTL
ChIP-seq analysis
Primer Design
Work flow
Unclassified
Statistical and Visualization
Text editor and IDE
Remote Connection (SSH)
Remote Connection (Desktop)
Other

Books&Tutorial

R

Linux&Shell

Python

C/C++

JAVA

Statistics and Deep learning

│  李航.统计学习方法.pdf
│  机器学习及其应用.pdf
│  All of Statistics - A Concise Course in Statistical Inference - Larry Wasserman - Springer.pdf
│  Machine Learning - Tom Mitchell.pdf
│  PRML.pdf
│  PRML读书会合集打印版.pdf
│  Programming Collective Intelligence.pdf
│  [奥莱理] Machine Learning for Hackers.pdf
│  [机器学习]Tom.Mitchell.pdf
│  《大数据:互联网大规模数据挖掘与分布式处理》迷你书.pdf
│  推荐系统实践.pdf
│  数据挖掘-实用机器学习技术(中文第二版).pdf
│  数据挖掘_概念与技术.pdf
│  机器学习-Mitchell-中文-清晰版.pdf
│  机器学习导论.pdf
│  模式分类第二版中文版Duda.pdf(全).pdf
│  深入搜索引擎--海量信息的压缩、索引和查询.pdf
│  矩阵分析.美国 Roger.A.Horn.扫描版.pdf
│  统计学习基础 数据挖掘、推理与预测.pdf
│  
├─机器学习实战
│      machinelearninginaction.zip
│      机器学习实战 单页.pdf
│      机器学习实战.pdf
│      
└─论文文集
    └─LDA
            LDA-wangyi.pdf
            LDA数学八卦.pdf
            text-est.pdf

Git

Cloud

Bioinfomatics

Skills

Programming language

Statistics

Code Management

Organization

Google Summer of Code Registered

  • Open Bioinformatics Foundation: Promoting practice & philosophy of OSS & Open Science in biological research.
  • National Resource for Network Biology (NRNB): The National Resource for Network Biology (NRNB) organizes the development of free, open source software to enable network-based visualization, analysis, and biomedical discovery.
  • INCF: INCF advances data reuse and reproducibility in brain research by coordinating the development of Open, FAIR, and Citable tools and resources for neuroscience.
  • Computational Biology @ University of Nebraska-Lincoln: Our organization develops tools for bioinformatics and computational biology research. Our goal is to further knowledge in health through data visualization and analysis.
  • Biomedical Informatics, Emory University: Big Data for Healthcare and Biomedical Research
  • Ensembl: The Ensembl project maintains and updates databases that annotate a wide number of genome sequences and distributes them freely to the worldwide research community.
  • R project for statistical computing: R provides a wide variety of statistical and graphical techniques, and is highly extensible. R is often the tool of choice for research in statistical methodology.
  • InterMine: InterMine integrates biological data sources and makes it easy to query, visualise, and analyse the data via a graphical user interface or via APIs in Python, R, Perl, and more.
  • NumFOCUS: NumFOCUS supports and promotes world-class, innovative, open source scientific software.
  • PEcAn Project: PEcAn is an integrated ecoinformatics toolbox that consists of a set of scientific workflows that wrap around ecosystem models and manage flow of information in and out of models

Project-based community

  • galaxyproject: Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research.
  • bioconda: A channel for the conda package manager specializing in bioinformatics software.
  • biopython: An international association of developers of freely available Python tools for computational molecular biology.
  • samtools: Tools (written in C using htslib) for manipulating next-generation sequencing data.
  • opengene: Open source tools for NGS data analysis.
  • MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report.
  • Gatk: GATK4 aims to bring together well-established tools from the GATK and Picard codebases under a streamlined framework, and to enable selected tools to be run in a massively parallel way on local clusters or in the cloud using Apache Spark. It also contains many newly developed tools not present in earlier releases of the toolkit.
  • nextflow: A bioinformatics workflow manager that enables the development of portable and reproducible workflows.
  • spack: A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
  • omicX: Reap the rewards of a biological insight engine.

Communication-based community

Institute or business company

People

Blog

Contributors

About

A curated list of resources for learning bioinformatics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published