To build this README, run build_readme.R
. Talks data is in csv talks_table.csv
Joe Cheng (RStudio), Shiny Reproducibility
Andy Nicholls (GSK), R Validation Hub (past, current and future state)
Leon Eyrich Jessen (Technical University of Denmark), Artificial neural networks in R with Keras and TensorFlow
Abstract
Workshop: Artificial neural networks in R with Keras and TensorFlowLink to Workshop Material
Max Kuhn (RStudio), Machine Learning
Carson Sievert (RStudio), plottly
Will Landau (Eli Lilly), Machine learning workflow management with drake
In alphabetical order.
Abhik Seal (Abbvie), Democratizing Natural Language Processing with I2E and R Shiny
Abstract
The primary objective of the presentation is to share insights of democratizing powerful natural language processing tool like I2E lingumatics and open source R and Shiny. The talk will focus on how we can leverage I2E python sdk natural language processing toolkit to perform natural language processing and visualize text mining results with R and Shiny. We will present several uses of our R shiny platform called pharmine and its use cases which we developed for minining biomedical data.Slides
Aedin Culhane (Dana-Farber Cancer Institute), Multi-modal data integration
Andy Nicholls (GSK), Making Better Decisions
Abstract
In the early phases of clinical development, the future of a compound depends on more than just the result of hypothesis test on a single endpoint, in a single phase 2 study. We think a lot about how design choices affect immediate outcomes. GSK's Quantitative Decision Making (QDM) framework focusses on the question, "How do we design our study in order to increase the chances that it will deliver data that will allow us to decide whether the drug should continue in development, or stop?" The QDM Framework has been developed in R and takes advantage of the Biostatistics HPC environment, running thousands of hypothetical scenarios in close to real-time. The initiative is changing the way we plan and deliver clinical trials. Thanks to a Shiny front end, Statisticians are able to walk clinical teams through key trial design decisions in order to estimate the Probability of Success ? a key component in the QDM framework. This presentation will cover the core QDM concepts and present the key communication outputs created to support the process.Slides
Becca Krouse (Rho), Building Open Source Tools for Safety Monitoring: Advancing Research Through Community Collaboration
Abstract
The Interactive Safety Graphics (ISG) workstream of the ASA-DIA Biopharm Safety Working Group is excited to introduce the safetyGraphics package: an interactive framework for evaluating clinical trial safety in R using a flexible data pipeline. Our group seeks to modernize clinical trial safety monitoring by building tools for data exploration and reporting in a highly collaborative open source environment. At present, our team includes clinical and technical representatives from the pharmaceutical industry, academia, and the FDA, and additional contributors are always welcome. The current release of the safetyGraphics R package includes graphics related to drug-induced liver injury. The R package is paired with an in-depth clinical workflow for monitoring liver function created by expert clinicians based on medical literature. safetyGraphics features interactive visualizations built using htmlwidgets, a Shiny application, and the ability to export a fully reproducible instance of the charts with associated source code. To ensure quality and accuracy, the package includes more than 300 unit tests, and it has been vetted through a beta testing process that included feedback from more than 20 clinicians and analysts. The Shiny application can easily be extended to include new charts or applied to other disease areas due to its modular design and generalized charting framework. Several companies have adapted the tool for their own use, leading to interesting discussions and paving the way for enhancements, which demonstrates the power of open source and community collaboration.Slides
Carson Sievert (RStudio), Reproducible shiny apps with shinymeta
Abstract
Shiny makes it easy to take domain logic from an existing R script and wrap some reactive logic around it to produce an interactive webpage where others can quickly explore different variables, parameter values, models/algorithms, etc. Although the interactivity is great for many reasons, once an interesting result is found, it's more difficult to prove the correctness of the result since: (1) the result can only be (easily) reproduced via the Shiny app and (2) the relevant domain logic which produced the result is obscured by Shiny's reactive logic. The R package shinymeta provides tools for capturing and exporting domain logic for execution outside of a Shiny runtime (so that others can reproduce Shiny-based result(s) from a new R session).Slides
Chase Clark (University of Illinois), Your Missing Step in Reproducible R Programming: Continuous Deployment
Abstract
The past few years have shown vast improvements in workflows for reproducible and distributable research within the R ecosystem. At satRday Chicago everyone in the audience said they used R Markdown, however only one person raised their hand when asked if they could associate their reports back to the code version that generated it. Since continuous integration is quickly becoming commonplace in the R community, continuous deployment (CD) is a logical and easy step to add to your workflow to enhance reproducibility. I will demo associating R Markdown to the code version that produced it and automating the build and release of both executable and cloud-based Shiny apps. Finally, an announcement of the electricShine package for creating Electron based Shiny apps will highlight the power of using CD with production-level Shiny apps.Slides
David Cooper (GlaxoSmithKline), Using Machine Learning and Interactive Graphics to Find New Cancer Targets
Abstract
GlaxoSmithKline is searching for new oncology drug targets. We have CRISPR knockout data for many cancer cell lines and many genes. For these same cell lines, we also have genomic data --somatic mutations, copy number variants, and gene expression. We use machine learning (random forests) to find predictive relationships between genomic features and cell line growth under knockout. Then we use GLASSES, a shiny app, to share the results with biologists. GLASSES lets scientists interactively explore key relationships and discover novel cancer vulnerabilities.Slides
Doug Kelkhoff (Roche/Genentech), Re-envisioning Clinical Content Delivery in the Open Source World
Abstract
Content delivery in preparation for filing a clinical study report requires robust tooling for quickly and reproducibly compiling analysis of study data. Traditionally, this reproducibility has stemmed from one-time, rigorous validation of a development environment and analytic workflow. More recently, this paradigm has shifted to match modern software development principles, transitioning toward continuous monitoring of software validation and quality.I'll share our developing perspectives on validation and reproducibility, driven by a need to leverage open source tools. This vision leans on open source software such as R and its package ecosystems, publicly maintained containerized environments like the rocker project and cross-industry risk assessment via the R Validation Hub. By treating analysis as a software process in the content pipeline transforming raw data into analytic results, we can take advantage of the continuous deployment workflows prevalent in the software development world to shorten our filing timelines, while simultaneously delivery a more reproducible product to our health authority partners.
Slides
Elena Rantou (FDA), Using R for Generic Drug Evaluation and SABE R-package for Assessing Bioequivalence of Topical Dermatological Products
Abstract
Determination of bioequivalence (BE), a crucial part of the evaluation of generic drugs, may depend on clinical endpoint studies, pharmacokinetic (PK) studies of bioavailability, and In-Vitro tests, among others. Additionally, in reviewing Abbreviated New Drug Applications (ANDA), FDA reviewers often analyze safety studies and perform various kinds of simulations. A growing, vibrant group of statisticians in the Office of Biostatistics, CDER/FDA has adopted R for both their routine tasks and to address numerous scientific questions that are received in the form of internal consults. During the past 5 years, we have used R to run power simulations; generate the distribution of certain statistics of interest; assess the similarity of and cluster amino-acid sequences as well as, derive the distribution of the molecular weight of such sequences of a certain length; and determine the validity of data sets categorized for genotoxicity. R-package SABE was developed to accompany a new statistical test, used to assess BE of topical dermatological products when data for evaluation come from the In-Vitro Permeation Test (IVPT) [1]. BE tests consider comparisons between a Test (usually generic) and a Reference (RLD) product under a replicate study design. A function that assesses BE of a Test and a Reference formulation uses a mixed scaled criterion for the PK metrics AUC (Area Under the Curve) and Cmax (maximum concentration).Slides
Ellis Hughes (Fred Hutch), Validation Framework for Assay Processing Pipelines
Abstract
In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention) and the lessons learned while creating packages as a team.Housed within Fred Hutch, SCHARP is an instrumental partner in the research and clinical trials surrounding HIV prevention and vaccine development. Part of SCHARP's work involves analyzing experimental biomarkers and endpoints which change as the experimental question, analysis methods, antigens measured, and assays evolve. Maintaining a validated code base that is rigid in its output format, but flexible enough to cater a variety of inputs with minimal custom coding has proven to be important for reproducibility and scalability.
SCHARP has developed several key steps in the creation, validation, and documentation of R packages that take advantage of R's packaging functionality. First, the programming team works with leadership to define specifications and lay out a roadmap of the package at the functional level. Next, statistical programmers work together and approach the task from a software development view. Once the code has been developed, the package is validated according to procedures that comply with 21 CFR part 11, and leverage software development life cycle (SDLC) methodology. Finally, the package is made available for use across the team on live data. These procedures set up a framework for validating assay processing packages that furthers the ability of Fred Hutch to provide world-class support for our clinical trials.
Slides
Eric Nantz (Eli Lilly), Creating and reviving Shiny apps with {golem}
Abstract
Developing Shiny applications that meet design goals, easily deploy to multiple platforms, and contain easily maintainable components (all while adhering to best practices) is typically a difficult endeavor. Until recently, there has not been a tool addressing the optimal development workflow and construction of Shiny apps. The {golem} package by Think-R offers an opinionated framework for creating a Shiny app as a package, with {usethis}-like functionality to add a diverse set of capabilities. In this presentation, I will share how {golem} enables a robust standard for Shiny development and how it magically brought a dormant application back to life.Slides
Jeannine Fisher (Metrum), Leveraging multiple R tools to make effective pediatric dosing decisions
Abstract
R Shiny apps allow for dynamic, interactive, real-time integration of knowledge within a drug-development program to support decision making. Here, an R Shiny app was used to explore the pharmacokinetic and pharmacodynamic effects of different dosing regimens of the anti IL-17 human mAb Cosentyx? (secukinumab) in pediatric patients. Secukinumab has been studied and approved to treat psoriasis in adult patients. Models which describe the dose-exposure-response relationships in adults (Lee et al., Clin Pharmacol Ther, 2019 and FDA, Medical Reviews BLA 125504, 2015) were used in the mrgsolve simulation package to explore these relationships in pediatric patients. The prior adult knowledge, used in conjunction with the computational infrastructure leveraged through R, the Shiny app, mrgsolve, and Rcpp, allows researchers to explore various dosing regimens in a difficult-to-study patient population. The tools and approaches described here have been routinely used to support regulatory interactions (ex. PIP) involving pediatric dosing.Slides
Jessica Franklin (Harvard Medical School), Evaluating the performance of advanced causal inference methods applied to healthcare claims data
Abstract
Cohort studies of treatments developed from healthcare claims often have hundreds of thousands of patients and up to several thousand measured covariates. Therefore, new causal inference methods that combine ideas from machine learning and causal inference may improve analysis of these studies by taking advantage of the wealth of information measured in claims. In order to evaluate the performance of these methods as applied to claims-based studies, we use a combination of real data examples and plasmode simulation, implemented in R package 'plasmode', which creates realistic simulated datasets based on a real cohort study. In this talk, I will give an overview of our progress so far and what is left to be done.Slides
Keaven Anderson (Merck), Teaching an old dog new tricks: modernizing gsDesign
Abstract
The gsDesign package for group sequential design is widely used with >30k downloads. The package was originally written in 2007 with substantial documentation and Runit testing created before 2010. A Shiny interface was created to make the package more approachable in about 2015. Recent efforts have focused on updating package to use Roxygen2, pkgdown, covr/covrpage and testthat as well as changing vignettes from Sweave to R Markdown. The learning curve for this modernization will be discussed as well as usage in a regulated environment.Slides
Kelly O'Briant (RStudio), Shiny in Production: Building bridges from data science to IT
Abstract
We know that adopting documentation, testing, and version control mechanisms are important for creating a culture of reproducibility in data science. But once you've embraced some basic development best practices, what comes next? What does it take to feel confident that our data products will make it to production? This talk will cover case studies in how I work with R users at various organizations to bridge the gaps that form between development and production. I'll cover reasons why CI/CD tools can enhance reproducibility for R and data science, showcase practical examples like automated testing and push-based application deployment, and point to simple resources for getting started with these tools in a number of different environments.Slides
Kevin Snyder (FDA), Interactive Visualization of Standardized CDISC-SEND-Formatted Toxicology Study Data Using R Shiny
Abstract
The standardization of nonclinical study data by the Clinical Data Interchange Standards Consortium (CDISC) via the Standard for Exchange of Nonclinical Data (SEND) has created an opportunity for the collaborative development and use of open source software solutions to analyze and visualize toxicology study data. Shiny is an open source R package that facilitates the development of user-friendly, web-based applications. The Pharmaceutical Users Software Exchange (PhUSE) consortium has provided a platform for stakeholders throughout the pharmaceutical industry to collaboratively build and share tools, e.g. R Shiny applications, to enhance the effectiveness and efficiency of drug development. The modeling of standard repeat-dose toxicology study endpoints, e.g. body weights, clinical signs, clinical pathology, histopathology, toxicokinetics, etc., in SEND has created new opportunities for dynamic, interactive visualization of study data above and beyond the static tables and figures typically included in static study reports. For example, clinical pathology data from nonclinical toxicology studies can be difficult to digest when presented as group means in data tables, due to the large number of potentially correlated analytes collected across treatment groups, sexes, and potentially multiple timepoints. An R Shiny application has been developed to allow end users to comprehensively examine these datasets, using a variety of analytical and visualization methods, with relative ease. The application is publicly hosted on shinyapps.io, and the source code can be found on the PhUSE GitHub website.Slides
Leon Eyrich Jessen (Technical University of Denmark), Tidysq for Working with Biological Sequence Data in ML Driven Epitope Prediction in Cancer Immunotherapy
Abstract
We are amidst a data revolution. Just the past 5 years, the cost of sequencing a human genome has gone down approximately 10-fold. This development moves equally fast within areas such as mass spectrometry, in vitro immuno-peptide screening a.o. This facilitates the search for bio-markers, biologics, therapeutics, etc. but also redefines the requirements for storing, accessing and working with data and the skillset of bio data scientists. In this talk I will present tidysq, an R-package aiming at extending the Tidyverse framework to include (tidy) bio-data-science / bioinformatics. Tidysq will be presented in context with current status in ML driven (neo)epitope prediction within cancer immunotherapy.Slides
Madeleine S. Gastonguay (Metrum), Prediction of maternal-fetal exposures of CYP450-metabolized drugs using physiologic pharmacokinetic modeling implemented in R and mrgsolve
Abstract
Physiologically based pharmacokinetic (PBPK) models are used extensively in drug development to address of number of problems. However, most PBPK applications have limited knowledge sharing impact because they are implemented in closed, proprietary software. Much of the physiologic data and knowledge required for these models is publically available or available in the pre-competitive space. To this end, we've engaged in the development of open science PBPK models, using R as the scaffolding for this work. In particular, our group has developed the mrgsolve R package which utilizes Rcpp to compile models of systems of ordinary differential equations. One example is the development of a PBPK model to predict maternal/fetal exposures for drugs that are primarily metabolized by liver CYP450 enzymes throughout pregnancy. This model aims to utilize a quantitative understanding of the physiological and biochemical changes that occur throughout pregnancy to inform clinical pharmacology decisions where clinical trials cannot. The model was validated against the observed data of 5 different drugs: midazolam, metoprolol, caffeine, nevirapine, and artemether. A series of local sensitivity analyses followed by parameter optimization further improved model predictions using the mrgsolve and nloptr R packages. The developed maternal-fetal PBPK model in its flexible open-source implementation provides a transparent, platform-independent, and reproducible system for model-informed decision support while developing exposure-based dosing recommendations in maternal/fetal patient populations.Slides
Marianna Foos (Bluebird Bio), Breaking the Speed Limit: How R Gets Faster
Mark Rothe (Sanofi), Shinytized R Markdown: A Potent OTC Alternative to 1,3,7?Trimethylxanthine & Currently Indicated for NDA Document Generation, Among Others
Abstract
Providing a Study Data Reviewer's Guide for Clinical Data to accompany the SDTM datasets, define.xml, and annotated CRF in a submission gives additional information to help the FDA review team. The guide is traditionally authored using MS Word - a 100% manual and labor intensive process with its inherent shortcomings often exposed and aggravated during the usually frenzied sponsor submission process. R offers a more efficient solution with greater reproducibility: Programmatic document generation facilitated by Shiny and R Markdown. Shiny not only manages R Markdown knitting but gives the sponsor staff, who oftentimes are unfamiliar with R, the ability to quickly leverage R with just a crash course in Markdown. An example of applying Shiny and R Markdown to generate the Study Data Reviewer's Guide for Clinical Data will be presented.Slides
Max Kuhn (RStudio), This one is not like the others: Applicability Domain methods in R
Abstract
Even though a model prediction can be made, there are times when it should taken with some skepticism. For example, if a new data point is substantially different from the training set, its predicted value may be suspect. In chemistry, it is not uncommon to create an "applicability domain" model that measures the amount of potential extrapolation from the training set. The applicable package will be used to demonstrate different method to measure how much a new data point is an extrapolation from the original data (if at all).Slides
Mercè Crosas (Chief Data Science and Technology Officer, IQSS, Harvard), Opening Remarks - Day 1
Mirjam Trame (Novartis), nlmixr: an R package for population PKPD modeling
Abstract
nlmixr is a free and open source R package for fitting nonlinear pharmacokinetic (PK), pharmacodynamic (PD), joint PK/PD and quantitative systems pharmacology (QSP) mixed-effects models. Currently, nlmixr is capable of fitting both traditional compartmental PK models as well as more complex models implemented using ordinary differential equations (ODEs). It is under intensive development and has succeeded in attracting extensive attention and a willingness to make contributions from the pharmaceutical modeling community. We believe that, over time, it will become a capable, credible alternative to commercial software tools, such as NONMEM, Monolix, and Phoenix NLME.Slides
Ning Leng (Roche/Genentech), Embrace R in Pharma - building internal R community and establishing fit-for-purpose R pilots
Abstract
R is the dominant language in modern quantitative science, however it is still not widely used in pharma industry. In this talk I will share learnings in building an internal R user community in a large global organization, via efforts including cataloging existing works, coordinating R adoption pilots and trainings, etc. In addition, I will share our experiences and challenges in building a streamlined workflow with an automated writing component to enhance efficiency and reproducibility in a recent health authority interaction, towarding our mission of bringing therapies to patients faster.Slides
Paul Schuette (FDA), Simulations, and Complex Innovative Trial Designs
Paulo Bargo (Janssen), Using RStudio.Cloud to advance R proficiency: a crowdsourcing training experience
Abstract
As the Pharmaceutical sector boosts its interactions with regulatory agencies using R programming as one key instrument for drug development submissions, we face a dilemma that several members of statistics and statistical programming teams are not currently advanced R programmers. For many years SAS has been a powerful tool in the data analysis repertoire of pharma statisticians however the recent development of automation capabilities such as RMarkdown and R/Shiny have created a new venue to expedite access to consumable information in the form of reports, presentations or interactive graphics that can be produced efficiently and in standard format for all phases of a drug development or submission process. At Janssen we aim to improve the literacy in R programming and achieve nearly 100% adhesion by statistics and statistical programming teams in the coming 2-3 years. To achieve this goal, we are leveraging all types of training formats, from online training, to in-house instructor led seminar, to one-on-one mentoring. One of the key methods we have been developing is the use of RStudio.Cloud as a platform for internal crowd-led hands-on workshops where statisticians/programmers are "thought" to solve on-the-job real problems ranging from visualization to automated reports. In this presentation we will discuss our experience creating this program and share lessons learned, mistakes and successes.Slides
Rena Yang (Roche/Genentech), Collaborating at scale: managing an enterprise analytical computing ecosystem
Abstract
In a large organization, collaboration faces many obstacles. Groups may inadvertently reinvent functionality and expend redundant effort. Siloing may impede aggregation and comparison of results. Analysts may not be aware of potential collaborators. However, a shared computational analysis environment, supported by centrally developed infrastructure and well-defined policies, enables discoverability, facilitates reuse, promotes communication between analysts, and improves comparability of results. We will present how we are pursuing this vision at Genentech.Slides
Volha Tryputsen (Janssen), From playing in the backyard to designing one: Shiny transforms study designs, data analyses and statistical thinking of oncology in vivo group at Janssen
Abstract
In vivo studies are crucial to the discovery and development of novel drugs and are conducted for proof-of-concept validation, FDA applications and to support clinical trials. Appropriate study design, data analyses and interpretation are essential in providing the knowledge about the drug efficacy and safety within a living organism. With drug discovery science moving forward at an ever-accelerating rate data analyses software are not always capable to offer appropriate toolset for data analyses. In the absence of a proper tool, oncology in vivo scientists at Janssen R&D needed comprehensive analysis platform to conduct appropriate and efficient analyses of in vivo data to insure quality and speed of decision-making. INVIVOLDA shiny application was developed to fulfill the gap.INVIVOLDA offers interactive and animated graphics for data explorations and powerful linear mixed effect modeling framework for longitudinal data analysis. With implemented decision trees and statistical report generation it streamlines statistical analyses of in vivo longitudinal data.
INVIVOLDA success lead to more requests for Shiny applications for analyses and design of experiments in oncology in vivo group. Multiple statistical trainings were subsequently conducted to educate biologists on statistical methods implemented in Shiny applications. Once completed, comprehensive framework of Shiny apps will enhance statistical knowledge and thinking, transform the way experiments are designed and analyzed and ensure traceable and reproducible research and efficient decision making in oncology in vivo group at Janssen.
Slides
Will Landau (Eli Lilly), Machine learning workflow management with drake
Abstract
Machine learning workflows can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, speed, scale, and reproducibility of such projects with the drake R package. drake resolves the dependency structure of your analysis pipeline, skips tasks that are already up to date, executes the rest with optional distributed computing, and organizes the output so you rarely have to think about data files. This talk demonstrates a deep learning project with drake-powered automation.Slides
Xiao-Li Meng (Professor of Statistics and Founding Editor in Chief of the Harvard Data Science Review, Harvard), Opening Remarks - Day 2