Skip to content

Commit

Permalink
revise & add multi-data example to readme + add n_levels param to lan…
Browse files Browse the repository at this point in the history
…dscape
  • Loading branch information
corybrunson committed Jan 26, 2024
1 parent 23c5861 commit 36d4250
Show file tree
Hide file tree
Showing 15 changed files with 331 additions and 105 deletions.
17 changes: 15 additions & 2 deletions R/landscape.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@
#' @inheritParams ggplot2::layer
#' @inheritParams ggplot2::geom_path
#' @inheritParams persistence
#' @param n_levels The number of levels to compute and plot. If `Inf` (the
#' default), determined to be all levels.
#' @example inst/examples/ex-landscape.R
#' @example inst/examples/ex-persistence-dataset.R
NULL
Expand Down Expand Up @@ -75,7 +77,8 @@ StatLandscape <- ggproto(
diagram = "landscape",
# 'dataset' aesthetic
diameter_max = Inf, radius_max = NULL, dimension_max = 1L,
field_order = 2L, complex = "Rips"
field_order = 2L, complex = "Rips",
n_levels = Inf
) {

# empty case
Expand All @@ -93,7 +96,7 @@ StatLandscape <- ggproto(
pd <- as.matrix(data[, c("start", "end"), drop = FALSE])
pl <- list()
k <- 0L
while (nrow(pd) > 0L) {
while (k < n_levels && nrow(pd) > 0L) {
k <- k + 1L

# identify frontier points
Expand Down Expand Up @@ -158,7 +161,12 @@ stat_landscape <- function(mapping = NULL,
data = NULL,
geom = "landscape",
position = "identity",
diameter_max = Inf, radius_max = NULL,
dimension_max = 1L,
field_order = 2L,
complex = "Rips",
diagram = "landscape",
n_levels = Inf,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
Expand All @@ -172,7 +180,12 @@ stat_landscape <- function(mapping = NULL,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
diameter_max = diameter_max, radius_max = radius_max,
dimension_max = dimension_max,
field_order = field_order,
complex = complex,
diagram = diagram,
n_levels = n_levels,
na.rm = na.rm,
...
)
Expand Down
149 changes: 115 additions & 34 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ output: github_document

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
Expand All @@ -15,67 +15,78 @@ knitr::opts_chunk$set(

# ggtda

[![Travis-CI Build Status](https://travis-ci.org/rrrlw/ggtda.svg?branch=master)](https://travis-ci.org/rrrlw/ggtda)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/rrrlw/ggtda?branch=master&svg=true)](https://ci.appveyor.com/project/rrrlw/ggtda)
[![Coverage Status](https://img.shields.io/codecov/c/github/rrrlw/ggtda/master.svg)](https://codecov.io/github/rrrlw/ggtda?branch=master)

[![Coverage Status](https://img.shields.io/codecov/c/github/tdaverse/ggtda/master.svg)](https://codecov.io/github/tdaverse/ggtda?branch=master)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

[![CRAN version](http://www.r-pkg.org/badges/version/ggtda)](https://CRAN.R-project.org/package=ggtda)
[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/grand-total/ggtda)](https://CRAN.R-project.org/package=ggtda)

## Overview

The **ggtda** package provides **ggplot2** layers for the visualization of persistence data and other summary data arising from topological data analysis.
The **ggtda** package provides **ggplot2** layers for the visualization of constructions and statistics arising from topological data analysis.

## Installation

The development version of **ggtda** can be installed used the **remotes** package:
The development version can be installed used the **remotes** package:

```r
# install from GitHub
remotes::install_github("rrrlw/ggtda", vignettes = TRUE)
remotes::install_github("tdaverse/ggtda", vignettes = TRUE)
```

For an introduction to **ggtda** functionality, read the vignettes:
For an introduction to package functionality, read the vignettes:

```r
# read vignettes
vignette(topic = "intro-ggtda", package = "ggtda")
vignette(topic = "visualize-persistence", package = "ggtda")
vignette(topic = "illustrate-constructions", package = "ggtda")
vignette(topic = "grouped-list-data", package = "ggtda")
```

We aim to submit ggtda to [CRAN](https://CRAN.R-project.org) soon.
We aim to submit to [CRAN](https://CRAN.R-project.org) in Spring 2024!

## Example

**ggtda** visualizes persistence data but also includes stat layers for common TDA constructions. This example illustrates them together. For some artificial data, generate a "noisy circle" and calculate its persistent homology (PH) using [the **ripserr** package](https://cran.r-project.org/package=ripserr), which ports [the `ripser` implementation](https://github.com/Ripser/ripser) into R:

```{r}
# attach {ggtda}
library(ggtda)
```

**ggtda** visualizes persistence data but also includes stat layers for common TDA constructions. This example illustrates them using an artificial point cloud $X$ and its persistent homology (PH) computed with [**ripserr**](https://cran.r-project.org/package=ripserr):

```{r, fig.height=3.5}
# generate a noisy circle
n <- 36; sd <- .2
n <- 36
set.seed(0)
t <- stats::runif(n = n, min = 0, max = 2*pi)
d <- data.frame(
x = cos(t) + stats::rnorm(n = n, mean = 0, sd = sd),
y = sin(t) + stats::rnorm(n = n, mean = 0, sd = sd)
x = cos(t) + stats::rnorm(n = n, mean = 0, sd = .2),
y = sin(t) + stats::rnorm(n = n, mean = 0, sd = .2)
)
# plot the data
ggplot(d, aes(x, y)) + geom_point() + coord_equal() + theme_bw()
# compute the persistent homology
ph <- as.data.frame(ripserr::vietoris_rips(as.matrix(d), dim = 1))
print(head(ph, n = 12))
ph <- transform(ph, dim = as.factor(dimension))
```

Now pick an example proximity at which to recognize features in the persistence data. This choice corresponds to a specific Vietoris complex in the filtration underlying the PH calculation. (This is not a good way to identify features, since it ignores persistence entirely, but it intuitively links back to the geometric construction.)
### Topological constructions

To first illustrate constructions, pick a proximity, or threshold, to consider points in the cloud to be neighbors:

```{r}
# fix a proximity for a Vietoris complex
# choose a proximity threshold
prox <- 2/3
```

Geometrically, the Vietoris complex is constructed based on balls of fixed radius around the data points --- in the 2-dimensional setting, disks (left). The simplicial complex itself consists of a simplex at each subset of points having diameter at most `prox` --- that is, each pair of which are within `prox` of each other.
The homology $H_k(X)$ of a point cloud is uninteresting ($H_0(X) = \lvert X \rvert$ and $H_k(X) = 0$ for $k > 0$). The most basic space of interest to the topological data analyst is the union of a _ball cover_ $B_r(X)$ of $X$---a ball of common radius $r$ around each point. The common radius will be $r =$ `prox / 2`.

The figure below compares the ball cover (left) with the _Vietoris_ (or _Rips_) _complex_ ${VR}_r(X)$ constructed using the same proximity (right).
The complex comprises a simplex at each subset of points having diameter at most `prox`---that is, each pair of which are within `prox` of each other.
A key result in TDA is that the homology of the ball union is "very close" to that of the complex.

```{r, fig.height=3.5}
# attach *ggtda*
library(ggtda)
# visualize disks of fixed radii and the Vietoris complex for this proximity
p_d <- ggplot(d, aes(x = x, y = y)) +
coord_fixed() +
Expand All @@ -94,26 +105,33 @@ gridExtra::grid.arrange(
)
```

We can visualize the persistence data using a barcode (left) and a flat persistence diagram (right). In the barcode plot, the dashed line indicates the cutoff at the proximity `prox` (= `r prox`); in the persistence diagram plot, the fundamental box contains the features that are detectable at this cutoff.
This cover and simplex clearly contain a non-trivial 1-cycle (loop), which makes $H_1(B_r(X)) = H_1({VR}_r(X)) = 1$.
Persistent homology encodes the homology group ranks across the full range $0 \leq r < \infty$, corresponding to the full filtration of simplicial complexes constructed on the point cloud.

### Persistence data

We can visualize the persistence data using a barcode (left) and a persistence diagram (right).
In the barcode plot, the dashed line indicates the cutoff at the proximity `prox`; in the persistence diagram plot, the fundamental box contains the features that are detectable at this cutoff.

```{r, fig.height=3.5}
# visualize the persistence data, indicating cutoffs at this proximity
p_bc <- ggplot(ph,
aes(start = birth, end = death, colour = dim)) +
geom_barcode(linewidth = 1) +
labs(x = "Diameter", y = "Homological features") +
geom_vline(xintercept = prox, color = "darkgoldenrod", linetype = "dashed") +
p_bc <- ggplot(ph, aes(start = birth, end = death)) +
geom_barcode(linewidth = 1, aes(color = dim, linetype = dim)) +
labs(x = "Diameter", y = "Homological features",
color = "Dimension", linetype = "Dimension") +
geom_vline(xintercept = prox, color = "darkgoldenrod", linetype = "dotted") +
theme_barcode()
max_prox <- max(ph$death)
p_pd <- ggplot(ph) +
coord_fixed() +
stat_persistence(aes(start = birth, end = death, colour = dim, shape = dim)) +
geom_abline(slope = 1) +
labs(x = "Birth", y = "Death") +
labs(x = "Birth", y = "Death", color = "Dimension", shape = "Dimension") +
lims(x = c(0, max_prox), y = c(0, max_prox)) +
geom_fundamental_box(t = prox,
color = "darkgoldenrod", fill = "darkgoldenrod",
linetype = "dashed") +
geom_fundamental_box(
t = prox,
fill = "darkgoldenrod", color = "transparent"
) +
theme_persist()
# combine the plots
gridExtra::grid.arrange(
Expand All @@ -122,12 +140,75 @@ gridExtra::grid.arrange(
)
```

The barcode shows red lines of varying persistence for the 0-dimensional features, i.e. the gaps between connected components, and one blue line for the 1-dimensional feature, i.e. the loop. These groups of lines do not overlap, which means that the loop exists only in the persistence domain where all the data points are part of the same connected component. Because our choice of `prox` is between the birth and death of the loop, the simplicial complex above recovers it.
The barcode lines are color- and linetype-coded by feature dimension: the 0-dimensional features, i.e. the gaps between connected components, versus the 1-dimensional feature, i.e. the loop.
These groups of lines do not overlap, which means that the loop exists only in the persistence domain where all the data points are part of the same connected component. Our choice of `prox` is between the birth and death of the loop, which is why the complex above recovers it.

The persistence diagram shows that the loop persists for longer than any of the gaps. This is consistent with the gaps being artifacts of the sampling procedure but the loop being an intrinsic property of the underlying space.

### Composite plots

TDA almost always involves comparisons of topological data between spaces.
To illustrate such a comparison, we construct a larger sample but then examine the persistence of its cumulative subsets:

```{r}
# larger point cloud sampled from a noisy circle
set.seed(0)
n <- 180
t <- stats::runif(n = n, min = 0, max = 2*pi)
d <- data.frame(
x = cos(t) + stats::rnorm(n = n, mean = 0, sd = .2),
y = sin(t) + stats::rnorm(n = n, mean = 0, sd = .2)
)
# list of cumulative point clouds
ns <- c(12, 36, 60, 180)
dl <- lapply(ns, function(n) d[seq(n), ])
```

First we construct a nested data frame containing these subsets and plot their Vietoris complexes.
(We specify the 'ripserr' engine to reduce runtime.)

```{r, fig.height=5.5}
# formatted as grouped data
dg <- do.call(rbind, dl)
dg$n <- rep(ns, vapply(dl, nrow, 0L))
# faceted plots of cumulative simplicial complexes
ggplot(dg, aes(x, y)) +
coord_fixed() +
facet_wrap(facets = vars(n), labeller = label_both) +
stat_simplicial_complex(
diameter = prox, fill = "darkgoldenrod",
engine = "ripserr"
) +
theme_bw() +
theme(legend.position = "none")
```

The Vietoris complexes on these subsets for the fixed proximity are not a filtration; instead they show us how increasing the sample affects the detection of homology at that threshold.
Notice that, while a cycle exists at $n = 36$, the "true" cycle is only detected at $n = 60$.

We can also conveniently plot the persistence diagrams from all four cumulative subsets, this time using a list-column of data sets passed to the `dataset` aesthetic:

```{r, fig.height=5.5}
# nested data frame of samples of different cumulative sizes
ds <- data.frame(n = ns, d = I(dl))
print(ds)
# faceted plot of persistence diagrams
ggplot(ds, aes(dataset = d)) +
coord_fixed() +
facet_wrap(facets = vars(n), labeller = label_both) +
stat_persistence(aes(colour = after_stat(factor(dimension)),
shape = after_stat(factor(dimension)))) +
geom_abline(slope = 1) +
labs(x = "Birth", y = "Death", color = "Dimension", shape = "Dimension") +
lims(x = c(0, max_prox), y = c(0, max_prox)) +
theme_persist()
```

The persistence diagram clearly shows that the loop persists for longer than any of the gaps. This is consistent with the gaps being artifacts of the sampling procedure while the loop being an intrinsic property of the underlying distribution.
The diagrams reveal that a certain sample is necessary to distinguish bona fide features from noise, as only occurs here at $n = 36$.
While the true feature retains about the same persistence (death value less birth value) from diagram to diagram, the persistence of the noise gradually lowers.

## Contribute

To contribute to **ggtda**, you can create issues for any bugs you find or any suggestions you have on the [issues page](https://github.com/rrrlw/ggtda/issues).
To contribute to **ggtda**, you can create issues for any bugs you find or any suggestions you have on the [issues page](https://github.com/tdaverse/ggtda/issues).

If you have a feature in mind you think will be useful for others, you can also [fork this repository](https://help.github.com/en/articles/fork-a-repo) and [create a pull request](https://help.github.com/en/articles/creating-a-pull-request).
Loading

0 comments on commit 36d4250

Please sign in to comment.