Skip to content

Commit

Permalink
update readme; unpublish site
Browse files Browse the repository at this point in the history
  • Loading branch information
cole-brokamp committed Nov 1, 2023
1 parent 3374f15 commit b0bb4bc
Show file tree
Hide file tree
Showing 5 changed files with 293 additions and 265 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ auditor_online_cache.zip
^README_cache$
^.*\.Rproj$
^\.Rproj\.user$
auditor_online_scrape_2023-08-10.rds
1 change: 1 addition & 0 deletions R/zzz.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ utils::globalVariables("gaz")
utils::globalVariables("f")
utils::globalVariables("high_score")
utils::globalVariables("score")
utils::globalVariables("apt_id")
170 changes: 92 additions & 78 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ output: github_document
<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
library(parcel)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
Expand All @@ -20,14 +22,25 @@ knitr::opts_chunk$set(
<!-- badges: end -->


The goal of parcel is to provide tools for matching real-world addresses to reference sets of addresses; e.g., "352 Helen Street", "352 Helen St." or "352 helen st". This package is motivated by the included example data resources of auditor parcel tax data from Hamilton County, Ohio.
The goal of parcel is to provide tools for matching real-world addresses to reference sets of addresses; e.g., "352 Helen Street", "352 Helen St." or "352 helen st". This package is motivated by the included example data resources of auditor parcel tax data from Hamilton County, Ohio. Use `get_parcel_data()` to get the corresponding parcel data for a vector of addresses:

```{r}
get_parcel_data(
c("1069 Overlook Avenue Cincinnati OH 45238",
"419 Elm St. Cincinnati OH 45238",
"3333 Burnet Ave Cincinnati OH 45219",
"3830 President Drive Cincinnati Ohio 45225",
"3544 Linwood Av Cincinnati OH 45226")
)
```

With this specific goal in mind, parcel includes:

- functions for cleaning and tagging components of addresses: **`clean_address()`**, **`tag_address()`**, and **`create_address_stub()`**
- the `cagis_parcels` tabular-data-resource, which contains parcel identifiers, parcel addresses, and parcel characteristics downloaded from the [Cincinnati Area Geographic Information System (CAGIS)](https://cagismaps.hamilton-co.org/cagisportal/mapdata/download)
- the `hamilton_online_parcels` tabular-data-resource, which contains parcel characteristics scraped from [Hamilton County Auditor Online](https://wedge1.hcauditor.org/)
- functions for joining addresses to parcel identifiers based on an included model pretrained on electronic health record addresses in Hamilton County, OH: **`link_parcel()`**
- functions for joining addresses to parcel identifiers based on an included model pretrained on electronic health record addresses in Hamilton County, OH and a list of custom pseudo-identifiers for multi-building apartment complexes: **`link_parcel()`**, **`link_apt()`**


## Installation

Expand All @@ -47,18 +60,7 @@ The development version of parcel can be installed with:
pak::pak("geomarker-io/parcel")
```

## Example Usage

Use `get_parcel_data()` to get the corresponding parcel data for a vector of addresses:

```{r}
library(parcel)
get_parcel_data(c("1069 Overlook Avenue Cincinnati OH 45238",
"419 Elm St. Cincinnati OH 45238",
"3544 Linwood Av Cincinnati OH 45226"))
```

## Python, `miniconda`, and `virtualenv`
### Python, `miniconda`, and `virtualenv`

`reticulate::py_install()` assumes a non-system version of Python is already installed and will offer to install Miniconda and create an environment specifically for R and the reticulate package.

Expand All @@ -82,34 +84,11 @@ reticulate::py_config()
reticulate::py_list_packages()
```

## CAGIS Parcels Data

The `cagis_parcels` tabular data resource (TDR) is created using the R scripts in `/inst` and stored within the package. It can be loaded using {[`codec`](https://geomarker.io/codec)}:

```{r}
d_parcel <- codec::read_tdr_csv(fs::path_package("parcel", "cagis_parcels"))
head(d_parcel)
# without codec:
# read.csv(fs::path_package("parcel", "cagis_parcels"))
```

```{r}
#| results: asis
options(knitr.kable.NA = '')
codec::glimpse_attr(d_parcel) |>
knitr::kable()
```
## Identifiers for Parcels and Properties

```{r}
#| results: asis
options(knitr.kable.NA = '')
codec::glimpse_schema(d_parcel) |>
knitr::kable()
```
A `parcel_id` refers to the Hamilton County Auditor's "Parcel Number", which is referred to as the "Property Number" within the CAGIS Open Data and uniquely identifies properties. In rare cases, multple addresses can share the same parcel boundaries, but have unique `parcel_id`s and in these cases, their resulting centroid coordinates would also be identical.

Some of the parcel characteristics do not make sense in certain contexts and should not be interpreted incorrectly; for example, the value of a parcel for a multi-family or multi-unit housing structure shouldn't be compared to the value of a parcel for a single-family household for the purposes of assesing individual-level SES. Within the process of matching to a parcel, an individual address could be merged with differing types and resolutions of data:
Within the process of matching to a parcel, an individual address could be merged with differing types and resolutions of data:

```mermaid
%%{init: { "fontFamily": "arial" } }%%
Expand All @@ -136,38 +115,74 @@ res -- multi-family dwelling --> lu("auditor land use type \n (e.g., two family
hc --> npm(not matched \nto a parcel):::tool
```

## Hamilton County Auditor Online Data
### Non-Residential Parcels

The `hamilton_online_parcels` TDR is created by linking a saved scraping of the [auditor's website](https://wedge1.hcauditor.org/) to the parcel identifiers in the `cagis_parcels` TDR.
Known non-residential addresses will be matched and returned with a special parcel identifer denoting that the matched parcel is non-residential; e.g., Cincinnati Children's Hospital Medical Center, Jobs and Family Services, Ronald McDonald House):

Similarly, the `hamilton_online_parcel` TDR is created using the R scripts in `/inst` and stored within the package. It can be loaded using {[`codec`](https://geomarker.io/codec)}:
```{r}
get_parcel_data(
c("222 E Central Parkway Cincinnati Ohio 45220",
"3333 Burnet Ave Cincinnati Ohio 45219",
"3333 Burnet Avenue Cincinnati Ohio 45219",
"350 Erkenbrecher Ave Cincinnati Ohio 45219")
) |>
dplyr::select(input_address, parcel_id)
```

### Condominiums

Because "second line" address components (e.g., "Unit 2B") are not captured, a single address can refer to multiple parcels in the case of condos or otherwise shared building ownership. For example, the address "323 Fifth St" has six distinct `parcel_id`s, each with different home values and land uses:

|parcel_id | market_total_value|land_use |
|:-----------|------------------:|:---------------------------|
|14500010321 | 397500|condominium unit |
|14500010317 | 123000|condominium office building |
|14500010320 | 180000|condominium unit |
|14500010319 | 255000|condominium unit |
|14500010322 | 388230|condominium unit |
|14500010318 | 239500|condominium unit |

In this case, a special parcel identifier `TIED_MATCH` is returned to denote that the address matched more than one parcel:

```{r}
d_online <- codec::read_tdr_csv(fs::path_package("parcel", "hamilton_online_parcels"))
get_parcel_data("323 Fifth St W Cincinnati OH 45202")$parcel_id
```

head(d_online)
### Large Apartment Complexes

Large apartment complexes often use multiple mailing addresses that are not the same as the parcel address(es). In these special cases, `link_apt()` is used to match addresses exactly based on their street name if the street number falls within a certain range:

```{r}
str(parcel:::apt_defs)
```

## CAGIS Parcels Data

The `cagis_parcels` tabular data resource (TDR) is created using the R scripts in `/inst` and stored within the package. It can be loaded using {[`codec`](https://geomarker.io/codec)}:

```{r}
d_parcel <- codec::read_tdr_csv(fs::path_package("parcel", "cagis_parcels"))
head(d_parcel)
# without codec:
# read.csv(fs::path_package("parcel", "hamilton_online_parcels"))
# read.csv(fs::path_package("parcel", "cagis_parcels"))
```

```{r}
#| results: asis
options(knitr.kable.NA = '')
codec::glimpse_attr(d_online) |>
codec::glimpse_attr(d_parcel) |>
knitr::kable()
```

```{r}
#| results: asis
options(knitr.kable.NA = '')
codec::glimpse_schema(d_online) |>
codec::glimpse_schema(d_parcel) |>
knitr::kable()
```


## Inclusion/Exclusion Criteria for Parcel Data

Auditor parcel-level data were excluded if they (1) did not contain a parcel identifier, (2) did not contain a property address number/name, or (3) had a duplicated parcel identifier.

Parcels with the following land use categories are included in the data resource and others are excluded. These were selected to reflect *residential* usages of parcels.
Expand All @@ -182,16 +197,35 @@ d_parcel |>
knitr::kable()
```

## Non-Residential Parcels
Some of the parcel characteristics do not make sense in certain contexts and should not be interpreted incorrectly; for example, the value of a parcel for a multi-family or multi-unit housing structure shouldn't be compared to the value of a parcel for a single-family household for the purposes of assesing individual-level SES.

Known non-residential addresses will be matched and returned with a special parcel identifer denoting that the matched parcel is non-residential; e.g., Cincinnati Children's Hospital Medical Center, Jobs and Family Services, Ronald McDonald House):
## Hamilton County Auditor Online Data

The `hamilton_online_parcels` TDR is created by linking a saved scraping of the [auditor's website](https://wedge1.hcauditor.org/) to the parcel identifiers in the `cagis_parcels` TDR.

Similarly, the `hamilton_online_parcel` TDR is created using the R scripts in `/inst` and stored within the package. It can be loaded using {[`codec`](https://geomarker.io/codec)}:

```{r}
d_online <- codec::read_tdr_csv(fs::path_package("parcel", "hamilton_online_parcels"))
head(d_online)
# without codec:
# read.csv(fs::path_package("parcel", "hamilton_online_parcels"))
```

```{r}
#| results: asis
options(knitr.kable.NA = '')
codec::glimpse_attr(d_online) |>
knitr::kable()
```

```{r}
c("222 E Central Parkway Cincinnati Ohio 45220",
"3333 Burnet Ave Cincinnati Ohio 45219",
"3333 Burnet Avenue Cincinnati Ohio 45219",
"350 Erkenbrecher Ave Cincinnati Ohio 45219") |>
get_parcel_data()
#| results: asis
options(knitr.kable.NA = '')
codec::glimpse_schema(d_online) |>
knitr::kable()
```

## Estimating the number of households per parcel
Expand Down Expand Up @@ -223,23 +257,3 @@ Certain calculations needs to be weighted by households instead of parcel; e.g.
|other residential structure |0|
|boataminium |0|

## Identifiers for Parcels and Properties

A `parcel_id` refers to the Hamilton County Auditor's "Parcel Number", which is referred to as the "Property Number" within the CAGIS Open Data and uniquely identifies properties. In rare cases, multple addresses can share the same parcel boundaries, but have unique `parcel_id`s and in these cases, their resulting centroid coordinates would also be identical.

Because "second line" address components (e.g., "Unit 2B") are not captured, a single address can refer to multiple parcels in the case of condos or otherwise shared building ownership. For example, the address "323 Fifth St" has six distinct `parcel_id`s, each with different home values and land uses:

|parcel_id | market_total_value|land_use |
|:-----------|------------------:|:---------------------------|
|14500010321 | 397500|condominium unit |
|14500010317 | 123000|condominium office building |
|14500010320 | 180000|condominium unit |
|14500010319 | 255000|condominium unit |
|14500010322 | 388230|condominium unit |
|14500010318 | 239500|condominium unit |

In this case, a special parcel identifier `TIED_MATCH` is returned to denote that the address matched more than one parcel:

```{r}
get_parcel_data("323 Fifth St W Cincinnati OH 45202")
```
Loading

0 comments on commit b0bb4bc

Please sign in to comment.