The data this week comes from the Alcohol and Tobacco Tax and Trade Bureau (TTB). H/t to Bart Watson for sharing the source of the data.
There's a literal treasure trove of data here:
- State-level beer production by year (2008-2019)
- Number of brewers by production size by year (2008-2019)
- Monthly beer stats aggregated across the US (2008-2019)
Some considerations:
- A barrel of beer for this data is 31 gallons
- Most data is in barrels removed/taxed or produced
- Removals = "Total barrels removed subject to tax by the breweries comprising the named strata of data", essentially how much was produced and removed for consumption.
- A LOT of data came from PDFs - I included all the code I used to grab data and tidy it up, take a peek and try out your own mechanism for getting the tables out.
Massive shoutout to pdftools
by ROpenSci and stringr
for doing a lot of heavy lifting with the datacleaning and prep here.
# Get the Data
brewing_materials <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewing_materials.csv')
beer_taxed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_taxed.csv')
brewer_size <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv')
beer_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/beer_states.csv')
# Or read in with tidytuesdayR package (https://github.com/dslc-io/tidytuesdayR)
# PLEASE NOTE TO USE 2020 DATA YOU NEED TO USE tidytuesdayR version ? from GitHub
# Either ISO-8601 date or year/week works!
# Install via pak::pak("dslc-io/tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2020-03-31')
tuesdata <- tidytuesdayR::tt_load(2020, week = 14)
brewing_materials <- tuesdata$brewing_materials
variable | class | description |
---|---|---|
data_type | character | Pounds of Material - this is a sub-table from beer_taxed |
material_type | character | Grain product, Totals, Non-Grain Product (basically hops vs grains) |
year | double | Year |
month | integer | Month |
type | character | Actual line-item from material type |
month_current | double | Current number of barrels for this year/month |
month_prior_year | double | Prior year number of barrels for same month |
ytd_current | double | Cumulative year to date of current year |
ytd_prior_year | double | Cumulative year to date for prior year |
variable | class | description |
---|---|---|
state | character | State abbreviated |
year | integer | Year |
barrels | double | Barrels produced within each type |
type | character | Type of production/use (On premise, Bottles/Cans, Kegs/Barrels) |
variable | class | description |
---|---|---|
data_type | character | Barrels Produced |
tax_status | character | The Tax Status, factor with Totals, Taxable, Sub Total Taxable, Tax Free, Sub Total Tax-Free |
year | double | Year |
month | integer | Month |
type | character | Type of production, either Total Production (Production) or specific sub-category and sub-totals |
month_current | double | Current number of barrels for this year/month |
month_prior_year | double | Prior year number of barrels for same month |
ytd_current | double | Cumulative year to date of current year |
ytd_prior_year | double | Cumulative year to date for prior year |
variable | class | description |
---|---|---|
year | integer | Year |
brewer_size | character | Range of production for brewer size, number of barrels produced |
n_of_brewers | double | Number of brewers at that brewer size |
total_barrels | double | Total barrels of beer produced at that brewer size |
taxable_removals | double | Taxable barrels for removals - removals for consumption under taxation |
total_shipped | double | Total barrels shipped - produced beer that is not taxed |
Please see download script via download_beer.R
Please see cleaning scripts via scrape_beers.R