Skip to content
Matthew Grinnell edited this page Dec 14, 2022 · 18 revisions

Overview

This document describes the steps to make the Pacific Herring data summaries. After you install the necessary programs and get the data, there are two main parts to creating the data summaries.

  1. Grab the data from the databases and create the tables, figures, and data for each stock assessment region (SAR). These outputs are used in the next step to make the data summary reports. In addition, this step creates input data files for the Pacific Herring statistical catch-age (SCA) models. This is done in the DataSummaries repo.

  2. Compile the data summary report as a PDF for each SAR. This is done in the Reports repo.

Another repository is required, HerringFunctions. Fork or clone these repositories on your machine to get started. Put them all in the same root folder, for example "C:/[path/to/repos]/". Note that in the following text, "./" is equivalent to "C:/[path/to/repos]/". You will need one more folder in this root called "Data" which is not under version control. Get the files and folders that go in the "Data" folder from someone (e.g., me). This folder is not under version control because it contains data that has privacy restrictions and it has some large files.

Thus you should have the following folders:

  1. "./DataSummaries",

  2. "./Reports",

  3. "./HerringFunctions", and

  4. "./Data".

Note: if you make the Pacific Herring stock assessment Science Response (SR) document, you should also have the herringsr repository in this same root folder (i.e., "./herringsr"). If this repository is missing, you may get an error because the data summaries update data files in "./herringsr/data" (e.g., see the variable srLoc in the file "Summary.R").

Once you have these folders, get set-up to make the data summaries. Note that the "set up" process can take a while, usually because of issues around MS Access drivers and 32- vs 64-bit programs. The main issue is that the databases are 32-bit but most programs are now 64-bit by default.

Set-up: programs and data

Get 32-bit MS Office

This is required to link R with the databases because the herring databases use 32-bit MS Access. To open them via R you need 32-bit drivers, which are included in 32-bit MS Office. Get 32-bit MS Office 365 from the Software center; you may need to request this from IT.

Get R-4.1.3 (32-bit)

Install R-4.3.1 on your machine. This is the last version to include a 32-bit version of R. A 32-bit version of R is required to link with the 32-bit MS Access databases. Use devtools::install_github() to load the pbs-assess packages and the SpawnIndex package. Alternatively, clone the repositories and then build them on your machine.

RStudio

I am able to run the data summaries using RStudio version 2022.07.1+554. You can tell RStudio to use 32-bit R in Tools > Global options > General > R version > Choose a specific version > Browse. Some versions of RStudio have issues connecting with the databases; if you are having issues, run the data summaries in the R Gui.

Get the databases

Get the Pacific Herring databases from "H:/" and copy them to "./Data/Local". Note that you will need to request permission to use this drive. Then open the frontend database (i.e., "HSA_Program_v*.mdb"), and link the backend databases: biosamples, catch, locations, and spawn. MS Access will give a message that the backends are not linked, and ask if you want to link them all automatically. Click "No", and then MS Access will ask you to find the backend in your folders. Find and link the respective backend databases on your machine, and click through the options (it can take a while).

Data summaries

Run the data summaries

Run the data summaries in the DataSummaries repo. Note: make a DataSummaries project in the root folder. This step links to the databases and grabs the data: biological data such as weight- and length-at-age, catch data, location information, as well as spawn data. It also calculates the spawn index (i.e., relative abundance) using the SpawnIndex package. Output includes tables ("*.tex"), figures ("*.png"), and data ("*.RData") that is used to create the report in the next step. In addition, it creates input data files for the SCA model ("*.dat"). Note that this output data is not on the GitHub repo; you have to run the data summary on your machine to do the next step. Finally, it creates some data files in the herringsr repository ("./herringsr/data/*.csv"). For all of these outputs, it is good practice to compare the updated and previous versions when you run the data summaries.

  1. Test run.

    1. Test run with the Haida Gwaii SAR (i.e., set region <- "HG" in "Summary.R").
    2. Source the file "Summary.R".
  2. Make the spawn animation.

    1. Set makeAnimation <- TRUE in "Summary.R".
    2. Source the file "Summary.R".
    3. Note that usually makeAnimation <- FALSE because it takes a while to make the animation. You only have to make the animation once when you update spawn data (if makeAnimation <- FALSE, the script uses the animation from the previous run, which is saved in "./DataSummaries/Animations".
  3. Production.

    1. Comment out the line rm(list = ls()) in "Summary.R".
    2. Set makeAnimation <- TRUE in "Summary.R".
    3. Source the file "RunSummaries.R.
    4. Note that "HG" is run twice so that the animation is correct. There is a weird bug that adds an empty first page to the animation in the first run; running HG twice fixes this.

Compile the reports

Compile the data summary reports in the Reports repo. Note: make a Reports project in the root folder. This step creates the PDFs that contain information from the previous step. Note that the default "year" used in the data summary report is the maximum value of yrRange from "Summary.R". Reports are not the same for each SAR: some SARs have additional tables and figures, and the text in some sections varies for each SAR. Thus, most of the report is generated automatically but there are a few sections that need to be written (by hand) each year. These are found in "./Reports/RegionalText/[Year]/[SAR]/". Look in the report to see where this custom text appears, and use the "*.tex" files in "./Reports/RegionalText/Templates" to get started.

Because we are no longer linking with the 32-bit MS Access databases, you can probably use RStudio and 64-bit R in this step. For help with debugging tinytex, check this help page.

  1. Test run.

    1. Test run with HG (i.e., set regName <- "HG" in "DataSummary.rnw").
    2. Click the "Compile PDF" button for "DataSummary.rnw".
    3. This will generate a PDF in "./Reports" called "DataSummary.pdf".
  2. Production.

    1. Source the file "MakeSummaries.R".
    2. This will generate one PDF for each SAR in the corresponding "./Reports/[Year]" folder.

Privacy

Do not add the new PDFs to the repository until they have been checked.

  1. Data (e.g., catch data) must be checked for privacy concerns.
  2. Content and text should be checked by Fish Management.
  3. The Reports repository is public, but you can control what goes on GitHub using the ".gitignore" file. For example, special area A10 does not go on the GitHub repo. Check the GitHub repository before you push if you are in doubt.
  4. Do not add PDFs to the GitHub repository until you are sure they are final because keeping PDFs in version control uses a lot of data. For example, add the line "[Year]/" to ".gitignore" until the PDFs are final, where "[Year]" is say, "2023" for the 2023 data summary reports (check the file to see an example).