This repo has been replaced by the new pudl-archiver repo, which combines both the scraping andd archiving process.
We recommend using conda to create and manage your environment.
Run:
conda env create -f environment.yml
conda activate pudl-scrapers
Logs are collected:
[your home]/Downloads/pudl_scrapers/scraped/
Data from the scrapers is stored:
[your home]/Downloads/pudl_scrapers/scraped/[source_name]/[today #]
The general pattern is scrapy crawl [source_name]
for one of the supported
sources. Typically and additional "year" argument is available, in the form
scrapy crawl [source_name] -a year=[year]
.
See below for exact commands and available arguments.
scrapy crawl censusdp1tract
No other options.
For full instructions:
epacems --help
eia_bulk_elec
No other options.
To collect the data and field descriptions:
scrapy crawl epacamd_eia
To collect all the data:
scrapy crawl eia860
To collect a specific year (eg, 2007):
scrapy crawl eia860 -a year=2007
To collect all the data:
scrapy crawl eia860m
To collect a specific month & year (eg, August 2020):
scrapy crawl eia860 -a month=August -a year=2020
To collect all the data:
scrapy crawl eia861
To collect a specific year (eg, 2007):
scrapy crawl eia861 -a year=2007
To collect all the data:
scrapy crawl eia923
To collect a specific year (eg, 2007):
scrapy crawl eia923 -a year=2007
To collect all the data:
scrapy crawl ferc1
scrapy crawl ferc2
scrapy crawl ferc6
scrapy crawl ferc60
There are no subsets enabled.
To collect the data:
scrapy crawl ferc714
There are no subsets, that's it.