PUDL v2023.12.01 Data Release #3152
zaneselvans
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Whew, it's been almost an entire year, but we've just put out a new PUDL data release!
Who should use PUDL v2023.12.01?
pudl.sqlite
database generated by our nightly builds) and you do not want to migrate to the new database structure immediately, you should switch to using this data release for the time being.dev
in the next couple of weeks and the database structure changes, it will probably break whatever you've built on top of the old database structure.PudlTabl
object remains available, but is now deprecated, and will be removed some time in 2024. The class now just queries the database rather than constructing dataframes itself. If you rely only on the software output layer, it will continue working after the database rename happens, but no new database tables will be made available via thePudlTabl
output class. If you need to continue using this software interface for the time being, you should install catalystcoop.pudl v2023.12.1 which corresponds to the version of the software which was used to create this release, or checkout thev2023.12.01
tag from the PUDL repository and install it locally.Who should not use PUDL v2023.12.01?
What's New?
Data only and lots of it!
pudl.sqlite
which is now about 12GB (previously it was about 1GB). As a result, we've started compressing the SQLite files for distribution usinggzip
, which shrinks them by about 75%.For an exhaustive list of the tables available in the database in this release, see the PUDL Data Dictionary for v2023.12.01
New data:
Some of the newly available data sources & tables in the
pudl.sqlite
database include:Derived tables that are directly available in the database which you may not have seen before if you weren't using the software layer:
Other databases:
We are also distributing SQLite versions of several FERC data sources previously only available as Visual FoxPro DBF files or XBRL, including:
These relatively unprocessed data sources are nonetheless far more accessible as SQLite databases than in their originally published formats.
We are treating PUDL as an application not a library
catalystcoop.pudl
software package, for now we are doing software releases alongside any data release, to archive the software that was used to produce the data release.How do I access this data release?
There are several distribution points for our data releases. Which one is appropriate for you will depend on your use case:
10.5281/zenodo.10275052
v2023.12.01
data release corresponds to v9 of the Kaggle dataset.s3://pudl.catalyst.coop/v2023.12.01
gs://pudl.catalyst.coop/v2023.12.01
Where's the code?
The software used to produce this data release can be obtained in several ways:
v2023.12.01
release tag.pip install catalystcoop.pudl==2023.12.1
10.5281/zenodo.10266492
Beta Was this translation helpful? Give feedback.
All reactions