-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use serial update for each upgrades data frame #186
Conversation
…into rHorsey/sampling-v2
assert isinstance(self.monthly_data, pl.LazyFrame) | ||
|
||
# self.data = pl.concat(annual_dfs_to_concat, join='inner', ignore_index=True) | ||
common_columns = set(annual_dfs_to_concat[0].columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my approach to implement the join='inner'
, find all the shared columns and select them out.
80f16af
to
3a0c12b
Compare
@@ -2571,7 +2643,7 @@ def export_data_and_enumeration_dictionary(self): | |||
col_enums = [] | |||
if col_def['data_type'] == 'string': | |||
str_enums = [] | |||
for enum in self.data.select(col).unique().to_series().to_list(): | |||
for enum in self.data.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code isn't looping through column names, it's getting the unique set of values across all rows within the column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyikuang let's have a discussion about this, I'm not convinced that the scaling weights are working as expected.
|
||
self.add_metadata_index_col(upgradIdcount) | ||
self.get_comstock_unscaled_monthly_energy_consumption() | ||
self.add_weighted_energy_savings_columns() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method requires the self.BLDG_WEIGHT
column to exist...but that column is not added until self.add_national_scaling_weights()
is called. How is the code working now?
|
||
color_map = {'Baseline': self.COLOR_COMSTOCK_BEFORE, upgrade_name: self.COLOR_COMSTOCK_AFTER} | ||
df_upgrade = df_upgrade.collect().to_pandas() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we really collect the dataframe at this point without first downselecting to only the columns actually used in the comparison plots? This needs to be tested with a full-sized dataset to make sure the memory usage is reasonable. Otherwise, it seems like it should be kept as a pl.LazyFrame and only collected inside the plotting functions.
@@ -0,0 +1,23 @@ | |||
FUNC __main__.main 227.7188 1717631807.5411 254.3594 1717631808.2755 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this file, seems like accidentally committed from memory profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
2742465
to
36c1893
Compare
…comstock comparison.
This now also includes sampling v2. I'm pulling in changes directly here. |
Input side of sampling v2
@ChristopherCaradonna This is the branch to test the 10k with! I haven't merged main into this however, so I'm unsure if there are conflicts esp w/ options lookup. |
Big (and somewhat breaking) merge to speed work towards Nov EUSS Release with new sampling and much better postprocessing. Huge thanks @wenyikuang !!! |
Pull request overview
This PR is intended to fix the #149 .
How do I test it works?
With
cycle3_euss_10k_df_2
dataset and there are 3 upgrades. I confirmed the results in output parquetsare matching without the change.
With
cycle3_euss_full_350k_combined
dataset and there are 32 upgrades. I could finished the run in my local laptop.This pull request makes changes to (select all the apply):
Author pull request checklist:
comstock_column_definitions.csv
options_lookup.tsv
files updated.yml
filesREADME.md
filestest/measure_tests.txt
or/andtest/resource_measure_tests.txt
Review Checklist
This will not be exhaustively relevant to every PR.
ComStock Licensing Language - Add to Beginning of Each Code File
TODO: