Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve forecast etl performance #28

Merged
merged 6 commits into from
May 21, 2024

Conversation

amehta-scottlogic
Copy link
Collaborator

@amehta-scottlogic amehta-scottlogic commented May 20, 2024

Description

Performance was particularly slow because xarray lazily loads data from a dataset. We then try to access each dataset for each city which without eagerly loading is quite slow.

By explicitly calling load, we use more memory but performance improves. Introducing threads also significantly improves the runtime.

We now process all 153 in around 2 minutes instead of 10.

Default

15 cities per minutes

With eager load

25 cities per minute

With eager load and thread pool

76 cities per minute

Output

2024-05-20 17:07:46,450 - INFO - Finding data for 153 cities
2024-05-20 17:07:46,450 - INFO - Extracting pollutant forecast data
2024-05-20 17:07:46,462 - INFO - Loading data from CAMS to file single_level_2024-05-20_00.grib
2024-05-20 17:07:47,403 - INFO - Loading data from CAMS to file multi_level_2024-05-20_00.grib
2024-05-20 17:07:49,678 - INFO - Transforming forecast data
2024-05-20 17:09:00,256 - INFO - Persisting forecast data
2024-05-20 17:09:12,220 - INFO - 5049 documents upserted, 0 modified

Copy link

github-actions bot commented May 20, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
266 200 75% 0% 🟢

New Files

File Coverage Status
air-quality-backend/src/database/location.py 100% 🟢
TOTAL 100% 🟢

Modified Files

File Coverage Status
air-quality-backend/src/database/air_quality_dashboard_dao.py 0% 🟢
air-quality-backend/src/etl/forecast/forecast_adapter.py 100% 🟢
air-quality-backend/src/etl/forecast/forecast_dao.py 100% 🟢
air-quality-backend/src/etl/forecast/forecast_data.py 93% 🟢
TOTAL 73% 🟢

updated for commit: 69e77c2 by action🐍

Copy link
Collaborator

@mwalker-scottlogic mwalker-scottlogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, probably worth a dev review too

f"single_level_{model_base_date_str}_{model_base_time}.grib",
),
(
get_multi_level_request_body(model_base_date_str, model_base_time),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibility that creating new files each time this is run could cause issues for user of application?

@amehta-scottlogic amehta-scottlogic merged commit 2ba3a37 into main May 21, 2024
1 check passed
@amehta-scottlogic amehta-scottlogic deleted the feature/forecast-performance branch May 21, 2024 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants