Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

EwoutH · 2023-10-29T14:14:08Z

This PR adds a new experiment evaluator to the EMAworkbench, the MPIEvaluator. This evaluator allows experiments to be conducted on multi-node systems, including High-Performance Computers (HPC) such as DelftBlue. Internally, it uses the MPIPoolExecutor from mpi4py.futures.

Additionally, logging has been integrated to facilitate debugging and performance tracking in distributed setups. As a robustness measure, mocked tests have been added to ensure consistent behavior and they have been incorporated into the CI pipeline. This might help catch future breaking changes in mpi4py, such as with the upcoming 4.0 release (mpi4py/mpi4py#386).

This PR follows from the discussions in #266 and succeeds the development PR #292.

Conceptual design

1. MPIEvaluator Class

The MPIEvaluator class is at the main component of this design. Its primary role is to initiate a pool of workers across multiple nodes, evaluate experiments in parallel, and finalize resources when done.

Initialization:

It imports mpi4py only when instantiated, preventing unnecessary dependencies for users who do not use the MPIEvaluator.
The number of processes (nodes) is optionally accepted during initialization.
The MPI pool of workers is started, with a warning given if the number of workers is low (indicating that the evaluator might be slower than its sequential or multiprocessing counterparts).

Evaluation:

Experiments are first packed with the necessary information for processing across nodes, including the model name and the experiment details.
- Note that currently the model is included in this package. This simplifies the implementation substantially, but with larger models there might be potential for performance gains if the model isn't send with each experiment, but just once to each worker.
Experiments are then dispatched to worker nodes for parallel processing using MPIPoolExecutor.map().
Once all experiments are done, outcomes are passed to a callback for post-processing.
- Note: Models using a lot of memory could run out of memory before the (single) Callback. A new streaming-to-disk Callback class could help allow for models that gather data that exceeds the memory size.

Finalization:

The MPI pool of workers is shut down.

2. run_experiment_mpi Function

This helper function is designed to unpack experiment data, set up the necessary logging configurations, run the experiment on the designated MPI rank (node), and return the results. This is the worker function that runs on each of the MPI ranks.

Logging:

Logging configurations are set up based on the level passed during experiment packing. This ensures uniformity in logging verbosity across nodes.
Messages include MPI rank details for easier debugging.

3. Logging Enhancements

A dedicated logger for the MPIEvaluator was introduced to provide clarity during debugging and performance tracking. Several measures were taken to ensure uniform logging verbosity across nodes and improve log readability:

The MPI process name and rank is displayed alongside the log level.
An optional flag to adjust root logger levels was introduced, ensuring uniformity across different modules.
- pass_root_logger_level argument has be added to ema_logging.log_to_stderr. This ensures that the root logger level is passed to all modules, so that they will log identical levels. Example:
```
ema_logging.log_to_stderr(level=20, pass_root_logger_level=True)
```

PR structure

This PR is structured in five commits:

MPIEvaluator for HPC Systems (0bc9e15)
- Purpose: To extend the capabilities of the EMAworkbench to multi-node HPC systems.
- Changes:
  - Introduced the MPIEvaluator class.
  - Added an initialization function to set up the global ExperimentRunner for worker processes.
  - Included proper handling for packing and unpacking experiments for efficient data transfer between nodes.
- Dependencies: While the addition leverages the mpi4py library, it's necessary to note that the dependency on mpi4py is only when the MPIEvaluator is utilized, thus not imposing unnecessary packages on other users.
Enhanced Logging for MPIEvaluator (59a2b7a)
- Purpose: To provide clear and detailed logs for debugging and performance tracking in distributed environments.
- Changes:
  - Set up a dedicated logger for the MPIEvaluator.
  - Ensured uniform logging verbosity across nodes by passing the logger's level to each worker process.
  - Introduced log messages for tracking progress on individual MPI ranks.
  - Refined log format for better readability by displaying the MPI process name alongside the log level.
Integration of MPIEvaluator Tests into CI (f51c29f)
- Purpose: To ensure the reliable functioning of the MPIEvaluator through continuous integration testing.
- Changes:
  - Incorporated the MPIEvaluator into the test suite using mock tests simulating its interaction with mpi4py.
  - Enriched the CI pipeline (.github/workflows/ci.yml) to encompass MPI testing, specifically for Ubuntu with Python 3.10.
  - Included conditional logic to skip MPI tests when not on Linux platforms or in the absence of mpi4py.
- Importance: With the upcoming mpi4py 4.0 release, potential breaking changes can be caught early through these mocked tests.
- Note: It's imperative to understand that these tests focus on the MPIEvaluator logic and its interactions, and do not delve into the actual internal workings of MPI.

However, a global initializer had issues with re-initializing the MPIEvaluator pool, where the second attempt would consistently throw a 'BrokenExecutor: cannot run initializer' error. This behavior was particularly evident when invoking the MPIEvaluator consecutively in a sequence.

After reproducing the issue with simplified examples and confirming its origin, the most robust approach to address this was to eliminate the common initializer function from the MPIEvaluator. This is done in dff46cd. Since the initializer also contained the logger configuration, that part was restored in f67b194. These commits are kept separate to provide insight in the development process and design considerations of the MPIEvaluator.

Refinement of MPIEvaluator Initialization and Experiment Handling (dff46cd)
- Purpose: To streamline the initialization and experiment execution process within the MPIEvaluator.
- Changes:
  - Removed the global ExperimentRunner and associated initializer function.
  - Adjusted the MPIEvaluator constructor to optionally accept the number of processes (n_processes).
  - Simplified the experiment packing process by including only the model name and the experiment itself.
  - Introduced an ExperimentRunner instantiation within the run_experiment_mpi function to handle experiments.
Logging Configuration Enhancement (f67b194)
- Purpose: To ensure consistent logging levels across all MPI processes.
- Changes:
  - Modified the experiment packing process to include the effective logging level.
  - Updated the run_experiment_mpi function to configure logging based on the passed level, ensuring uniformity across all worker processes.

Technical changes per file

CI Configuration (ci.yml):
- Added an MPI testing flag to the matrix build.
- Defined steps to set up necessary MPI libraries and the mpi4py package.
EMAworkbench Initialization Files:
- Imported and initialized the MPIEvaluator.
Evaluator Logic Enhancements (evaluators.py):
- Defined global ExperimentRunner for worker processes.
- Added the MPIEvaluator class with its initialization, finalization, and experiment evaluation logic.
- Implemented logic to handle experiments in an MPI environment.
Logging Improvements (ema_logging.py):
- Introduced an optional flag to adjust log levels for the root logger.
Test Enhancements (test_evaluators.py):
- Incorporated mocked tests for the MPIEvaluator, conditional on the availability of mpi4py and a Linux environment.

Logging

Logging is a big part of this PR. Being able to debug failures and errors on HPC systems effectively is important, because the iteration speed is on these systems if often low (since you have to queue jobs).

This is how the current logs work for a simple model example run:

INFO level (20)

[MainProcess/INFO] MPI pool started with 9 workers
[MainProcess/WARNING] With only a few workers (9), the MPIEvaluator may be slower than the Sequential- or MultiprocessingEvaluator
[MainProcess/INFO] performing 25 scenarios * 1 policies * 1 model(s) = 25 experiments
  0%|                                                   | 0/25 [00:00<?, ?it/s][MainProcess/INFO] MPIEvaluator: Starting 25 experiments using MPI pool with 9 workers
[MainProcess/INFO] MPIEvaluator: Completed all 25 experiments
100%|██████████████████████████████████████████| 25/25 [00:00<00:00, 39.04it/s]
[MainProcess/INFO] MPIEvaluator: Callback completed for all 25 experiments
[MainProcess/INFO] experiments finished
[MainProcess/INFO] MPI pool has been shut down

DEBUG level

[MainProcess/INFO] MPI pool started with 9 workers
[MainProcess/WARNING] With only a few workers (9), the MPIEvaluator may be slower than the Sequential- or MultiprocessingEvaluator
[MainProcess/INFO] performing 25 scenarios * 1 policies * 1 model(s) = 25 experiments
  0%|                                                   | 0/25 [00:00<?, ?it/s][MainProcess/INFO] MPIEvaluator: Starting 25 experiments using MPI pool with 9 workers
[MainProcess/INFO] MPIEvaluator: Completed all 25 experiments
[MainProcess/DEBUG] MPI Rank 1: starting Experiment(name='simpleModel None 0', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 2.0829145226976835, 'x2': -0.007878333224039757, 'x3': -0.0009740476752629831}), experiment_id=0)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 2: starting Experiment(name='simpleModel None 1', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 0.6097172301360301, 'x2': -0.007528201393234191, 'x3': 0.007709262322100828}), experiment_id=1)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 0 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] running scenario 1 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 5: starting Experiment(name='simpleModel None 4', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 6.266259219423759, 'x2': -0.00337617120143388, 'x3': -0.008074954689949926}), experiment_id=4)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 7: starting Experiment(name='simpleModel None 6', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 1.6937743192292463, 'x2': -0.004134864569602595, 'x3': 0.0011194805732659095}), experiment_id=6)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 4 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] running scenario 6 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: starting Experiment(name='simpleModel None 5', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 4.304935288926784, 'x2': 0.00061753325715346, 'x3': -0.0030157106915799856}), experiment_id=5)
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] running scenario 5 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] MPI Rank 9: starting Experiment(name='simpleModel None 8', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 9.958653269070073, 'x2': -0.002413434028158994, 'x3': 0.0023440265846300205}), experiment_id=8)
[MainProcess/DEBUG] MPI Rank 4: starting Experiment(name='simpleModel None 3', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 1.1378714219956574, 'x2': 0.006880507077623674, 'x3': 0.005504065805926132}), experiment_id=3)
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] MPI Rank 8: starting Experiment(name='simpleModel None 7', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 0.1727896233646382, 'x2': -0.000982020382601088, 'x3': 0.003327952320623955}), experiment_id=7)
[MainProcess/DEBUG] MPI Rank 3: starting Experiment(name='simpleModel None 2', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 8.7962163908074, 'x2': -0.009981449292094727, 'x3': -0.0038280997193846098}), experiment_id=2)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] running scenario 3 for policy None on model simpleModel
[MainProcess/DEBUG] running scenario 8 for policy None on model simpleModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 7 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] running scenario 2 for policy None on model simpleModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: completed Experiment 0 (model: simpleModel, policy: None, scenario: 0)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 2: completed Experiment 1 (model: simpleModel, policy: None, scenario: 1)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 6: completed Experiment 5 (model: simpleModel, policy: None, scenario: 5)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 7: completed Experiment 6 (model: simpleModel, policy: None, scenario: 6)
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 5: completed Experiment 4 (model: simpleModel, policy: None, scenario: 4)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 8: completed Experiment 7 (model: simpleModel, policy: None, scenario: 7)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 3: completed Experiment 2 (model: simpleModel, policy: None, scenario: 2)
[MainProcess/DEBUG] MPI Rank 6: starting Experiment(name='simpleModel None 12', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 1.635146092974492, 'x2': -0.0062798465407033904, 'x3': -0.008594200709001644}), experiment_id=12)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 12 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 9: completed Experiment 8 (model: simpleModel, policy: None, scenario: 8)
[MainProcess/DEBUG] MPI Rank 1: starting Experiment(name='simpleModel None 9', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 3.9784232294826705, 'x2': 0.008577070681243982, 'x3': 0.00015405200963791027}), experiment_id=9)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 9 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 4: completed Experiment 3 (model: simpleModel, policy: None, scenario: 3)
[MainProcess/DEBUG] MPI Rank 4: starting Experiment(name='simpleModel None 11', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 3.192582283926509, 'x2': 0.0048337454591713055, 'x3': 0.0039418271321421654}), experiment_id=11)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 11 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 5: starting Experiment(name='simpleModel None 16', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 7.936943878171319, 'x2': 0.004366194761901594, 'x3': 0.001489162880869744}), experiment_id=16)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 16 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] MPI Rank 3: starting Experiment(name='simpleModel None 14', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 9.321454486239302, 'x2': 0.001562762240739304, 'x3': 0.008992833791031921}), experiment_id=14)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 14 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 2: starting Experiment(name='simpleModel None 10', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 3.5394552051248, 'x2': 0.006780661665895806, 'x3': -0.006355045440043552}), experiment_id=10)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 10 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
  4%|█▋                                         | 1/25 [00:00<00:05,  4.30it/s][MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 7: starting Experiment(name='simpleModel None 15', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 6.505963595145575, 'x2': 0.009669568131733563, 'x3': 0.009852481211068803}), experiment_id=15)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 15 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 8: starting Experiment(name='simpleModel None 13', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 8.15536364639159, 'x2': 0.0020707297931449892, 'x3': 0.0067288601313206745}), experiment_id=13)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 13 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 9: starting Experiment(name='simpleModel None 17', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 4.878475541405726, 'x2': 0.007665523368552666, 'x3': 0.007371176206485841}), experiment_id=17)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 17 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: completed Experiment 9 (model: simpleModel, policy: None, scenario: 9)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 4: completed Experiment 11 (model: simpleModel, policy: None, scenario: 11)
[MainProcess/DEBUG] MPI Rank 4: starting Experiment(name='simpleModel None 19', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 9.17134521547357, 'x2': -0.008524107715765134, 'x3': -0.005158661354858541}), experiment_id=19)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: starting Experiment(name='simpleModel None 18', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 5.416240399907894, 'x2': -0.0012159298484620846, 'x3': -0.009228811770848973}), experiment_id=18)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 18 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 5: completed Experiment 16 (model: simpleModel, policy: None, scenario: 16)
 40%|████████████████▊                         | 10/25 [00:00<00:00, 27.30it/s][MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 19 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: completed Experiment 12 (model: simpleModel, policy: None, scenario: 12)
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 8: completed Experiment 13 (model: simpleModel, policy: None, scenario: 13)
[MainProcess/DEBUG] MPI Rank 7: completed Experiment 15 (model: simpleModel, policy: None, scenario: 15)
[MainProcess/DEBUG] MPI Rank 7: starting Experiment(name='simpleModel None 23', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 2.552098964515078, 'x2': 0.00024019172488336585, 'x3': -0.0020777525992340135}), experiment_id=23)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 23 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: starting Experiment(name='simpleModel None 21', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 5.791919188785238, 'x2': -0.0056467729260636845, 'x3': -0.007365480953875257}), experiment_id=21)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 21 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 5: starting Experiment(name='simpleModel None 20', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 7.119138469408275, 'x2': 0.0033181138303110536, 'x3': 0.004893059686998311}), experiment_id=20)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 20 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 3: completed Experiment 14 (model: simpleModel, policy: None, scenario: 14)
[MainProcess/DEBUG] MPI Rank 8: starting Experiment(name='simpleModel None 22', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 4.698277666358151, 'x2': 0.005579333312602821, 'x3': -0.0017796670394285476}), experiment_id=22)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 2: completed Experiment 10 (model: simpleModel, policy: None, scenario: 10)
[MainProcess/DEBUG] running scenario 22 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 3: starting Experiment(name='simpleModel None 24', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 7.364100121292793, 'x2': -0.00503473305035102, 'x3': -0.0059506479282402}), experiment_id=24)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 24 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 9: completed Experiment 17 (model: simpleModel, policy: None, scenario: 17)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: completed Experiment 18 (model: simpleModel, policy: None, scenario: 18)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 4: completed Experiment 19 (model: simpleModel, policy: None, scenario: 19)
 76%|███████████████████████████████▉          | 19/25 [00:00<00:00, 34.68it/s][MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: completed Experiment 21 (model: simpleModel, policy: None, scenario: 21)
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 5: completed Experiment 20 (model: simpleModel, policy: None, scenario: 20)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 7: completed Experiment 23 (model: simpleModel, policy: None, scenario: 23)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 8: completed Experiment 22 (model: simpleModel, policy: None, scenario: 22)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 3: completed Experiment 24 (model: simpleModel, policy: None, scenario: 24)
100%|██████████████████████████████████████████| 25/25 [00:00<00:00, 38.94it/s]
[MainProcess/INFO] MPIEvaluator: Callback completed for all 25 experiments
[MainProcess/INFO] experiments finished
[MainProcess/INFO] MPI pool has been shut down

Performance

Overhead

Divided by 10 cores:

Model	SequentialEvaluator	MultiprocessingEvaluator	MPIEvaluator
python	1	0.0135	0.0101
lake	1	0.4296	0.3037
flu	1	0.4041	0.3976

Undivided:

Model	SequentialEvaluator	MultiprocessingEvaluator	MPIEvaluator
python	1	0.135	0.101
lake	1	4.296	3.037
flu	1	4.041	3.976

Scaling

The first one used shared nodes, which shows inconsistent performance:

The second two experiments claimed full nodes exclusively. While normally this isn't best practice, it was useful to get insight in performance scaling.

Both models are relatively simple with large communication overhead. Testing the performance on a more compute intensive model would be interesting for future research. The MPIEvaluator-benchmarks branch can be used for this.

Documentation

A tutorial for the MPIEvaluator will be added in PR #308.

Limitations & future enhancements

There are two main limitations currently:

The MPIEvaluator is not tested with file-based models, such as NetLogo and Vensim. It might still work, but it's not tested. Originally this was in scope for this PR, but due to difficulties in creating the proper environment, this was cut out of the scope of this efford.
The model object is currently passed to the worker for each experiment. For large models with a relatively short runtime, this introduces significant performance overhead. Therefore, and optimization could be made to send the model only once to a worker on initialization.
- Building on this, submitting experiment parameter sets in batches could also help increate performance, instead of sending them to the workers one-by-one.

Some other future improvements could be:

A new Callback class could be implemented that streams to the disk instead of keeping all results in memory. This would allow for handling very large model that gather lots of data, probably at the costs of some performance. See issue Storing results and memory errors #304 for more details.
Further performance profiling could be done on the current design, to see any components can be sped up, like the distribution of experiments and models to workers, the logging or the.
- It would be interesting to see how larger models perform on many-node systems, and if the scaling is better than with the small lake and flu models.

Review

This PR is ready for review.

When merging, the preferred method would be fast-forward merge to keep individual commit messages and be able to revert one if necessary.

coveralls · 2023-10-29T14:25:04Z

coverage: 80.585% (-0.07%) from 80.655%
when pulling 65e0fc0 on MPIEvaluator
into c9049bb on master.

quaquel

Just a few minor things at the code level

ema_workbench/em_framework/evaluators.py

ema_workbench/util/ema_logging.py

EwoutH · 2023-10-29T20:04:09Z

Thanks for initial review! I updated the loglevel in (60f1f9c) and added example logs (on INFO and DEBUG level) to the PR.

Tomorrow I will be benchmarking and working on the documentation. If you have any specific requests (for models or scenarios to tests, or specific documentation to write), please let me know!

quaquel · 2023-10-30T06:41:14Z

It would be great to have a simple lake model example that can be run on an hpc. I guess this would need to take the form of a notebook because It needs to cover the python code and the batch script.

Also, not essential for this PR, but for my understanding of what functionality is available, have you run any tests with a FileModel or any of its subclasses (e.g., NetLogoModel).

EwoutH · 2023-10-31T15:03:01Z

It's good that I decided to do extensive performance testing, because I ran into a nasty bug that I likely wouldn't have noticed otherwise.

Somehow every time the second initialization of a MPIEvaluator broke down, with errors like:

"/mnt/c/Users/Ewout/Documents/GitHub/EMAworkbench/ema_workbench/em_framework/evaluators.py", line 450, in initialize
    if self._pool._max_workers <= 10:
TypeError: '<=' not supported between instances of 'NoneType' and 'int'
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

Which indicated no workers available. My first hypotheses was that the MPI pool wasn't properly shut down, so I tried about 138 different ways to shut it down.

Apparently, that wasn't the problem.

So I decided to break it down to it's simplest form, try to get it working, and then bisect between our then current, broken implementation and the working one.

The minimal working example I initially found was this:

import time
from mpi4py.futures import MPIPoolExecutor

def simple_task(x):
    time.sleep(0.2)
    return x * x

def main():
    # First use of the MPI Pool
    with MPIPoolExecutor() as pool:
        results = list(pool.map(simple_task, range(10)))
        print("First pool completed")

    # Explicitly try to shut it down (though it should be shut down by the context manager)
    pool.shutdown(wait=True)

    # Second use of the MPI Pool
    with MPIPoolExecutor() as pool:
        results = list(pool.map(simple_task, range(20)))
        print("Second pool completed")

if __name__ == "__main__":
    main()

After 6 iterations I settled on the maximum working model, having a MPIEvaluator class:

import time
from mpi4py.futures import MPIPoolExecutor

def simple_task(x):
    time.sleep(0.2)
    return x * x

class MPIEvaluator:
    def __init__(self, n_workers=None):
        self._pool = None
        self.n_workers = n_workers

    def __enter__(self):
        self._pool = MPIPoolExecutor(max_workers=self.n_workers)
        print(f"MPI pool started with {self._pool._max_workers} workers")
        return self._pool

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self._pool:
            self._pool.shutdown(wait=True)
            print("MPI pool has been shut down")
            self._pool = None

def main():
    # First use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results1 = list(pool.map(simple_task, range(10)))
        print("First pool completed")

    # Second use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results2 = list(pool.map(simple_task, range(20)))
        print("Second pool completed")

if __name__ == "__main__":
    main()

And the minimum breaking one:

import time
from mpi4py.futures import MPIPoolExecutor

def simple_task(x):
    time.sleep(0.2)
    return x * x

def common_initializer():
    # A basic initializer, doing almost nothing for now.
    pass

class MPIEvaluator:
    def __init__(self, n_workers=None):
        self._pool = None
        self.n_workers = n_workers

    def __enter__(self):
        self._pool = MPIPoolExecutor(max_workers=self.n_workers, initializer=common_initializer)
        print(f"MPI pool started with {self._pool._max_workers} workers")
        return self._pool

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self._pool:
            self._pool.shutdown(wait=True)
            print("MPI pool has been shut down")
            self._pool = None

def main():
    # First use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results1 = list(pool.map(simple_task, range(10)))
        print("First pool completed")

    # Second use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results2 = list(pool.map(simple_task, range(20)))
        print("Second pool completed")

if __name__ == "__main__":
    main()

The problem seemed to be the common, global initializer. So I refactored the MPIEvaluator in the workbench to don't need one, and that solved it instantly.

With this effort, the logger configuration has changed and now shows slightly different behaviour. I have to determine if that's desired behaviour of not.

The mocked tests still pass, so that's great I guess.

TL;DR: You couldn't use the MPIEvaluator multiple times successively in a single script. Now you can. And I know what a Singleton Pattern is. It was a fun day.

EwoutH · 2023-11-01T21:41:14Z

In e24cd08 I fixed multiple sequential calls (and probably properly shutting off in general), but broke logging. Logging was difficult, but not being allowed a global initializer made it even more challenging. In the end I found a somewhat elegant solution, implemented in e2ff35d.

I also added some performance scaling tests. Those required multiple runs in a machine-scalable fashion, so it made setting up a environment in DelftBlue also mode robust in the process (and improved my bash skills).

Performance graphs are in the main post. As expected, more complex models (flu) scale better than less complex/faster ones (lake model). All tested models have diminishing returns at some point.

All the code, figures and data for this is available on the MPIEvaluator-benchmarks branch.

EwoutH · 2023-11-02T19:22:02Z

Restructured the code, and can be reviewed. I decided to leave in 0783435 and 2eb434e as distinct commits, instead of squashing them, because it shows design considerations and restrictions made in the process.

Docs will follow in separate PR, because this one is large enough as is.

EwoutH · 2023-11-03T15:16:15Z

A tutorial for the MPIEvaluator will be added in #308.

EwoutH · 2023-11-10T11:22:16Z

@quaquel I know you're quite busy, but would you have time to review this sooner than later? Now the code and all the ideas behind it is still relatively fresh in my head, so I can make changes quickly without much overhead.

quaquel

This can be merged as long as we make clear it is experimental and somewhat WIP

EwoutH · 2023-11-15T19:31:25Z

Would

UserWarning: "The MPIEvaluator is experimental and may change without notice."

suffice?

quaquel · 2023-11-15T19:38:29Z

yes, in combination with my feedback on the tutorial in #308

Adds a new MPIEvaluator to the EMAworkbench, enabling experiments to be executed on multi-node High-Performance Computing (HPC) systems leveraging the mpi4py library. This evaluator optimizes performance for distributed computing environments by parallelizing experiments across multiple nodes and processors. Changes include: - Definition of the MPIEvaluator class. - Initialization function to set up the global ExperimentRunner for worker processes. - Proper handling to pack and unpack experiments for efficient data transfer between nodes. Note: This addition requires the mpi4py package only when the MPIEvaluator is explicitly used, preventing unnecessary dependencies for users not requiring this feature.

Introduced detailed logging capabilities for the MPIEvaluator to facilitate debugging and performance tracking in distributed environments. Key changes include: - Configured a logger specifically for the MPIEvaluator. - Passed logger's level to each worker process to ensure consistent logging verbosity across all nodes. - Added specific log messages to track the progress of experiments on individual MPI ranks. - Improved the log format to display the MPI process name alongside the log level, making it easier to identify logs from different nodes. - Modified `log_to_stderr` in `ema_logging` to adjust log levels for root logger based on an optional flag. With this enhancement, users can now get a clearer insight into the functioning and performance of the MPIEvaluator in HPC systems, helping in both development and operational phases.

Add mocked tests to the MPIEvaluator and include these in a single CI run 1. Integrated the MPIEvaluator into the test suite. This involves adding unit tests that ensure the new evaluator behaves as expected, with mocks simulating its interaction with `mpi4py`. 2. Enhanced the CI pipeline (in `.github/workflows/ci.yml`) to include MPI testing. This includes: - Adjustments to the matrix build, adding a configuration for testing with MPI on Ubuntu with Python 3.10. - Steps to install necessary MPI libraries and the `mpi4py` package. The MPI tests are designed to skip when run on non-Linux platforms or when `mpi4py` isn't available, ensuring compatibility with various testing environments. The use of mocking ensures that the MPIEvaluator logic is tested in isolation, focusing solely on its behavior and interaction with its dependencies, without the overhead or side effects of real MPI operations. This provides faster test execution and better control over the testing environment. mpi4py 4.0 will release at some point, if anything breaking is changed, these mocked tests might help catch that. Please note: These test don't cover actual (internal) MPI functionality and its integrations.

Addressed an issue where initializing the MPIEvaluator pool multiple times with a common initializer was causing a 'BrokenExecutor' error. Details: - Observed that using the MPIPoolExecutor twice in a row with an initializer function would lead to a 'BrokenExecutor: cannot run initializer' error on the second run. - Reproduced the issue with simplified examples to confirm that the problem was due to the initializer function in conjunction with MPIPoolExecutor. - Decided to remove the common initializer function from the MPIEvaluator to prevent this error. Changes: - Removed the global `experiment_runner` and the `mpi_initializer` function. - Modified the MPIEvaluator's `initialize` method to not use the initializer arguments. - Updated the `run_experiment_mpi` function to create the `ExperimentRunner` directly, ensuring each experiment execution has its fresh instance. Examples: Before: ```python with MPIEvaluator(model) as evaluator: results = evaluator.perform_experiments(scenarios=24) with MPIEvaluator(model) as evaluator: results2 = evaluator.perform_experiments(scenarios=48) This would fail on the second invocation with a 'BrokenExecutor' error. with MPIEvaluator(model) as evaluator: results = evaluator.perform_experiments(scenarios=24) with MPIEvaluator(model) as evaluator: results2 = evaluator.perform_experiments(scenarios=48) Now, both invocations run successfully without errors. TL;DR: By removing the common initializer, we have resolved the issue with re-initializing the MPIEvaluator pool. Users can now confidently use the MPIEvaluator multiple times in their workflows without encountering the 'BrokenExecutor' error.

Add a warning to the MPIEvaluator that it's still experimental and its interface and functionality might change in future releases. Feedback is welcome at: #311

EwoutH · 2023-11-15T21:22:53Z

Merged! It was quite a journey, happy that it's in. @quaquel, thanks for all the help along the way!

The SEN1211 students will be developing quite heavy (geospatial) ABM models, primarily in Mesa. Since it's pure Python it should work with this implementation, it could be an interesting test case for the MPIEvaluator!

I made a new discussion for feedback and future development: #311.

EwoutH added feature performance labels Oct 29, 2023

EwoutH requested a review from quaquel October 29, 2023 14:14

quaquel reviewed Oct 29, 2023

View reviewed changes

ema_workbench/em_framework/evaluators.py Outdated Show resolved Hide resolved

ema_workbench/em_framework/evaluators.py Outdated Show resolved Hide resolved

ema_workbench/util/ema_logging.py Show resolved Hide resolved

EwoutH mentioned this pull request Oct 29, 2023

em_framework: Improve log messages, warning and errors #300

Merged

quaquel mentioned this pull request Oct 31, 2023

Storing results and memory errors #304

Open

EwoutH force-pushed the MPIEvaluator branch 2 times, most recently from 7a5b073 to 2eb434e Compare November 2, 2023 19:19

EwoutH requested a review from quaquel November 2, 2023 19:20

EwoutH force-pushed the MPIEvaluator branch from 2eb434e to f67b194 Compare November 2, 2023 19:59

EwoutH mentioned this pull request Nov 3, 2023

Docs: Add MPIEvaluator tutorial for multi-node HPC systems, including DelftBlue #308

Merged

quaquel approved these changes Nov 15, 2023

View reviewed changes

EwoutH force-pushed the MPIEvaluator branch from 4360ba4 to a943910 Compare November 15, 2023 21:01

EwoutH added 4 commits November 15, 2023 22:06

EwoutH force-pushed the MPIEvaluator branch from a943910 to f5421b9 Compare November 15, 2023 21:07

evaluator: Add warning that MPIEvaluator is experimental, feedback link

65e0fc0

Add a warning to the MPIEvaluator that it's still experimental and its interface and functionality might change in future releases. Feedback is welcome at: #311

EwoutH force-pushed the MPIEvaluator branch from f5421b9 to 65e0fc0 Compare November 15, 2023 21:15

EwoutH merged commit 65e0fc0 into master Nov 15, 2023
18 checks passed

EwoutH mentioned this pull request Nov 15, 2023

Prototype of MPIEvaluator for multi-node workloads #292

Closed

quaquel deleted the MPIEvaluator branch December 4, 2023 17:51

EwoutH mentioned this pull request May 23, 2024

MPI evaluator tutorial crashes on pass_root_logger_level #358

Open

EwoutH mentioned this pull request Jul 28, 2024

Test and update MPIEvaluator against mpi4py 4.0 #366

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

EwoutH commented Oct 29, 2023 •

edited

Loading

coveralls commented Oct 29, 2023 •

edited

Loading

quaquel left a comment

EwoutH commented Oct 29, 2023 •

edited

Loading

quaquel commented Oct 30, 2023

EwoutH commented Oct 31, 2023 •

edited

Loading

EwoutH commented Nov 1, 2023

EwoutH commented Nov 2, 2023

EwoutH commented Nov 3, 2023

EwoutH commented Nov 10, 2023

quaquel left a comment

EwoutH commented Nov 15, 2023 •

edited

Loading

quaquel commented Nov 15, 2023

EwoutH commented Nov 15, 2023

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Conversation

EwoutH commented Oct 29, 2023 • edited Loading

Conceptual design

1. MPIEvaluator Class

2. run_experiment_mpi Function

3. Logging Enhancements

PR structure

Technical changes per file

Logging

Performance

Overhead

Scaling

Documentation

Limitations & future enhancements

Review

coveralls commented Oct 29, 2023 • edited Loading

quaquel left a comment

Choose a reason for hiding this comment

EwoutH commented Oct 29, 2023 • edited Loading

quaquel commented Oct 30, 2023

EwoutH commented Oct 31, 2023 • edited Loading

EwoutH commented Nov 1, 2023

EwoutH commented Nov 2, 2023

EwoutH commented Nov 3, 2023

EwoutH commented Nov 10, 2023

quaquel left a comment

Choose a reason for hiding this comment

EwoutH commented Nov 15, 2023 • edited Loading

quaquel commented Nov 15, 2023

EwoutH commented Nov 15, 2023

EwoutH commented Oct 29, 2023 •

edited

Loading

coveralls commented Oct 29, 2023 •

edited

Loading

EwoutH commented Oct 29, 2023 •

edited

Loading

EwoutH commented Oct 31, 2023 •

edited

Loading

EwoutH commented Nov 15, 2023 •

edited

Loading