Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Merged
merged 5 commits into from
Nov 15, 2023

Conversation

EwoutH
Copy link
Collaborator

@EwoutH EwoutH commented Oct 29, 2023

This PR adds a new experiment evaluator to the EMAworkbench, the MPIEvaluator. This evaluator allows experiments to be conducted on multi-node systems, including High-Performance Computers (HPC) such as DelftBlue. Internally, it uses the MPIPoolExecutor from mpi4py.futures.

Additionally, logging has been integrated to facilitate debugging and performance tracking in distributed setups. As a robustness measure, mocked tests have been added to ensure consistent behavior and they have been incorporated into the CI pipeline. This might help catch future breaking changes in mpi4py, such as with the upcoming 4.0 release (mpi4py/mpi4py#386).

This PR follows from the discussions in #266 and succeeds the development PR #292.

Conceptual design

1. MPIEvaluator Class

The MPIEvaluator class is at the main component of this design. Its primary role is to initiate a pool of workers across multiple nodes, evaluate experiments in parallel, and finalize resources when done.

Initialization:

  • It imports mpi4py only when instantiated, preventing unnecessary dependencies for users who do not use the MPIEvaluator.
  • The number of processes (nodes) is optionally accepted during initialization.
  • The MPI pool of workers is started, with a warning given if the number of workers is low (indicating that the evaluator might be slower than its sequential or multiprocessing counterparts).

Evaluation:

  • Experiments are first packed with the necessary information for processing across nodes, including the model name and the experiment details.
    • Note that currently the model is included in this package. This simplifies the implementation substantially, but with larger models there might be potential for performance gains if the model isn't send with each experiment, but just once to each worker.
  • Experiments are then dispatched to worker nodes for parallel processing using MPIPoolExecutor.map().
  • Once all experiments are done, outcomes are passed to a callback for post-processing.
    • Note: Models using a lot of memory could run out of memory before the (single) Callback. A new streaming-to-disk Callback class could help allow for models that gather data that exceeds the memory size.

Finalization:

  • The MPI pool of workers is shut down.

2. run_experiment_mpi Function

This helper function is designed to unpack experiment data, set up the necessary logging configurations, run the experiment on the designated MPI rank (node), and return the results. This is the worker function that runs on each of the MPI ranks.

Logging:

  • Logging configurations are set up based on the level passed during experiment packing. This ensures uniformity in logging verbosity across nodes.
  • Messages include MPI rank details for easier debugging.

3. Logging Enhancements

A dedicated logger for the MPIEvaluator was introduced to provide clarity during debugging and performance tracking. Several measures were taken to ensure uniform logging verbosity across nodes and improve log readability:

  • The MPI process name and rank is displayed alongside the log level.
  • An optional flag to adjust root logger levels was introduced, ensuring uniformity across different modules.
    • pass_root_logger_level argument has be added to ema_logging.log_to_stderr. This ensures that the root logger level is passed to all modules, so that they will log identical levels. Example:
      ema_logging.log_to_stderr(level=20, pass_root_logger_level=True)

PR structure

This PR is structured in five commits:

  1. MPIEvaluator for HPC Systems (0bc9e15)

    • Purpose: To extend the capabilities of the EMAworkbench to multi-node HPC systems.
    • Changes:
      • Introduced the MPIEvaluator class.
      • Added an initialization function to set up the global ExperimentRunner for worker processes.
      • Included proper handling for packing and unpacking experiments for efficient data transfer between nodes.
    • Dependencies: While the addition leverages the mpi4py library, it's necessary to note that the dependency on mpi4py is only when the MPIEvaluator is utilized, thus not imposing unnecessary packages on other users.
  2. Enhanced Logging for MPIEvaluator (59a2b7a)

    • Purpose: To provide clear and detailed logs for debugging and performance tracking in distributed environments.
    • Changes:
      • Set up a dedicated logger for the MPIEvaluator.
      • Ensured uniform logging verbosity across nodes by passing the logger's level to each worker process.
      • Introduced log messages for tracking progress on individual MPI ranks.
      • Refined log format for better readability by displaying the MPI process name alongside the log level.
  3. Integration of MPIEvaluator Tests into CI (f51c29f)

    • Purpose: To ensure the reliable functioning of the MPIEvaluator through continuous integration testing.
    • Changes:
      • Incorporated the MPIEvaluator into the test suite using mock tests simulating its interaction with mpi4py.
      • Enriched the CI pipeline (.github/workflows/ci.yml) to encompass MPI testing, specifically for Ubuntu with Python 3.10.
      • Included conditional logic to skip MPI tests when not on Linux platforms or in the absence of mpi4py.
    • Importance: With the upcoming mpi4py 4.0 release, potential breaking changes can be caught early through these mocked tests.
    • Note: It's imperative to understand that these tests focus on the MPIEvaluator logic and its interactions, and do not delve into the actual internal workings of MPI.

However, a global initializer had issues with re-initializing the MPIEvaluator pool, where the second attempt would consistently throw a 'BrokenExecutor: cannot run initializer' error. This behavior was particularly evident when invoking the MPIEvaluator consecutively in a sequence.

After reproducing the issue with simplified examples and confirming its origin, the most robust approach to address this was to eliminate the common initializer function from the MPIEvaluator. This is done in dff46cd. Since the initializer also contained the logger configuration, that part was restored in f67b194. These commits are kept separate to provide insight in the development process and design considerations of the MPIEvaluator.

  1. Refinement of MPIEvaluator Initialization and Experiment Handling (dff46cd)

    • Purpose: To streamline the initialization and experiment execution process within the MPIEvaluator.
    • Changes:
      • Removed the global ExperimentRunner and associated initializer function.
      • Adjusted the MPIEvaluator constructor to optionally accept the number of processes (n_processes).
      • Simplified the experiment packing process by including only the model name and the experiment itself.
      • Introduced an ExperimentRunner instantiation within the run_experiment_mpi function to handle experiments.
  2. Logging Configuration Enhancement (f67b194)

    • Purpose: To ensure consistent logging levels across all MPI processes.
    • Changes:
      • Modified the experiment packing process to include the effective logging level.
      • Updated the run_experiment_mpi function to configure logging based on the passed level, ensuring uniformity across all worker processes.

Technical changes per file

  1. CI Configuration (ci.yml):

    • Added an MPI testing flag to the matrix build.
    • Defined steps to set up necessary MPI libraries and the mpi4py package.
  2. EMAworkbench Initialization Files:

    • Imported and initialized the MPIEvaluator.
  3. Evaluator Logic Enhancements (evaluators.py):

    • Defined global ExperimentRunner for worker processes.
    • Added the MPIEvaluator class with its initialization, finalization, and experiment evaluation logic.
    • Implemented logic to handle experiments in an MPI environment.
  4. Logging Improvements (ema_logging.py):

    • Introduced an optional flag to adjust log levels for the root logger.
  5. Test Enhancements (test_evaluators.py):

    • Incorporated mocked tests for the MPIEvaluator, conditional on the availability of mpi4py and a Linux environment.

Logging

Logging is a big part of this PR. Being able to debug failures and errors on HPC systems effectively is important, because the iteration speed is on these systems if often low (since you have to queue jobs).

This is how the current logs work for a simple model example run:

INFO level (20)

[MainProcess/INFO] MPI pool started with 9 workers
[MainProcess/WARNING] With only a few workers (9), the MPIEvaluator may be slower than the Sequential- or MultiprocessingEvaluator
[MainProcess/INFO] performing 25 scenarios * 1 policies * 1 model(s) = 25 experiments
  0%|                                                   | 0/25 [00:00<?, ?it/s][MainProcess/INFO] MPIEvaluator: Starting 25 experiments using MPI pool with 9 workers
[MainProcess/INFO] MPIEvaluator: Completed all 25 experiments
100%|██████████████████████████████████████████| 25/25 [00:00<00:00, 39.04it/s]
[MainProcess/INFO] MPIEvaluator: Callback completed for all 25 experiments
[MainProcess/INFO] experiments finished
[MainProcess/INFO] MPI pool has been shut down
DEBUG level
[MainProcess/INFO] MPI pool started with 9 workers
[MainProcess/WARNING] With only a few workers (9), the MPIEvaluator may be slower than the Sequential- or MultiprocessingEvaluator
[MainProcess/INFO] performing 25 scenarios * 1 policies * 1 model(s) = 25 experiments
  0%|                                                   | 0/25 [00:00<?, ?it/s][MainProcess/INFO] MPIEvaluator: Starting 25 experiments using MPI pool with 9 workers
[MainProcess/INFO] MPIEvaluator: Completed all 25 experiments
[MainProcess/DEBUG] MPI Rank 1: starting Experiment(name='simpleModel None 0', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 2.0829145226976835, 'x2': -0.007878333224039757, 'x3': -0.0009740476752629831}), experiment_id=0)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 2: starting Experiment(name='simpleModel None 1', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 0.6097172301360301, 'x2': -0.007528201393234191, 'x3': 0.007709262322100828}), experiment_id=1)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 0 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] running scenario 1 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 5: starting Experiment(name='simpleModel None 4', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 6.266259219423759, 'x2': -0.00337617120143388, 'x3': -0.008074954689949926}), experiment_id=4)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 7: starting Experiment(name='simpleModel None 6', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 1.6937743192292463, 'x2': -0.004134864569602595, 'x3': 0.0011194805732659095}), experiment_id=6)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 4 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] running scenario 6 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: starting Experiment(name='simpleModel None 5', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 4.304935288926784, 'x2': 0.00061753325715346, 'x3': -0.0030157106915799856}), experiment_id=5)
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] running scenario 5 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] MPI Rank 9: starting Experiment(name='simpleModel None 8', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 9.958653269070073, 'x2': -0.002413434028158994, 'x3': 0.0023440265846300205}), experiment_id=8)
[MainProcess/DEBUG] MPI Rank 4: starting Experiment(name='simpleModel None 3', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 1.1378714219956574, 'x2': 0.006880507077623674, 'x3': 0.005504065805926132}), experiment_id=3)
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] MPI Rank 8: starting Experiment(name='simpleModel None 7', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 0.1727896233646382, 'x2': -0.000982020382601088, 'x3': 0.003327952320623955}), experiment_id=7)
[MainProcess/DEBUG] MPI Rank 3: starting Experiment(name='simpleModel None 2', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 8.7962163908074, 'x2': -0.009981449292094727, 'x3': -0.0038280997193846098}), experiment_id=2)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] running scenario 3 for policy None on model simpleModel
[MainProcess/DEBUG] running scenario 8 for policy None on model simpleModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 7 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] running scenario 2 for policy None on model simpleModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: completed Experiment 0 (model: simpleModel, policy: None, scenario: 0)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 2: completed Experiment 1 (model: simpleModel, policy: None, scenario: 1)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 6: completed Experiment 5 (model: simpleModel, policy: None, scenario: 5)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 7: completed Experiment 6 (model: simpleModel, policy: None, scenario: 6)
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 5: completed Experiment 4 (model: simpleModel, policy: None, scenario: 4)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 8: completed Experiment 7 (model: simpleModel, policy: None, scenario: 7)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 3: completed Experiment 2 (model: simpleModel, policy: None, scenario: 2)
[MainProcess/DEBUG] MPI Rank 6: starting Experiment(name='simpleModel None 12', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 1.635146092974492, 'x2': -0.0062798465407033904, 'x3': -0.008594200709001644}), experiment_id=12)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 12 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 9: completed Experiment 8 (model: simpleModel, policy: None, scenario: 8)
[MainProcess/DEBUG] MPI Rank 1: starting Experiment(name='simpleModel None 9', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 3.9784232294826705, 'x2': 0.008577070681243982, 'x3': 0.00015405200963791027}), experiment_id=9)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 9 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 4: completed Experiment 3 (model: simpleModel, policy: None, scenario: 3)
[MainProcess/DEBUG] MPI Rank 4: starting Experiment(name='simpleModel None 11', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 3.192582283926509, 'x2': 0.0048337454591713055, 'x3': 0.0039418271321421654}), experiment_id=11)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 11 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 5: starting Experiment(name='simpleModel None 16', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 7.936943878171319, 'x2': 0.004366194761901594, 'x3': 0.001489162880869744}), experiment_id=16)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 16 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] MPI Rank 3: starting Experiment(name='simpleModel None 14', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 9.321454486239302, 'x2': 0.001562762240739304, 'x3': 0.008992833791031921}), experiment_id=14)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 14 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 2: starting Experiment(name='simpleModel None 10', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 3.5394552051248, 'x2': 0.006780661665895806, 'x3': -0.006355045440043552}), experiment_id=10)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 10 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
  4%|█▋                                         | 1/25 [00:00<00:05,  4.30it/s][MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 7: starting Experiment(name='simpleModel None 15', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 6.505963595145575, 'x2': 0.009669568131733563, 'x3': 0.009852481211068803}), experiment_id=15)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 15 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 8: starting Experiment(name='simpleModel None 13', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 8.15536364639159, 'x2': 0.0020707297931449892, 'x3': 0.0067288601313206745}), experiment_id=13)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 13 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 9: starting Experiment(name='simpleModel None 17', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 4.878475541405726, 'x2': 0.007665523368552666, 'x3': 0.007371176206485841}), experiment_id=17)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 17 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: completed Experiment 9 (model: simpleModel, policy: None, scenario: 9)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 4: completed Experiment 11 (model: simpleModel, policy: None, scenario: 11)
[MainProcess/DEBUG] MPI Rank 4: starting Experiment(name='simpleModel None 19', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 9.17134521547357, 'x2': -0.008524107715765134, 'x3': -0.005158661354858541}), experiment_id=19)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: starting Experiment(name='simpleModel None 18', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 5.416240399907894, 'x2': -0.0012159298484620846, 'x3': -0.009228811770848973}), experiment_id=18)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 18 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 5: completed Experiment 16 (model: simpleModel, policy: None, scenario: 16)
 40%|████████████████▊                         | 10/25 [00:00<00:00, 27.30it/s][MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 19 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: completed Experiment 12 (model: simpleModel, policy: None, scenario: 12)
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 8: completed Experiment 13 (model: simpleModel, policy: None, scenario: 13)
[MainProcess/DEBUG] MPI Rank 7: completed Experiment 15 (model: simpleModel, policy: None, scenario: 15)
[MainProcess/DEBUG] MPI Rank 7: starting Experiment(name='simpleModel None 23', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 2.552098964515078, 'x2': 0.00024019172488336585, 'x3': -0.0020777525992340135}), experiment_id=23)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 23 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: starting Experiment(name='simpleModel None 21', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 5.791919188785238, 'x2': -0.0056467729260636845, 'x3': -0.007365480953875257}), experiment_id=21)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 21 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 5: starting Experiment(name='simpleModel None 20', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 7.119138469408275, 'x2': 0.0033181138303110536, 'x3': 0.004893059686998311}), experiment_id=20)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 20 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 3: completed Experiment 14 (model: simpleModel, policy: None, scenario: 14)
[MainProcess/DEBUG] MPI Rank 8: starting Experiment(name='simpleModel None 22', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 4.698277666358151, 'x2': 0.005579333312602821, 'x3': -0.0017796670394285476}), experiment_id=22)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 2: completed Experiment 10 (model: simpleModel, policy: None, scenario: 10)
[MainProcess/DEBUG] running scenario 22 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] MPI Rank 3: starting Experiment(name='simpleModel None 24', model_name='simpleModel', policy=Policy({}), scenario=Policy({'x1': 7.364100121292793, 'x2': -0.00503473305035102, 'x3': -0.0059506479282402}), experiment_id=24)
[MainProcess/DEBUG] calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] running scenario 24 for policy None on model simpleModel
[MainProcess/DEBUG] calling run_model on SingleReplication
[MainProcess/DEBUG] calling run_model on AbstractModel
[MainProcess/DEBUG] calling initialized on AbstractModel
[MainProcess/DEBUG] completed calling initialized on AbstractModel
[MainProcess/DEBUG] calling model_init on AbstractModel
[MainProcess/DEBUG] completed calling model_init on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling _transform on AbstractModel
[MainProcess/DEBUG] completed calling run_model on AbstractModel
[MainProcess/DEBUG] calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 9: completed Experiment 17 (model: simpleModel, policy: None, scenario: 17)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 1: completed Experiment 18 (model: simpleModel, policy: None, scenario: 18)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 4: completed Experiment 19 (model: simpleModel, policy: None, scenario: 19)
 76%|███████████████████████████████▉          | 19/25 [00:00<00:00, 34.68it/s][MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] MPI Rank 6: completed Experiment 21 (model: simpleModel, policy: None, scenario: 21)
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 5: completed Experiment 20 (model: simpleModel, policy: None, scenario: 20)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 7: completed Experiment 23 (model: simpleModel, policy: None, scenario: 23)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 8: completed Experiment 22 (model: simpleModel, policy: None, scenario: 22)
[MainProcess/DEBUG] completed calling run_experiment on BaseModel
[MainProcess/DEBUG] completed calling run_model on SingleReplication
[MainProcess/DEBUG] calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling reset_model on AbstractModel
[MainProcess/DEBUG] completed calling run_experiment on ExperimentRunner
[MainProcess/DEBUG] MPI Rank 3: completed Experiment 24 (model: simpleModel, policy: None, scenario: 24)
100%|██████████████████████████████████████████| 25/25 [00:00<00:00, 38.94it/s]
[MainProcess/INFO] MPIEvaluator: Callback completed for all 25 experiments
[MainProcess/INFO] experiments finished
[MainProcess/INFO] MPI pool has been shut down

Performance

Overhead

boxplot

Divided by 10 cores:

Model SequentialEvaluator MultiprocessingEvaluator MPIEvaluator
python 1 0.0135 0.0101
lake 1 0.4296 0.3037
flu 1 0.4041 0.3976

Undivided:

Model SequentialEvaluator MultiprocessingEvaluator MPIEvaluator
python 1 0.135 0.101
lake 1 4.296 3.037
flu 1 4.041 3.976

Scaling

The first one used shared nodes, which shows inconsistent performance:

boxplot_lake

The second two experiments claimed full nodes exclusively. While normally this isn't best practice, it was useful to get insight in performance scaling.
boxplot_lake_exclusive
boxplot_flu

Both models are relatively simple with large communication overhead. Testing the performance on a more compute intensive model would be interesting for future research. The MPIEvaluator-benchmarks branch can be used for this.

Documentation

A tutorial for the MPIEvaluator will be added in PR #308.

Limitations & future enhancements

There are two main limitations currently:

  • The MPIEvaluator is not tested with file-based models, such as NetLogo and Vensim. It might still work, but it's not tested. Originally this was in scope for this PR, but due to difficulties in creating the proper environment, this was cut out of the scope of this efford.
  • The model object is currently passed to the worker for each experiment. For large models with a relatively short runtime, this introduces significant performance overhead. Therefore, and optimization could be made to send the model only once to a worker on initialization.
    • Building on this, submitting experiment parameter sets in batches could also help increate performance, instead of sending them to the workers one-by-one.

Some other future improvements could be:

  • A new Callback class could be implemented that streams to the disk instead of keeping all results in memory. This would allow for handling very large model that gather lots of data, probably at the costs of some performance. See issue Storing results and memory errors #304 for more details.
  • Further performance profiling could be done on the current design, to see any components can be sped up, like the distribution of experiments and models to workers, the logging or the.
    • It would be interesting to see how larger models perform on many-node systems, and if the scaling is better than with the small lake and flu models.

Review

This PR is ready for review.

When merging, the preferred method would be fast-forward merge to keep individual commit messages and be able to revert one if necessary.

@coveralls
Copy link

coveralls commented Oct 29, 2023

Coverage Status

coverage: 80.585% (-0.07%) from 80.655%
when pulling 65e0fc0 on MPIEvaluator
into c9049bb on master.

Copy link
Owner

@quaquel quaquel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things at the code level

ema_workbench/em_framework/evaluators.py Outdated Show resolved Hide resolved
ema_workbench/em_framework/evaluators.py Outdated Show resolved Hide resolved
ema_workbench/util/ema_logging.py Show resolved Hide resolved
@EwoutH
Copy link
Collaborator Author

EwoutH commented Oct 29, 2023

Thanks for initial review! I updated the loglevel in (60f1f9c) and added example logs (on INFO and DEBUG level) to the PR.

Tomorrow I will be benchmarking and working on the documentation. If you have any specific requests (for models or scenarios to tests, or specific documentation to write), please let me know!

@quaquel
Copy link
Owner

quaquel commented Oct 30, 2023

It would be great to have a simple lake model example that can be run on an hpc. I guess this would need to take the form of a notebook because It needs to cover the python code and the batch script.

Also, not essential for this PR, but for my understanding of what functionality is available, have you run any tests with a FileModel or any of its subclasses (e.g., NetLogoModel).

@EwoutH
Copy link
Collaborator Author

EwoutH commented Oct 31, 2023

It's good that I decided to do extensive performance testing, because I ran into a nasty bug that I likely wouldn't have noticed otherwise.

Somehow every time the second initialization of a MPIEvaluator broke down, with errors like:

"/mnt/c/Users/Ewout/Documents/GitHub/EMAworkbench/ema_workbench/em_framework/evaluators.py", line 450, in initialize
    if self._pool._max_workers <= 10:
TypeError: '<=' not supported between instances of 'NoneType' and 'int'
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

Which indicated no workers available. My first hypotheses was that the MPI pool wasn't properly shut down, so I tried about 138 different ways to shut it down.

Apparently, that wasn't the problem.

So I decided to break it down to it's simplest form, try to get it working, and then bisect between our then current, broken implementation and the working one.

The minimal working example I initially found was this:

import time
from mpi4py.futures import MPIPoolExecutor

def simple_task(x):
    time.sleep(0.2)
    return x * x

def main():
    # First use of the MPI Pool
    with MPIPoolExecutor() as pool:
        results = list(pool.map(simple_task, range(10)))
        print("First pool completed")

    # Explicitly try to shut it down (though it should be shut down by the context manager)
    pool.shutdown(wait=True)

    # Second use of the MPI Pool
    with MPIPoolExecutor() as pool:
        results = list(pool.map(simple_task, range(20)))
        print("Second pool completed")

if __name__ == "__main__":
    main()

After 6 iterations I settled on the maximum working model, having a MPIEvaluator class:

import time
from mpi4py.futures import MPIPoolExecutor

def simple_task(x):
    time.sleep(0.2)
    return x * x

class MPIEvaluator:
    def __init__(self, n_workers=None):
        self._pool = None
        self.n_workers = n_workers

    def __enter__(self):
        self._pool = MPIPoolExecutor(max_workers=self.n_workers)
        print(f"MPI pool started with {self._pool._max_workers} workers")
        return self._pool

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self._pool:
            self._pool.shutdown(wait=True)
            print("MPI pool has been shut down")
            self._pool = None

def main():
    # First use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results1 = list(pool.map(simple_task, range(10)))
        print("First pool completed")

    # Second use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results2 = list(pool.map(simple_task, range(20)))
        print("Second pool completed")

if __name__ == "__main__":
    main()

And the minimum breaking one:

import time
from mpi4py.futures import MPIPoolExecutor

def simple_task(x):
    time.sleep(0.2)
    return x * x

def common_initializer():
    # A basic initializer, doing almost nothing for now.
    pass

class MPIEvaluator:
    def __init__(self, n_workers=None):
        self._pool = None
        self.n_workers = n_workers

    def __enter__(self):
        self._pool = MPIPoolExecutor(max_workers=self.n_workers, initializer=common_initializer)
        print(f"MPI pool started with {self._pool._max_workers} workers")
        return self._pool

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self._pool:
            self._pool.shutdown(wait=True)
            print("MPI pool has been shut down")
            self._pool = None

def main():
    # First use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results1 = list(pool.map(simple_task, range(10)))
        print("First pool completed")

    # Second use of the MPIEvaluator
    with MPIEvaluator(n_workers=4) as pool:
        results2 = list(pool.map(simple_task, range(20)))
        print("Second pool completed")

if __name__ == "__main__":
    main()

The problem seemed to be the common, global initializer. So I refactored the MPIEvaluator in the workbench to don't need one, and that solved it instantly.

With this effort, the logger configuration has changed and now shows slightly different behaviour. I have to determine if that's desired behaviour of not.

The mocked tests still pass, so that's great I guess.

TL;DR: You couldn't use the MPIEvaluator multiple times successively in a single script. Now you can. And I know what a Singleton Pattern is. It was a fun day.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Nov 1, 2023

In e24cd08 I fixed multiple sequential calls (and probably properly shutting off in general), but broke logging. Logging was difficult, but not being allowed a global initializer made it even more challenging. In the end I found a somewhat elegant solution, implemented in e2ff35d.

I also added some performance scaling tests. Those required multiple runs in a machine-scalable fashion, so it made setting up a environment in DelftBlue also mode robust in the process (and improved my bash skills).

Performance graphs are in the main post. As expected, more complex models (flu) scale better than less complex/faster ones (lake model). All tested models have diminishing returns at some point.

All the code, figures and data for this is available on the MPIEvaluator-benchmarks branch.

@EwoutH EwoutH force-pushed the MPIEvaluator branch 2 times, most recently from 7a5b073 to 2eb434e Compare November 2, 2023 19:19
@EwoutH EwoutH requested a review from quaquel November 2, 2023 19:20
@EwoutH
Copy link
Collaborator Author

EwoutH commented Nov 2, 2023

Restructured the code, and can be reviewed. I decided to leave in 0783435 and 2eb434e as distinct commits, instead of squashing them, because it shows design considerations and restrictions made in the process.

Docs will follow in separate PR, because this one is large enough as is.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Nov 3, 2023

A tutorial for the MPIEvaluator will be added in #308.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Nov 10, 2023

@quaquel I know you're quite busy, but would you have time to review this sooner than later? Now the code and all the ideas behind it is still relatively fresh in my head, so I can make changes quickly without much overhead.

Copy link
Owner

@quaquel quaquel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be merged as long as we make clear it is experimental and somewhat WIP

@EwoutH
Copy link
Collaborator Author

EwoutH commented Nov 15, 2023

Would

UserWarning: "The MPIEvaluator is experimental and may change without notice."

suffice?

@quaquel
Copy link
Owner

quaquel commented Nov 15, 2023

yes, in combination with my feedback on the tutorial in #308

Adds a new MPIEvaluator to the EMAworkbench, enabling experiments to be executed on multi-node High-Performance Computing (HPC) systems leveraging the mpi4py library. This evaluator optimizes performance for distributed computing environments by parallelizing experiments across multiple nodes and processors.

Changes include:
- Definition of the MPIEvaluator class.
- Initialization function to set up the global ExperimentRunner for worker processes.
- Proper handling to pack and unpack experiments for efficient data transfer between nodes.

Note: This addition requires the mpi4py package only when the MPIEvaluator is explicitly used, preventing unnecessary dependencies for users not requiring this feature.
Introduced detailed logging capabilities for the MPIEvaluator to facilitate debugging and performance tracking in distributed environments.

Key changes include:
- Configured a logger specifically for the MPIEvaluator.
- Passed logger's level to each worker process to ensure consistent logging verbosity across all nodes.
- Added specific log messages to track the progress of experiments on individual MPI ranks.
- Improved the log format to display the MPI process name alongside the log level, making it easier to identify logs from different nodes.
- Modified `log_to_stderr` in `ema_logging` to adjust log levels for root logger based on an optional flag.

With this enhancement, users can now get a clearer insight into the functioning and performance of the MPIEvaluator in HPC systems, helping in both development and operational phases.
Add mocked tests to the MPIEvaluator and include these in a single CI run

1. Integrated the MPIEvaluator into the test suite. This involves adding unit tests that ensure the new evaluator behaves as expected, with mocks simulating its interaction with `mpi4py`.
2. Enhanced the CI pipeline (in `.github/workflows/ci.yml`) to include MPI testing. This includes:
   - Adjustments to the matrix build, adding a configuration for testing with MPI on Ubuntu with Python 3.10.
   - Steps to install necessary MPI libraries and the `mpi4py` package.

The MPI tests are designed to skip when run on non-Linux platforms or when `mpi4py` isn't available, ensuring compatibility with various testing environments.

The use of mocking ensures that the MPIEvaluator logic is tested in isolation, focusing solely on its behavior and interaction with its dependencies, without the overhead or side effects of real MPI operations. This provides faster test execution and better control over the testing environment.

mpi4py 4.0 will release at some point, if anything breaking is changed, these mocked tests might help catch that.

Please note: These test don't cover actual (internal) MPI functionality and its integrations.
Addressed an issue where initializing the MPIEvaluator pool multiple times with a common initializer was causing a 'BrokenExecutor' error.

Details:
- Observed that using the MPIPoolExecutor twice in a row with an initializer function would lead to a 'BrokenExecutor: cannot run initializer' error on the second run.
- Reproduced the issue with simplified examples to confirm that the problem was due to the initializer function in conjunction with MPIPoolExecutor.
- Decided to remove the common initializer function from the MPIEvaluator to prevent this error.

Changes:
- Removed the global `experiment_runner` and the `mpi_initializer` function.
- Modified the MPIEvaluator's `initialize` method to not use the initializer arguments.
- Updated the `run_experiment_mpi` function to create the `ExperimentRunner` directly, ensuring each experiment execution has its fresh instance.

Examples:

Before:
```python
with MPIEvaluator(model) as evaluator:
    results = evaluator.perform_experiments(scenarios=24)
with MPIEvaluator(model) as evaluator:
    results2 = evaluator.perform_experiments(scenarios=48)

This would fail on the second invocation with a 'BrokenExecutor' error.

with MPIEvaluator(model) as evaluator:
    results = evaluator.perform_experiments(scenarios=24)
with MPIEvaluator(model) as evaluator:
    results2 = evaluator.perform_experiments(scenarios=48)

Now, both invocations run successfully without errors.

TL;DR:
By removing the common initializer, we have resolved the issue with re-initializing the MPIEvaluator pool. Users can now confidently use the MPIEvaluator multiple times in their workflows without encountering the 'BrokenExecutor' error.
Add a warning to the MPIEvaluator that it's still experimental and its interface and functionality might change in future releases.

Feedback is welcome at: #311
@EwoutH EwoutH merged commit 65e0fc0 into master Nov 15, 2023
18 checks passed
@EwoutH
Copy link
Collaborator Author

EwoutH commented Nov 15, 2023

Merged! It was quite a journey, happy that it's in. @quaquel, thanks for all the help along the way!

The SEN1211 students will be developing quite heavy (geospatial) ABM models, primarily in Mesa. Since it's pure Python it should work with this implementation, it could be an interesting test case for the MPIEvaluator!

I made a new discussion for feedback and future development: #311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants