This project is part of the course CS-433 Machine Learning at EPFL.
The goal of this project is to design a wind turbine pitching control system using reinforcement learning. It uses a surrogate model to simulate the wind turbine system, and a Deep Deterministic Policy Gradient (DDPG) agent to optimize the control actions. A Soft Actor-Critic (SAC) agent is also tested for comparison.
ML_project2/
matlab/ # not tracked by git - dataset files should be placed here
...
notebooks/
outdated-tests/ # those notebooks are not used anymore, not supposed to work, but kept for reference
ddpg custom implementation.ipynb
offline ddpg.ipynb
prepare dataset.ipynb
surrogate model.ipynb
test environment.ipynb
ddpg stablebaselines3.ipynb
outputs/ # not tracked by git - generated files are saved here
...
public models/ # pretrained models are placed here, you should copy them to outputs/models
...
environment.py # the simulation environment, along with useful functions
mlps.py # functions to generate, load and save MLPs
parameters.py # parameters for preprocessing, surrogate model and DDPG agent
README.md # this file
report.pdf # the report
requirements.txt # python dependencies
To set up the project, you need to download the MATLAB files that contain the wind turbine data.
These files should be placed in the matlab
as shown in the directory structure below.
ML_project2/
matlab/
case 1/
ms008mpt001.mat
ms008mpt002.mat
...
case 2/
ms006mpt001.mat
ms006mpt002.mat
...
case 3/
vawt_data.mat
The dataset used in this project is derived from the MATLAB files. The preprocessing of the data is done in the prepare dataset notebook. The dataset is only used to train the surrogate model, and is not used in the reinforcement learning algorithm. The preprocessing steps are as follows:
-
The data is loaded from the MATLAB files and converted into pandas DataFrames with the same column names. The dataframes are saved as pickle files in the
outputs/datasets/{case}/dataframes
directory. -
The data are resampled to 50 steps per revolution using linear interpolation.
-
Angles are converted to have the same units through dataset cases.
-
The state action matrix
X
is created by taking the following rows from the dataframes of the same dataset case.df[Phase]
→ computeX[cos_phase]
andX[sin_phase]
df[Ct]
→X[Ct]
df[Cr]
→X[Cr]
df[Pitch]
→X[pitch]
, computeX[dpitch]
: Dpitch is the pitch increment from the previous state to the current state, corresponding approximately to the pitch angular speed.X[action]
: Action is the pitch increment from the current state to the next state.
-
The labels
Y
are created by taking the next state from the dataframes of the same dataset case. Some features of the next state can be computed at simulation time, so they are not computed by the surrogate model and thus not included in the labels. The only features that are included in the labels are:df[Ct+1]
→Y[Ct]
df[Cr+1]
→Y[Cr]
-
The data are normalized and shuffled, then saved as a single pickle file in the
outputs/datasets/{case}/array
directory. -
The data from multiple dataset cases are mixed together by randomly selecting experiences through dataset cases using a slightly adjusted probability distribution.
-
The data are normalized and shuffled, then saved as a single pickle file in the
outputs/datasets/full/array
directory. The dataset includes features such as the phase, pitch, and the coefficients of thrust and power.
The surrogate model is a multi-layer perceptron (MLP) neural network that predicts the evolution of the forces Ct
and Cr
based on the current state and the control action.
This is enough to predict the next state of the system, which is used in the simulation environment.
This surrogate model allows us to simulate the system without the need for real access to the wind turbine, and without
having to solve the Navier-Stokes equations that describe the system.
The surrogate model is trained in the surrogate model notebook. You can also find a small analysis of the model and the dataset in this notebook, such as a SHAP features importance analysis.
The model's architecture and training parameters can be found in the parameters file.
The model is saved in the outputs/models
directory.
The environment is a simulation of the wind turbine system. It uses the surrogate model to predict the next state of the system based on the current state and the control action. The environment is implemented as a Gymnasium environment, making it compatible with reinforcement learning algorithms from libraries like Stable Baselines 3.
The environment is defined in
the environment file as the TurbineEnvironment
class extending the gymnasium.Env
class.
The simulate_open_loop_episode
function in the environment file is used to simulate an episode of the system
operation with a given sequence of actions. The function returns the states and rewards for the entire episode.
Giving the same sequence of actions to the environment should produce the same states and rewards.
This is tested in the test environment notebook.
The DDPG agent is a reinforcement learning agent that learns the optimal control policy for the wind turbine system. The
agent is trained in the environment using the DDPG algorithm. The agent's architecture and training parameters can be
found in the parameters file. The training process can be monitored using TensorBoard, with the logs
saved in the outputs/tensorboard
directory.
To train the agent, run the ddpg stablebaselines3 notebook. A small agent evaluation is also done in this notebook.
The agent is saved in the outputs/models
directory.
The SAC agent is a reinforcement learning agent that learns the optimal control policy for the wind turbine system. The
agent is trained in the environment using the SAC algorithm. The agent's architecture and training parameters can be
found in the parameters file. The training process can be monitored using TensorBoard, with the logs
saved in the outputs/tensorboard
directory.
To train the agent, run the sac stablebaselines3 notebook. A small agent evaluation is also done in this notebook.
The agent is saved in the outputs/models
directory.
The report is available in the report file.