- Migrated from gym to Gymnasium
gym==0.26.3
is still required for the dm_control and pybullet-gym environmentsTransition
andTransitionBatch
now support theterminated
andtruncated
booleans instead of the singledone
boolean previously used by gym- Migrated calls to
env.reset()
which now returns a tuple ofobs, info
instead of justobs
- Migrated calls to
env.step()
which now returns aobservation, reward, terminated, truncated, info
- Migrated to Gymnasium render API, environments are instantiated with
render_mode=None
by default - DMC and PyBullet envs use the original gym wrappers to turn them into gym environments, then are wrapper by gymnasium.envs.GymV20Environment
- All Mujoco envs use the DeepMind Mujoco bindings, mujoco-py is deprecated as a dependency
- Custom Mujoco envs e.g.
AntTruncatedObsEnv
inherit from gymnasium.envs.mujoco_env.MujocoEnv, and access data throughself.data
instead ofself.sim.data
- Mujoco environment versions have been updated to
v4
fromv2
e.g.Hopper-v4
- Fixed PlaNet to save model to a directory instead of a file name
- Added
follow-imports=skip
tomypy
CI test to allow for gymnasium/gym wrapper compatibility - Bumped
black
to version0.23.1
in CI
- Added PlaNet implementation.
- Added support for PyBullet environments.
- Changed SAC library used by MBPO (now based on Pranjan Tadon's).
Model.reset()
andModel.sample()
signature has changed. They no longer receiveTransitionBatch
objects, and they both return a dictionary of strings to tensors representing a model state that should be passed tosample()
to simulate transitions. This dictionary can contain things like previous actions, predicted observation, latent states, beliefs, and any other such quantity that the model need to maintain to simulate trajectories when usingModelEnv
.Ensemble
class and sub-classes are assumed to operate on 1-D models.- Checkpointing format used by
save()
andload()
in classesGaussianMLP
andOneDTransitionRewardModel
changed, making old checkpoints incompatible with the new version. use_silu
argument toGaussianMLP
has been replaced byactivation_fn_cfg
, which is anomegaconf.DictConfig
specifying the class to use for the activation functions, thus giving more flexibility.- Removed unnecessary nesting inside
dynamics_model
Hydra configuration. - SAC agents prior to v0.2.0 cannot be loaded anymore.
- Added
add_batch()
method tombrl.util.ReplayBuffer
. - Added functions to
mbrl.util.models
to easily create convolutional encoder/decoders with a desired configuration. mbrl.util.common.rollout_agent_trajectories
now allows rolling out a pixel-based environment using a policy trained on its corresponding non-pixel environment version.ModelTrainer
can be giveneps
forAdam
optimizer. It now also includes a progress bar usingtqdm
(can be turned off).- CEM optimizer can now be toggled between using clipped normal distribution or truncated normal distribution.
mbrl.util.mujoco.make_env
can now create an environment specified via anomegaconf
configuration andhydra.utils.instantiate
, which takes precedence over the old mechanism if both are present.- Fixed bug that assigned wrong termination functino to
humanoid_truncated_obs
env.
- Added MPPI optimizer.
- Added iCEM optimizer.
control_env.py
now works with CEM, iCEM and MPPI.- Changed algorithm configuration so that action optimizer is passed as another config file.
- Added option to quantize pixel obs of gym mujoco and dm control env wrappers.
- Added a sequence iterator,
SequenceTransitionSampler
, that always returns a fixed number of random batches.
- Methods
loss
,eval_score
andupdate
ofModel
class now return a tuple of loss/score and metadata. Currently, supports the old version as well, but this will be deprecated in v0.2.0. ModelTrainer
now accepts a callback that will be called after every batch both during training and evaluation.Normalizer
inutil.math
can now operate using double precision. Utilities now allow specifying if replay buffer and normalizer should use double or float via Hydra config.
- Multiple bug fixes
- Added a training browser to compare results of multiple runs
- Deprecated
ReplayBuffer.get_iterators()
and replaced withmbrl.util.common.get_basic_iterators()
- Added an iterator that returns batches of sequences of transitions of a given length
- Multiple bug fixes
- Added
third_party
folder forpytorch_sac
anddmc2gym
- Library now available in
pypi
- Moved example configurations to package
mbrl.examples
, which can now be run aspython -m mbrl.examples.main
, afterpip
installation
Initial release