Universe RL trainer platform. Simple. Supple. Scalable.
tinyverse is a reinforcement learning platform for gym/universe/custom environments that lets you utilize any resources you have to train reinforcement learning algorithm.
- Simple: the core is currently under 400 lines including code (~50%), comments(~40%) and spaces (~10%).
- Supple: tinyverse assumes almost nothing of your agent and environment. The environment may not be interruptable. Agent may have any algorithm/structure. Agent [will soon](#14) support any framework from numpy to pure tensorflow/theano to keras/lasagne+agentnet.
- Scalable: You can train and play 10 parallel games on your GPU desktop/server, 20 more sessions on your Macbook and another 5 on your friend's laptop when he doesn't look. (And 1000 more games and 10 trainers in the cloud ofc).
The core idea is to have two types of processes:
- play-er - interacts with the environment, records sessions to the database, periodically loads new params
- train-er - reads sessions from the database, trains agent via experience replay, sends params to the database
Those processes revolve around database that stores experience sessions and weights. The database is currently implemented with Redis since it is simple to setup and swift with key-value operations. You can, however, implement the database interface with what database you prefer.
- install redis server
- (Ubuntu)
sudo apt-get install redis-server
- Mac OS version HERE.
- Otherwise search "Install redis your_OS" or ask on gitter.
- If you want to run on multiple machines, configure redis-server to listen to 0.0.0.0 (also mb set password)
- install python packages
- gym and universe
pip install gym[atari]
pip install universe
- most likely needs dependencies, see urls above.
- install bleeding edge theano, lasagne and agentnet for agentnet examples to work.
- Preferably setup theano to use floatX=float32 in .theanorc
pip install joblib redis prefetch_generator six
- examples require opencv:
conda install -y -c https://conda.binstar.org/menpo opencv3
- Spawn several player processes. Each process simply interacts and saves results. -b stands for batch size.
for i in `seq 1 10`;
do
python tinyverse atari.py play -b 3 &
done
- Spawn trainer process. (demo below runs on gpu, change to cpu if you have to)
THEANO_FLAGS=device=gpu python tinyverse atari.py train -b 10 &
- evaluate results at any time (records video to ./records)
python tinyverse atari.py eval -n 5
Devs: see workbench.ipynb