PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Find a file
2019-12-17 15:01:08 +01:00
docs undo changes to conf.py 2019-11-21 14:52:29 +01:00
scripts Bug fixes + add evaluate script 2019-09-06 10:44:55 +02:00
tests Merge branch 'master' into feat/sde-features 2019-12-05 16:33:41 +01:00
torchy_baselines Add value function for SDE + TD3 2019-12-17 15:01:08 +01:00
.coveragerc Bug fixes + add evaluate script 2019-09-06 10:44:55 +02:00
.gitignore Refactor: CEM-RL closer to TD3 implementation 2019-09-09 13:43:46 +02:00
LICENSE Init: TD3 2019-09-05 17:29:41 +02:00
README.md Add gradient clipping for SAC 2019-12-06 18:32:57 +01:00
setup.cfg Add flexible mlp 2019-10-17 13:32:25 +02:00
setup.py Bump version 2019-12-05 16:44:27 +01:00

Build Status Documentation Status

Torchy Baselines

PyTorch version of Stable Baselines, a set of improved implementations of reinforcement learning algorithms.

Implemented Algorithms

  • A2C
  • CEM-RL (with TD3)
  • PPO
  • SAC
  • TD3

Roadmap

TODO:

  • better predict
  • complete logger
  • Refactor: buffer with numpy array instead of pytorch
  • Refactor: remove duplicated code for evaluation
  • double check the shape of log prob
  • try squashing both mean and output when using SAC + SDE
  • plotting? -> zoo

Later:

  • get_parameters / set_parameters
  • SDE: use affine transform to scale the noise after a tanh transform?
  • Use MultivariateNormal with full covariance matrix?
  • CNN policies + normalization
  • tensorboard support
  • DQN
  • TRPO
  • ACER
  • DDPG
  • HER -> use stable-baselines because does not depends on tf?