saymrwulf/stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-19 21:40:19 +00:00

Antonin Raffin d63cef7693 Add gradient clipping for SAC

2019-12-06 18:32:57 +01:00

1.2 KiB

Raw Blame History

Torchy Baselines

PyTorch version of Stable Baselines, a set of improved implementations of reinforcement learning algorithms.

Implemented Algorithms

A2C
CEM-RL (with TD3)
PPO
SAC
TD3

Roadmap

TODO:

better predict
complete logger
Refactor: buffer with numpy array instead of pytorch
Refactor: remove duplicated code for evaluation
double check the shape of log prob
try squashing both mean and output when using SAC + SDE
plotting? -> zoo

Later:

get_parameters / set_parameters
SDE: use affine transform to scale the noise after a tanh transform?
Use MultivariateNormal with full covariance matrix?
CNN policies + normalization
tensorboard support
DQN
TRPO
ACER
DDPG
HER -> use stable-baselines because does not depends on tf?