Full Documentation | README_CN.md | δΈζζζ‘£
XuanCe is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.
We call it as Xuan-Ce (ηη) in Chinese. "Xuan (η)" means incredible and magic box, "Ce (η)" means policy.
DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks, and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan". This project gives a thorough, high-quality and easy-to-understand implementation of DRL algorithms, and hope this implementation can give a hint on the magics of reinforcement learning.
We expect it to be compatible with multiple deep learning toolboxes( PyTorch, TensorFlow, and MindSpore), and hope it can really become a zoo full of DRL algorithms.
Paper link: https://arxiv.org/pdf/2312.16248.pdf
- π Highly modularized.
- π Easy to learn, easy for installation, and easy for usage.
- π Flexible for model combination.
- π Abundant algorithms with various tasks.
- π« Supports both DRL and MARL tasks.
- π High compatibility for different users. (PyTorch, TensorFlow2, MindSpore, CPU, GPU, Linux, Windows, MacOS, etc.)
- β‘ Fast running speed with parallel environments.
- π» Distributed training with multi-GPUs.
- ποΈ Support automatically hyperparameters tuning.
- π Good visualization effect with tensorboard or wandb tool.
- DQN: Deep Q Network [Paper]
- Double DQN: DQN with Double Q-learning [Paper]
- Dueling DQN: DQN with Dueling Network [Paper]
- PER: DQN with Prioritized Experience Replay [Paper]
- NoisyDQN: DQN with Parameter Space Noise for Exploration [Paper]
- DRQN: Deep Recurrent Q-Netwrk [Paper]
- QRDQN: DQN with Quantile Regression [Paper]
- C51: Distributional Reinforcement Learning [Paper]
- PG: Vanilla Policy Gradient [Paper]
- NPG: Natural Policy Gradient [Paper]
- PPG: Phasic Policy Gradient [Paper] [Code]
- A2C: Advantage Actor Critic [Paper] [Code]
- SAC: Soft Actor-Critic [Paper] [Code]
- SAC-Discrete: Soft Actor-Critic for Discrete Actions [Paper] [Code]
- PPO-Clip: Proximal Policy Optimization with Clipped Objective [Paper] [Code]
- PPO-KL: Proximal Policy Optimization with KL Divergence [Paper] [Code]
- DDPG: Deep Deterministic Policy Gradient [Paper] [Code]
- TD3: Twin Delayed Deep Deterministic Policy Gradient [Paper][Code]
- P-DQN: Parameterised Deep Q-Network [Paper]
- MP-DQN: Multi-pass Parameterised Deep Q-network [Paper] [Code]
- SP-DQN: Split Parameterised Deep Q-Network [Paper]
- IQL: Independent Q-learning [Paper] [Code]
- VDN: Value Decomposition Networks [Paper] [Code]
- QMIX: Q-mixing networks [Paper] [Code]
- WQMIX: Weighted Q-mixing networks [Paper] [Code]
- QTRAN: Q-transformation [Paper] [Code]
- DCG: Deep Coordination Graphs [Paper] [Code]
- IDDPG: Independent Deep Deterministic Policy Gradient [Paper]
- MADDPG: Multi-agent Deep Deterministic Policy Gradient [Paper] [Code]
- IAC: Independent Actor-Critic [Paper] [Code]
- COMA: Counterfactual Multi-agent Policy Gradient [Paper] [Code]
- VDAC: Value-Decomposition Actor-Critic [Paper] [Code]
- IPPO: Independent Proximal Policy Optimization [Paper] [Code]
- MAPPO: Multi-agent Proximal Policy Optimization [Paper] [Code]
- MFQ: Mean-Field Q-learning [Paper] [Code]
- MFAC: Mean-Field Actor-Critic [Paper] [Code]
- ISAC: Independent Soft Actor-Critic
- MASAC: Multi-agent Soft Actor-Critic [Paper]
- MATD3: Multi-agent Twin Delayed Deep Deterministic Policy Gradient [Paper]
- IC3Net: Individualized Controlled Continuous Communication Model [Paper] [Code]
![]() Cart Pole |
![]() Pendulum |
![]() Acrobot |
![]() MountainCar |
![]() Bipedal Walker |
![]() Car Racing |
![]() Lunar Lander |
![]() Ant |
![]() HalfCheetah |
![]() Hopper |
![]() HumanoidStandup |
![]() Humanoid |
![]() InvertedPendulum |
... |
![]() Adventure |
![]() Air Raid |
![]() Alien |
![]() Amidar |
![]() Assault |
![]() Asterix |
![]() Asteroids |
... |
![]() GoToDoorEnv |
![]() LockedRoomEnv |
![]() MemoryEnv |
![]() PlaygroundEnv |
... |
![]() Helix |
![]() Single-Agent Hover |
![]() Multi-Agent Hover |
... |
![]() Simple Push |
![]() Simple Reference |
![]() Simple Spread |
![]() Simple Adversary |
... |
![]() Example 1 |
![]() Example 2 |
![]() Example 3 |
![]() Example 4 |
... |
π» The library can be run at Linux, Windows, MacOS, and EulerOS, etc.
Before installing XuanCe, you should install Anaconda to prepare a python environment. (Note: select a proper version of Anaconda from here.)
After that, open a terminal and install XuanCe by the following steps.
Step 1: Create and activate a new conda environment (python>=3.7 is suggested):
conda create -n xuance_env python=3.8 && conda activate xuance_env
Step 2: Install the mpi4py dependency:
conda install mpi4py
Step 3: Install the library:
pip install xuance
This command does not include the dependencies of deep learning toolboxes. To install the XuanCe with
deep learning tools, you can type pip install xuance[torch]
for PyTorch,
pip install xuance[tensorflow]
for TensorFlow2,
pip install xuance[mindspore]
for MindSpore,
and pip install xuance[all]
for all dependencies.
Note: Some extra packages should be installed manually for further usage. Click here to see more details for installation.
import xuance
runner = xuance.get_runner(method='dqn',
env='classic_control',
env_id='CartPole-v1',
is_test=False)
runner.run()
import xuance
runner_test = xuance.get_runner(method='dqn',
env='classic_control',
env_id='CartPole-v1',
is_test=True)
runner_test.run()
You can use tensorboard to visualize what happened in the training process. After training, the log file will be automatically generated in the directory ".results/" and you should be able to see some training data after running the command.
$ tensorboard --logdir ./logs/dqn/torch/CartPole-v0
XuanCe also supports Weights & Biases (wandb) tools for users to visualize the results of the running implementation.
How to use wandb online? β‘οΈ https://github.com/wandb/wandb.git/
How to use wandb offline? β‘οΈ https://github.com/wandb/server.git/
- GitHub issues: https://github.com/agi-brain/xuance/issues
- Github discussions: https://github.com/orgs/agi-brain/discussions
- Discord invite link: https://discord.gg/HJn2TBQS7y
- Slack invite link: https://join.slack.com/t/xuancerllib/
- QQ App's group number: 552432695 (Full), 153966755
- WeChat account: "ηη RLlib"
(Note: You can also post your questions on Stack Overflow.)
If you use XuanCe in your research or development, please cite the paper:
@article{liu2023xuance,
title={XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library},
author={Liu, Wenzhang and Cai, Wenzhe and Jiang, Kun and Cheng, Guangran and Wang, Yuanda and Wang, Jiawei and Cao, Jingyu and Xu, Lele and Mu, Chaoxu and Sun, Changyin},
journal={arXiv preprint arXiv:2312.16248},
year={2023}
}