RL 在线资源整理

强化学习在线资源整理
- DeepMind Spinning Up简单推导和上手实践
- Berkeley CS285强化学习课程,主要为PG流派
- 李宏毅老师RL课程,逻辑清晰比较简单
- Multi-agent RL实现框架pymarl,包括CDTE方法下的QMIX、COMA实现
Open source implementation
- 超越IMPALA和SEED RL的强化学习加速框架
- IMPALA的pytorch实现
- DRL框架GARAGE、RLlib、Catalyst、rlpyt
- SMAC SMAC - StarCraft Multi-Agent Challenge
- Fully Cooperative Multiagent Object Transporation Problems (CMOTPs)
The Apprentice Firemen Game
Pommerman
Starcraft Multiagent Challenge
The Multi-Agent Reinforcement Learning in Malmo (MARLO)
Hanabi is a cooperative multiplayer card game (two to five players)
Arena
MuJoCo Multiagent Soccer
Neural MMO
Game Theory mechanism expriments
- Keynes Beauty Contest
- Auction
- Stone Scissors
- Star Craft II
- two didactic
可follow的组/研究人员
Anuj Mahajan -- OATML
Chongjie Zhang -- Tsinghua University
RL目前存在的挑战
- scalability: 可扩展性CTDE(Centralized Training and Decentralized Execuation)
- Credit Assignment:each agent's contribution to the team
- uncertainty (non-stationary): partial and noisy observation🌊通过communication解决来自环境的不确定性,多智能体会相互影响
- Heterogeneity:异构性 requiring diverse behaviors of agents /role-based
- Hierarchical:层次化Agent,多级agent面对的模型
- Coordination: 协调
- Generality:RL的泛化性,off-policy和在推理时数据不同