Introduction
While Reinforcement Learning (RL) has shown impressive performance in games (e.g., Go and chess [14,13], DoTA2 [2], Starcraft II [18,17], etc.), it remains a challenging problem to use RL in real-world scenarios due to the gigantic sample complexity and limited ability to generalize to unknown environments. To train
an RL agent with traditional online methods, millions of interactions with the environments are often needed, which is definitely...