Using Deep Reinforcement Learning to Generalize Search in Games

Search methods have been instrumental in computing superhuman strategies for large-scale games [1,2,3]. However, existing search techniques are tabular and can therefore have trouble searching far into the future. This is particularly a problem in games with high stochasticity and/or imperfect information. For example, existing search techniques in Hanabi, which is considered an interesting research problem by the AI community [4], are only able to search one move ahead. Even searching two moves ahead is considered intractable for existing techniques. Since real-world situations are highly stochastic and commonly involve hidden information, it is important to develop more scalable online search methods for real-world applications.


  • Arnaud Fickinger, UC Berkeley
  • Stuart Russell, UC Berkeley
  • Noam Brown, FAIR


We propose using deep reinforcement learning to search further ahead than existing tabular techniques are capable of. We would accomplish this by first training a policy network for the entire game. Then, at test time, whenever a decision must be made we would fine-tune the policy for the particular situation the agent is in.

We aim to develop a method that outperforms existing search techniques in the benchmark game of Hanabi, as well as potentially other benchmark games. Specifically, we aim to show that two-ply search in Hanabi is intractable when using existing state-of-the-art techniques [3], but that using our deep RL approach makes this both tractable and effective.


[1] Brown, Noam, and Tuomas Sandholm. "Superhuman AI for multiplayer poker." Science 365.6456 (2019): 885-890.

[2] Silver, David, et al. "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play." Science 362.6419 (2018): 1140-1144.

[3] Lerer, Adam, et al. "Improving Policies via Search in Cooperative Partially Observable Games." AAAI. 2020.

[4] Bard, Nolan, et al. "The hanabi challenge: A new frontier for ai research." Artificial Intelligence 280 (2020): 103216.