Differentiable Optimization for Game Theoretic Formulations

Game theory is an effective tool in formulating interactions among agents and finds its use in many real world challenges including human-robot interaction scenarios, like self-driving. Recently, such tools have also found applications in machine learning [1,2] and reinforcement learning [3] domains. For example, virtual agents in the form of optimizer and uncertainties in robust optimization, the E-step and M-step in EM algorithm [4,5], and the model adaptation and policy update in model-based RL [6]. Typically real world problems like self-driving are represented as general-sum games that are particularly difficult to solve due to the infinite loop or nonuniqueness of Nash solutions. To address this, Stackelberg strategies have been widely adopted, which reduces games to bi-level optimization problems. Namely, one agent leads and the other follows, and the leader considers potential responses from the follower while finding its policies. Such methods however, are not yet equipped to handle high-dimensional inputs and exploit end-to-end learning. To bridge this gap, we will study the foundations and applications of game theory toward building new tools for machine learning.

Researchers

BAIR Faculty Member: Masayoshi Tomizuka (tomizuka@berkeley.edu)
BAIR PhD student/postdoc: Wei Zhan (wzhan@berkeley.edu), Liting Sun (litingsun@berkeley.edu)
FAIR researcher: Mustafa Mukadam (mukadam@fb.com), Brandon Amos (bda@fb.com)

Overview

Novelty and Innovation

We will derive an end-to-end differentiable bi-level optimizer for general-sum games by leveraging differentiable optimization that has shown promising results in single-level optimization problems [7,8] and normal and extensive form games [9]. Some risks come from running into computational intractability issues due to the additional machinery, and from applications potentially needing other methodological improvements before the power of differentiable optimization comes through.

Technical Objective

We will build the differentiable bi-level optimizer and apply it to efficiently solve problems such as interactive driving around human drivers and pedestrians in crowded environments and benchmark against state of the art.
We will explore the influence of different risk metrics and bounded rationality [10] to the solution of games. For example, when uncertainties exist, should we minimize the expected rewards or the conditional value at risk (CVaR) [11] or the cumulative prospect [12]? And how will bounded computation resources shift the game solutions?
We will also explore how humans make decisions from real driving data with a goal to learn explainable and expressive human behavior models.

References

[1] Goodfellow, Ian, et al. Generative adversarial nets. Neural information processing systems. 2014.

[2] Florian Schäfer and Anima Anandkumar. Competitive gradient descent. In NeurIPS, 2019.
[3] Wen, Ying, et al. Probabilistic recursive reasoning for multi-agent reinforcement learning. arXiv preprint arXiv:1901.09207 (2019).
[4] Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6), 47-60.
[5] Greff, K., Van Steenkiste, S., & Schmidhuber, J. (2017). Neural expectation maximization. In Advances in Neural Information Processing Systems (pp. 6691-6701).
[6] Rajeswaran, A., Mordatch, I., & Kumar, V. (2020). A Game Theoretic Framework for Model Based Reinforcement Learning. arXiv preprint arXiv:2004.07804.
[7] Amos, B. (2019). Differentiable optimization-based modeling for machine learning (Doctoral dissertation, PhD thesis. Carnegie Mellon University).
[8] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter. Differentiable convex optimization layers. In Advances in Neural Information Processing Systems, pages 9558–9570, 2019.
[9] Ling, C. K., Fang, F., & Kolter, J. Z. (2018). What game are we playing? end-to-end learning in normal and extensive form games. arXiv preprint arXiv:1805.02777.
[10] Simon, H. A. (1972). Theories of bounded rationality. Decision and organization, 1(1), 161-176.
[11] Carlo Acerbi; Dirk Tasche (2002). "Expected Shortfall: a natural coherent alternative to Value at Risk" (PDF). Economic Notes. 31 (2): 379–388.
[12] Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5(4), 297-323.