Reference
[1] John Ruggles. Locomotive steam-engine for rail and other roads. US Patent 1. July 1836.
[2] Jeremy Rifkin. The end of work: The decline of the global labor force and the dawn of the post-market era. ERIC, 1995.
[3] William M Siebert. âFrequency discrimination in the auditory system: Place or periodicity mechanisms?â In: Proceedings of the IEEE 58.5 (1970), pp. 723â730.
[4] Donald Waterman. âA guide to expert systemsâ. In: (1986).
[5] Marti A. Hearst et al. âSupport vector machinesâ. In: IEEE Intelligent Systems and their applications 13.4 (1998), pp. 18â28.
[6] Carl Edward Rasmussen. âGaussian processes in machine learningâ. In: Advanced lectures on machine learning. Springer, 2004, pp. 63â71.
[7] Yann LeCun et al. âGradient-based learning applied to document recognitionâ. In: Proceedings of the IEEE 86.11 (1998), pp. 2278â2324.
[8] Geoffrey Hinton et al. âDeep neural networks for acoustic modeling in speech recognition: The shared views of four research groupsâ. In: IEEE Signal processing magazine 29.6 (2012), pp. 82â97.
[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. âImagenet classification with deep convolutional neural networksâ. In: Advances in neural information processing systems. 2012, pp. 1097â1105.
[10] Brendan Shillingford et al. âLarge-scale visual speech recognitionâ. In: arXiv preprint arXiv:1807.05162 (2018).
[11] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. âSequence to sequence learning with neural networksâ. In: Advances in neural information processing systems. 2014, pp. 3104â3112.
[12] Richard S Sutton. âLearning to predict by the methods of temporal differencesâ. In: Machine learning 3.1 (1988), pp. 9â44.
[13] Volodymyr Mnih et al. âHuman-level control through deep reinforcement learningâ. In: Nature 518.7540 (2015), pp. 529â533.
[14] David Silver et al. âMastering the game of Go with deep neural networks and tree searchâ. In: Nature 529.7587 (2016), pp. 484â489.
[15] Robin IM Dunbar. âNeocortex size as a constraint on group size in primatesâ. In: Journal of human evolution 22.6 (1992), pp. 469â493.
[16] Robert M Axelrod. The evolution of cooperation: revised edition. Basic books, 2006.
[17] Erik Zawadzki, Asher Lipson, and Kevin Leyton-Brown. âEmpirically evaluating multiagent learning algorithmsâ. In: arXiv preprint arXiv:1401.8074 (2014).
[18] Kagan Tumer and Adrian Agogino. âDistributed agent-based air traffic flow managementâ. In: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems. ACM. 2007, p. 255.
[19] Lloyd S Shapley. âStochastic gamesâ. In: Proceedings of the national academy of sciences 39.10 (1953), pp. 1095â1100.
[20] Frans A. Oliehoek, Matthijs T. J. Spaan, and Nikos Vlassis. âOptimal and Approximate Q-value Functions for Decentralized POMDPsâ. In: 32 (2008), pp. 289â353.
[21] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. Vol. 1. 1. MIT press Cambridge, 1998.
[22] Yoav Shoham, Rob Powers, Trond Grenager, et al. âIf multi-agent learning is the answer, what is the question?â In: Artificial Intelligence 171.7 (2007), pp. 365â377.
[23] John F Nash et al. âEquilibrium points in n-person gamesâ. In: Proceedings of the national academy of sciences 36.1 (1950), pp. 48â49.
[24] Ian Goodfellow et al. Deep learning. Vol. 1. MIT press Cambridge, 2016.
[25] Ming Tan. âMulti-agent reinforcement learning: Independent vs. cooperative agentsâ. In: Proceedings of the tenth international conference on machine learning. 1993, pp. 330â337.
[26] Ardi Tampuu et al. âMultiagent cooperation and competition with deep reinforcement learningâ. In: arXiv preprint arXiv:1511.08779 (2015).
[27] Y. Shoham and K. Leyton-Brown. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. New York: Cambridge University Press, 2009.
[28] Matthew Hausknecht and Peter Stone. âDeep recurrent q-learning for partially observable mdpsâ. In: arXiv preprint arXiv:1507.06527 (2015).
[29] Richard S Sutton et al. âPolicy gradient methods for reinforcement learning with function approximation.â In: NIPS. Vol. 99. 1999, pp. 1057â1063.
[30] Ronald J Williams. âSimple statistical gradient-following algorithms for connectionist reinforcement learningâ. In: Machine learning 8.3-4 (1992), pp. 229â256.
[31] John Schulman et al. âGradient Estimation Using Stochastic Computation Graphsâ. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada. 2015, pp. 3528â3536.
[32] Hajime Kimura, Shigenobu Kobayashi, et al. âAn analysis of actor-critic algorithms using eligibility traces: reinforcement learning with imperfect value functionsâ. In: Journal of Japanese Society for Artificial Intelligence 15.2 (2000), pp. 267â275.
[34] Ziyu Wang et al. âSample Efficient Actor-Critic with Experience Replayâ. In: arXiv preprint arXiv:1611.01224 (2016).
[35] Roland Hafner and Martin Riedmiller. âReinforcement learning in feedback controlâ. In: Machine learning 84.1 (2011), pp. 137â169.
[36] Lex Weaver and Nigel Tao. âThe optimal reward baseline for gradient-based reinforcement learningâ. In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. 2001, pp. 538â545.
[37] Vijay R Konda and John N Tsitsiklis. âActor-Critic Algorithms.â In: NIPS. Vol. 13. 2000, pp. 1008â1014.
[38] Kyunghyun Cho et al. âOn the properties of neural machine translation: Encoder-decoder approachesâ. In: arXiv preprint arXiv:1409.1259 (2014).
[39] Yu-Han Chang, Tracey Ho, and Leslie Pack Kaelbling. âAll learning is Local: Multi-agent Learning in Global Reward Games.â In: NIPS. 2003, pp. 807â814.
[40] Nicolas Usunier et al. âEpisodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasksâ. In: arXiv preprint arXiv:1609.02993 (2016).
[41] Peng Peng et al. âMultiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Gamesâ. In: arXiv preprint arXiv:1703.10069 (2017).
[42] Lucian Busoniu, Robert Babuska, and Bart De Schutter. âA comprehensive survey of multiagent reinforcement learningâ. In: IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews 38.2 (2008), p. 156.
[43] Erfu Yang and Dongbing Gu. Multiagent reinforcement learning for multi-robot systems: A survey. Tech. rep. tech. rep, 2004.
[44] Joel Z Leibo et al. âMulti-agent Reinforcement Learning in Sequential Social Dilemmasâ. In: arXiv preprint arXiv:1702.03037 (2017).
[45] Abhishek Das et al. âLearning Cooperative Visual Dialog Agents with Deep Reinforcement Learningâ. In: arXiv preprint arXiv:1703.06585 (2017).
[46] Igor Mordatch and Pieter Abbeel. âEmergence of Grounded Compositional Language in Multi-Agent Populationsâ. In: arXiv preprint arXiv:1703.04908 (2017).
[47] Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. âMulti-agent cooperation and the emergence of (natural) languageâ. In: arXiv preprint arXiv:1612.07182 (2016).
[48] Sainbayar Sukhbaatar, Rob Fergus, et al. âLearning multiagent communication with backpropagationâ. In: Advances in Neural Information Processing Systems. 2016, pp. 2244â2252.
[49] Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. âCooperative Multi-Agent Control Using Deep Reinforcement Learningâ. In: (2017).
[50] Shayegan Omidshafiei et al. âDeep Decentralized Multi-task Multi-Agent RL under Partial Observabilityâ. In: arXiv preprint arXiv:1703.06182 (2017).
[51] Tabish Rashid et al. âQMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learningâ. In: Proceedings of The 35th International Conference on Machine Learning. 2018.
[52] Peter Sunehag et al. âValue-Decomposition Networks For Cooperative Multi-Agent Learningâ. In: arXiv preprint arXiv:1706.05296 (2017).
[53] Ryan Lowe et al. âMulti-Agent Actor-Critic for Mixed Cooperative-Competitive Environmentsâ. In: arXiv preprint arXiv:1706.02275 (2017).
[54] Danny Weyns, Alexander Helleboogh, and Tom Holvoet. âThe packet-world: A test bed for investigating situated multi-agent systemsâ. In: Software Agent-Based Applications, Platforms and Development Kits. Springer, 2005, pp. 383â408.
[55] David H Wolpert and Kagan Tumer. âOptimal payoff functions for members of collectivesâ. In: Modeling complexity in economic and social systems. World Scientific, 2002, pp. 355â369.
[56] Scott Proper and Kagan Tumer. âModeling difference rewards for multiagent learningâ. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 3. International Foundation for Autonomous Agents and Multiagent Systems. 2012, pp. 1397â1398.
[57] Mitchell K Colby, William Curran, and Kagan Tumer. âApproximating difference evaluations with local informationâ. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems. 2015, pp. 1659â1660.
[58] Gabriel Synnaeve et al. âTorchCraft: a Library for Machine Learning Research on Real-Time Strategy Gamesâ. In: arXiv preprint arXiv:1611.00625 (2016).
[59] R. Collobert, K. Kavukcuoglu, and C. Farabet. âTorch7: A Matlab-like Environment for Machine Learningâ. In: BigLearn, NIPS Workshop. 2011.
[60] Landon Kraemer and Bikramjit Banerjee. âMulti-agent reinforcement learning as a rehearsal for decentralized planningâ. In: Neurocomputing 190 (2016), pp. 82â94.
[61] Emilio Jorge, Mikael Kageback, and Emil Gustavsson. âLearning to Play Guess Who? and Inventing a Grounded Language as a Consequenceâ. In: arXiv preprint arXiv:1611.03218 (2016).
[62] Martin J Osborne and Ariel Rubinstein. A course in game theory. MIT press, 1994.
[63] Katie Genter, Tim Laue, and Peter Stone. âThree years of the RoboCup standard platform league drop-in player competitionâ. In: Autonomous Agents and Multi-Agent Systems 31.4 (2017), pp. 790â820.
[64] Carlos Guestrin, Daphne Koller, and Ronald Parr. âMultiagent planning with factored MDPsâ. In: Advances in neural information processing systems. 2002, pp. 1523â1530.
[65] Jelle R Kok and Nikos Vlassis. âSparse cooperative Q-learningâ. In: Proceedings of the twenty-first international conference on Machine learning. ACM. 2004, p. 61.
[66] Katie Genter, Noa Agmon, and Peter Stone. âAd hoc teamwork for leading a flockâ. In: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems. 2013, pp. 531â538.
[67] Samuel Barrett, Peter Stone, and Sarit Kraus. âEmpirical evaluation of ad hoc teamwork in the pursuit domainâ. In: The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems. 2011, pp. 567â574.
[68] Stefano V Albrecht and Peter Stone. âReasoning about hypothetical agent behaviours and their parametersâ. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems. 2017, pp. 547â555.
[69] Alessandro Panella and Piotr Gmytrasiewicz. âInteractive POMDPs with finite-state models of other agentsâ. In: Autonomous Agents and Multi-Agent Systems 31.4 (2017), pp. 861â904.
[70] Takaki Makino and Kazuyuki Aihara. âMulti-agent reinforcement learning algorithm to handle beliefs of other agentsâ policies and embedded beliefsâ. In: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. ACM. 2006, pp. 789â791.
[71] Kyle A Thomas et al. âThe psychology of coordination and common knowledge.â In: Journal of personality and social psychology 107.4 (2014), p. 657.
[72] Ariel Rubinstein. âThe Electronic Mail Game: Strategic Behavior Under" Almost Common Knowledge"â. In: The American Economic Review (1989), pp. 385â391.
[73] Gizem Korkmaz et al. âCollective action through common knowledge using a facebook modelâ. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems. 2014, pp. 253â260.
[74] Ronen I. Brafman and Moshe Tennenholtz. âLearning to Coordinate Efficiently: A Model-based Approachâ. In: Journal of Artificial Intelligence Research. Vol. 19. 2003, pp. 11â23.
[75] Robert J Aumann et al. âSubjectivity and correlation in randomized strategiesâ. In: Journal of mathematical Economics 1.1 (1974), pp. 67â96.
[76] Ludek Cigler and Boi Faltings. âDecentralized anti-coordination through multi-agent learningâ. In: Journal of Artificial Intelligence Research 47 (2013), pp. 441â473.
[77] Craig Boutilier. âSequential optimality and coordination in multiagent systemsâ. In: IJCAI. Vol. 99. 1999, pp. 478â485.
[78] Christopher Amato, George D Konidaris, and Leslie P Kaelbling. âPlanning with macro-actions in decentralized POMDPsâ. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems. 2014, pp. 1273â1280.
[79] Miao Liu et al. âLearning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actionsâ. In: arXiv preprint arXiv:1707.07399 (2017).
[80] Rajbala Makar, Sridhar Mahadevan, and Mohammad Ghavamzadeh. âHierarchical multi-agent reinforcement learningâ. In: Proceedings of the fifth international conference on Autonomous agents. ACM. 2001, pp. 246â253.
[81] Thomas G. Dietterich. âHierarchical Reinforcement Learning with the MAXQ Value Function Decompositionâ. In: J. Artif. Int. Res. 13.1 (Nov. 2000), pp. 227â303.
[82] Saurabh Kumar et al. âFederated Control with Hierarchical Multi-Agent Deep Reinforcement Learningâ. In: arXiv preprint arXiv:1712.08266 (2017).
[83] Laetitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. âIndependent reinforcement learners in cooperative Markov games: a survey regarding coordination problemsâ. In: The Knowledge Engineering Review 27.01 (2012), pp. 1â31.
[84] Kamil Ciosek and Shimon Whiteson. âOFFER: Off-Environment Reinforcement Learningâ. In: (2017).
[85] Gerald Tesauro. âExtending Q-Learning to General Adaptive Multi-Agent Systems.â In: NIPS. Vol. 4. 2003.
[86] Tom Schaul et al. âPrioritized Experience Replayâ. In: CoRR abs/1511.05952 (2015).
[87] Vincent Conitzer and Tuomas Sandholm. âAWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponentsâ. In: Machine Learning 67.1-2 (2007), pp. 23â43.
[88] Bruno C Da Silva et al. âDealing with non-stationary environments using context detectionâ. In: Proceedings of the 23rd international conference on Machine learning. ACM. 2006, pp. 217â224.
[89] Jelle R Kok and Nikos Vlassis. âCollaborative multiagent reinforcement learning by payoff propagationâ. In: Journal of Machine Learning Research 7.Sep (2006), pp. 1789â1828.
[90] Martin Lauer and Martin Riedmiller. âAn algorithm for distributed reinforcement learning in cooperative multi-agent systemsâ. In: In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer. 2000.
[91] Maja J Mataric. âUsing communication to reduce locality in distributed multiagent learningâ. In: Journal of experimental & theoretical artificial intelligence 10.3 (1998), pp. 357â369.
[92] CP Robert and G Casella. âMonte Carlo Statistical Methods Springerâ. In: New York (2004).
[93] F. S. Melo, M. Spaan, and S. J. Witwicki. âQueryPOMDP: POMDP-based communication in multiagent systemsâ. In: Multi-Agent Systems. 2011, pp. 189â204.
[94] L. Panait and S. Luke. âCooperative multi-agent learning: The state of the artâ. In: Autonomous Agents and Multi-Agent Systems 11.3 (2005), pp. 387â434.
[95] C. Zhang and V. Lesser. âCoordinating multi-agent reinforcement learning with limited communicationâ. In: vol. 2. 2013, pp. 1101â1108.
[96] T. Kasai, H. Tenmoto, and A. Kamiya. âLearning of communication codes in multi-agent reinforcement learning problemâ. In: IEEE Soft Computing in Industrial Applications. 2008, pp. 1â6.
[97] C. L. Giles and K. C. Jim. âLearning communication for multi-agent systemsâ. In: Innovative Concepts for Agent-Based Systems. Springer, 2002, pp. 377â390.
[98] Karol Gregor et al. âDRAW: A recurrent neural network for image generationâ. In: arXiv preprint arXiv:1502.04623 (2015).
[99] Matthieu Courbariaux and Yoshua Bengio. âBinaryNet: Training deep neural networks with weights and activations constrained to +1 or -1â. In: arXiv preprint arXiv:1602.02830 (2016).
[100] Geoffrey Hinton and Ruslan Salakhutdinov. âDiscovering binary codes for documents by learning deep generative modelsâ. In: Topics in Cognitive Science 3.1 (2011), pp. 74â91.
[101] Karthik Narasimhan, Tejas Kulkarni, and Regina Barzilay. âLanguage understanding for text-based games using deep reinforcement learningâ. In: arXiv preprint arXiv:1506.08941 (2015).
[102] Sepp Hochreiter and Jurgen Schmidhuber. âLong short-term memoryâ. In: Neural computation 9.8 (1997), pp. 1735â1780.
[103] Junyoung Chung et al. âEmpirical evaluation of gated recurrent neural networks on sequence modelingâ. In: arXiv preprint arXiv:1412.3555 (2014).
[104] Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. âAn empirical exploration of recurrent network architecturesâ. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 2015, pp. 2342â2350.
[105] Sergey Ioffe and Christian Szegedy. âBatch normalization: Accelerating deep network training by reducing internal covariate shiftâ. In: arXiv preprint arXiv:1502.03167 (2015).
[106] W. Wu. 100 prisoners and a lightbulb. Tech. rep. OCF, UC Berkeley, 2002.
[107] Michael Studdert-Kennedy. âHow Did Language go Discrete?â In: Language Origins: Perspectives on Evolution. Ed. by Maggie Tallerman. Oxford University Press, 2005. Chap. 3.
[109] Michael C. Frank and Noah D. Goodman. âPredicting pragmatic reasoning in language gamesâ. In: Science (80-. ). 336.6084 (2012), p. 998. arXiv: 0602092 [physics].
[113] Piotr J. Gmytrasiewicz and Prashant Doshi. âA framework for sequential planning in multi-agent settingsâ. In: J. Artif. Intell. Res. 24 (2005), pp. 49â79. arXiv: 1109.2135.
[118] Jakob N Foerster et al. âLearning to communicate to solve riddles with deep distributed recurrent q-networksâ. In: arXiv preprint arXiv:1602.02672 (2016).
[119] Jean-Francois Baffier et al. âHanabi is NP-complete, even for cheaters who look at their cardsâ. In: (2016).
[121] Bruno Bouzy. âPlaying Hanabi Near-Optimallyâ. In: Advances in Computer Games. Springer. 2017, pp. 51â62.
[122] Joseph Walton-Rivers et al. âEvaluating and modelling Hanabi-playing agentsâ. In: Evolutionary Computation (CEC), 2017 IEEE Congress on. IEEE. 2017, pp. 1382â1389.
[123] Hirotaka Osawa. âSolving Hanabi: Estimating Hands by Opponentâs Actions in Cooperative Game with Incomplete Information.â In: AAAI workshop: Computer Poker and Imperfect Information. 2015, pp. 37â43.
[124] Markus Eger, Chris Martens, and Marcela Alfaro Cordoba. âAn intentional AI for hanabiâ. In: 2017 IEEE Conf. Comput. Intell. Games, CIG 2017 (2017), pp. 68â75.
[125] Matej Moravcik et al. âDeepStack: Expert-Level Artificial Intelligence in No-Limit Pokerâ. In: arXiv preprint arXiv:1701.01724 (2017).
[126] Noam Brown and Tuomas Sandholm. âSuperhuman AI for heads-up no-limit poker: Libratus beats top professionalsâ. In: Science 359.6374 (2018), pp. 418â424.
[127] Pablo Hernandez-Leal et al. âA Survey of Learning in Multiagent Environments: Dealing with Non-Stationarityâ. In: arXiv preprint arXiv:1707.09183 (2017).
[128] Tuomas W Sandholm and Robert H Crites. âMultiagent reinforcement learning in the iterated prisonerâs dilemmaâ. In: Biosystems 37.1-2 (1996), pp. 147â166.
[129] Michael Bowling and Manuela Veloso. âMultiagent learning using a variable learning rateâ. In: Artificial Intelligence 136.2 (2002), pp. 215â250.
[130] William Uther and Manuela Veloso. Adversarial reinforcement learning. Tech. rep. Technical report, Carnegie Mellon University, 1997. Unpublished, 1997.
[131] C. Claus and C. Boutilier. âThe Dynamics of Reinforcement Learning Cooperative Multiagent Systemsâ. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence. June 1998, pp. 746â752.
[132] Michael Wunder, Michael L Littman, and Monica Babes. âClasses of multiagent q-learning dynamics with epsilon-greedy explorationâ. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010, pp. 1167â1174.
[133] Martin Zinkevich, Amy Greenwald, and Michael L Littman. âCyclic equilibria in Markov gamesâ. In: Advances in Neural Information Processing Systems. 2006, pp. 1641â1648.
[134] Michael L Littman. âFriend-or-foe Q-learning in general-sum gamesâ. In: ICML. Vol. 1. 2001, pp. 322â328.
[135] Doran Chakraborty and Peter Stone. âMultiagent learning in the presence of memory-bounded agentsâ. In: Autonomous agents and multi-agent systems 28.2 (2014), pp. 182â213.
[136] Ronen I. Brafman and Moshe Tennenholtz. âEfficient Learning Equilibriumâ. In: Advances in Neural Information Processing Systems. Vol. 9. 2003, pp. 1635â1643.
[137] Marc Lanctot et al. âA Unified Game-Theoretic Approach to Multiagent Reinforcement Learningâ. In: Advances in Neural Information Processing Systems (NIPS). 2017.
[138] Johannes Heinrich and David Silver. âDeep reinforcement learning from self-play in imperfect-information gamesâ. In: arXiv preprint arXiv:1603.01121 (2016).
[139] Adam Lerer and Alexander Peysakhovich. âMaintaining cooperation in complex social dilemmas using deep reinforcement learningâ. In: arXiv preprint arXiv:1707.01068 (2017).
[141] Jacob W Crandall and Michael A Goodrich. âLearning to compete, coordinate, and cooperate in repeated games using reinforcement learningâ. In: Machine Learning 82.3 (2011), pp. 281â314.
[142] George W Brown. âIterative solution of games by fictitious playâ. In: (1951).
[143] Richard Mealing and Jonathan Shapiro. âOpponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Pokerâ. In: IEEE Transactions on Computational Intelligence and AI in Games (2015).
[144] Neil C Rabinowitz et al. âMachine Theory of Mindâ. In: arXiv preprint arXiv:1802.07740 (2018).
[145] Richard Mealing and Jonathan L Shapiro. âOpponent Modelling by Sequence Prediction and Lookahead in Two-Player Games.â In: ICAISC (2). 2013, pp. 385â396.
[146] Pablo Hernandez-Leal and Michael Kaisers. âLearning against sequential opponents in repeated stochastic gamesâ. In: (2017).
[147] Chongjie Zhang and Victor R Lesser. âMulti-Agent Learning with Policy Prediction.â In: AAAI. 2010.
[148] Luke Metz et al. âUnrolled generative adversarial networksâ. In: arXiv preprint arXiv:1611.02163 (2016).
[149] Max Kleiman-Weiner et al. âCoordinate to cooperate or compete: abstract goals and joint intentions in social interactionâ. In: COGSCI. 2016.
[150] Stephane Ross, Geoffrey J Gordon, and J Andrew Bagnell. âNo-regret reductions for imitation learning and structured predictionâ. In: In AISTATS. Citeseer. 2011.
[151] Mariusz Bojarski et al. âEnd to end learning for self-driving carsâ. In: arXiv preprint arXiv:1604.07316 (2016).
[152] R Duncan Luce and Howard Raiffa. âGames and Decisions: Introduction and Critical Surveyâ. In: (1957).
[153] King Lee and K Louis. âThe Application of Decision Theory and Dynamic Programming to Adaptive Control Systemsâ. PhD thesis. 1967.
[154] Drew Fudenberg and Jean Tirole. âGame theory, 1991â. In: Cambridge, Massachusetts 393 (1991), p. 12.
[155] B Myerson Roger. Game theory: analysis of conflict. 1991.
[156] Robert Gibbons. Game theory for applied economists. Princeton University Press, 1992.
[157] William H Press and Freeman J Dyson. âIterated Prisonerâs Dilemma contains strategies that dominate any evolutionary opponentâ. In: Proceedings of the National Academy of Sciences 109.26 (2012), pp. 10409â10413.
[158] John E Dennis Jr and Jorge J More. âQuasi-Newton methods, motivation and theoryâ. In: SIAM review 19.1 (1977), pp. 46â89.
[159] Chelsea Finn, Pieter Abbeel, and Sergey Levine. âModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networksâ. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. 2017, pp. 1126â1135.
[160] Maruan Al-Shedivat et al. âContinuous Adaptation via Meta-Learning in Nonstationary and Competitive Environmentsâ. In: CoRR abs/1710.03641 (2017). arXiv: 1710.03641.
[161] Zhenguo Li et al. âMeta-SGD: Learning to Learn Quickly for Few Shot Learningâ. In: CoRR abs/1707.09835 (2017). arXiv: 1707.09835.
[162] Martin Abadi et al. âTensorFlow: A System for Large-Scale Machine Learningâ. In: 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016. 2016, pp. 265â283.
[163] Adam Paszke et al. âAutomatic differentiation in PyTorchâ. In: (2017)
[164] John Schulman, Pieter Abbeel, and Xi Chen. âEquivalence Between Policy Gradients and Soft Q-Learningâ. In: CoRR abs/1704.06440 (2017). arXiv: 1704.06440.
[165] John Schulman et al. âTrust region policy optimizationâ. In: International Conference on Machine Learning. 2015, pp. 1889â1897.
[166] Barak A Pearlmutter. âFast exact multiplication by the Hessianâ. In: Neural computation 6.1 (1994), pp. 147â160.
[168] Michael C Fu. âGradient estimationâ. In: Handbooks in operations research and management science 13 (2006), pp. 575â616.
[169] Ivo Grondman et al. âA survey of actor-critic reinforcement learning: Standard and natural policy gradientsâ. In: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42.6 (2012), pp. 1291â1307.
[170] Peter W Glynn. âLikelihood ratio gradient estimation for stochastic systemsâ. In: Communications of the ACM 33.10 (1990), pp. 75â84.
[171] David Wingate and Theophane Weber. âAutomated Variational Inference in Probabilistic Programmingâ. In: CoRR abs/1301.1299 (2013). arXiv: 1301.1299.
[172] Rajesh Ranganath, Sean Gerrish, and David M. Blei. âBlack Box Variational Inferenceâ. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, April 22-25, 2014. 2014, pp. 814â822.
[173] Diederik P. Kingma and Max Welling. âAuto-Encoding Variational Bayesâ. In: CoRR abs/1312.6114 (2013). arXiv: 1312.6114.
[174] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. âStochastic Backpropagation and Approximate Inference in Deep Generative Modelsâ. In: (2014), pp. 1278â1286.
[175] Atilim Gunes Baydin, Barak A. Pearlmutter, and Alexey Andreyevich Radul. âAutomatic differentiation in machine learning: a surveyâ. In: CoRR abs/1502.05767 (2015). arXiv: 1502.05767.
Last updated
Was this helpful?