A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This report provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner’s dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisoner’s dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results derived in this report are quite robust to violations of the underlying assumptions.

Additional Metadata
Keywords Cooperation, Multi-Agent Q-Learning, Multi-Agent Reinforcement Learning, Nash Equilibrium, Prisoner’s Dilemma
JEL Model Construction and Estimation (jel C51), Information and Product Quality; Standardization and Compatibility (jel L15), Business Administration and Business Economics; Marketing; Accounting (jel M), Management of Technological Innovation and R&D (jel O32)
Publisher Erasmus Research Institute of Management
Persistent URL hdl.handle.net/1765/7323
Series ERIM Report Series Research in Management
Journal ERIM report series research in management Erasmus Research Institute of Management
Citation
Waltman, L, & Kaymak, U. (2006). A Theoretical Analysis of Cooperative Behavior in Multi-Agent Q-learning (No. ERS-2006-006-LIS). ERIM report series research in management Erasmus Research Institute of Management. Erasmus Research Institute of Management. Retrieved from http://hdl.handle.net/1765/7323