import numpy as np
import matplotlib.pyplot as plt
Ex | Learning Dynamics
Task 2 | Critical transition
We consider the following model: Two agents can either cooperate or defect. A cooperator contributes a benefit \(b\), which all agents receive. However, a cooperator must pay \(c\) for the contribution. A defector does not contribute and does not pay a cost. Thus, the payoff matrix is
Cooperate | Defect | |
---|---|---|
Cooperate | \(2b-c\) , \(2b-c\) | \(b-c\), \(b\) |
Defect | \(b\), \(b-c\) | \(0, 0\) |
Let us re-normalize the payoffs, devide all payoffs by \(b\) and express in the cost-to-benefit ratio \(r = c/b\).
Cooperate | Defect | |
---|---|---|
Cooperate | \(2-r\) , \(2-r\) | \(1-r\), \(1\) |
Defect | \(1\), \(1-r\) | \(0, 0\) |
Simulate the reinforcement learning dynamics in the game from 25 random initial joint policies for values of \(r\) in the range \([0.5, 1.5]\). Record the final joint policy for each initial policy and plot the critical transition from defection to cooperation as a function of \(r\). Also, visualize how long, on average, it takes for the agents to reach the final joint policy. Show a critical slowing down.
# ...