import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style; style.use('seaborn-v0_8')
'figure.figsize'] = (7.8, 2.5); plt.rcParams['figure.dpi'] = 300
plt.rcParams[= plt.rcParams['axes.prop_cycle'].by_key()['color'][0] # get the first color of the default color cycle
color 'axes.facecolor'] = 'white'; plt.rcParams['grid.color'] = 'gray'; plt.rcParams['grid.linewidth'] = 0.25; plt.rcParams[
8 Dynamic Interactions
Wolfram Barfuss | University of Bonn | 2024/2025
▶ Complex Systems Modeling of Human-Environment Interactions
8.1 Motivation | Futures and environments
Prototypical models did not explicitly consider future environmental consequences of strategic interactions (Figure 8.1).

However, many real-world scenarios involve strategic interactions with environmental consequences. For example, the tragedy of the commons, agreements, and threshold public goods can be extended to include ecological consequences. In this lecture, we will introduce dynamic games, particularly stochastic or Markov games, to model strategic interactions with environmental consequences.
Stochastic games integrate Markov chains, Markov decision processes, and game theory to model strategic interactions in dynamic environments. They are particularly useful for modeling human-environment interactions, where the environment is affected by human actions and, in turn, affects human behavior.
Advantages of dynamic games
Using dynamic games, particularly stochastic games to model strategic interactions with environmental consequences has several advantages:
- inherently stochastic - to account for uncertainty
- nonlinear - to account for structural changes
- agency - to account for human behavior
- interactions - to account for strategic interactions
- future-looking - to account for the trade-off between short-term and long-term
- feedback - between multiple agents and the environment
Stochastic games also have structural benefits that make them compatible with numerical computer modeling due to their discrete action and state sets, as well as their advancement in discrete time intervals.
Learning goals
After this lecture, students will be able to:
- Describe the elements of a stochastic game
- Apply stochastic games to model human-environment interactions
- Analyze the strategic interactions in stochastic games
8.2 Dynamic games | Strategic interactions with environmental consequences

Here, we focus on environments with a discrete state set in discrete time. These specifications are commonly called stochastic or Markov games. They consist of the following elements:
- A discrete set of environmental contexts or states \(\mathcal S = \{S_1, \dots, S_Z\}\).
- We denote an environmental state by \(s \in \mathcal S\).
- A finate set of \(N\) agents \(\mathcal I = \{2,\dots, N\}\) participating in an interaction.
- For each agent \(i\in\mathcal I\), a discrete set of options or actions \(\mathcal A^i = \{A^i_1, \dots, A^i_M\}\).
- Let’s denote the joint action set by \(\mathbf{\mathcal A} = \mathcal A^1 \times \dots \times A^N\).
- An action profile \(\mathbf a = (a^1,\dots,a^N) \in \mathbf{\mathcal A}\) is a joint action of all agents.
Time \(t\) advances in discrete steps \(t=0,1,2,\dots\), and agents choose their actions simultaneously.
- We denote the state at time \(t\) by \(s_t\) and the joint action by \(\mathbf a_t\).
The transitions tensor \(T: \mathcal S \times \mathbf{\mathcal A} \times \mathcal S \to [0,1]\) defines the environmental dynamics.
- \(T(s, \mathbf a, s')\) is the probability of transitioning from state \(s\in\mathcal S\) to \(s'\in\mathcal S\) given the joint action \(\mathbf a\).
- Thus, \(\sum_{s'} T(s, \mathbf a, s') = 1\) most hold for all \(s\in\mathcal S\) and \(\mathbf a\in\mathbf{\mathcal A}\).
The reward tensor \(\mathbf R: \mathcal I \times \mathcal S \times \mathbf{\mathcal A} \times \mathcal S \to \mathbb R\) defines the agents’ short-term welfare, utility, rewards, or payoffs.
- \(R^i(s, \mathbf a, s')\) is the reward agent \(i\) receives for transitioning from state \(s\) to \(s'\) given the joint action \(\mathbf a\).
- We assume that it is each agent \(i\)’s goal to maximize their expected discounted sum of future rewards, \(G^i = \sum_{t=0}^\infty (\gamma^i)^t R^i(s_t, \mathbf a_t, s_{t+1})\), where \(\gamma^i\in[0,1)\) is the discount factor.
We assume that agents can condition their probabilities of choosing actions on the current state \(s_t\), yielding so-called Markov policies or strategies, \(\mathbf x: \mathcal I \times \mathcal S \times \mathcal A^i \to [0, 1]\).
- \(x^i(s, a)\) is the probability agent \(i\) chooses action \(a\) in state \(s\).
8.3 Application | Ecological public good
We apply the stochastic game framework to integrate the fact that we are embedded in a shared environment and care about the future to some extent. We do so by considering a public good game with ecological consequences (Figure 8.3), which allows us to answer how the strategic incentives depend on their level of care for future rewards.

States, agents and actions
The environment consists of two states, \(\mathcal S = \{\textsf{p}, \textsf{d}\}\), representing a prosperous and a degraded state of the environment.
= ['prosperous,' 'degraded']; p=0; g=1; Z=2 state_set
We also defined two Python variable p=0
and g=1
to serves as readable and memorable indices to represent the environmental contexts.
There are \(2\) identical agents. In each state \(s \in \mathcal S\), each agent \(i \in \{1, 2\}\) can choose within their action set between either cooperation or defection, \(\mathcal A^i = \{\mathsf{c,d}\}\).
= ['cooperate', 'defect']; c=0; d=1; M=2 action_sets
Likewise, we define two Python variables, c=0
and d=1
, to serve as readable and memorable indices to represent the agents’ actions.
We denote the number of cooperating agents by \(N_c\). The number of defecting agents is \(N_d = N - N_c\).
Transitions | Environmental dynamics
We represent the environmental dynamics, i.e., the transitions between environmental state contexts, in a \(Z \times M \times M \times Z\) tensor, where \(Z\) is the number of states and \(M\) is the number of actions. In this representation,
- the first dimension corresponds to the current state,
- the second to the action profile of the first agent,
- the third to the action profile of the other agent, and
- the fourth and last dimension corresponds to the next state.
= np.zeros((Z, M, M, Z), dtype=object) TransitionTensor
The environmental dynamics are then governed by two parameters: a collapse leverage, \(q_c\), and a recovery probability, \(p_r\).
= sp.symbols('q_c p_r') qc, pr
A collapse from the prosperous to the degraded state occurs with a transition probability
\[T(\mathsf p, \mathbf a, \mathsf g) = \frac{N_d}{N} q_c. \]
= 0 # no agent defects
TransitionTensor[p, c, c, g] = qc/2 # one agent defects
TransitionTensor[p, d, c, g] = qc/2 # other agent defects
TransitionTensor[p, c, d, g] = qc # all agents defect TransitionTensor[p, d, d, g]
Thus, if all agents defect, the environment collapses with probability \(q_c\). The collapse leverage indicates how much impact a defecting agent exerts on the environment. The environment remains within the prosperous state with probability, \(T(\mathsf p, \mathbf a, \mathsf p) = 1 - \frac{N_d}{N} q_c\).
= 1 - TransitionTensor[p, : , :, g] TransitionTensor[p, : , :, p]
In the degraded state, we set the recovery to occur with probability,
\[T(\mathsf g, \mathbf a, \mathsf p) = p_r,\]
independent of the agents’ actions.
= pr TransitionTensor[g, :, :, p]
The probability that the environment remains in the degraded state is then, \(T(\mathsf g, \mathbf a, \mathsf g) = 1 - p_r\).
= 1-pr TransitionTensor[g, :, :, g]
Together, our transition tensor is then given by
sp.Array(TransitionTensor)
\(\displaystyle \left[\begin{matrix}\left[\begin{matrix}1 & 0\\1 - \frac{q_{c}}{2} & \frac{q_{c}}{2}\end{matrix}\right] & \left[\begin{matrix}1 - \frac{q_{c}}{2} & \frac{q_{c}}{2}\\1 - q_{c} & q_{c}\end{matrix}\right]\\\left[\begin{matrix}p_{r} & 1 - p_{r}\\p_{r} & 1 - p_{r}\end{matrix}\right] & \left[\begin{matrix}p_{r} & 1 - p_{r}\\p_{r} & 1 - p_{r}\end{matrix}\right]\end{matrix}\right]\)
Last, we make sure that our transition tensor is normalized, i.e., the sum of all transition probabilities from a state-joint-action pair to all possible next states equals one, \(\sum_{s'} T(s, \mathbf a, s') = 1\).
sum(-1) TransitionTensor.
array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]], dtype=object)
Rewards | Short-term welfare
= np.zeros((2, Z, M, M, Z), dtype=object) RewardTensor
Rewards in the prosperous state follow the standard public goods game with a synergy factor \(r\) and a cost of cooperation \(c\).
= sp.symbols('r c') r, cost
The rewards for cooperating and defecting agents are then given by
\[ R^i(\mathsf p, a^i, a^{-i}, \mathsf p) = \begin{cases} \frac{r c (N_c +1)}{N} - c, & \text{if } a^i = \mathsf c, \\ \frac{r c N_c}{N}, & \text{if } a^i = \mathsf d. \end{cases} \]
= r*cost - cost
RewardTensor[:, p, c, c, p] = 0
RewardTensor[:, p, d, d, p] 0, p, c, d, p] = RewardTensor[1, p, d, c, p] = r*cost/2 - cost
RewardTensor[0, p, d, c, p] = RewardTensor[1, p, c, d, p] = r*cost/2 RewardTensor[
When a state transition involves the degraded state, \(\mathsf{g}\), the agents only receive an environmental collapse impact \(m\):
= sp.symbols('m') m
\[R^i(\mathsf p, \mathbf a, \mathsf g) = R^i(\mathsf g, \mathbf a, \mathsf g) = R^i(\mathsf g, \mathbf a, \mathsf p) = m, \quad \text{for all} \ \mathbf a, i.\]
= RewardTensor[:, g, :, :, g] = RewardTensor[:, g, :, :, p] = m RewardTensor[:, p, :, :, g]
Together, our reward tensor is then given by
sp.Array(RewardTensor)
\(\displaystyle \left[\begin{matrix}\left[\begin{matrix}\left[\begin{matrix}c r - c & m\\\frac{c r}{2} - c & m\end{matrix}\right] & \left[\begin{matrix}\frac{c r}{2} & m\\0 & m\end{matrix}\right]\\\left[\begin{matrix}m & m\\m & m\end{matrix}\right] & \left[\begin{matrix}m & m\\m & m\end{matrix}\right]\end{matrix}\right] & \left[\begin{matrix}\left[\begin{matrix}c r - c & m\\\frac{c r}{2} & m\end{matrix}\right] & \left[\begin{matrix}\frac{c r}{2} - c & m\\0 & m\end{matrix}\right]\\\left[\begin{matrix}m & m\\m & m\end{matrix}\right] & \left[\begin{matrix}m & m\\m & m\end{matrix}\right]\end{matrix}\right]\end{matrix}\right]\)
DeepDive | Subsituting parameter values
In this chapter, we defined the transition and reward tensors as general numpy
arrays with data types object
, which we filled with symbolic expressions from sympy
. To manipulate and substitute these expressions, we can use the sympy.subs
method, however, not directly on the numpy
array. Instead, we define a helper function substitute_in_array
that takes a numpy
array and a dictionary of substitutions and returns a new array with the substitutions applied.
def substitute_in_array(array, subs_dict):
= array.copy()
result for index,_ in np.ndenumerate(array):
if isinstance(array[index], sp.Basic):
= array[index].subs(subs_dict)
result[index] return result
To make this work, it seems to be of critical importance that the subsitution dictionary is given as a dictionary in the form of {<symbol_variable>: <subsitution>, ...}
and not as dict(<symbol_variable>=<subsitution>, ...)
. For example,
0.01, qc:0.2}).astype(float) substitute_in_array(TransitionTensor, {pr:
array([[[[1. , 0. ],
[0.9 , 0.1 ]],
[[0.9 , 0.1 ],
[0.8 , 0.2 ]]],
[[[0.01, 0.99],
[0.01, 0.99]],
[[0.01, 0.99],
[0.01, 0.99]]]])
Policies | Strategic choices
The crucial question is whether or not the agents should cooperate or defect in the prosperous state, \(\mathsf p\), under the assumption that agents are anonymous - and how to answer depends on the parameter conditions \(q_c\), \(p_r\), \(\gamma\), \(r\), \(c\), and \(m\).
Anonymity means that agents do not consider the game’s history, i.e., behave according to a Markov policy. We analyze the two extreme cases: An agent can either cooperate or defect in the prosperous state, \(\mathsf p\).
Generally, a single agent’s policy is represented by a \(Z \times M\) tensor. The cooperative policy is then given by
= 0.5 * np.ones((Z, M))
Xsa_coo = 1
Xsa_coo[p, c] = 0 Xsa_coo[p, d]
The defective policy is given by
= 0.5 * np.ones((Z, M))
Xsa_def = 0
Xsa_def[p, c] = 1 Xsa_def[p, d]
To obtain the incentive regimes, we need to calculate the long-term values of the four policy combinations:
- mutual cooperation \((\mathsf c \mathsf c)\), i.e. both agents cooperate,
- unilateral defection \((\mathsf d \mathsf c)\), i.e. one agent defects, and the other cooperates,
- unilateral cooperation \((\mathsf c \mathsf d)\), i.e. one agent cooperates, and the other defects, and
- mutual defection \((\mathsf d \mathsf d)\), i.e. both agents defect.
They can also be represented by a meta-game payoff matrix, where the rows represent the focal agent’s policies, and the columns represent the opponent’s policies,
\(\mathsf c\) | \(\mathsf d\) | |
---|---|---|
\(\mathsf c\) | \(v_{\mathsf c \mathsf c}\) | \(v_{\mathsf c \mathsf d}\) |
\(\mathsf d\) | \(v_{\mathsf d \mathsf c}\) | \(v_{\mathsf d \mathsf d}\) |
We summarize these respecitve joint policies in four \(N \times Z \times M\) tensors,
= np.array([Xsa_coo, Xsa_coo])
Xisa_cc = np.array([Xsa_coo, Xsa_def])
Xisa_cd = np.array([Xsa_def, Xsa_coo])
Xisa_dc = np.array([Xsa_def, Xsa_def]) Xisa_dd
State values | Long-term welbeing
Long-term values are defined precisely like in the single-agent case (03.01-SequentialDecisions) except that they now depend on the joint policy \(\mathbf x\) and each agent \(i\) holds their own values.
Given a joint policy \(\mathbf x\), we define the state value for agent \(i\), \(v^i_{\mathbf x}(s)\), as the expected gain, \(\mathbb E_\mathbf{x}[ G^i_t | S_t = s]\), when starting in state \(s\) and the following the policy \(\mathbf x\),
\[ v^i_\mathbf{x}(s) := \mathbb E_\mathbf{x}[ G^i_t | S_t = s] = (1-\gamma^i) \mathbb E_\mathbf{x}\left[\sum_{\tau=t}^\infty (\gamma^i)^\tau R^i_{t+\tau+1} | S_t = s\right], \quad \text{for all } s \in \mathcal S, \]
They are also computable like in the single-agent case (03.01-SequentialDecisions) by solving the Bellman equations in matrix form,
\[\mathbf v^i_\mathbf{x} = (1-\gamma^i) (\mathbf 1_Z - \gamma^i\underline{\mathbf T}_\mathbf{x})^{-1} \mathbf R^i_\mathbf{x}.\]
Before we solve this equation, we focus on computing the effective transition matrix \(\underline{\mathbf T}_\mathbf{x}\) and the average reward \(\mathbf R^i_\mathbf{x}\), given a joint policy \(\mathbf x\).
The transition matrix \(\underline{\mathbf T}_\mathbf{x}\) is a \(Z \times Z\) matrix, where the element \(T_\mathbf{x}(s,s')\) is the probability of transitioning from state \(s\) to \(s'\) under the joint policy \(\mathbf x\). It is computed as
\[T_\mathbf{x}(s,s') = \sum_{a^1 \in \mathcal A^1} \dots \sum_{a^N \in \mathcal A^N} x^1(s, a^1) \dots x^N(s, a^N) T(s, a^1, \dots, a^N, s').\]
For \(N=2\), we implement in Python as follows:
def compute_symbolic_TransitionMatrix_Tss(policy_Xisa,
transitions_Tsas):= 0, 1, 2, 3 # defining indices for convenience
s, a, b, s_
= sp.Matrix(np.einsum(policy_Xisa[0], [s, a],
Tss 1], [s, b],
policy_Xisa[
transitions_Tsas, [s, a, b, s_],
[s,s_])) return sp.simplify(Tss)
For example, the transition matrix for the joint policy \(\mathbf x = (\mathsf d, \mathsf c)\) is then given by
compute_symbolic_TransitionMatrix_Tss(Xisa_dc, TransitionTensor)
\(\displaystyle \left[\begin{matrix}1.0 - 0.5 q_{c} & 0.5 q_{c}\\1.0 p_{r} & 1.0 - 1.0 p_{r}\end{matrix}\right]\)
The average reward \(\mathbf R^i_\mathbf{x}\) is a \(N \times Z\)-matrix, where the element \(R_\mathbf{x}^i(s)\) is the expected reward agent \(i\) receives in state \(s\) under the joint policy \(\mathbf x\). It is computed as
\[ R_\mathbf{x}^i(s) = \sum_{a^1 \in \mathcal A^1} \dots \sum_{a^N \in \mathcal A^N} x^1(s, a^1) \dots x^N(s, a^N) T(s, a^1, \dots, a^N, s') R^i(s, a^1, \dots, a^N, s').\]
For \(N=2\), we implement in Python as follows:
def compute_symbolic_AverageReward_Ris(policy_Xisa,
transitions_Tsas,
rewards_Risas):= 0, 1, 2, 3, 4 # defining indices for convenience
i, s, a, b, s_ = sp.Array(np.einsum(policy_Xisa[0], [s, a],
Ris 1], [s, b],
policy_Xisa[
transitions_Tsas, [s, a, b, s_],
rewards_Risas, [i, s, a, b, s_],
[i, s]))return sp.simplify(Ris)
For example, the average reward under the joint policy \(\mathbf x = (\mathsf d, \mathsf c)\) is then given by
compute_symbolic_AverageReward_Ris(Xisa_dc, TransitionTensor, RewardTensor)
\(\displaystyle \left[\begin{matrix}- \frac{c r \left(0.5 q_{c} - 1.0\right)}{2} + 0.5 m q_{c} & 1.0 m\\- \frac{c \left(0.5 q_{c} - 1.0\right) \left(r - 2\right)}{2} + 0.5 m q_{c} & 1.0 m\end{matrix}\right]\)
With the transition matrix \(\underline{\mathbf T}_\mathbf{x}\) and the average reward \(\mathbf R^i_\mathbf{x}\), we can now solve the Bellman equation for the state values \(\mathbf v^i_\mathbf{x}\). For convenience, we assume a homogeneous discount factor \(\gamma^i = \gamma\) for all agents \(i\).
= sp.symbols('gamma')
dcf def compute_symbolic_statevalues(policy_Xisa,
transitions_Tsas,
rewards_Risas, =dcf):
discountfactor= 0, 1, 2, 3 # defining indices for convenience
i, s, a, s_
= compute_symbolic_TransitionMatrix_Tss(
Tss
policy_Xisa, transitions_Tsas)= compute_symbolic_AverageReward_Ris(
Ris
policy_Xisa, transitions_Tsas, rewards_Risas)
= (sp.eye(2) - discountfactor*Tss).inv();
inv # sp.simplify() often helps
inv.simplify()
= (1-discountfactor) *\
Vis ;
sp.Matrix(np.einsum(inv, [s,s_], Ris, [i, s_], [i, s]))
return sp.simplify(Vis)
With the help of the function compute_symbolic_statevalues
, we can now compute the state values for all four joint policies.
Mutual cooperation
The state value of the prosperous state, \(\mathsf p\), for the joint policy \((\mathsf c, \mathsf c)\) is given by
= compute_symbolic_statevalues(
statevalues_Vis_cc Xisa_cc, TransitionTensor, RewardTensor)
= statevalues_Vis_cc[0, p]
Vcc_p = statevalues_Vis_cc[0, g]
Vcc_g Vcc_p
\(\displaystyle 1.0 c \left(r - 1\right)\)
Unilateral defection
The state value of the prosperous state, \(\mathsf p\), for the joint policy \((\mathsf d, \mathsf c)\) is given by
= compute_symbolic_statevalues(
statevalues_Vis_dc Xisa_dc, TransitionTensor, RewardTensor)
= statevalues_Vis_dc[0, p]
Vdc_p = statevalues_Vis_dc[0, g]
Vdc_g Vdc_p
\(\displaystyle \frac{- 0.5 c \gamma p_{r} q_{c} r + 1.0 c \gamma p_{r} r + 0.5 c \gamma q_{c} r - 1.0 c \gamma r - 0.5 c q_{c} r + 1.0 c r + 1.0 \gamma m p_{r} q_{c} + 1.0 m q_{c}}{2.0 \gamma p_{r} + 1.0 \gamma q_{c} - 2.0 \gamma + 2.0}\)
Unilateral cooperation
The state value of the prosperous state, \(\mathsf p\), for the joint policy \((\mathsf c, \mathsf d)\) is given by
= compute_symbolic_statevalues(
statevalues_Vis_cd Xisa_cd, TransitionTensor, RewardTensor)
= statevalues_Vis_cd[0, p]
Vcd_p = statevalues_Vis_cd[0, g]
Vcd_g Vcd_p
\(\displaystyle \frac{- 0.5 c \gamma p_{r} q_{c} r + 1.0 c \gamma p_{r} q_{c} + 1.0 c \gamma p_{r} r - 2.0 c \gamma p_{r} + 0.5 c \gamma q_{c} r - 1.0 c \gamma q_{c} - 1.0 c \gamma r + 2.0 c \gamma - 0.5 c q_{c} r + 1.0 c q_{c} + 1.0 c r - 2.0 c + 1.0 \gamma m p_{r} q_{c} + 1.0 m q_{c}}{2.0 \gamma p_{r} + 1.0 \gamma q_{c} - 2.0 \gamma + 2.0}\)
Mutual defection
The state value of the prosperous state, \(\mathsf p\), for the joint policy \((\mathsf d, \mathsf d)\) is given by
= compute_symbolic_statevalues(
statevalues_Vis_dd Xisa_dd, TransitionTensor, RewardTensor)
= statevalues_Vis_dd[0, p]
Vdd_p = statevalues_Vis_dd[0, g]
Vdd_g Vdd_p
\(\displaystyle \frac{1.0 m q_{c} \left(\gamma p_{r} + 1\right)}{\gamma p_{r} + \gamma q_{c} - \gamma + 1}\)
Critical curves
Finally, we can compute the critical conditions on the model parameters where the agents’ incentives change. The three conditions are:
- Dilemma: The agents are indifferent between all cooperating and all defecting, \(v_{cc} = v_{dd}\),
- Greed: Each individual agent is indifferent between cooperating and defecting, given all others cooperate, \(v_{cc} = v_{dc}\), and
- Fear: Each individual agent is indifferent between cooperating and defecting, given all agents defect, \(v_{cd} = v_{dd}\).
Without greed the action situation becomes a coordination challenge between two pure equilibria of mutual cooperation and mutual defection. Without greed and fear the only Nash equilibrium left is mutual cooperation.
Dilemma
The critical curve at which collapse avoidance becomes collectively optimal is obtained by setting \(v_{cc} = v_{dd}\). Solving for the collapse impact \(m\) yields,
= sp.solve(Vdd_p - Vcc_p, m)[0]
dilemma_m dilemma_m
\(\displaystyle \frac{c \left(\gamma p_{r} r - \gamma p_{r} + \gamma q_{c} r - \gamma q_{c} - \gamma r + \gamma + r - 1.0\right)}{q_{c} \left(\gamma p_{r} + 1.0\right)}\)
We verify that the dilemma condition does not depend on environmental state by computing it also for the degraded state.
- Vcc_g, m)[0] - dilemma_m sp.solve(Vdd_g
\(\displaystyle 0\)
Greed
The critical curve at which agents become indifferent to greed, i.e., exactly where cooperators are indifferent to cooperation and defection, given all other actors cooperate, is obtained by setting \(v_{cc} = v_{dc}\). Solving for the collapse impact \(m\) yields,
= sp.solve(Vdc_p - Vcc_p, m)[0]
greed_m greed_m
\(\displaystyle \frac{0.5 c \left(\gamma p_{r} q_{c} r + 2.0 \gamma p_{r} r - 4.0 \gamma p_{r} + \gamma q_{c} r - 2.0 \gamma q_{c} - 2.0 \gamma r + 4.0 \gamma + q_{c} r + 2.0 r - 4.0\right)}{q_{c} \left(\gamma p_{r} + 1.0\right)}\)
We verify that the greed condition does not depend on the environmental state by computing it also for the degraded state.
- Vcc_g, m)[0] - greed_m sp.solve(Vdc_g
\(\displaystyle 0\)
Fear
The critical curve at which actors are indifferent to fear, i.e., exactly where defectors are indifferent to cooperation and defection, given all other actors defect, is obtained by setting \(v_{cd} = v_{dd}\). Solving for the collapse impact \(m\) yields,
= sp.solve(Vcd_g - Vdd_g, m)[0]
fear_m fear_m
\(\displaystyle \frac{0.5 c \left(- \gamma p_{r} q_{c} r + 2.0 \gamma p_{r} q_{c} + 2.0 \gamma p_{r} r - 4.0 \gamma p_{r} - \gamma q_{c}^{2} r + 2.0 \gamma q_{c}^{2} + 3.0 \gamma q_{c} r - 6.0 \gamma q_{c} - 2.0 \gamma r + 4.0 \gamma - q_{c} r + 2.0 q_{c} + 2.0 r - 4.0\right)}{q_{c} \left(\gamma p_{r} + 1.0\right)}\)
We verify that also the fear condition does not depend on the environmental state by computing it also for the degraded state.
- Vdd_g, m)[0] - fear_m sp.solve(Vcd_g
\(\displaystyle 0\)
Visualization
Having obtained symbolic expressions for the critical curves, we can now visualize them as a function of the discount factor \(\gamma\), indicating how much the agents value future rewards.
Parameter values
Let us apply the model to the case of global sustainability. We set public goods synergy factors \(r=1.2\) and the cost of cooperation to \(5\).
= 1.2
vr = 5 vc
Regarding the transition probabilities, we apply the conversion rule developed in 02.04-StateTransitions to set the collapse leverage, \(q_c\) and the recovery probability \(p_r\), in terms of typical timescales. Under current business-as-usual policies, there are about 50 years left to avert the collapse (Rockström et al., 2017). After a collapse, there is a potential lock-in into an unfavorable Earth system state for a timescale up to millennia (Steffen et al., 2018). Interpreting a time step as one year, we set the collapse leverage to \(q_r = 1/50 = 0.02\) and the recovery probability to \(p_r = 1/10000 = 0.0001\).
= 0.02
vqc = 0.0001 vqr
Final graphic
We convert the symbolic expressions to numerical functions using lambdify
.
= sp.lambdify((r, cost, qc, pr, dcf), dilemma_m, 'numpy')
Fdilemma_m = sp.lambdify((r, cost, qc, pr, dcf), greed_m, 'numpy')
Fgreed_m = sp.lambdify((r, cost, qc, pr, dcf), fear_m, 'numpy') Ffear_m
= np.linspace(0.95, 1.0, 100)
dcf_values
plt.plot(dcf_values, Fdilemma_m(vr, vc, vqc, vqr, dcf_values), ='k', label='Dilemma')
c
plt.plot(dcf_values, Fgreed_m(vr, vc, vqc, vqr, dcf_values),='b', label='Greed')
c
plt.plot(dcf_values, Ffear_m(vr, vc, vqc, vqr, dcf_values),='g', label='Fear')
c
0, 4, color='gray', alpha=0.4)
plt.fill_between(dcf_values,
"Tragedy", (0.975, -1.6), ha='center', va='center')
plt.annotate("Coordination", (0.987, -4), ha='center', va='center')
plt.annotate("Comedy", (0.996, -6), ha='center', va='center')
plt.annotate("Collapse avoidance suboptimal",
plt.annotate(0.9999, 2.5), ha='right', va='center')
(
='center left'); plt.ylim(-7, 3); plt.xlim(0.955, 1.0);
plt.legend(loc'Discount factor $\gamma$'); plt.ylabel('Collapse impact $m$'); plt.xlabel(
This plot is a precise reproduction of the result by (Barfuss et al., 2020). It highlights that
- the same care for the future that makes individual decision-making apt for the long-term also positively impacts collective decision-making.
- This care for the future alone can turn a tragedy into a comedy, where individual incentives are fully aligned with the collective interest - completely resolving the social dilemma.
- This is true, given the anticipated collapse impact is sufficiently severe. Thus, agents need to consider the catastrophic impact and, at the same time, be immensely future-oriented.
- No other mechanism is required: Agents are anonymous, cannot reciprocate, and cannot make any agreements.
8.4 Learning goals revisited
In this chapter,
- we introduced the concept of dynamic games and described the elements of a stochastic game.
- We applied the stochastic games framework to model human-environment interactions on the question, how the individual care for future consequences and the fact that we all are embedded into a shared environment impacts collective decision-making.
- We analyze the strategic interactions in this stochastic game model and found that caring for the future can turn a tragedy into a commedy of the commons.