import numpy as np
import matplotlib.pyplot as plt
from pyCRLD.Environments.EcologicalPublicGood import EcologicalPublicGood as EPG
from pyCRLD.Agents.StrategySARSA import stratSARSA as stratS
6 Hysteresis
In this section, we illustrate the complex phenomenon of hysteresis in CRLD. Hysteresis means that the system’s state depends on the history of external parameter changes. Here, we show that hysteresis exists by varying the discount factor, which indicates how much the agents care for future rewards. We let the discount factor increase and decrease again while the CRLD keeps running.
First, we import everything we need:
In contrast to the previous examples, where we used stratAC
, i.e., actor-critic learning agents in strategy space, we use stratSARSA
agents, as seen above in the imports. The SARSA agents differ from the actor-critic learners in their exploration terms. The SARSA agents keep a constant exploration term, which prevents them from converging too close to the edges of the strategy phase space. They are constantly exploring to some extent. Keeping a small distance to the edges of the strategy phase space is required for hysteresis. When the external parameter changes while CRLD keeps running, the agents need to be able to change their current equilibrium. Otherwise, no change of equilibrium is observable.
To be able to change their current equilibrium requires the agents to keep a small distance from the strategy phase space edges, as one can also see in the learning update equation,
\[ X^i_{t+1}(s, a) = \frac{1}{\bar{\mathfrak{Z}}^i(s)} X^i_t(s, a) \exp\big(\eta^i \cdot \bar \delta^i(s, a) \big). \]
If \(X^i_t(s, a)\) is too close to zero or one, no update can happen, regardless of the strategy-average reward-prediction error \(\bar \delta^i_t(s, a)\). See Barfuss et al. (2019) for a detailed comparison between the CRLD of SARSA and actor-critic learning.
By trial-and-error, we set the choice intensity of SARSA learning to 60 log-probils per util.
Compute data
First, we compute the data for the hysteresis curve.
# Set up the ecological public goods environment
= EPG(N=2, f=1.2, c=5, m=-5, qc=0.2, qr=0.01, degraded_choice=False)
env
# Compile the list of discount factors
= list(np.arange(0.6, 0.9, 0.005))
dcfs # Hysteresis curve parameters first increase and then decrease again
= dcfs + dcfs[::-1]
hystcurve
= [] # for storing the cooperation probabilities
coops for i, dcf in enumerate(hystcurve):
# Adjust multi-agent environment interface with discount factor
= stratS(env=env, discount_factors=dcf, use_prefactor=True,
MAEi =0.01, choice_intensities=60)
learning_ratesif i==0: # Choose random intial policy
= MAEi.random_softmax_strategy()
X
# Compute trajectory
= MAEi.trajectory(X, Tmax=2500, tolerance=10e-12)
trj, fpr print('\r ', dcf, fpr, end=' ')
= trj[-1] # select last strategy
X 1, 0]) # append to storage container coops.append(X[:,
Plot curve
Now, we plot the computed data. We use the points’ size and color to indicate the time dimensions of the discount factor changes. The time flows from big to small data points and from dark to light ones.
# Create the canves
= 0.75 # figure size factor
fsf =(fsf*6, fsf*3))
plt.figure(figsize
# Plot background line
-1),'-',alpha=0.5,color='k',zorder=-1)
plt.plot(hystcurve, np.array(coops).mean(# Plot data points with size and color indicating the time dimension
-1), alpha=0.9,
plt.scatter(hystcurve, np.array(coops).mean(=np.arange(len(hystcurve))[::-1]+1, c=np.arange(len(hystcurve)))
s
# Make labels and axis nice
'Cooperation')
plt.ylabel('Discount Factor')
plt.xlabel(False)
plt.gca().spines.right.set_visible(False)
plt.gca().spines.top.set_visible(
# Legend
= plt.gcf().add_axes([0.85, 0.22, 0.12, 0.6])
ax # ax = plt.gcf().add_axes([0.135, 0.38, 0.12, 0.6])
4], np.arange(len(hystcurve))[::4], alpha=0.9,
ax.scatter(np.ones_like(hystcurve)[::=0.75*np.arange(len(hystcurve))[::-1][::4]+1, c=np.arange(len(hystcurve))[::4])
s# ax.annotate('Time', xy=(0.5, 1.07), xycoords='axes fraction', va='center', ha='center', fontsize=9)
'Start', xy=(1.6, 0), xycoords='data', va='center', ha='left', fontsize=8)
ax.annotate('End', xy=(1.6, len(hystcurve)-5), xycoords='data', va='center', ha='left', fontsize=8)
ax.annotate(-10,); ax.set_xlim(0,4)
ax.set_ylim(; ax.set_xticks([])
ax.set_yticks([])for spine in ax.spines.values(): spine.set_edgecolor('grey')
# Save plot
=0.125, right=0.98, top=0.98, bottom=0.2)
plt.subplots_adjust(left"_figs/fig_03Hysteresis.png", dpi=150) plt.savefig(
As one can see, when the discount factor starts to increase, the learners remain close to defection up to the critical point of about 0.83 when they suddenly switch to complete cooperation. However, when the discount factor starts to decrease again, they remain at almost full cooperation until a much smaller value of approx. 0.71. Only then do the agents suddenly become complete defectors again.