import numpy as np
import sympy as sp
import pandas as pd
import matplotlib.pyplot as plt
from copy import deepcopy
Ex | Individual Learning
Task 1 | Learning the risky policy
In the lecture, we explored how the agent learns a cautious policy within the risk-reward dilemma. Investigate the learning process for parameter combinations that make the risky policy optimal (DiscountFactor=0.6
, CollapseProbability=0.1
, RecoveryProbability=0.1
, SafeReward=0.5
, RiskyReward=1.0
, DegradedReward=0.0
). What parameters of the learning process, such as learning rate and choice intensity, allow the agent to consistently learn the risky policy?
# ...
How does the learning process change if you change the transition probabilities to CollapseProbability=0.05
, RecoveryProbability=0.005
?
# ...
Task 2 | Ecological public good
Implement the ecological public good from Lecture 03.03 as a reinforcement learning environment. Ensure your EcologicalPublicGood
class inherits from the base Environment class.
# ...
Let two agents learn in it and visualize the learning process.
# ...
Briefly discuss your findings.