import numpy as np
import sympy as sp
import pandas as pd
import matplotlib.pyplot as plt
from copy import deepcopyEx | Individual Learning
Task 1 | Learning the risky policy
In the lecture, we explored how the agent learns a cautious policy within the risk-reward dilemma. Investigate the learning process for parameter combinations that make the risky policy optimal (DiscountFactor=0.6, CollapseProbability=0.1, RecoveryProbability=0.1, SafeReward=0.5, RiskyReward=1.0, DegradedReward=0.0). What parameters of the learning process, such as learning rate and choice intensity, allow the agent to consistently learn the risky policy?
# ...How does the learning process change if you change the transition probabilities to CollapseProbability=0.05, RecoveryProbability=0.005?
# ...Task 2 | Ecological public good
Implement the ecological public good from Lecture 03.03 as a reinforcement learning environment. Ensure your EcologicalPublicGood class inherits from the base Environment class.
# ...Let two agents learn in it and visualize the learning process.
# ...Briefly discuss your findings.