Ex | Individual Learning

import numpy as np
import sympy as sp
import pandas as pd
import matplotlib.pyplot as plt
from copy import deepcopy

Task 1 | Learning the risky policy

In the lecture, we explored how the agent learns a cautious policy within the risk-reward dilemma. Investigate the learning process for parameter combinations that make the risky policy optimal (DiscountFactor=0.6, CollapseProbability=0.1, RecoveryProbability=0.1, SafeReward=0.5, RiskyReward=1.0, DegradedReward=0.0). What parameters of the learning process, such as learning rate and choice intensity, allow the agent to consistently learn the risky policy?

# ...

How does the learning process change if you change the transition probabilities to CollapseProbability=0.05, RecoveryProbability=0.005?

# ...

Task 2 | Ecological public good

Implement the ecological public good from Lecture 03.03 as a reinforcement learning environment. Ensure your EcologicalPublicGood class inherits from the base Environment class.

# ...

Let two agents learn in it and visualize the learning process.

# ...

Briefly discuss your findings.