Ex | Individual Learning

Open the latest version on the web, Github or in GoogleColab

import numpy as np
import sympy as sp
import pandas as pd
import matplotlib.pyplot as plt
from copy import deepcopy

Learning the risky policy

In the lecture, we explored how the agent learns a cautious policy within the risk-reward dilemma. Investigate the learning process for parameter combinations that make the risky policy optimal (DiscountFactor=0.6, CollapseProbability=0.1, RecoveryProbability=0.1, SafeReward=0.5, RiskyReward=1.0, DegradedReward=0.0). What parameters of the learning process, such as learning rate and choice intensity, allow the agent to consistently learn the risky policy?

# ...

How does the learning process change if you change the transition probabilities to CollapseProbability=0.05, RecoveryProbability=0.005?

# ...