import numpy as np
import sympy as sp
import pandas as pd
import matplotlib.pyplot as plt
from copy import deepcopyEx | Individual Learning
Open the latest version on the web, Github or in GoogleColab
Learning the risky policy
In the lecture, we explored how the agent learns a cautious policy within the risk-reward dilemma. Investigate the learning process for parameter combinations that make the risky policy optimal (DiscountFactor=0.6, CollapseProbability=0.1, RecoveryProbability=0.1, SafeReward=0.5, RiskyReward=1.0, DegradedReward=0.0). What parameters of the learning process, such as learning rate and choice intensity, allow the agent to consistently learn the risky policy?
# ...How does the learning process change if you change the transition probabilities to CollapseProbability=0.05, RecoveryProbability=0.005?
# ...