Intrinsic Motivation — Curiosity & Empowerment
Two complementary drives for self-motivated learning, independent of external reward.
Curiosity (Prediction Error)
Curiosity reward = normalized prediction error. The creature is rewarded for encountering surprising states. Running-mean normalization prevents habituation. Boredom detection triggers exploration bursts after 200 steps without novelty.
normalized_PE = (PE - running_mean) / sqrt(running_var) if normalized_PE > 0.1: intrinsic_reward = min(PE, 1.0) if boredom_counter > 200: intrinsic_reward = 0.5 # forced exploration
Empowerment (Action→State MI)
Empowerment measures how much the creature’s actions influence the world. High empowerment = “my actions have predictable consequences.” Approximated by comparing state variance under high vs low action magnitudes.
Combined Reward
total = (1 - α) × extrinsic + α × (curiosity + empowerment) α = 0.3 default (30% intrinsic)
References
- Schmidhuber (1991). Curious model-building control systems. IJCNN
- Oudeyer & Kaplan (2007). Intrinsic motivation systems. Frontiers in Neurorobotics
- Klyubin, Polani & Nehaniv (2005). Empowerment. Adaptive Behavior
API Reference
CuriosityDrive(config: CuriosityConfig)
compute_intrinsic_reward(prediction_error) → float
Normalized PE reward with boredom detection.
get_neuromodulator_signals() → dict
Returns novelty (0–1), boredom (0–1) for NE/DA modulation.
EmpowermentDrive(config: EmpowermentConfig)
record(action, next_state)
Store action-state pair.
compute_reward() → float
Weighted empowerment score.
CuriosityConfig
alpha: 0.3 boredom_steps: 200 max_reward: 1.0