🎭 Actor-Critic Architectures
Combining policy and value learning for powerful reinforcement learning
Your Progress
0 / 5 completedIntroduction to Actor-Critic
🎯 What is Actor-Critic?
Actor-Critic combines the best of policy gradient and value-based methods. The actor learns the policy (what to do), while the critic evaluates actions by estimating value functions. This synergy reduces variance and accelerates learning.
The critic provides a baseline that reduces the variance of policy gradient updates, making learning more stable and sample-efficient than pure policy gradients.
Learns the policy π(a|s) mapping states to action probabilities
- • Outputs action distribution
- • Updated via policy gradient
- • Guided by critic's feedback
Estimates value function V(s) or Q(s,a) to judge action quality
- • Evaluates state/action pairs
- • Updated via TD learning
- • Provides advantage estimates
🔄 The Actor-Critic Loop
Sample action a from policy π(a|s) given current state s
Execute action, observe reward r and next state s'
Compute TD error: δ = r + γV(s') - V(s)
Critic learns V(s), Actor improves policy using advantage A(s,a)
✅ Advantages
- • Lower variance than REINFORCE
- • More sample-efficient learning
- • Online and incremental updates
- • Works with continuous actions
⚠️ Challenges
- • Two networks to train simultaneously
- • Can suffer from bias
- • Hyperparameter sensitivity
- • Stability issues possible