Value functions estimate expected cumulative reward.
State value V(s): Expected return starting from state s, following policy π.
Action value Q(s,a): Expected return taking action a in state s, then following π.
Bellman equation: Recursive relationship.
Interview question: "Difference between V and Q?"
V evaluates states. Q evaluates state-action pairs. Q is more useful for action selection.