On Boltzmann Rationality
Modeling Decision-Making
A decision-maker selects an action \(a_i\) from a set of \(n\) actions \(\{a_1,\dots,a_n\}\), where each action has an associated utility \(U(a_i)\).
A driver is approaching an intersection where the traffic light has just turned yellow. There is a 98% chance of safely crossing (+100), a 2% chance of causing an accident (-200), and a 100% chance of stopping safely (+80). The driver must choose between:
- \(a_1\): Accelerate to make it through the light: \[ \mathbb{E}[U(a_1)] = (0.98 \cdot 100) + (0.02 \cdot -200) = 94. \]
- \(a_2\): Decelerate and stop at the light: \[ \mathbb{E}[U(a_2)] = 1.0 \cdot 80 = 80. \]
Perfect Rationality
In perfect rationality, one always chooses the action with the highest utility: \[ p(a_i) = \Pr\left(\arg\max_{j} U(a_j) = a_i\right). \]
Under perfect rationality, the driver would always choose \(a_1\) (Accelerate). However, human decision-making often involves uncertainty, emotions, or incomplete information, leading to different choices.
Boltzmann Rationality
The action probability is given by: \[ p(a_i) = \frac{\exp(-\beta\cdot U(a_i))}{\sum_{j=1}^n\exp(-\beta\cdot U(a_j))}. \] Here \(\beta\) is the rationality coefficient:
- \(\beta\to\infty\): Perfect rationality.
- \(\beta\to 0\): Uniform random choice.
The probability of the driver choosing \(a_1\) (Accelerate) is proportional to its utility, but \(a_2\) (Stop) still has a chance.
Analogy to Boltzmann Distribution Boltzmann rationality is analogous to the Boltzmann distribution in statistical mechanics: \[ p_i = \frac{\exp\left(-\frac{\epsilon_i}{\kappa T}\right)}{\sum_{j=1}^n\exp\left(-\frac{\epsilon_j}{\kappa T}\right)}, \] where \(\epsilon_i\) is energy, \(\kappa\) is the Boltzmann constant, and \(T\) is temperature.
History
- Duncan Luce (1959) derived the logic formula from assumptions about the characteristics of choice probabilities, namely the "independence from irrelevant alternatives" property [1–2].
- Jacob Marschak (1960) showed that these axioms implied that the model is consistent with utility maximization.
- Tony Marley developed the relation of the logit formula to the distribution of unobserved utility.
- Luce and Suppes (1965) showed that the extreme value distribution leads to the logit formula.
- Daniel McFadden (1974) completed the analysis by showing that the logit formula implies that unobserved utility is distributed extreme value [3].
- Ziebart et al. proposed the Boltzmann model of noisily rational behavior in trajectory space [4].
See Chapter 3 (Logit) of [5] for an overview.
Interpretation 1: Solving Max-Entropy Problem
\[ \max_{p(a_i)} \quad \mathbb{E}_{p}[U(a_i)] + H(p) \] \[ \text{s.t.} \quad \sum_{i=1}^n p(a_i) = 1, \] where:
- \(\mathbb{E}_{p}[U(a_i)] = \sum_{i=1}^n p(a_i) U(a_i)\): Expected utility of the action distribution.
- \(H(p) = -\sum_{i=1}^n p(a_i) \log p(a_i)\): Entropy of the probability distribution \(p(a_i)\).
This optimization balances exploitation (maximizing utility) and exploration (maximizing entropy). Using Lagrange multipliers, we derive the Boltzmann rationality equation.
Interpretation 2: Utility Perturbed by Gumbel Noise
Let each action's utility \(U(a_i)\) be perturbed by independent and identically distributed (i.i.d.) Gumbel noise \(\varepsilon_i\): \[ \tilde{U}(a_i) = U(a_i) + \varepsilon_i, \quad \varepsilon_i \sim \text{Gumbel}(0, 1). \]
Recall the Gumbel distribution \(\varepsilon_i \sim \text{Gumbel}(0,1)\): \[ \text{PDF: } f(\varepsilon_i) = e^{-\varepsilon_i} e^{-e^{-\varepsilon_i}}, \] \[ \text{CDF: } F(\varepsilon_i) = e^{-e^{-\varepsilon_i}}. \]