On Boltzmann Rationality
Modeling Decision-Making
A decision-maker selects an action $a_i$ from a set of $n$ actions ${a_1,\dots,a_n}$, where each action has an associated utility $U(a_i)$.
A driver is approaching an intersection where the traffic light has just turned yellow. There is a 98% chance of safely crossing (+100), a 2% chance of causing an accident (-200), and a 100% chance of stopping safely (+80). The driver must choose between:
- $a_1$: Accelerate to make it through the light: $$ \mathbb{E}[U(a_1)] = (0.98 \cdot 100) + (0.02 \cdot -200) = 94. $$
- $a_2$: Decelerate and stop at the light: $$ \mathbb{E}[U(a_2)] = 1.0 \cdot 80 = 80. $$
Perfect Rationality
In perfect rationality, one always chooses the action with the highest utility: \(p(a_i) = \Pr\left(\arg\max_{j} U(a_j) = a_i\right).\)
Under perfect rationality, the driver would always choose $a_1$ (Accelerate). However, human decision-making often involves uncertainty, emotions, or incomplete information, leading to different choices.
Boltzmann Rationality
The action probability is given by: $$ p(a_i) = \frac{\exp(-\beta\cdot U(a_i))}{\sum_{j=1}^n\exp(-\beta\cdot U(a_j))}. $$ Here $\beta$ is the rationality coefficient:
- $\beta\to\infty$: Perfect rationality.
- $\beta\to 0$: Uniform random choice.
The probability of the driver choosing $a_1$ (Accelerate) is proportional to its utility, but $a_2$ (Stop) still has a chance.
Analogy to Boltzmann Distribution Boltzmann rationality is analogous to the Boltzmann distribution in statistical mechanics: \(p_i = \frac{\exp\left(-\frac{\epsilon_i}{\kappa T}\right)}{\sum_{j=1}^n\exp\left(-\frac{\epsilon_j}{\kappa T}\right)},\) where $\epsilon_i$ is energy, $\kappa$ is the Boltzmann constant, and $T$ is temperature.
History
- Duncan Luce (1959) derived the logic formula from assumptions about the characteristics of choice probabilities, namely the “independence from irrelevant alternatives” property [1–2].
- Jacob Marschak (1960) showed that these axioms implied that the model is consistent with utility maximization.
- Tony Marley developed the relation of the logit formula to the distribution of unobserved utility.
- Luce and Suppes (1965) showed that the extreme value distribution leads to the logit formula.
- Daniel McFadden (1974) completed the analysis by showing that the logit formula implies that unobserved utility is distributed extreme value [3].
- Ziebart et al. proposed the Boltzmann model of noisily rational behavior in trajectory space [4].
See Chapter 3 (Logit) of [5] for an overview.
Interpretation 1: Solving Max-Entropy Problem
$$ \max_{p(a_i)} \quad \mathbb{E}_{p}[U(a_i)] + H(p) $$ $$ \text{s.t.} \quad \sum_{i=1}^n p(a_i) = 1, $$ where:
- $\mathbb{E}_{p}[U(a_i)] = \sum_{i=1}^n p(a_i) U(a_i)$: Expected utility of the action distribution.
- $H(p) = -\sum_{i=1}^n p(a_i) \log p(a_i)$: Entropy of the probability distribution $p(a_i)$.
This optimization balances exploitation (maximizing utility) and exploration (maximizing entropy). Using Lagrange multipliers, we derive the Boltzmann rationality equation.
Interpretation 2: Utility Perturbed by Gumbel Noise
Let each action's utility $U(a_i)$ be perturbed by independent and identically distributed (i.i.d.) Gumbel noise $\varepsilon_i$: $$ \tilde{U}(a_i) = U(a_i) + \varepsilon_i, \quad \varepsilon_i \sim \text{Gumbel}(0, 1). $$
Recall the Gumbel distribution $\varepsilon_i \sim \text{Gumbel}(0,1)$: \(\text{PDF: } f(\varepsilon_i) = e^{-\varepsilon_i} e^{-e^{-\varepsilon_i}},\) \(\text{CDF: } F(\varepsilon_i) = e^{-e^{-\varepsilon_i}}.\)