(PDF)
Introduction
A generalized linear model for binary response data has the form
\Pr\left(y=1\mid x\right)=g^{-1}\left(x^{\prime}\beta\right)where y is the 0/1 response variable, x is the n-vector of predictor variables, \beta is the vector of regression coefficients, and g is the link function. In the Stan modeling language this would be written as
y ~ bernoulli(p); g(p) <- dot_product(x, beta);
with g replaced by the name of a link function, and similarly for the BUGS modeling language.
The most common choices for the link function are
- logit: g(p)=\log\left(\frac{p}{1-p}\right);
- probit:
g^{-1}(\eta)=\Phi(\eta)
where \Phi is the cumulative distribution function for the standard normal distribution; and
- complementary log-log (cloglog): g(p)=\log\left(-\log\left(1-p\right)\right).
All three of these are strictly increasing, continuous functions with g(0)=-\infty and g(1)=+\infty.
In this note we’ll discuss when to use each of these link functions.
Probit
The probit link function is appropriate when it makes sense to think of y as obtained by thresholding a normally distributed latent variable z:
\begin{array}{rcl} z & = & x^{\prime}\beta^{*}+\varepsilon\\ \varepsilon & \sim & \text{Normal}\left(0,\sigma\right)\\ y & = & \begin{cases} 1 & \text{if }z\geq0\\ 0 & \text{otherwise}. \end{cases} \end{array}Defining \beta=\beta^{*}/\sigma, this yields
\begin{array}{rcl} \Pr\left(y=1\mid x\right) & = & \Pr\left(x^{\prime}\beta^{*}+\varepsilon\geq0\right)\\ & = & \Pr\left(-\varepsilon\leq x^{\prime}\beta^{*}\right)\\ & = & \Pr\left(\varepsilon\leq x^{\prime}\beta^{*}\right)\\ & = & \Phi\left(x^{\prime}\beta\right). \end{array}Logit
Logit is the default link function to use when you have no specific reason to choose one of the others. There is a specific technical sense in which use of logit corresponds to minimal assumptions about the relationship between y and x. Suppose that we describe the joint distribution for x and y by giving
- the marginal distribution for x, and
- the expected value of x_{i}y for each predictor variable x_{i}.
Then the maximum-entropy (most spread-out, diffuse, least concentrated) joint distribution for x and y satisfying the above description has a pdf of form
p\left(x,y\right)=\frac{1}{Z}f(x)\exp\left(\sum_{i=1}^{n}\beta_{i}x_{i}y\right)for some function f, coefficient vector \beta and normalizing constant Z. The conditional distribution for y is then
\begin{array}{rcl} p\left(y\mid x\right) & = & \frac{p(x,y)}{p(x,0)+p(x,1)}\\ & = & \frac{\exp\left(\left(x^{\prime}\beta\right)y\right)}{1+\exp\left(x^{\prime}\beta\right)} \end{array}and so
\begin{array}{rcl} \Pr\left(y=1\mid x\right) & = & \frac{\exp\left(x^{\prime}\beta\right)}{1+\exp\left(x^{\prime}\beta\right)}\\ & = & \text{logit}^{-1}\left(x^{\prime}\beta\right). \end{array}Cloglog
The complementary log-log link function arises when
y=\begin{cases} 1 & \text{if }z > 0\\ 0 & \text{if }z=0 \end{cases}where z is a count having a Poisson distribution:
\begin{array}{rcl} z & \sim & \text{Poisson}\left(\lambda\right)\\ \lambda & = & \exp\left(x^{\prime}\beta\right). \end{array}To see this, let
p=\Pr\left(z > 0\mid x\right).Then
\begin{array}{rcl} p & = & 1-\text{Poisson}\left(0\mid\lambda\right)\\ & = & 1-\exp\left(-\lambda\right)\\ & = & 1-\exp\left(-\exp\left(x^{\prime}\beta\right)\right) \end{array}and so
\begin{array}{rcl} \text{cloglog}\left(p\right) & = & \log\left(-\log\left(1-p\right)\right)\\ & = & \log\left(-\log\left(\exp\left(-\exp\left(x^{\prime}\beta\right)\right)\right)\right)\\ & = & x^{\prime}\beta. \end{array}Conclusion
In summary, here is when to use each of the link functions:
- Use probit when you can think of y as obtained by thresholding a normally distributed latent variable.
- Use cloglog when y indicates whether a count is nonzero, and the count can be modeled with a Poisson distribution.
- Use logit if you have no specific reason to choose some other link function.