Which Link Function — Logit, Probit, or Cloglog?

Introduction

A generalized linear model for binary response data has the form

\Pr\left(y=1\mid x\right)=g^{-1}\left(x^{\prime}\beta\right)

where $y$ is the 0/1 response variable, $x$ is the $n$ -vector of predictor variables, $\beta$ is the vector of regression coefficients, and $g$ is the link function. In the Stan modeling language this would be written as

  y ~ bernoulli(p);
  g(p) <- dot_product(x, beta);

with $g$ replaced by the name of a link function, and similarly for the BUGS modeling language.

The most common choices for the link function are

logit: $g(p)=\log\left(\frac{p}{1-p}\right);$
probit: $g^{-1}(\eta)=\Phi(\eta)$
where $\Phi$ is the cumulative distribution function for the standard normal distribution; and
complementary log-log (cloglog): $g(p)=\log\left(-\log\left(1-p\right)\right).$

All three of these are strictly increasing, continuous functions with $g(0)=-\infty$ and $g(1)=+\infty$ .

In this note we’ll discuss when to use each of these link functions.

Probit

The probit link function is appropriate when it makes sense to think of $y$ as obtained by thresholding a normally distributed latent variable $z$ :

\begin{array}{rcl} z & = & x^{\prime}\beta^{*}+\varepsilon\\ \varepsilon & \sim & \text{Normal}\left(0,\sigma\right)\\ y & = & \begin{cases} 1 & \text{if }z\geq0\\ 0 & \text{otherwise}. \end{cases} \end{array}

Defining $\beta=\beta^{*}/\sigma$ , this yields

\begin{array}{rcl} \Pr\left(y=1\mid x\right) & = & \Pr\left(x^{\prime}\beta^{*}+\varepsilon\geq0\right)\\ & = & \Pr\left(-\varepsilon\leq x^{\prime}\beta^{*}\right)\\ & = & \Pr\left(\varepsilon\leq x^{\prime}\beta^{*}\right)\\ & = & \Phi\left(x^{\prime}\beta\right). \end{array}

Logit

Logit is the default link function to use when you have no specific reason to choose one of the others. There is a specific technical sense in which use of logit corresponds to minimal assumptions about the relationship between $y$ and $x$ . Suppose that we describe the joint distribution for $x$ and $y$ by giving

the marginal distribution for $x$ , and
the expected value of $x_{i}y$ for each predictor variable $x_{i}$ .

Then the maximum-entropy (most spread-out, diffuse, least concentrated) joint distribution for $x$ and $y$ satisfying the above description has a pdf of form

p\left(x,y\right)=\frac{1}{Z}f(x)\exp\left(\sum_{i=1}^{n}\beta_{i}x_{i}y\right)

for some function $f$ , coefficient vector $\beta$ and normalizing constant $Z$ . The conditional distribution for $y$ is then

\begin{array}{rcl} p\left(y\mid x\right) & = & \frac{p(x,y)}{p(x,0)+p(x,1)}\\ & = & \frac{\exp\left(\left(x^{\prime}\beta\right)y\right)}{1+\exp\left(x^{\prime}\beta\right)} \end{array}

and so

\begin{array}{rcl} \Pr\left(y=1\mid x\right) & = & \frac{\exp\left(x^{\prime}\beta\right)}{1+\exp\left(x^{\prime}\beta\right)}\\ & = & \text{logit}^{-1}\left(x^{\prime}\beta\right). \end{array}

Cloglog

The complementary log-log link function arises when

y=\begin{cases} 1 & \text{if }z > 0\\ 0 & \text{if }z=0 \end{cases}

where $z$ is a count having a Poisson distribution:

\begin{array}{rcl} z & \sim & \text{Poisson}\left(\lambda\right)\\ \lambda & = & \exp\left(x^{\prime}\beta\right). \end{array}

To see this, let

p=\Pr\left(z > 0\mid x\right).

Then

\begin{array}{rcl} p & = & 1-\text{Poisson}\left(0\mid\lambda\right)\\ & = & 1-\exp\left(-\lambda\right)\\ & = & 1-\exp\left(-\exp\left(x^{\prime}\beta\right)\right) \end{array}

and so

\begin{array}{rcl} \text{cloglog}\left(p\right) & = & \log\left(-\log\left(1-p\right)\right)\\ & = & \log\left(-\log\left(\exp\left(-\exp\left(x^{\prime}\beta\right)\right)\right)\right)\\ & = & x^{\prime}\beta. \end{array}

Conclusion

In summary, here is when to use each of the link functions:

Use probit when you can think of $y$ as obtained by thresholding a normally distributed latent variable.
Use cloglog when $y$ indicates whether a count is nonzero, and the count can be modeled with a Poisson distribution.
Use logit if you have no specific reason to choose some other link function.

Introduction

Probit

Logit

Cloglog

Conclusion

Comments