(PDF)

## Introduction

A generalized linear model for binary response data has the form

where is the 0/1 response variable, is the -vector of predictor variables, is the vector of regression coefficients, and is the link function. In the Stan modeling language this would be written as

y ~ bernoulli(p); g(p) <- dot_product(x, beta);

with replaced by the name of a link function, and similarly for the BUGS modeling language.

The most common choices for the link function are

- logit:
- probit:
where is the cumulative distribution function for the standard normal distribution; and

- complementary log-log (cloglog):

All three of these are strictly increasing, continuous functions with and .

In this note we’ll discuss when to use each of these link functions.

## Probit

The probit link function is appropriate when it makes sense to think of as obtained by thresholding a normally distributed latent variable :

Defining , this yields

## Logit

Logit is the default link function to use when you have no specific reason to choose one of the others. There is a specific technical sense in which use of logit corresponds to minimal assumptions about the relationship between and . Suppose that we describe the joint distribution for and by giving

- the marginal distribution for , and
- the expected value of for each predictor variable .

Then the maximum-entropy (most spread-out, diffuse, least concentrated) joint distribution for and satisfying the above description has a pdf of form

for some function , coefficient vector and normalizing constant . The conditional distribution for is then

and so

## Cloglog

The complementary log-log link function arises when

where is a count having a Poisson distribution:

To see this, let

Then

and so

## Conclusion

In summary, here is when to use each of the link functions:

- Use probit when you can think of as obtained by thresholding a normally distributed latent variable.
- Use cloglog when indicates whether a count is nonzero, and the count can be modeled with a Poisson distribution.
- Use logit if you have no specific reason to choose some other link function.

Allen Downey says

This is very good, so thanks! You might also mention one nice property of logit as a link function: it make the parameters interpretable in terms of log odds ratios.