I recently discovered a wonderful package for easily checking linear
regression assumptions via diagnostic plots: the
`check_model()`

function of the `performance`

package.

Let’s make a quick demonstration of this package first.

```
# Load necessary libraries
library(performance)
library(see)
# Note: if you haven't installed the packages above,
# you'll need to install them first by using:
# install_if_not_installed(c("performance", "see"))
# Create a regression model (using data available in R by default)
model <- lm(mpg ~ wt * cyl + gear, data = mtcars)
```

Wonderful! And efficient. However, sometimes, for different reasons, someone might want to check assumptions with an objective test. Testing each assumption one by one is kind of time-consuming, so I made a convenience function to accelerate this process.

Just a **warning** before jumping into objective
assumption tests. Note these tests are generally NOT recommended! With a
large sample size, objective assumption tests will be OVER-sensitive to
deviations from expected values, while with a small sample size, the
objective assumption tests will be UNDER-powered to detect real,
existing deviations. It can also mask other visual patterns not
reflected in a single number. It is thus recommended to use visual
assessment of diagnostic plots instead (e.g., see Shatz, 2023, Kozak
& Piepho, 2018, https://doi.org/10.1111/jac.12220; Schucany & Ng,
2006). For a quick and accessible argument against tests of
normality specifically, see __this
source__.

Load the `rempsyc`

package:

If you haven’t installed this package yet, you will need to install it via the following command:Note:`install.packages("rempsyc")`

. Furthermore, you may be asked to install the following packages if you haven’t installed them already (you may decide to install them all now to avoid interrupting your workflow if you wish to follow this tutorial from beginning to end):

```
pkgs <- c(
"flextable", "performance", "see", "lmtest", "ggplot2",
"qqplotr", "ggrepel", "patchwork", "boot"
)
install_if_not_installed(pkgs)
```

The raw output doesn’t look so nice in the console because the column names are so long, but it does look good in the viewer or when exported to a word processing software because the column names get wrapped. You can try it in RStudio yourself with the following command:

Model | Normality (Shapiro-Wilk) | Homoscedasticity (Breusch-Pagan) | Autocorrelation of residuals (Durbin-Watson) | Diagnostic |
---|---|---|---|---|

mpg ~ wt * cyl + gear | .615 | .054 | .525 | 0.00 |

Interpretation: (p) values < .05 imply assumptions are not respected. Diagnostic is how many assumptions are not respected for a given model or variable.

This function is particularly useful when testing several models simultaneously as it makes for a nice table of results. Let’s make a short demonstration of this.

```
# Define our dependent variables
DV <- names(mtcars[-1])
# Make list of all formulas
formulas <- paste(DV, "~ mpg")
# Make list of all models
models.list <- lapply(X = formulas, FUN = lm, data = mtcars)
# Make diagnostic table
assumptions.table <- nice_assumptions(models.list)
```

Use the Viewer for better results

Or alternatively with the nice_table() function

Model | Normality (Shapiro-Wilk) | Homoscedasticity (Breusch-Pagan) | Autocorrelation of residuals (Durbin-Watson) | Diagnostic |
---|---|---|---|---|

cyl ~ mpg | .361 | .282 | .460 | 0.00 |

disp ~ mpg | .506 | .831 | .077 | 0.00 |

hp ~ mpg | .004** | .351 | .198 | 1.00 |

drat ~ mpg | .939 | .887 | .505 | 0.00 |

wt ~ mpg | .020* | .270 | .002** | 2.00 |

qsec ~ mpg | .427 | .944 | .011* | 1.00 |

vs ~ mpg | .142 | .568 | .238 | 0.00 |

am ~ mpg | .074 | .650 | < .001*** | 1.00 |

gear ~ mpg | .001** | .528 | < .001*** | 2.00 |

carb ~ mpg | .008** | .362 | .003** | 2.00 |

If you have categorical predictors (e.g., groups), then it is suggested that you check assumptions directly on the raw data rather than on the residuals (source).

In this case, one can make qqplots on the different groups to first check the normality assumption.

Generally speaking, as long as the dots lie within the confidence band, all is good and the data distribute normally.

Change (or reorder) colours, x-axis title or y-axis title, names of
groups, no grid lines, or add the *p*-value from the Shapiro-Wilk
(if you wish so).

```
nice_qq(
data = iris,
variable = "Sepal.Length",
group = "Species",
colours = c("#00BA38", "#619CFF", "#F8766D"),
groups.labels = c("(a) Setosa", "(b) Versicolor", "(c) Virginica"),
grid = FALSE,
shapiro = TRUE,
title = NULL
)
```

A *p* > .05 suggests that your data is normally distributed
(according to the Shapiro-Wilk test). Inversely, a *p* < .05
suggests your data is not normally distributed. Keep in mind however
that there are several known problems with objective tests, so visual
assessment is recommended. Nonetheless, for beginners, it can sometimes
be useful (or rather, reassuring) to be able to pair the visual
assessment to the tests values.

Some people think that density distributions are less useful than qqplots, but sometimes, people still like to look at distributions, so I made another function for this.

**Pro tip**: Save density plots with a narrower width
for better-looking results.

Change (or reorder) colours, x-axis title or y-axis title, names of
groups, no grid lines, or add the *p*-value from the Shapiro-Wilk
(if you wish so).

```
nice_density(
data = iris,
variable = "Sepal.Length",
group = "Species",
colours = c("#00BA38", "#619CFF", "#F8766D"),
xtitle = "Sepal Length",
ytitle = "Density (vs. Normal Distribution)",
groups.labels = c("(a) Setosa", "(b) Versicolor", "(c) Virginica"),
grid = FALSE,
shapiro = TRUE,
histogram = TRUE,
title = "Density (Sepal Length)"
)
```