---
title: "partialling_out"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{partialling_out}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)
```


## Getting started

Let's make a simple linear model first.


```{r}
library(partialling.out)
library(tinytable)
library(tinyplot)
library(palmerpenguins)
library(fwlplot)
model <- lm(bill_length_mm ~ bill_depth_mm + species, data = penguins)
summary(model)
```

Using the `partialling_out` function, you can get the residualised variable of interest (`bill_length_mm`) and of the first explanatory variable (`bill_depth_mm`), i.e. it would return the residuals of the following two regressions.

```{r eval = FALSE}

modely <- lm(bill_length_mm ~ species, data = penguins)

modelx <- lm(bill_depth_mm ~ species, data = penguins)

```


```{r}
res <- partialling_out(model, data = penguins)

tt(head(res)) |>
  format_tt(digits = 2) |>
  style_tt(align = "c")
```

Accordingly, the coefficient of `res_bill_depth_mm` in the model `lm(res_bill_length_mm ~ res_bill_depth_mm)` will be the same of the coefficient of `bill_depth_mm` in the original model.

```{r}
resmodel <- lm(res_bill_length_mm ~ res_bill_depth_mm, data = res)

print(c(model$coefficients[2], resmodel$coefficients[2]))
```


If `both` is set to `FALSE`, the function will return the actual Y values and the residualised X values.

```{r}
tt(head(partialling_out(model, penguins, both = FALSE))) |>
  format_tt(digits = 2) |>
  style_tt(align = "c")
```


### Weighted models


If `weights` are specified in `partialling_out` they will be used to weight the underlying partial models. The weights will also be returned as a column in the result data.frame. If the original model is weighted but the partial models aren't, the function will throw a warning (also if the original model is not weighted but partial models are).


```{r}
model <- lm(
  bill_length_mm ~ bill_depth_mm + species,
  data = penguins,
  weights = penguins$body_mass_g
)
res <- partialling_out(model, data = penguins, weights = penguins$body_mass_g)


tt(head(res)) |>
  format_tt(digits = 2) |>
  style_tt(align = "c")
```


Note that, for the FWL theorem to hold, if the partial models are weighted, the linear model between the two residualised variables must be weighted as well.


```{r}
unweighted_model <- lm(res_bill_length_mm ~ res_bill_depth_mm, data = res)

weighted_model <- lm(
  res_bill_length_mm ~ res_bill_depth_mm,
  weights = res$weights,
  data = res
)


data.frame(
  "original model" = model$coefficients[2],
  "unweighted model" = unweighted_model$coefficients[2],
  "weighted_model" = weighted_model$coefficients[2]
) |>
  tt() |>
  format_tt(digits = 4) |>
  style_tt(align = "c")

```


## Fixed effects models
As stated, the model will also work with `feols` or `felm` models
```{r}
library(fixest)

model_fixest <- feols(bill_length_mm ~ bill_depth_mm | species, data = penguins)

res_fixest <- partialling_out(model_fixest, data = penguins)
tt(head(res_fixest)) |>
  format_tt(digits = 2) |>
  style_tt(align = "c")
```


```{r}
library(lfe)

model_lfe <- felm(bill_length_mm ~ bill_depth_mm | species, data = penguins)

res_lfe <- partialling_out(model_lfe, data = penguins)


tt(head(res_lfe)) |>
  format_tt(digits = 2) |>
  style_tt(align = "c")
```

## Interaction terms and varying slopes

So far, interaction terms with `:`, `*`, or `^` (in `feols`) as well as varying slopes in `feols` with `[` or `[[` are supported only for variables other than the main explanatory variable, i.e.

```{r}
#| eval: false

feols(bill_depth_mm ~ bill_length_mm | species^sex, data = penguins) |>
  partialling_out(data = penguins)
```
is supported, but

```{r}
#| eval: false

feols(bill_depth_mm ~ i(bill_length_mm, species) | sex, data = penguins) |>
  partialling_out(data = penguins)
```

will yield an error. Therefore, if your main explanatory variable is an interaction, it should be built outside the model if you intend to partial it out afterwards.

### AsIs and polynomial terms

So far, AsIs expressions, and polynomial terms in the formula are not yet supported, so these two expressions will yield an error.

```{r}
#| eval: false
partialling_out(lm(bill_depth_mm ~ I(bill_length_mm^2) + species,
                   data = penguins))

partialling_out(lm(bill_depth_mm ~  poly(bill_length_mm, 2) + species,
                   data = penguins[!is.na(penguins$bill_length_mm), ]))

```

## Plotting the results

The library includes the function `plot_partial_residuals()` to plot a scatterplot of the residualised variables with a tendence line (which can be disabled if the parametre `add_lm` is set to `FALSE`) using [`tinyplot`](https://github.com/grantmcdermott/tinyplot/) as backend. If the parametre `quantile` is set to `TRUE` it will divide the main explanatory variable in the specified number of quantiles (by default, 50) and plot the mean values for X and Y for each quantile.


```{r fig.height = 8, fig.width = 6}

tinytheme("tufte")
par(mfrow = c(2, 1))


plot_partial_residuals(res_fixest)
plot_partial_residuals(res_fixest, quantile = TRUE)
```

Plotting all residuals is equivalent to the following approach in `fwlplot`

```{r fig.height = 4, fig.width = 6}

fwlplot(bill_length_mm ~ bill_depth_mm | species, data = penguins)

```

## Adding other parameters to the model

Any parameters that could be passed to `lm()`, `feols()`, or `felm()`, can be passed to  `partialling_out()`.

```{r}
model_fixest <- feols(
  bill_length_mm ~ bill_depth_mm | species + island,
  data = penguins,
  cluster = ~species
)


tt(head(partialling_out(model_fixest, data = penguins, cluster = ~species))) |>
  format_tt(digits = 2) |>
  style_tt(align = "c")
```


## Acknowledgements
To the authors of the [fwlplot](https://github.com/kylebutts/fwlplot) package, Kyle Butts and Grant McDermott, which has provided inspiration and ideas for this project.
To my colleague Andreu Arenas-Jal for his insight and guiding.