Sometimes when using a Linear Regression to analyse our data, we may think that the model we are using is the best fit for our data because the p-value is significant (p-value< 0.05) and the r-square value is close to 1 (R² is high), however this may not always be the case. We should always assess the appropriateness of the model by defining the residuals and examining the residual plots.

**But what is a residual?**

A residual is the difference between the observed value of the dependent variable (*y*) and the predicted value (*ŷ*). Each data point has one residual.

Residual = Observed value – Predicted value

* e* = *y* – *ŷ*

*Note: The sum and the mean of the residuals are equal to zero.*

**And a residual plot, what is it?**

A **residual plot** is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, this means that our linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

The residual plots show ‘two typical’ patterns: a **random pattern (indicating a good fit for a linear model**) and a non-random pattern (U-shaped and inverted U), suggesting a better fit for a non-linear model.

**How do we do a residual plot in tableau?**

**1 –** On the sheet that you have visualised your scatterplot, go to worksheet menu and select export data

**2 –** On the menu box select ‘Connect after Export’ and click OK (this will now be saved as an Access file and it will open automatically in Tableau)

**3-** On a new sheet, drag the recently created field ‘residuals’ to the rows and your independent variable to columns (x axis) – in this example ‘wind speed’

- We can now see that the residuals aren’t randomly distributed, they do follow a pattern – a ‘sigmoid distribution’. This tell us, that contrary of what we believed (by analysis of p-value and r-square) the Linear Regression model isn’t the best fit model to our data. We should now look for other types of models, namely the non-linear models.

Example of a residual plot showing that our linear regression model is the best fit to our data

** Random Distribution around zero**

**Note:** when accessing p-value and R-squared don’t forget to analyse the number of observations and degrees of freedom, as these may indicate an artificially high r-squared value.

High number of observations and low degrees of freedom indicate that the high r-square value may be due to external reasons.

Weird. I don’t get the “Export Data to Access” windows when I select Export. Using Tableau 10.0

Hi Jorge,

Thank you for raising this point, I have tried myself and it doesn’t seem to prompt the same window as the 9.3 version.

You have to export the data, save it in your local machine and then open it in Tableau. The procedure from here is the same as in 9.3

I hope you find this helpful,

Nisa.