## Residual Analysis

### Plotting and Analysing Residuals

The residuals from a fitted model are defined as the differences between the response data and the fit to the response data at each predictor value.

*residual* = *data* – *fit*

You display the residuals in Curve Fitting app by selecting
the toolbar button or menu item **View** > **Residuals Plot**.

Mathematically, the residual for a specific predictor value
is the difference between the response value *y* and
the predicted response value *ŷ*.

*r* = *y* – *ŷ*

Assuming the model you fit to the data is correct, the residuals approximate the random errors. Therefore, if the residuals appear to behave randomly, it suggests that the model fits the data well. However, if the residuals display a systematic pattern, it is a clear sign that the model fits the data poorly. Always bear in mind that many results of model fitting, such as confidence bounds, will be invalid should the model be grossly inappropriate for the data.

A graphical display of the residuals for a first degree polynomial fit is shown below. The top plot shows that the residuals are calculated as the vertical distance from the data point to the fitted curve. The bottom plot displays the residuals relative to the fit, which is the zero line.

The residuals appear randomly scattered around zero indicating that the model describes the data well.

A graphical display of the residuals for a second-degree polynomial fit is shown below. The model includes only the quadratic term, and does not include a linear or constant term.

The residuals are systematically positive for much of the data range indicating that this model is a poor fit for the data.

### Example: Residual Analysis

This example fits several polynomial models to generated data
and evaluates how well those models fit the data and how precisely
they can predict. The data is generated from a cubic curve, and there
is a large gap in the range of the *x* variable where
no data exist.

x = [1:0.1:3 9:0.1:10]'; c = [2.5 -0.5 1.3 -0.1]; y = c(1) + c(2)*x + c(3)*x.^2 + c(4)*x.^3 + (rand(size(x))-0.5);

Fit the data in
the Curve Fitting app using a cubic polynomial and a fifth degree
polynomial. The data, fits, and residuals are shown below. Display
the residuals in the Curve Fitting app by selecting **View** > **Residuals Plot**.

Both models appear to fit the data well, and the residuals appear to be randomly distributed around zero. Therefore, a graphical evaluation of the fits does not reveal any obvious differences between the two equations.

Look at the numerical fit results in the **Results** pane
and compare the confidence bounds for the coefficients.

The results show that the cubic fit coefficients are accurately
known (bounds are small), while the quintic fit coefficients are not
accurately known. As expected, the fit results for `poly3`

are
reasonable because the generated data follows a cubic curve. The 95%
confidence bounds on the fitted coefficients indicate that they are
acceptably precise. However, the 95% confidence bounds for `poly5`

indicate
that the fitted coefficients are not known precisely.

The goodness-of-fit statistics are shown in the **Table of Fits**. By default, the adjusted R-square
and RMSE statistics are displayed in the table. The statistics do
not reveal a substantial difference between the two equations. To
choose statistics to display or hide, right-click the column headers.

The 95% nonsimultaneous prediction bounds for new observations
are shown below. To display prediction bounds in the Curve Fitting
app, select **Tools** > **Prediction
Bounds** > **95%**.

The prediction bounds for `poly3`

indicate
that new observations can be predicted with a small uncertainty throughout
the entire data range. This is not the case for `poly5`

.
It has wider prediction bounds in the area where no data exist, apparently
because the data does not contain enough information to estimate the
higher degree polynomial terms accurately. In other words, a fifth-degree
polynomial overfits the data.

The 95% prediction bounds for the fitted function using `poly5`

are
shown below. As you can see, the uncertainty in predicting the function
is large in the center of the data. Therefore, you would conclude
that more data must be collected before you can make precise predictions
using a fifth-degree polynomial.

In conclusion, you should examine all available goodness-of-fit measures before deciding on the fit that is best for your purposes. A graphical examination of the fit and residuals should always be your initial approach. However, some fit characteristics are revealed only through numerical fit results, statistics, and prediction bounds.