# Improve Model Fit with Weights

This example shows how to fit a polynomial model to data using both the linear least-squares method and the weighted least-squares method for comparison.

Generate sample data from different normal distributions by using the `randn`

function.

rng("default") % For reproducibility rnorm = []; idx = []; for k=1:20 r = k*randn([20,1]) + (1/20)*(k^3); rnorm = [rnorm;r]; idx = [idx;ones(20,1).*k]; end

The dependent variable `rnorm`

contains sample data from 20 normal distributions. The independent variable `idx`

contains integers indicating whether two elements in `rnorm`

are sampled from the same normal distribution.

Fit a third-degree polynomial model to `idx`

and `rnorm`

. Return information about the coefficient estimates and the algorithm used to fit the model.

`[p3fit,~,fitinfo] = fit(idx,rnorm,"poly3")`

p3fit = Linear model Poly3: p3fit(x) = p1*x^3 + p2*x^2 + p3*x + p4 Coefficients (with 95% confidence bounds): p1 = 0.05156 (0.0438, 0.05932) p2 = -0.03993 (-0.2875, 0.2076) p3 = 0.1418 (-2.124, 2.408) p4 = 0.0462 (-5.585, 5.678)

`fitinfo = `*struct with fields:*
numobs: 400
numparam: 4
residuals: [400×1 double]
Jacobian: [400×4 double]
exitflag: 1
algorithm: 'QR factorization and solve'
iterations: 1

`p3fit`

contains the estimates for the model coefficients with 95% confidence intervals. The default fitting method for fitting a polynomial model is linear least squares. `fitinfo`

contains information about the fitting algorithm used to fit the coefficients to the data. The error in the data can be estimated by the residuals stored in `fitinfo`

.

Plot the residuals using a stem plot.

stem(idx,fitinfo.residuals) xlabel("idx") ylabel("residuals")

The plot of the residuals shows that their variance increases as the values in `idx`

increase. The nonconstant variances across the different values of `idx`

indicate that the weighted least-squares fitting method is more appropriate for calculating the model coefficients.

Create a vector of zeros for later storage of the weights.

W = zeros(length(rnorm),1);

The weights you supply transform the residual variances so that they are constant for different values of `idx`

. Define the weight for each element in `rnorm`

as the reciprocal of the residual variance for the corresponding value in `idx`

. Then fit the model with the weights.

for k=1:20 rnorm_idx = rnorm(idx==k); recipvar = 1/var(rnorm_idx); w = (idx==k).*recipvar; W = W+w; end [wp3fit,~,wfitinfo] = fit(idx,rnorm,"poly3","Weights",W)

wp3fit = Linear model Poly3: wp3fit(x) = p1*x^3 + p2*x^2 + p3*x + p4 Coefficients (with 95% confidence bounds): p1 = 0.04894 (0.04419, 0.0537) p2 = 0.03601 (-0.08744, 0.1595) p3 = -0.4262 (-1.253, 0.4009) p4 = 0.9836 (-0.1959, 2.163)

`wfitinfo = `*struct with fields:*
numobs: 400
numparam: 4
residuals: [400×1 double]
Jacobian: [400×4 double]
exitflag: 1
algorithm: 'QR factorization and solve'
iterations: 1

`wp3fit`

and `wfitinfo`

contain the results of the weighted least-squares fitting.

Display `p3fit`

, `wp3fit`

, and `rnorm`

in the same plot.

plot(p3fit,idx,rnorm) hold on plot(wp3fit,"g") xlabel("idx") ylabel("rnorm") legend(["rnorm","linear least-squares fit","weighted least-squares fit"]) hold off

The plot shows `wp3fit`

closely tracking `p3fit`

.

You can determine whether `wp3fit`

is a better fit than `p3fit`

by plotting the residuals.

stem(idx,wfitinfo.residuals) xlabel("idx") ylabel("residuals")

The output shows that the `wp3fit`

residuals are smaller than the `p3fit`

residuals. The variances of the `wp3fit`

residuals are also more similar for different values of `idx`

than the variances of the `p3fit`

residuals.