Clear Filters
Clear Filters

Imputing missing values for a RM-ANOVA

6 views (last 30 days)
lil brain
lil brain on 15 Mar 2023
Commented: Scott MacKenzie on 3 Apr 2023
I have 4 experimental groups/conditions and 5 measurement times for each group/condition. Each participant only took part in one of the 1 conditions. In total there are 27 participants and each condition has around 6 participants.
In my data set, several participants are missing a value during one of the 5 measurement times. So there are several cases (rows) where the participant does not have 5 measurement times but only 4 for example. The missing values are completely random and exist in all conditions.
My problem is the following. Because RM-ANOVA does case-wise deletion I end up with around 6 fewer cases which severely impacts my results. What I am wondering is if it is feasible to impute the missing data using a regression for example. And how many values can I impute before it is no longer feasible?
Many thanks!
Jeff on 15 Mar 2023
As an alternative to imputation you could analyze the data using dummy variable regression. Set up dummy variables to code the conditions, time points, interaction, and subjects, and then assess SS's for each set of terms via regression model comparisons. You''ll know there are missing data--there are no rows with some of the possible combinations of dummy variables (e.g., time point 3 for subject 4 in group 1), but the regression algorithm doesn't need a complete dataset to produce least-squares estimates. A nice feature of this approach is that the df's will be computed "automatically" from the real data you do have, with no need to make df adjustments reflecting imputation.
Scott MacKenzie
Scott MacKenzie on 3 Apr 2023
Provided the missing values represent a small percentage of all values and they are more-or-less randomly dispersed among the data set, a technique you can consider is to replace the missing values with the grand mean -- the mean of all the other values. Then, proceed with the ANOVA. The ANOVA will be more conservative than if the missing values were present. So, if the ANOVA yields any significant effects, they are likely valid (since they are on the conservative side).

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!