# How can I do data cleaning/ data smoothing?

3 views (last 30 days)
lil brain on 15 Mar 2023
Answered: Gayatri Rathod on 3 Apr 2023
I have a cell array called "pre_data" with 1 column and 27 rows. Each element in the column contains a cell with 21 columns and a varying number of rows.
I want to scan the columns in the cells of "pre_data". For each separate column, if there are values in a column that are above 3 standard deviations of that column, then I want the row cointaining that value to be removed.
Additionally, I want to create a cell array called "removed_pre_data" that has the same structure as "pre_data" but includes all the values that were removed.
How would I go about doing that?

Gayatri Rathod on 3 Apr 2023
Hi Lil,
To accomplish this task, you can use a loop to iterate through each column of each cell in the pre_data cell array. For each column, you can compute the mean and standard deviation of the values and identify which rows have values that are above 3 standard deviations. You can then remove those rows and store them in a separate cell array called removed_pre_data.
You can follow the below steps to achieve the desired result:
• Initialize an empty cell array called "removed_pre_data" with the same structure as "pre_data".
removed_pre_data = cell(size(pre_data));
• Loop through each cell in the column of "pre_data" and compute the mean and standard deviation of each column using the mean and std functions.
Col_mean = mean(col_data) %returns the mean of the elements of col_data
Col_std = std(col_data) %returns the standard deviation of the elements of col_data
• Loop through each cell in the column of "pre_data" again and remove the rows that contain values greater than 3 standard deviations away from the mean using the find function.
find(required_condition)
• Store the removed rows in a new cell array called "removed_data" with the same structure as "pre_data".
• Assign the updated cells in the column of "pre_data" with the remaining values.
• Assign the removed cells to the corresponding cells in "removed_pre_data".
You can read more about the cell, mean, std and find functions from the following documentations: cell function, mean function, std function, find function.
Hope it helps!
Regards,
Gayatri Rathod