knnimpute in training/ testing sets
2 views (last 30 days)
Show older comments
Dear support
I am planning to convert my machine learning code from R to MATLAB in which I impute the missing variable using KNN. In the R code, I impute the missing data after I spilt them into training and testing sets to prevent the double dipping. So the R code simple will be as follow:
- Impute missing values in the training dataset (mltrain) only:
- mltrain2 <- DMwR::knnImputation(mltrain)
- Impute missing values in the testing dataset (mltest) using a data frame (here the training dataset) containing the data set that should be used to find the neighbours
- mltest <- DMwR::knnImputation(mltest,distData = mltrain)
In MATLAB, I tried to use (knnimpute) on the training and testing datasets seperatly in the same way as the R code above, however, there is no option to pass the training data frame during the imputation of the missing values of the testing dataset.
Any suggestion on how to solve this issue?
Sincerely
Salim AL-Wasity
0 Comments
Answers (1)
Aditya Patil
on 24 Dec 2020
Currently this functionality is not available in knnimpute. I have brought this request to the notice of concerned developers. It might be considered in any of the future releases.
As a workaround, you can train regression models on training data, and use them to predict missing values in the test dataset. Mulitple models might be required if data is missing in multiple columns.
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!