Categorical Data preprocessing for Data mining

Hello friends
I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.
Oh, thanks
I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.
For instance, when trying to correct those for world bank I tried this expression which is still failing
pat11='wo(rd|rdl|uld|rld)?\s((b\w*|nk|divisio)$)?[^vd]';
newDataClean.installer=regexprep(newDataClean.installer,pat11,'world bank');
Here i was testing the expression in Atom, but it fails to correctly replace those selected words
However, I am still wondering if there could be another "faster" way of approaching the issue!

1 Comment

Question is not clear. Can you elaborate with an example?

Sign in to comment.

Answers (0)

Categories

Find more on MATLAB in Help Center and File Exchange

Asked:

on 6 Oct 2021

Edited:

on 6 Oct 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!