Handling Undesirable Characters in Numeric Columns When Reading a CSV File
    6 views (last 30 days)
  
       Show older comments
    
    Geovane Gomes
 on 31 Jan 2024
  
    
    
    
    
    Commented: Geovane Gomes
 on 31 Jan 2024
            Dear all,
I need to read a .csv file which has some numeric columns that contain undesirable characters, so when I import it using readtable, the lines with these characters ( [u.]) are showed as NAN. Is it possible to remove it before? Or maybe read it as char, then remove and convert to double?
Here some lines of the file:

3 Comments
Accepted Answer
  Stephen23
      
      
 on 31 Jan 2024
        
      Edited: Stephen23
      
      
 on 31 Jan 2024
  
      You should NOT make changes to the original data file.
fnm = 'p1.csv';
opt = detectImportOptions(fnm, 'FileType','delimitedtext',...
    'Delimiter',';', 'DecimalSeparator',',', 'VariableNamingRule','preserve');
opt = setvartype(opt,'double');
opt = setvartype(opt,'MoveStop','datetime');
opt = setvartype(opt,'Operador','string');
idx = strcmp('double',opt.VariableTypes);
opt = setvaropts(opt,idx,'TrimNonNumeric',true);
tbl = readtable(fnm,opt)
tbl = convertvars(tbl,@isstring,@(s)regexprep(s,'\s+\[.*','')) % optional
tbl(180:190,:) % checking the rows where [u.] first appears:
More Answers (1)
  Austin M. Weber
      
 on 31 Jan 2024
        
      Edited: Austin M. Weber
      
 on 31 Jan 2024
  
      If you have Microsoft Excel (or Google Sheets) you can use the Replace Tool to get rid of the [u.] in all of the columns. Simply open your CSV file in Excel, choose the "Replace" tool (I think the shortcut in Windows is Ctrl+H). A dialog box should appear. In the "Find what:" box type [u.] and leave the "Replace with:" box blank. Then click the "Replace All" button. This should get rid of all instances of [u.] in your spreadsheet. Just save the CSV and then you can load it into MATLAB as normal. 
Alternatively, you could probably also do this in MATLAB by converting the numeric columns into strings and then using the strrep function:
% Read table data
T = readtable('table_name.csv')
% Extract the numeric columns in the table
numeric_data = T{:,2:10};
% Convert to strings
numeric_strings = string(numeric_data);
% Delete [u.]
numeric_strings = strrep(numeric_strings, '[u.]','');
% Convert back to numeric data
numeric_data = str2num(numeric_strings);
% Insert back into the table
T{:,2:10} = numeric_data;
Try either of these and let me know if they work.
5 Comments
  Austin M. Weber
      
 on 31 Jan 2024
				@Geovane Gomes, I did not see that you had shared your CSV file before I submitted my original answer. But I just I did the Excel procedure with the file and it seems to work perfectly. When I finished, I saved the file as a tab-delimited text file rather than a CSV so that the commas in your numbers aren't confused as delimiters. You can load the file in MATLAB by typing:
T = readtable('p1_tab_delimited.txt','Delimiter','tab');
  Stephen23
      
      
 on 31 Jan 2024
				"I think I'll have to remove it using another tool before import."
Not required, you can use the import options.
See Also
Categories
				Find more on Data Import from MATLAB in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



