How to extract specific rows with requirements from multiple CSV files?
5 views (last 30 days)
Show older comments
Hi,
I am having a question regarding to data extracting and need your help.
I have roughly 700 CSV files under three folders with the same format. Each CSV contains 40 columns, and the columns I am interested in are the 32th and 33th column, which are "Latitude" and "Longitude".
Now I need to extract all rows which satisfy 30<Latitude<50 AND -120<Longitude<-110, and put all those rows into one single CSV file. What I have known so far is to open the file, textscan those data into 40 different cells, and close the file.
Could someone help me on this issue? Thanks!
Typical CSV files are like:
Date Time ...... Latitude Longitude ......
08/27 7:00 ...... 33.44 110.43 ......
08/28 7:00 ...... 35.44 160.43 ......
08/29 7:00 ...... 33.44 120.48 ......
08/29 7:00 ...... 32.44 150.42 ......
08/29 7:00 ...... 38.44 130.46 ......
08/29 7:00 ...... 53.44 110.63 ......
......
Two out of 600 sample CSV files have been attached.
2 Comments
Akira Agata
on 28 Aug 2017
To import CSV file, readtable function would be more easier, because it reads csv data and saves it to table format.
If you could upload your csv file here, I (...or someone) can provide more detailed solution. So, I would recommend uploading your sample data.
Accepted Answer
Akira Agata
on 28 Aug 2017
Hi Ziran-san, thank you for uploading some sample csv files. Thanks to these files, I could understand the point!
Based on these files, I believe the solution for your problem will be like the following. In this code, I assumed that your csv files are stored in ./data folder.
But looking at your sample csv files, there are no rows that matches your condition. So, using your two csv files, the output of the following code is empty (I hope this is simply due to these two sample files...). Anyway, I would be happy if my answer could be your help! :-)
% List of CSV files
dataFolder = './data/'
list = dir([dataFolder,'*.csv']);
% CSV data format
format = ['%{MM/dd/yy HH:mm:ss}D',repmat('%f',1,39)];
% Extraction condition
latLim = [33.975, 33.982];
lonLim = [-117.373, -117.331];
% Read each CSV file, extract rows and store them to 'Output'
Output = table();
for kk = 1:numel(list)
data = readtable(...
fullfile(list(kk).folder, list(kk).name),...
'HeaderLines',2,...
'Format', format);
idxLat = (data{:,32} >= latLim(1)) & (data{:,32} <= latLim(2));
idxLon = (data{:,33} >= lonLim(1)) & (data{:,33} <= lonLim(2));
idx = idxLat & idxLon;
Output = [Output; data(idx,:)];
end
3 Comments
More Answers (0)
See Also
Categories
Find more on Application Deployment in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!