How to extract specific rows with requirements from multiple CSV files?

5 views (last 30 days)
Hi,
I am having a question regarding to data extracting and need your help.
I have roughly 700 CSV files under three folders with the same format. Each CSV contains 40 columns, and the columns I am interested in are the 32th and 33th column, which are "Latitude" and "Longitude".
Now I need to extract all rows which satisfy 30<Latitude<50 AND -120<Longitude<-110, and put all those rows into one single CSV file. What I have known so far is to open the file, textscan those data into 40 different cells, and close the file.
Could someone help me on this issue? Thanks!
Typical CSV files are like:
Date Time ...... Latitude Longitude ......
08/27 7:00 ...... 33.44 110.43 ......
08/28 7:00 ...... 35.44 160.43 ......
08/29 7:00 ...... 33.44 120.48 ......
08/29 7:00 ...... 32.44 150.42 ......
08/29 7:00 ...... 38.44 130.46 ......
08/29 7:00 ...... 53.44 110.63 ......
......
Two out of 600 sample CSV files have been attached.
  2 Comments
Akira Agata
Akira Agata on 28 Aug 2017
To import CSV file, readtable function would be more easier, because it reads csv data and saves it to table format.
Regarding an extraction of specific rows in table, Access Data in a Table page will be helpful.
If you could upload your csv file here, I (...or someone) can provide more detailed solution. So, I would recommend uploading your sample data.
Ryan Wang
Ryan Wang on 28 Aug 2017
Thank you very much Akira. I just uploaded two csv files, where I have nearly 600 of them. Specifically, the latitude and longitude columns are the 32th and the 33th, and their requirements are 33.975~33.982, and -117.373~-117.331, respectively.

Sign in to comment.

Accepted Answer

Akira Agata
Akira Agata on 28 Aug 2017
Hi Ziran-san, thank you for uploading some sample csv files. Thanks to these files, I could understand the point!
Based on these files, I believe the solution for your problem will be like the following. In this code, I assumed that your csv files are stored in ./data folder.
But looking at your sample csv files, there are no rows that matches your condition. So, using your two csv files, the output of the following code is empty (I hope this is simply due to these two sample files...). Anyway, I would be happy if my answer could be your help! :-)
% List of CSV files
dataFolder = './data/'
list = dir([dataFolder,'*.csv']);
% CSV data format
format = ['%{MM/dd/yy HH:mm:ss}D',repmat('%f',1,39)];
% Extraction condition
latLim = [33.975, 33.982];
lonLim = [-117.373, -117.331];
% Read each CSV file, extract rows and store them to 'Output'
Output = table();
for kk = 1:numel(list)
data = readtable(...
fullfile(list(kk).folder, list(kk).name),...
'HeaderLines',2,...
'Format', format);
idxLat = (data{:,32} >= latLim(1)) & (data{:,32} <= latLim(2));
idxLon = (data{:,33} >= lonLim(1)) & (data{:,33} <= lonLim(2));
idx = idxLat & idxLon;
Output = [Output; data(idx,:)];
end
  3 Comments
Ryan Wang
Ryan Wang on 4 Sep 2017
Hi Akira,
Just one further question. I am getting an error "Reference to non-existent field 'folder' today. Before this happened, the code works pretty well. Do you know why does this happen?
Thanks,
Ziran

Sign in to comment.

More Answers (0)

Categories

Find more on Application Deployment in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!