Splitting time series sediment data for each station (site_no.)

7 views (last 30 days)
I have a very big one time series .tsv data (~ 2000 site_number). The time series has very large row no. (~20000000 rows) and 9 columns. I want to split the time series and save it for each site no. I'm trying to use text scan and fgetl function in matlab but couldn't effective. Any help please. The sample of the time series:

Answers (2)

KSSV
KSSV on 2 Jan 2018
You may follow the following steps:
1. Load the file using textscan. Here you can give header lines option. Miss the header lines and load only data.
2. Extract the column of station id's.
3. Use ismember. This will help you to separate the station id's.
So overall, you need to read about textscan and ismember.
  5 Comments
KSSV
KSSV on 2 Jan 2018
fid = fopen('daily_data.tsv') ;
% S = textscan(fid,'%s','Delimiter','\n','headerlines',17) ;
str1 = repmat('%*s',1,3) ;
str2 = repmat('%f',1,90) ;
str = ['%s','%f','%s','%f','%s','%f%[^\n]'] ;
S = textscan(fid,str,'HeaderLines',17);
S(end) = [] ;
fclose(fid) ;
%%Site number
SN = S{2} ;
%%time series
time = S{3} ;
%%seperate the indices for number
SN1 = unique(SN) ;
NSN = length(SN1) ; % total number of stations
iwant = cell(NSN,1) ;
for i = 1:NSN
idx = ismember(SN,SN1(i)) ;
iwant{i} = time(idx) ;
end

Sign in to comment.


Abaye Getahun Abebe
Abaye Getahun Abebe on 2 Jan 2018
thanks for the help. But it reads the first four stations only. After the fourth station, there is one line header again. so how to skip the header comes after the fourth station?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!