Replacing missing data in timetable with values from another dataset

3 views (last 30 days)
Ben Wright
Ben Wright on 26 Nov 2021
Answered: Peter Perkins on 28 Nov 2021
Hi all - I have a reasonably large time-series dataset (57777-by-20) of daily mean temperature for 1980 to present day, recorded from a weather station in Denmark. Within the time-series there are several large gaps in the coverage (> 1-2 months) due to maintenance and servicing of the weather station.
The gaps are too large to consider interpolating between missing data, so I would like to replace the missing values with data recorded from a temporary weather station, that was sited near the main station during periods when it was unable to operate. Having visually and statistically investigate the similarity between the two datasets, this seems a sound approach.
I've imported both datasets from .txt files, converted them to timetables, and located and replaced missing data with nan's.
However, I've been unable to successfully replace the nan's with like-for-like data from the secondary dataset (i.e., replacing with data from the exact same datetime). The datasets cover different temporal ranges, so indexing and replacing cells between the data has not worked (see code below).
I think I'm looking for a similar method but indexing both the rowtimes and the missing data from the temperature variable.
It would be great to hear from anyone with a solution to this problem, or with a similar method for replacing missing data within a time-series with values from a different dataset?
% index to find cells containing no-data
[idx] = find(data1.Temp == 999 | data1.Temp == -999);
% replace no-data vaules with nan
data1.Temp(idx) = nan;
% index to find nan's
idx = isnan(data1.Temp);
% DOES NOT WORK SIMPLY REPLACING NAN'S BY INDEXING THE TWO DATASETS,
% AS THEY COVER DIFFERENT TEMPORAL RNAGES
data1.Temp(idx) = data2.Temp(idx);

Accepted Answer

Peter Perkins
Peter Perkins on 28 Nov 2021
IIUC, I think the simplest thing to do would be to remove the rows with missng data, vertically concatenate with the other data, then sort by time:
>> tt1 = timetable(rand(10,1),rand(10,1),'RowTimes',datetime(2021,11,1:10));
tt1.Var1(4:6) = NaN; tt1.Var2(4:6) = NaN
tt1 =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 NaN NaN
05-Nov-2021 NaN NaN
06-Nov-2021 NaN NaN
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
>> tt2 = timetable(rand(3,1),rand(3,1),'RowTimes',datetime(2021,11,4:6))
tt2 =
3×2 timetable
Time Var1 Var2
___________ _______ _______
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
>> tt = sortrows([tt1(~isnan(tt1.Var1),:); tt2])
tt =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
But another way is to assign into one timetable from the other by time:
>> tt = tt1; tt(tt2.Time,:) = tt2
tt =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272

More Answers (0)

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!