Replacing missing data in timetable with values from another dataset

13 views (last 30 days)
Hi all - I have a reasonably large time-series dataset (57777-by-20) of daily mean temperature for 1980 to present day, recorded from a weather station in Denmark. Within the time-series there are several large gaps in the coverage (> 1-2 months) due to maintenance and servicing of the weather station.
The gaps are too large to consider interpolating between missing data, so I would like to replace the missing values with data recorded from a temporary weather station, that was sited near the main station during periods when it was unable to operate. Having visually and statistically investigate the similarity between the two datasets, this seems a sound approach.
I've imported both datasets from .txt files, converted them to timetables, and located and replaced missing data with nan's.
However, I've been unable to successfully replace the nan's with like-for-like data from the secondary dataset (i.e., replacing with data from the exact same datetime). The datasets cover different temporal ranges, so indexing and replacing cells between the data has not worked (see code below).
I think I'm looking for a similar method but indexing both the rowtimes and the missing data from the temperature variable.
It would be great to hear from anyone with a solution to this problem, or with a similar method for replacing missing data within a time-series with values from a different dataset?
% index to find cells containing no-data
[idx] = find(data1.Temp == 999 | data1.Temp == -999);
% replace no-data vaules with nan
data1.Temp(idx) = nan;
% index to find nan's
idx = isnan(data1.Temp);
% DOES NOT WORK SIMPLY REPLACING NAN'S BY INDEXING THE TWO DATASETS,
% AS THEY COVER DIFFERENT TEMPORAL RNAGES
data1.Temp(idx) = data2.Temp(idx);

Accepted Answer

Peter Perkins
Peter Perkins on 28 Nov 2021
IIUC, I think the simplest thing to do would be to remove the rows with missng data, vertically concatenate with the other data, then sort by time:
>> tt1 = timetable(rand(10,1),rand(10,1),'RowTimes',datetime(2021,11,1:10));
tt1.Var1(4:6) = NaN; tt1.Var2(4:6) = NaN
tt1 =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 NaN NaN
05-Nov-2021 NaN NaN
06-Nov-2021 NaN NaN
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
>> tt2 = timetable(rand(3,1),rand(3,1),'RowTimes',datetime(2021,11,4:6))
tt2 =
3×2 timetable
Time Var1 Var2
___________ _______ _______
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
>> tt = sortrows([tt1(~isnan(tt1.Var1),:); tt2])
tt =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
But another way is to assign into one timetable from the other by time:
>> tt = tt1; tt(tt2.Time,:) = tt2
tt =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
  2 Comments
Albert Johan Mamani Larico
This help me a lot!
How to keep values from original timetable(TT1) when I have also values for same datetime in the TT2?
For example, if TT2 has values for nov 7, I would like to keep the original values of TT1 for the same date. What would you recommend?
tt2 = timetable(rand(4,1),rand(4,1),'RowTimes',datetime(2021,11,4:7))
tt2 = 4×2 timetable
Time Var1 Var2 ___________ _______ _______ 04-Nov-2021 0.17953 0.72559 05-Nov-2021 0.96159 0.22381 06-Nov-2021 0.5495 0.59069 07-Nov-2021 0.74658 0.76516
Peter Perkins
Peter Perkins on 20 Oct 2023
I only just now saw this.
Not sure what the question is. The row times for a timetable do not need to be unique, there can be two or more 7-Nov's. You should look at retime if the question involves, "how do I fill in holes"?

Sign in to comment.

More Answers (0)

Categories

Find more on Time Series in Help Center and File Exchange

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!