Clean Timetable with Missing, Duplicate, or Nonuniform Times
This example shows how to create a regular timetable from one that has missing, duplicate, or nonuniform times. A timetable is a type of table that associates a time-stamp, or row time, with each row of data. In a regular timetable, the row times are sorted and unique, and differ by the same regular time step.
Also, some toolboxes have functions that work on regularly spaced time series data in the form of numeric arrays. So the example also shows how to export the data from a timetable for use with other functions.
There are a number of issues with row times that can make timetables irregular. The row times can be missing. They can be out of order. They can be duplicates, creating multiple rows with the same time that might have the same or different data. And even when they are present, sorted, and unique, they can differ by time steps of different sizes.
Timetables provide a number of different ways to resolve missing, duplicate, or nonuniform times, and to resample or aggregate data to create regular row times.
To find missing row times, use
ismissing
.To remove missing times and data, use
rmmissing
.To sort a timetable by its row times, use
sortrows
.To make a timetable with unique and sorted row times, use
unique
andretime
.To remove duplicate times, specify a vector of unique times and use
retime
.To make a regular timetable, specify a regular time vector and use
retime
.
Load Timetable
Load a sample timetable from the MAT-file badTimes
that contains weather measurements taken over several hours on June 9, 2016. The timetable TT
includes temperature, rainfall, and wind speed measurements taken at irregular times during that day.
load badTimes
TT
TT=12×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 09:53:57 59 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
NaT 56 0 0
09-Jun-2016 09:53:57 67 0.03 3.4
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 08:49:10 75.8 0.01 2.7
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 08:49:10 67.2 0.01 2.7
09-Jun-2016 04:12:00 58.8 NaN NaN
Find and Remove Rows with Missing Row Times
One way to begin is by finding and removing rows that have a NaT
, or missing value, as the row time. To find missing values in the vector of row times, use ismissing
. The ismissing
function returns a logical vector that contains 1
wherever TT.Time
has a missing value.
natRowTimes = ismissing(TT.Time)
natRowTimes = 12x1 logical array
0
0
0
0
1
0
0
0
0
0
⋮
To keep only those rows that do not have missing values as row times, index into TT
using ~natRowTimes
as row indices. Assign those rows to a new timetable, goodRowTimesTT
.
goodRowTimesTT = TT(~natRowTimes,:)
goodRowTimesTT=11×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 09:53:57 59 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 08:49:10 75.8 0.01 2.7
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 08:49:10 67.2 0.01 2.7
09-Jun-2016 04:12:00 58.8 NaN NaN
This method removes only the rows that have missing row times. The timetable variables still might have missing data values. For example, the last row of goodRowTimesTT
has NaN
values for the Rain
and Windspeed
variables.
Remove Rows with Missing Times and Missing Data
As an alternative, you can remove both missing row times and missing data values at the same time by using the rmmissing
function. rmmissing
removes any timetable rows that have missing row times, missing data values, or both.
Display the missing row time and missing data values of TT
.
TT
TT=12×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 09:53:57 59 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
NaT 56 0 0
09-Jun-2016 09:53:57 67 0.03 3.4
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 08:49:10 75.8 0.01 2.7
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 08:49:10 67.2 0.01 2.7
09-Jun-2016 04:12:00 58.8 NaN NaN
Remove all rows that have missing row times or data values. Assign the remaining rows to the timetable goodValuesTT
.
goodValuesTT = rmmissing(TT)
goodValuesTT=10×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 09:53:57 59 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 08:49:10 75.8 0.01 2.7
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 08:49:10 67.2 0.01 2.7
Sort Timetable and Determine If It Is Regular
After dealing with missing values, you can go on to sort your timetable and then determine if the sorted timetable is regular.
To determine if goodValuesTT
is already sorted, use the issorted
function.
tf = issorted(goodValuesTT)
tf = logical
0
Since it is not, sort the timetable on its row times by using the sortrows
function.
sortedTT = sortrows(goodValuesTT)
sortedTT=10×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 08:49:10 75.8 0.01 2.7
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 08:49:10 67.2 0.01 2.7
09-Jun-2016 09:53:57 59 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
Determine whether sortedTT
is regular. A regular timetable has the same time interval between consecutive row times. Even a sorted timetable can have time steps that are not uniform.
tf = isregular(sortedTT)
tf = logical
0
Since it is not, display the differences between row times.
diff(sortedTT.Time)
ans = 9x1 duration
00:57:53
01:58:19
00:49:47
00:00:00
00:00:00
00:00:00
01:04:47
00:00:00
00:00:00
Since the row times are sorted, this result shows that some row times are unique and some are duplicates.
Remove Duplicate Rows
Timetables can have duplicate rows. Timetable rows are duplicates if they have the same row times and the same data values. In this example, the last two rows of sortedTT
are duplicate rows. (There are other rows in sortedTT
that have duplicate row times but differing data values.)
To remove the duplicate rows from sortedTT
, use unique
. The unique
function returns the unique rows and sorts them by their row times.
uniqueRowsTT = unique(sortedTT)
uniqueRowsTT=9×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 08:49:10 67.2 0.01 2.7
09-Jun-2016 08:49:10 75.8 0.01 2.7
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 09:53:57 59 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
Find Rows with Duplicate Times and Different Data
Timetables can have rows with duplicate row times but different data values. In this example, uniqueRowsTT
has several rows with the same row times but different values.
Find the rows that have duplicate row times. First, sort the row times and find consecutive times that have no difference between them. Times with no difference between them are the duplicates. Index back into the vector of row times and return a unique set of times that identify the duplicate row times in uniqueRowsTT
.
dupTimes = sort(uniqueRowsTT.Time); tf = (diff(dupTimes) == 0); dupTimes = dupTimes(tf); dupTimes = unique(dupTimes)
dupTimes = 2x1 datetime
09-Jun-2016 08:49:10
09-Jun-2016 09:53:57
To display the rows with duplicate row times, index into uniqueRowsTT
using dupTimes
. When you index on times, the output timetable contains all rows with matching row times.
uniqueRowsTT(dupTimes,:)
ans=6×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 08:49:10 67.2 0.01 2.7
09-Jun-2016 08:49:10 75.8 0.01 2.7
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 09:53:57 59 0.03 3.4
09-Jun-2016 09:53:57 67 0.03 3.4
Select First and Last Rows with Duplicate Times
When a timetable has rows with duplicate times, you might want to select particular rows and discard the other rows having duplicate times. For example, you can select either the first or the last of the rows with duplicate row times by using the unique
and retime
functions.
First, create a vector of unique row times from TT
by using unique
.
uniqueTimes = unique(uniqueRowsTT.Time)
uniqueTimes = 5x1 datetime
09-Jun-2016 05:03:11
09-Jun-2016 06:01:04
09-Jun-2016 07:59:23
09-Jun-2016 08:49:10
09-Jun-2016 09:53:57
Select the first row from each set of rows that have duplicate times. To copy data from the first rows, specify the 'firstvalue'
method.
firstUniqueRowsTT = retime(uniqueRowsTT,uniqueTimes,'firstvalue')
firstUniqueRowsTT=5×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 08:49:10 62 0.01 2.7
09-Jun-2016 09:53:57 59 0.03 3.4
Select the last rows from each set of rows that have duplicate times. To copy data from the last rows, specify the 'lastvalue'
method.
lastUniqueRowsTT = retime(uniqueRowsTT,uniqueTimes,'lastvalue')
lastUniqueRowsTT=5×3 timetable
Time Temp Rain WindSpeed
____________________ ____ ____ _________
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 08:49:10 82 0.01 2.7
09-Jun-2016 09:53:57 67 0.03 3.4
As a result, the last two rows of firstUniqueRowsTT
and lastUniqueRowsTT
have different values in the Temp
variable.
Aggregate Data from All Rows with Duplicate Times
Another way to deal with data in the rows having duplicate times is to aggregate or combine the data values in some way. For example, you can calculate the means of several measurements of the same quantity taken at the same time.
Calculate the mean temperature, rainfall, and wind speed for rows with duplicate row times using the retime
function.
meanTT = retime(uniqueRowsTT,uniqueTimes,'mean')
meanTT=5×3 timetable
Time Temp Rain WindSpeed
____________________ _____ ____ _________
09-Jun-2016 05:03:11 66.2 0.05 3
09-Jun-2016 06:01:04 73 0.01 2.3
09-Jun-2016 07:59:23 59 0.08 0.9
09-Jun-2016 08:49:10 71.75 0.01 2.7
09-Jun-2016 09:53:57 63 0.03 3.4
As a result, the last two rows of meanTT
have mean temperatures in the Temp
variable for the rows with duplicate row times.
Make Timetable Regular
Finally, you can resample data from an irregular timetable to make it regular by using the retime
function. For example, you can interpolate the data from meanTT
onto a regular hourly time vector. To use linear interpolation, specify 'linear'
. Each row time in hourlyTT
begins on the hour, and there is a one-hour interval between consecutive row times.
hourlyTT = retime(meanTT,'hourly','linear')
hourlyTT=6×3 timetable
Time Temp Rain WindSpeed
____________________ ______ ________ _________
09-Jun-2016 05:00:00 65.826 0.0522 3.0385
09-Jun-2016 06:00:00 72.875 0.010737 2.3129
09-Jun-2016 07:00:00 66.027 0.044867 1.6027
09-Jun-2016 08:00:00 59.158 0.079133 0.9223
09-Jun-2016 09:00:00 70.287 0.013344 2.8171
09-Jun-2016 10:00:00 62.183 0.031868 3.4654
Instead of using a predefined time step such as 'hourly'
, you can specify a time step of your own. To specify a time step of 30 minutes, use the 'regular'
input argument and the 'TimeStep'
name-value argument. You can specify a time step of any size as a duration
or calendarDuration
value.
regularTT = retime(meanTT,'regular','linear','TimeStep',minutes(30))
regularTT=11×3 timetable
Time Temp Rain WindSpeed
____________________ ______ ________ _________
09-Jun-2016 05:00:00 65.826 0.0522 3.0385
09-Jun-2016 05:30:00 69.35 0.031468 2.6757
09-Jun-2016 06:00:00 72.875 0.010737 2.3129
09-Jun-2016 06:30:00 69.576 0.027118 1.9576
09-Jun-2016 07:00:00 66.027 0.044867 1.6027
09-Jun-2016 07:30:00 62.477 0.062616 1.2477
09-Jun-2016 08:00:00 59.158 0.079133 0.9223
09-Jun-2016 08:30:00 66.841 0.03695 2.007
09-Jun-2016 09:00:00 70.287 0.013344 2.8171
09-Jun-2016 09:30:00 66.235 0.022606 3.1412
09-Jun-2016 10:00:00 62.183 0.031868 3.4654
Extract Regular Timetable Data into Array
You can export the timetable data for use with functions to analyze data that is regularly spaced in time. For example, the Econometrics Toolbox™ and the Signal Processing Toolbox™ have functions you can use for further analysis on regularly spaced data.
Extract the timetable data as an array. You can use the Variables
property to return the data as an array, as long as the table variables have data types that allow them to be concatenated.
A = regularTT.Variables
A = 11×3
65.8260 0.0522 3.0385
69.3504 0.0315 2.6757
72.8747 0.0107 2.3129
69.5764 0.0271 1.9576
66.0266 0.0449 1.6027
62.4768 0.0626 1.2477
59.1579 0.0791 0.9223
66.8412 0.0370 2.0070
70.2868 0.0133 2.8171
66.2348 0.0226 3.1412
⋮
regularTT.Variables
is equivalent to using curly brace syntax, regularTT{:,:}
, to access the data in the timetable variables.
A2 = regularTT{:,:}
A2 = 11×3
65.8260 0.0522 3.0385
69.3504 0.0315 2.6757
72.8747 0.0107 2.3129
69.5764 0.0271 1.9576
66.0266 0.0449 1.6027
62.4768 0.0626 1.2477
59.1579 0.0791 0.9223
66.8412 0.0370 2.0070
70.2868 0.0133 2.8171
66.2348 0.0226 3.1412
⋮
See Also
timetable
| table2timetable
| retime
| issorted
| sortrows
| unique
| diff
| isregular
| rmmissing
| fillmissing
Related Topics
- Resample and Aggregate Data in Timetable
- Combine Timetables and Synchronize Their Data
- Retime and Synchronize Timetable Variables Using Different Methods
- Select Times in Timetable
- Grouped Calculations in Tables and Timetables
- Add Event Table from External Data to Timetable
- Find Events in Timetable Using Event Table