# How to Preprocess Time Series Data with MATLAB

#### Introduction

In data science it is always said that 80% of the time is spent in preparing data and 20% of time is spent complaining about the need to prepare data. So let’s try solving this issue by preparing and preprocessing time series data in less time and with a few lines of code in MATLAB.

#### DEMO

For this example, we will be using data from Phasor Measurement Units (PMUs)

• We have 15 PMUs which are measuring the voltage magnitude, voltage angle, and frequency synchronized to a GPS clock system
• They are in 3 time zones in Australia – Eastern Daylight Savings time zone, Central Daylight Savings time zone, and Eastern Standard time
• The PMUs were sampled at 3 different sampling rates: 30, 60, 120Hz
• This occurred at different time offsets as the PMUs did not start recording at the same time

So yes, we have data that needs to be aligned and cleaned for any further analysis.

Our data is in three Excel spreadsheets, one for each of the three time zones, and we will be importing the data using the Import Tool. If we open the spreadsheet in MATLAB it, by default, opens in the Import Tool. Click on import data. The data is then stored in the workspace as a table.

For processing the time series data and working on time-specific functions, MATLAB has a datatype `timetable` that makes the processing easier. To convert the table into timetable our time should be in `datetime` format.

So now we will extract the time column from each table and convert it into datetime using the `datetime` function. In the same line of code, we are addressing the first challenge of our dataset; that the data is in three different time zones with different offsets. So, in the same step we align the time zones by giving the time zones as the options: Australia/Sydney for Eastern Daylight time, Australia/Adelaide for Central Daylight time and Australia/Brisbane for Eastern Standard time.

Let’s plot our frequency measurement with the time from each PMU to visually confirm the time alignment. As we see frequencies overlap. Hence, we have our time alignment correct.

Once our time columns are in datetime format and aligned we convert all our three tables into timetables by using the function `table2timetable`. We now have our three tables in the format of timetables.

Next, we proceed to cleaning the data. As we can see in this linked MATLAB documentation that we can handle and clean timetables using various workflows like finding and removing the missing data, sorting and removing duplicates, etc.

Here we demonstrate one way of working with duplicate times. First, with the unique function we create a vector `UniqueTime` which returns the unique and sorted rows of the time variable. This vector does not contain the duplicate row times. Just for verification we check here that we have seven duplicate row rimes. To handle that we use the `retime` function to calculate the mean for all other variables with duplicate row times. Hence now we have a new table UniquedataACDT with no duplicate row times. We do a similar process for the other two tables.  We will now work with the three new tables.

Missing data can be processed in different ways. For a single timetable we can use the `rmissing` function and remove all the rows with missing data. As we do here for the UniquedataACDT table.

The other method is using interpolation while joining the tables.

We join all the three tables into one timetable using the `synchronize` function. Synchronize collects the variables from all input timetables and outputs a single timetable with all the variables concatenated horizontally.

Here we specify `union` to synchronize on a time vector that includes times from all three timetables. And `linear` to fill in missing timetable elements with linear interpolation.

Hence now we have a Total table consisting of all the three clean timetables.

Once we have our data all cleaned up, we can do some visual analysis by plotting the data.

Here we show one example using the `stackedplot` function. `Stackedplot` plots the variables of a timetable against row times. We plot the voltage magnitudes VM6, VM1, and VM11 against the row times from the total combined table.