How to merge two datasets with different sample times
31 views (last 30 days)
Show older comments
Hello,
I have two datasets that I measured, attached in a spreadsheet.
In the first dataset, I collected magnetic field strength as a function of time. The measurement from my sensor was taken every 0.05 seconds.
In the second dataset, I collected output voltage from a source meter (that was applying a current into magnetic coils, creating the magnetic field the sensor was measuring). This was also measured as a function of time, with a measurement being taken every 0.0005 seconds.
As you can see, the second dataset has many more time points than the first. My goal is to sync their time and plot them both on the same time axis. This way I can see a phase difference between the two.
So far, I've felt the best way to go about this was to create timetables of both measurements, then synchronize the two timetables with a union.
I did this by importing the two time columns as number column vectors and then setting them to durations through the S = seconds(Time_Vector) command. Then I imported the measurement columns as number column vectors as well, and used the TT = timetable(S, Measurement_Vector) command. Once I had a timetable for each dataset, i synchronized the two with union as the new timebasis.
This mostly worked, but it fills the places where one table did not have a measurement with NaN, which then messes up my ability to plot them. Is there a way to synchronize these to the same time axis, while having empty values anywhere that a dataset does not have a measurement? Or a way to get rid of the NaN so there is just no y value at that x time point? That way I can plot the values along the same timebase.
Thank you
Answers (1)
Kartikay Sapra
on 7 Oct 2022
The following problem can be solved using 2 different approaches:
- Completely remove 'NaN' values and plot values corresponding to the lower frequency(i.e. value at every 0.05 seconds)
- Estimate/predict a value wherever there is a 'NaN' and plot values corresponding to the higher frequency(i.e. value at every 0.0005 seconds)
The benefit of the first approach is that the plot will only contain 'true' values. This approach is better when you want to compare values from both datasets. However, a lot of variations and trends will not be displayed in the plot for the dataset with higher frequency. For this workflow, you can use 'rmmissing' to remove the rows with 'NaN' values. Refer to the following documentation:
The benefit of the second approach is that there is no 'data loss'. You can preserve the variations in data with higher frequency. In this workflow, the missing values are estimated using the existing data. However, in the presence of a lot of 'NaN' values(think of a sparse matrix) the estimated values may not be the correct representatives of the true values. Use 'fillmissing' with an appropriate input argument for 'method' attribute. The 'method' input for 'fillmissing' allows you to choose the estimation method. Refer to the following documentation:
Based on the sampling rates (0.05 seconds and 0.0005 seconds), the 'NaN' values will be a lot more than numeric values for the first dataset.
0 Comments
See Also
Categories
Find more on Timetables in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!