How a large timetable (10 million rows by 15 columns) could be modified to a smaller table?

1 view (last 30 days)
Robert
Robert on 2 Jun 2018
Answered: Peter Perkins on 4 Jun 2018
Hello,
Here is an example of my 10 million x 15 columns timetable...
Time Par_Name1 Par_Name2 ... Par_Name12 UniqueID Value
01-Jan-1980 AA X1 ... ... 18 109
02-Jan-1980 AA X1 ... ... 18 23
03-Jan-1980 AA X1 ... ... 18 12
... AA X1 ... ... 18 ...
31-Dec-2017 AA X1 ... ... 18 234
01-Jan-1980 AA X1 ... ... 75 10
02-Jan-1980 AA X1 ... ... 75 21
03-Jan-1980 AA X1 ... ... 75 5
... AA X1 ... ... 75 ...
31-Dec-2017 AA X1 ... ... 75 34
01-Jan-1980 BB X1 ... ... 03 190
02-Jan-1980 BB X1 ... ... 03 12
03-Jan-1980 BB X1 ... ... 03 18
... BB X1 ... ... 03 ...
31-Dec-2017 BB X1 ... ... 03 237
01-Jan-1980 DD X7 ... ... 99 100
02-Jan-1980 DD X7 ... ... 99 21
03-Jan-1980 DD X7 ... ... 99 12
... DD X7 ... ... 99 ...
31-Dec-2017 DD X7 ... ... 99 299
... ... ... ... ... ... ...
... ... ... ... ... ... ...
... ... ... ... ... ... ...
...to a table like this...
Par_Name1 Par_Name2 Par_Name... Par_Name12 UniqueID 01-Jan-1980 02-Jan-1980 03-Jan-1980 ... 31-Dec-2017
AA X1 ... ... 18 109 23 12 ... 234
AA X1 ... ... 75 10 21 5 ... 34
BB X1 ... ... 03 190 12 18 ... 237
DD X7 ... ... 99 100 21 12 ... 299
... ... ... ... ... ... ... ... ... ...
My goals here are: 1) Reduce the size of the table by removing redundancy, and 2) still be able to generate statistics based on a 'Par_Name??' conditions or due to time periods like per Year or Season or a mix of the previous options

Answers (1)

Peter Perkins
Peter Perkins on 4 Jun 2018
This looks like an application of unstack.
Not sure what you'll be doing with that table, so I can't say if that "wide" version is a convenient organization, but you are correct that it should reduce the memory footprint by removing redundant data. It might also be possible to create a timetable with one row for each parameter combination (as you've show), but instead of
>> days(datetime(2017,12,31) - datetime(1980,1,1))
ans =
13879
variables, you'd have one variable that's a cell array, each cell containing one 13879x1 timetable. You'd have some unnecessary redundancy (the time vectors), but that organization might be more convenient. Or maybe not.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!