How to separate a data set into multiple tables sorted by a specific variable

I have datathat is 63351 x 7 cell. The data is different prices of commodities/assets at specific times. At first, all data for AUD/USD is listed at the specific times and dates. Then, the asset switches to EUR/USD and restarts with the same date and times as before. The name of the asset is listed in one of the columns. I need to separate this data into multiple tables based on the specific asset. Any idea how to do this?

1 Comment

if your original matrix is A, if you want to separate each column,you can get each column by B1=A(:,1) B2=A(:,2) ...

Sign in to comment.

 Accepted Answer

If you are using R2016b or later, you probably want to be using timetables instead of cell arrays. Especially if your data are at all large. It's not entirely clear to me what your data look like, or what you'll do with them after you split by asset, but I will also argue that with timetables, you may not need to split the data at all. In any case, here's an example of one simple way to do it.
>> price = rand(20,1);
>> volume = rand(20,1);
>> asset = categorical([1;2;1;2;1;2;1;2;1;2;1;2;1;2;1;2;1;2;1;2],1:2,{'a' 'b'});
>> currency = categorical([1;1;1;1;1;1;1;1;1;1;2;2;2;2;2;2;2;2;2;2],1:2,{'AUD/USD' 'EUR/USD'});
>> date = datetime(2018,1,[1;1;2;2;3;3;4;4;5;5;1;1;2;2;3;3;4;4;5;5]);
>> tt = timetable(date,currency,asset,price,volume)
tt =
20×4 timetable
date currency asset price volume
___________ ________ _____ ________ _______
01-Jan-2018 AUD/USD a 0.69103 0.56856
01-Jan-2018 AUD/USD b 0.091485 0.84191
02-Jan-2018 AUD/USD a 0.30828 0.0533
02-Jan-2018 AUD/USD b 0.62075 0.55293
03-Jan-2018 AUD/USD a 0.59696 0.70035
03-Jan-2018 AUD/USD b 0.19802 0.39625
04-Jan-2018 AUD/USD a 0.39491 0.9954
04-Jan-2018 AUD/USD b 0.75725 0.87266
05-Jan-2018 AUD/USD a 0.11658 0.99732
05-Jan-2018 AUD/USD b 0.46778 0.13998
01-Jan-2018 EUR/USD a 0.45711 0.92185
01-Jan-2018 EUR/USD b 0.30848 0.35911
02-Jan-2018 EUR/USD a 0.21144 0.91209
02-Jan-2018 EUR/USD b 0.8504 0.29512
03-Jan-2018 EUR/USD a 0.71659 0.2831
03-Jan-2018 EUR/USD b 0.57582 0.24675
04-Jan-2018 EUR/USD a 0.54203 0.172
04-Jan-2018 EUR/USD b 0.74173 0.38838
05-Jan-2018 EUR/USD a 0.5002 0.51731
05-Jan-2018 EUR/USD b 0.087709 0.50291
>> tt = sortrows(tt,'asset');
>> [~,start] = unique(tt.asset,'first');
>> stop = [start(2:end)-1; height(tt)];
>> nassets = length(categories(tt.asset));
>> c = cell(1,nassets);
>> for i = 1:nassets, c{i} = sortrows(tt(start(i):stop(i),:)); end
>> c{:}
ans =
10×4 timetable
date currency asset price volume
___________ ________ _____ _______ _______
01-Jan-2018 AUD/USD a 0.69103 0.56856
01-Jan-2018 EUR/USD a 0.45711 0.92185
02-Jan-2018 AUD/USD a 0.30828 0.0533
02-Jan-2018 EUR/USD a 0.21144 0.91209
03-Jan-2018 AUD/USD a 0.59696 0.70035
03-Jan-2018 EUR/USD a 0.71659 0.2831
04-Jan-2018 AUD/USD a 0.39491 0.9954
04-Jan-2018 EUR/USD a 0.54203 0.172
05-Jan-2018 AUD/USD a 0.11658 0.99732
05-Jan-2018 EUR/USD a 0.5002 0.51731
ans =
10×4 timetable
date currency asset price volume
___________ ________ _____ ________ _______
01-Jan-2018 AUD/USD b 0.091485 0.84191
01-Jan-2018 EUR/USD b 0.30848 0.35911
02-Jan-2018 AUD/USD b 0.62075 0.55293
02-Jan-2018 EUR/USD b 0.8504 0.29512
03-Jan-2018 AUD/USD b 0.19802 0.39625
03-Jan-2018 EUR/USD b 0.57582 0.24675
04-Jan-2018 AUD/USD b 0.75725 0.87266
04-Jan-2018 EUR/USD b 0.74173 0.38838
05-Jan-2018 AUD/USD b 0.46778 0.13998
05-Jan-2018 EUR/USD b 0.087709 0.50291
Prior to R2016b, you can do the same thing with tables, but timetables have useful time-based functionality built into them for things like synchronizing multiple data sets and interpolating data.

4 Comments

Peter, I'm not sure if that would work, but I think you did put me on the right track. The data looks like this:
{[ NaN]} {'#RIC'} {'Date-Time' } {'Type' } {'Close Ask'} {'date' } {'time' } {[174360]} {'AUD='} {'2014-05-02T07:00…'} {'Intraday 1Min'} {[ 0.9275]} {[41761]} {'07:00'} {[174361]} {'AUD='} {'2014-05-02T07:01…'} {'Intraday 1Min'} {[ 0.9275]} {[41761]} {'07:01'} {[174362]} {'AUD='} {'2014-05-02T07:02…'} {'Intraday 1Min'} {[ 0.9277]} {[41761]} {'07:02'} {[174363]} {'AUD='} {'2014-05-02T07:03…'} {'Intraday 1Min'} {[ 0.9275]} {[41761]} {'07:03'} {[174364]} {'AUD='} {'2014-05-02T07:04…'} {'Intraday 1Min'} {[ 0.9277]} {[41761]} {'07:04'} {[174365]} {'AUD='} {'2014-05-02T07:05…'} {'Intraday 1Min'} {[ 0.9274]} {[41761]} {'07:05'} {[174366]} {'AUD='} {'2014-05-02T07:06…'} {'Intraday 1Min'} {[ 0.9273]} {[41761]} {'07:06'}
After around 3000 or so observations of AUD, the data swtiches to a different currency. Each of the column headings are the same, but the data starts over at the original starting date and time with a different currency. The goal is to separate the table into multiple tables so that I can work with each currency separately. The end goal is to do an event study based on these currency returns and release of US Non Farm Payroll data (which I have separately).
Is there some way to write an "if" statement that only looks at data if "AUD" is in a certain column?
Thanks, Jack
Peter,
I figured it out. I ran a loop to index each of the different currencies and separated like that.
Thanks for your help
I have a similar issue to this. My data is for a separate application and slightly larger, however, I would like to sort my data by date and save each date and the values in other columns into one Bin if possible. Could you share how this could be possible? I currently have my data in a 156848x4 table.
thanks in advance,
Adam
There are a few ways to do this, but again, I'll offer that you may not need to split into separate tables. In any case, a statement such as
t(t.currency == 'AUD/USD', :)
may be the "if" statement you are looking for.

Sign in to comment.

More Answers (0)

Categories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!