re-arrange data

Question

0 votes

Hello,

I apprecite yuor help for re-arranging the following data.

input (output from table out=readtable('file.xls','PreserveVariableNames',true) )

{'A'}        {'B1'}         {'X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)'}      {'  X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '}         

appriciated to get the following output matrix:

{'A'} {'B1'} {'X6(1.4M)'} 0

{'A'} {'B1'} {'X15(3M) '} 1

{'A'} {'B1'} {'X25(5M) '} 2

{'A'} {'B1'} {'X50(10M) '} 3

{'A'} {'B1'} {'X75(15M) '} 4

{'A'} {'B1'} {'X100(20M) '} 5

{'X6(1.4M)'} from element 3 before ccomma,

0, 1,....5 field 4 after ~

Thank you,

Br..

Joseph

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Adam Danz on 18 Mar 2021

Open in MATLAB Online

0 votes

It looks like you want the output to be a cell array.

out =  {'A', 'B1', 'X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)', '  X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '};  
x = strtrim(strsplit(out{3},','))';
n = str2double(regexp(strtrim(strsplit(out{end},',')), '\d+$','match','once'))';
M = [repmat(out(1:2),numel(x),1), x, num2cell(n)]
M = 6×4 cell array
    {'A'}    {'B1'}    {'X6(1.4M)' }    {[0]}
    {'A'}    {'B1'}    {'X15(3M)'  }    {[1]}
    {'A'}    {'B1'}    {'X25(5M)'  }    {[2]}
    {'A'}    {'B1'}    {'X50(10M)' }    {[3]}
    {'A'}    {'B1'}    {'X75(15M)' }    {[4]}
    {'A'}    {'B1'}    {'X100(20M)'}    {[5]}

Alternatively, a table,

T = table(string(repmat(out(1),numel(x),1)), ...
    string(repmat(out(2),numel(x),1)),...
    string(x), ...
    n, 'VariableNames', {'A','B','X','n'})
T = 6×4 table
     A      B           X         n
    ___    ____    ___________    _

    "A"    "B1"    "X6(1.4M)"     0
    "A"    "B1"    "X15(3M)"      1
    "A"    "B1"    "X25(5M)"      2
    "A"    "B1"    "X50(10M)"     3
    "A"    "B1"    "X75(15M)"     4
    "A"    "B1"    "X100(20M)"    5

15 Comments
Show 13 older comments Hide 13 older comments

joseph chahin on 19 Mar 2021

Edited: dpb on 19 Mar 2021

Open in MATLAB Online

Thank you Adam for the reaction. I tried to implement your proposal for each row of the table but it did not work. i think it has to do with the manner of reading the excel file and cell content from the table,

Example of Original Excel data is:

filename: Testmat.xls

X      Y          Type1      Z1                                        Z2 
A1      B1      Type1      M1, M2, M3, M4                   M1~0, M2~5, M3~2, M4~3 
A1      B2      Type1      N1, N2                                  N1~7, N2~10 
A1      B3      Type1      S1, S2, S3, S4, S5, S6         S1~0, S2~1, S3~2, S4~3, S5~4, S6~5 
A2      B1      Type1      P1, P2, P3                             P1~0, P2~1, P3~2 

to read the Excel file i used the following:

data=readtable('Testmat.xls','PreserveVariableNames',true)

the desired output wihch i could not get is:

  X  Y  Type1  Z1  Z2 
   A1  B1  Type1  M1  0 
   A1  B1  Type1  M2  5 
   A1  B1  Type1  M3  2 
   A1  B1  Type1  M4  3 
   A1  B2  Type1  N1  7 
   A1  B2  Type1  N2  10 
   A1  B3  Type1  S1  0 
   A1  B3  Type1  S2  1 
   A1  B3  Type1  S3  2 
   A1  B3  Type1  S4  3 
   A1  B3  Type1  S5  4 
   A1  B3  Type1  S6  5 
   A2  B1  Type1  P1  0 
   A2  B1  Type1  P2  1 
   A2  B1  Type1  P3  2 

Thanks agian for your help.

Br. Joseph

dpb on 19 Mar 2021

Edited: dpb on 19 Mar 2021

Open in MATLAB Online

Having created a Testmat.xls file from the above data

data=readcell('Testmat.xls');                           % read as cell array
M=[];                                                   % empty container; don't know ending size
for i=2:size(data,2)                                    % all rows except header
  t=strtrim(split(data{i,4},','));                      % the strings trimmed
  n=str2double(extractAfter(split(data{i,5},','),'~')); % the numeric values
  M=[M;repmat(data(i,1:3),numel(n),1), t, num2cell(n)]; % add to exist w/ duplicated data
end
tM=cell2table(M,'VariableNames',data(1,:));             % convert to a table

results in

>> tM
  tM =
  15×5 table
      X         Y         Type1        Z1       Z2  
    ______    ______    _________    ______    _____
    {'A1'}    {'B1'}    {'Type1'}    {'M1'}     0.00
    {'A1'}    {'B1'}    {'Type1'}    {'M2'}     5.00
    {'A1'}    {'B1'}    {'Type1'}    {'M3'}     2.00
    {'A1'}    {'B1'}    {'Type1'}    {'M4'}     3.00
    {'A1'}    {'B2'}    {'Type1'}    {'N1'}     7.00
    {'A1'}    {'B2'}    {'Type1'}    {'N2'}    10.00
    {'A1'}    {'B3'}    {'Type1'}    {'S1'}     0.00
    {'A1'}    {'B3'}    {'Type1'}    {'S2'}     1.00
    {'A1'}    {'B3'}    {'Type1'}    {'S3'}     2.00
    {'A1'}    {'B3'}    {'Type1'}    {'S4'}     3.00
    {'A1'}    {'B3'}    {'Type1'}    {'S5'}     4.00
    {'A1'}    {'B3'}    {'Type1'}    {'S6'}     5.00
    {'A2'}    {'B1'}    {'Type1'}    {'P1'}     0.00
    {'A2'}    {'B1'}    {'Type1'}    {'P2'}     1.00
    {'A2'}    {'B1'}    {'Type1'}    {'P3'}     2.00
>> 

I'd undoubtedly turn X, Y, Z1 and Type into categorical variables.

PS: OBTW, I had/have FORMAT BANK at moment, hence the two decimal digits displayed in the table...

Adam Danz on 23 Mar 2021

Edited: Adam Danz on 23 Mar 2021

Open in MATLAB Online

Any time a cell in the spreadsheet breaks the expected pattern you either need to fix that cell in the spreadsheet or include a detection & correction in the code.

The 2nd row of H6 clearly breaks the pattern ("None").

If you're dealing with just a few spreadsheets it may be quickest to correct those spreadsheets. Otherwise, you'll need to detect & correct in the code.

For example, this fixes the 2nd iteration where the problem is caused by H6="None". H6 in the code is an empty cell {''}. The first line fills in empties with a default value of '0'. The second part tests whether the number of elements in H6 is the same as the number of elements in n. If there is a discrepancy then it just adds '0's to the end which is probably not a good solution because I don't know which values are missing.

    % Fill in empties with default values
    H6(cellfun(@isempty, H6)) = {'0'};
    % Fill in missing values 
    if numel(H6) < numel(n)
        H6 = [H6; repmat({'0'},numel(n)-numel(H6),1)];
    end

This is how you'll need to deal with the cells that break the expected pattern.

Short anecdote:

A while back I was hired to clean data that involved dozens of TSV files that were to be reformatted. The TSV files were prepared by different people who exported the data differently and I tried explaining to them that learning how to re-export the data uniformly might take a couple of days but cleaning it on the code-end will take months. They didn't take my advice and it was an expensive task that involved hundreds of lines of exception detection & correction and each time I was given a new batch of files several new inconsistencies were discovered. It's almost always better to fix the input files rather than working around inconsistencies with the code.

joseph chahin on 23 Mar 2021

Script_RearrangeData.m

Those cases occur many times. I meant a few cases but are repeated in teh data. it is a large set of data. this was just an example. Anyway, i have solved with an easy manner. Please enclosed my script which it gives the desired output:

H_out =

{'A'} {'a0'} {'a_0'} {[0]} {'On' } {'1' }

{'A'} {'a0'} {'a_1'} {[1]} {'Off'} {'0' }

{'A'} {'a0'} {'a_2'} {[2]} {'Off'} {'0' }

{'B'} {'b0'} {'b_0'} {[0]} {'Off'} {'None'}

{'B'} {'b0'} {'b_1'} {[1]} {'Off'} {'None'}

{'B'} {'b0'} {'b_2'} {[2]} {'Off'} {'None'}

{'B'} {'b0'} {'b_3'} {[3]} {'Off'} {'None'}

{'C'} {'c0'} {'c_0'} {[0]} {'On' } {'On' }

{'C'} {'c0'} {'c_1'} {[1]} {'Off'} {'Off' }

{'C'} {'c0'} {'c_3'} {[2]} {'Off'} {'Off' }

{'C'} {'c0'} {'c_4'} {[4]} {'Off'} {'-' }

{'D'} {'d0'} {'d_0'} {[0]} {'Off'} {'Off' }

{'D'} {'d0'} {'d_1'} {[1]} {'Off'} {'Off' }

Adam Danz on 24 Mar 2021

I'm glad it all worked out without too much of a mess.

Sign in to comment.

Answer 2

David Hill on 18 Mar 2021

Open in MATLAB Online

0 votes

Output will have to be a cell array.

Input= {'A','B1','X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)','  X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '};     
a=regexp(Input{3},'[X()0-9.M]+(?=,)','match');
b=regexp(Input{4},'(?<=~)\d+','match');
for k=1:length(a)
    Output{k,1}=Input{1};
    Output{k,2}=Input{2};
    Output{k,3}=a{k};
    Output{k,4}=b{k};
end

1 Comment
Show -1 older comments Hide -1 older comments

joseph chahin on 19 Mar 2021

Best David, thank you for the reaction. I tried you script as well. still not get the output. Please see down my reaction to Adam. Thx. Br. Joseph.

Sign in to comment.

Answer 3

dpb on 18 Mar 2021

Edited: dpb on 18 Mar 2021

Open in MATLAB Online

0 votes

>> C=[{'A'},        {'B1'},         {'X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)'},      {'  X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '} ]
C =
  1×4 cell array
    {'A'}    {'B1'}    {'X6(1.4M), X15(3M), X25(5M), X50(10M), X75…'}    {'  X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '}
>> split(C{3},',')
ans =
  6×1 cell array
    {'X6(1.4M)'  }
    {' X15(3M)'  }
    {' X25(5M)'  }
    {' X50(10M)' }
    {' X75(15M)' }
    {' X100(20M)'}
>> str2double(extractAfter(split(C{4},','),'~'))
ans =
          0
          1
          2
          3
          4
          5
>> 

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

re-arrange data

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

15 Comments
Show 13 older comments Hide 13 older comments

More Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

re-arrange data

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

15 Comments Show 13 older comments Hide 13 older comments

More Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

15 Comments
Show 13 older comments Hide 13 older comments

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments