Hello,
I apprecite yuor help for re-arranging the following data.
input (output from table out=readtable('file.xls','PreserveVariableNames',true) )
{'A'} {'B1'} {'X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)'} {' X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '}
appriciated to get the following output matrix:
{'A'} {'B1'} {'X6(1.4M)'} 0
{'A'} {'B1'} {'X15(3M) '} 1
{'A'} {'B1'} {'X25(5M) '} 2
{'A'} {'B1'} {'X50(10M) '} 3
{'A'} {'B1'} {'X75(15M) '} 4
{'A'} {'B1'} {'X100(20M) '} 5
{'X6(1.4M)'} from element 3 before ccomma,
0, 1,....5 field 4 after ~
Thank you,
Br..
Joseph

 Accepted Answer

It looks like you want the output to be a cell array.
out = {'A', 'B1', 'X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)', ' X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '};
x = strtrim(strsplit(out{3},','))';
n = str2double(regexp(strtrim(strsplit(out{end},',')), '\d+$','match','once'))';
M = [repmat(out(1:2),numel(x),1), x, num2cell(n)]
M = 6×4 cell array
{'A'} {'B1'} {'X6(1.4M)' } {[0]} {'A'} {'B1'} {'X15(3M)' } {[1]} {'A'} {'B1'} {'X25(5M)' } {[2]} {'A'} {'B1'} {'X50(10M)' } {[3]} {'A'} {'B1'} {'X75(15M)' } {[4]} {'A'} {'B1'} {'X100(20M)'} {[5]}
Alternatively, a table,
T = table(string(repmat(out(1),numel(x),1)), ...
string(repmat(out(2),numel(x),1)),...
string(x), ...
n, 'VariableNames', {'A','B','X','n'})
T = 6×4 table
A B X n ___ ____ ___________ _ "A" "B1" "X6(1.4M)" 0 "A" "B1" "X15(3M)" 1 "A" "B1" "X25(5M)" 2 "A" "B1" "X50(10M)" 3 "A" "B1" "X75(15M)" 4 "A" "B1" "X100(20M)" 5

15 Comments

joseph chahin
joseph chahin on 19 Mar 2021
Edited: dpb on 19 Mar 2021
Thank you Adam for the reaction. I tried to implement your proposal for each row of the table but it did not work. i think it has to do with the manner of reading the excel file and cell content from the table,
Example of Original Excel data is:
filename: Testmat.xls
X Y Type1 Z1 Z2
A1 B1 Type1 M1, M2, M3, M4 M1~0, M2~5, M3~2, M4~3
A1 B2 Type1 N1, N2 N1~7, N2~10
A1 B3 Type1 S1, S2, S3, S4, S5, S6 S1~0, S2~1, S3~2, S4~3, S5~4, S6~5
A2 B1 Type1 P1, P2, P3 P1~0, P2~1, P3~2
to read the Excel file i used the following:
data=readtable('Testmat.xls','PreserveVariableNames',true)
the desired output wihch i could not get is:
X Y Type1 Z1 Z2
A1 B1 Type1 M1 0
A1 B1 Type1 M2 5
A1 B1 Type1 M3 2
A1 B1 Type1 M4 3
A1 B2 Type1 N1 7
A1 B2 Type1 N2 10
A1 B3 Type1 S1 0
A1 B3 Type1 S2 1
A1 B3 Type1 S3 2
A1 B3 Type1 S4 3
A1 B3 Type1 S5 4
A1 B3 Type1 S6 5
A2 B1 Type1 P1 0
A2 B1 Type1 P2 1
A2 B1 Type1 P3 2
Thanks agian for your help.
Br. Joseph
It seems like you changed the goal. From your question,
appriciated to get the following output matrix:
{'A'} {'B1'} {'X6(1.4M)'} 0
{'A'} {'B1'} {'X15(3M) '} 1
{'A'} {'B1'} {'X25(5M) '} 2
{'A'} {'B1'} {'X50(10M) '} 3
{'A'} {'B1'} {'X75(15M) '} 4
{'A'} {'B1'} {'X100(20M) '} 5
and that's what my answer does. Now it seems you wanto do something different.
If you attach a mat file containing the 'data' variable from the line below, I might have time to look at this again later on.
data=readtable('Testmat.xls','PreserveVariableNames',true)
dpb
dpb on 19 Mar 2021
Edited: dpb on 19 Mar 2021
Better yet would be to attach 'Testmat.xls' -- reading it as a table for this purpose is counterproductive in all likelihood. Use 'readcell' instead and will be able to apply Adam's or my code to each cell much more simply and build either cell or table from the result.
dpb
dpb on 19 Mar 2021
Edited: dpb on 19 Mar 2021
Having created a Testmat.xls file from the above data
data=readcell('Testmat.xls'); % read as cell array
M=[]; % empty container; don't know ending size
for i=2:size(data,2) % all rows except header
t=strtrim(split(data{i,4},',')); % the strings trimmed
n=str2double(extractAfter(split(data{i,5},','),'~')); % the numeric values
M=[M;repmat(data(i,1:3),numel(n),1), t, num2cell(n)]; % add to exist w/ duplicated data
end
tM=cell2table(M,'VariableNames',data(1,:)); % convert to a table
results in
>> tM
tM =
15×5 table
X Y Type1 Z1 Z2
______ ______ _________ ______ _____
{'A1'} {'B1'} {'Type1'} {'M1'} 0.00
{'A1'} {'B1'} {'Type1'} {'M2'} 5.00
{'A1'} {'B1'} {'Type1'} {'M3'} 2.00
{'A1'} {'B1'} {'Type1'} {'M4'} 3.00
{'A1'} {'B2'} {'Type1'} {'N1'} 7.00
{'A1'} {'B2'} {'Type1'} {'N2'} 10.00
{'A1'} {'B3'} {'Type1'} {'S1'} 0.00
{'A1'} {'B3'} {'Type1'} {'S2'} 1.00
{'A1'} {'B3'} {'Type1'} {'S3'} 2.00
{'A1'} {'B3'} {'Type1'} {'S4'} 3.00
{'A1'} {'B3'} {'Type1'} {'S5'} 4.00
{'A1'} {'B3'} {'Type1'} {'S6'} 5.00
{'A2'} {'B1'} {'Type1'} {'P1'} 0.00
{'A2'} {'B1'} {'Type1'} {'P2'} 1.00
{'A2'} {'B1'} {'Type1'} {'P3'} 2.00
>>
I'd undoubtedly turn X, Y, Z1 and Type into categorical variables.
PS: OBTW, I had/have FORMAT BANK at moment, hence the two decimal digits displayed in the table...
thank you, it works :)
Hi Adam,
I am facing some issue with the data which is not always has the same format. I have now attached the data in tap1. In tab2 i explain the issues in reading and re-arrange teh data. You hepl is much appreciated.
Thank you,
Br.
Joseph
Adam Danz
Adam Danz on 23 Mar 2021
Could you provide the short snippet of code you're using to read in the data and create the output?
It may be quicker to fix your code rather than write one from scratch that works.
dpb
dpb on 23 Mar 2021
The problem is in the pocess that created the file -- fix it to be consistent.
Otherwise, you're at the mercy of having to process every cell and regularize the data before you can parse it.
If there aren't defined rules and those rules aren't followed, then chaos ensues.
joseph chahin
joseph chahin on 23 Mar 2021
Edited: dpb on 23 Mar 2021
Of course, here is it. It works fine for the first line, but not the others :(
for i=2:size(Data,1)
i
n = str2double(extractAfter(split(Data{i,3},','),'~'))
H1 = repmat(Data(i,1),numel(n),1)
H2 = repmat(Data(i,2),numel(n),1)
H3 = strtrim(split(Data{i,4},','))
H4 = num2cell(n)
H5 = extractAfter(split(Data{i,5},','),':')
H6 = extractAfter(split(Data{i,6},','),':')
H_out = [H_out; H1 H2 H3 H4 H5 H6]
end
H_out_Table = cell2table(H_out,'VariableNames',[Data(1,:) ])
These cases are the most occuring cases. the most issue is coming in H6. H5 with None is easy to solve. I am looking for away to check parmaters within H6 and select the Off or On behind it. I was thing something with string or text but ....i do not know whether is possible !!!
dpb
dpb on 23 Mar 2021
You can't parse until you regularize the input...or define what rules are followed for each column.
"One size does NOT fit all!"
Adam Danz
Adam Danz on 23 Mar 2021
Edited: Adam Danz on 23 Mar 2021
Any time a cell in the spreadsheet breaks the expected pattern you either need to fix that cell in the spreadsheet or include a detection & correction in the code.
The 2nd row of H6 clearly breaks the pattern ("None").
If you're dealing with just a few spreadsheets it may be quickest to correct those spreadsheets. Otherwise, you'll need to detect & correct in the code.
For example, this fixes the 2nd iteration where the problem is caused by H6="None". H6 in the code is an empty cell {''}. The first line fills in empties with a default value of '0'. The second part tests whether the number of elements in H6 is the same as the number of elements in n. If there is a discrepancy then it just adds '0's to the end which is probably not a good solution because I don't know which values are missing.
% Fill in empties with default values
H6(cellfun(@isempty, H6)) = {'0'};
% Fill in missing values
if numel(H6) < numel(n)
H6 = [H6; repmat({'0'},numel(n)-numel(H6),1)];
end
This is how you'll need to deal with the cells that break the expected pattern.
Short anecdote:
A while back I was hired to clean data that involved dozens of TSV files that were to be reformatted. The TSV files were prepared by different people who exported the data differently and I tried explaining to them that learning how to re-export the data uniformly might take a couple of days but cleaning it on the code-end will take months. They didn't take my advice and it was an expensive task that involved hundreds of lines of exception detection & correction and each time I was given a new batch of files several new inconsistencies were discovered. It's almost always better to fix the input files rather than working around inconsistencies with the code.
dpb
dpb on 23 Mar 2021
"It's almost always better to fix the input files rather than working around inconsistencies with the code."
+123.31415926...
I'm going through similar situtation -- it's essentially impossible to code for all possible ways for people to screw it up -- if it is manual, there must be sufficient repercussions such that employees will take the requirements seriously; if it's machine-generated, then fix the code there.
The only cases that are generally worth the effort otherwise are those where data are/were very expensive to have generated and cannot be obtained any other way--and are of sufficient value to the organization as to be worth paying the cost.
Those cases occur many times. I meant a few cases but are repeated in teh data. it is a large set of data. this was just an example. Anyway, i have solved with an easy manner. Please enclosed my script which it gives the desired output:
H_out =
{'A'} {'a0'} {'a_0'} {[0]} {'On' } {'1' }
{'A'} {'a0'} {'a_1'} {[1]} {'Off'} {'0' }
{'A'} {'a0'} {'a_2'} {[2]} {'Off'} {'0' }
{'B'} {'b0'} {'b_0'} {[0]} {'Off'} {'None'}
{'B'} {'b0'} {'b_1'} {[1]} {'Off'} {'None'}
{'B'} {'b0'} {'b_2'} {[2]} {'Off'} {'None'}
{'B'} {'b0'} {'b_3'} {[3]} {'Off'} {'None'}
{'C'} {'c0'} {'c_0'} {[0]} {'On' } {'On' }
{'C'} {'c0'} {'c_1'} {[1]} {'Off'} {'Off' }
{'C'} {'c0'} {'c_3'} {[2]} {'Off'} {'Off' }
{'C'} {'c0'} {'c_4'} {[4]} {'Off'} {'-' }
{'D'} {'d0'} {'d_0'} {[0]} {'Off'} {'Off' }
{'D'} {'d0'} {'d_1'} {[1]} {'Off'} {'Off' }
Adam Danz
Adam Danz on 24 Mar 2021
I'm glad it all worked out without too much of a mess.

Sign in to comment.

More Answers (2)

David Hill
David Hill on 18 Mar 2021
Output will have to be a cell array.
Input= {'A','B1','X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)',' X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '};
a=regexp(Input{3},'[X()0-9.M]+(?=,)','match');
b=regexp(Input{4},'(?<=~)\d+','match');
for k=1:length(a)
Output{k,1}=Input{1};
Output{k,2}=Input{2};
Output{k,3}=a{k};
Output{k,4}=b{k};
end

1 Comment

Best David, thank you for the reaction. I tried you script as well. still not get the output. Please see down my reaction to Adam. Thx. Br. Joseph.

Sign in to comment.

dpb
dpb on 18 Mar 2021
Edited: dpb on 18 Mar 2021
>> C=[{'A'}, {'B1'}, {'X6(1.4M), X15(3M), X25(5M), X50(10M), X75(15M), X100(20M)'}, {' X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '} ]
C =
1×4 cell array
{'A'} {'B1'} {'X6(1.4M), X15(3M), X25(5M), X50(10M), X75…'} {' X6~0, X15~1, X25~2, X50~3, X75~4, X100~5 '}
>> split(C{3},',')
ans =
6×1 cell array
{'X6(1.4M)' }
{' X15(3M)' }
{' X25(5M)' }
{' X50(10M)' }
{' X75(15M)' }
{' X100(20M)'}
>> str2double(extractAfter(split(C{4},','),'~'))
ans =
0
1
2
3
4
5
>>

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!