How to extract only numerical values from cell into new column array?

4 views (last 30 days)
For this data:
"chb12_06.edf"
"1665 seconds"
"1726 seconds"
"3415 seconds"
"3447 seconds"
"chb12_08.edf"
"1426 seconds"
"1439 seconds"
"1591 seconds"
"1614 seconds"
"1957 seconds"
"1977 seconds"
"2798 seconds"
"2824 seconds"
"chb12_09.edf"
"3082 seconds"
"3114 seconds"
"3503 seconds"
"3535 seconds"
"chb12_10.edf"
"593 seconds"
"625 seconds"
"811 seconds"
"856 seconds"
Whis is a column in a larger cell array. I want to extract just the seconds value after each edf file, and possibly put them in a new array or matrix, eg:
{"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" ; "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"...}
I hope this makes sense what I'm trying to achieve, if not please feel free to clarify, thanks!

Accepted Answer

Bruno Luong
Bruno Luong on 15 Jul 2019
s={"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"};
sc = cellfun(@char,s,'Unif',0); %cast for regexp, which does not work on string input
sm = regexp(sc,'^\d* seconds','match');
[sm{:}]
  3 Comments
Bruno Luong
Bruno Luong on 15 Jul 2019
s={"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"};
sc = cellfun(@char,s,'Unif',0); %cast for regexp, which does not work on string input
sm = regexp(sc,'^\d* seconds','match','once');
isfile = cellfun('isempty',sm);
g = cumsum(isfile);
num = str2double(regexp(sm(~isfile),'^\d*','match','once'));
numc = splitapply(@(x) {x(:)}, num, g(~isfile));
% Pad so the cell subarrays have same row-length
nrows = max(cellfun('size',numc,1));
numc = cellfun(@(x) [x; nan(nrows-size(x,1),1)], numc, 'unif', 0);
% construct the table
colnames = strrep(cellfun(@string,sc(isfile)),".","_"); % colnames cannot have "."
T = table(numc{:});
T.Properties.VariableNames = colnames
Z Liang
Z Liang on 16 Jul 2019
thank you, I was able to test each line and everything worked for me!
it's very clever the use of isfile followed by cumsum, and I haven't seen splitapply used before, so in all I learned quite a few tools to help me with tasks in the future
cheers!

Sign in to comment.

More Answers (1)

Stephane Dauvillier
Stephane Dauvillier on 15 Jul 2019
Try this:
in=["chb12_06.edf"
"1665 seconds"
"1726 seconds"
"3415 seconds"
"3447 seconds"
"chb12_08.edf"
"1426 seconds"
"1439 seconds"
"1591 seconds"
"1614 seconds"
"1957 seconds"
"1977 seconds"
"2798 seconds"
"2824 seconds"
"chb12_09.edf"
"3082 seconds"
"3114 seconds"
"3503 seconds"
"3535 seconds"
"chb12_10.edf"
"593 seconds"
"625 seconds"
"811 seconds"
"856 seconds"]
notQuiteThat = regexp(in,"(\d+) second","once","tokens") % extarct only digits befor second
notQuiteThat = [notQuiteThat{:}] % delete empty candidate
theSeconds = str2double(notQuiteThat) % transform it into actual double value
  3 Comments
Z Liang
Z Liang on 15 Jul 2019
Thank you, this works! When I tried I noticed a difference between
"(\d+) second"
versus
"\d+ second"
in that the first works and gave me numbers, but the second just gave me empty cells; what does the parenthesis do in this instance?
Also, can you show me how to assign the seconds values to different coloumns if they are after a different .edf file?
For example:
Column 1: chb12_06.edf 1665 1726 3415 3447
Column 2: chb12_08.edf 1426 1439 1591 ... ...

Sign in to comment.

Categories

Find more on Data Import and Analysis in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!