# How would I create a matrix from the following strings

8 views (last 30 days)
JLV on 26 Dec 2019
Edited: Stephen Cobeldick on 1 Jan 2020
I am trying to write a code with minimal preprocessing. I have many entries in a text file like this:
NODE{1 0 11 0 1.000000e+00 1.000000e-02 -1.000000e-02 1.500000e-03}
There are many rows of these in an Excel file. I want to try and read the Columns 1 and 6 to 8 inside the brackets, what would the best way to do this. I have tried fileread and textscan, but I haven't got anywhere because of the text NODE at the front of the brackets.

Andrei Bobrov on 26 Dec 2019
John D'Errico on 26 Dec 2019
Please learn to use comments intead of answers if you are just responding to a followup question or wish to make a comment.
I've moved the file, attaching it to this comment instead.
JLV on 26 Dec 2019
Apologies

Stephen Cobeldick on 26 Dec 2019
Edited: Stephen Cobeldick on 26 Dec 2019
Note that specifying a suitable format string is much more efficient than importing as character/string and then converting afterwards (like the other answers):
fmt = ['NODE{',repmat('%f',1,8),'}'];
[fid,msg] = fopen('ThinPlateNodes.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1}
Giving:
M =
1.00000 0.00000 11.00000 0.00000 1.00000 0.01000 -0.01000 0.00150
2.00000 0.00000 11.00000 0.00000 1.00000 0.01000 0.01000 0.00150
3.00000 0.00000 11.00000 0.00000 1.00000 0.01000 -0.01000 -0.00150
4.00000 0.00000 11.00000 0.00000 1.00000 0.01000 0.01000 -0.00150
5.00000 0.00000 11.00000 0.00000 1.00000 -0.01000 0.01000 0.00150
6.00000 0.00000 11.00000 0.00000 1.00000 -0.01000 0.01000 -0.00150
... lots of lines here
179495.00000 0.00000 15.00000 0.00000 1.00000 0.00952 -0.00960 -0.00824
179496.00000 0.00000 15.00000 0.00000 1.00000 0.00964 -0.00978 -0.00902
179497.00000 0.00000 15.00000 0.00000 1.00000 -0.00985 0.00144 -0.00838
179498.00000 0.00000 15.00000 0.00000 1.00000 0.00912 -0.00254 -0.00858
179499.00000 0.00000 15.00000 0.00000 1.00000 0.00979 0.00995 -0.00745
179500.00000 0.00000 15.00000 0.00000 1.00000 -0.00981 0.00984 -0.00805
And checking the size:
>> size(M)
ans =
78410 8

JLV on 27 Dec 2019
I will try and diagnose the error when I am back in the office
JLV on 1 Jan 2020
The method above worked fine this time!
I assume you put the safeguard in to warn me if the file can't be opened.
Stephen Cobeldick on 1 Jan 2020
"The method above worked fine this time!"
I'm glad. It will be more efficient than the other methods shown on this thread.
"I assume you put the safeguard in to warn me if the file can't be opened."
Yes. I recommend putting that assert statement (or something equivalent) after every fopen: it prints much more useful information than you would get otherwise when a file cannot be opened.

Bhaskar R on 26 Dec 2019
Edited: Bhaskar R on 26 Dec 2019
ext_data = regexp(data, '[^{\]]+(?=})', 'match'); % get data between {}
ext_data(1) = []; % first cell is not required so removed
num_data = zeros(length(ext_data), 8); % your complete data
for ii = 1:length(ext_data)
num_data(ii,:) = cellfun(@str2num, strsplit(cell2mat(ext_data(ii))));
end
% you can get data any colum from the "num_data"
col_1 = num_data(:,1);
col_6_to_8 = num_data(:, 6:8);

Show 1 older comment
JLV on 26 Dec 2019
I was trying to repeat the steps above for the folling file after understanding what each step does.
See my edited code below
data = fileread('NodeNosatElementsUnedited.txt'); % Node Locations in space. Read file(it is in text).
ext_data = regexp(data, '[^{\]]+(?=})', 'match'); % Get data between {}
ext_data(1) = []; % First cell is not required so remove
num_data = zeros(length(ext_data), 12); % Preallocating memory
for ii = 1:length(ext_data)
Split(ii,:) = strsplit(cell2mat(ext_data(ii)));
NodesatElements(ii,:) = cellfun(@str2num, Split(ii,:),'UniformOutput',false);
end
NodesatElements = NodesatElements(:, [1 9 10 11 12]);
This seems to be computationally ineffecient with large rows, primarily because of text in the array inside the bracket and that I am unable to preallocate for some reason.
What would be the best way to speed up the code.
Bhaskar R on 27 Dec 2019
Stephen Cobeldick provided a sophisticated answer !!
fmt = '"TET4{%f %f %f %f %f %s %f %f %f %f %f %f}"';
[fid,msg] = fopen('NodeNosatElementsUnedited.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,fmt,opt{:}); % open C in variable editor so that you can know extracted data C
fclose(fid);
NodesatElements = [C{1}(:,1),C{3}(:, 3:6)]; % this is final data
Stephen Cobeldick on 27 Dec 2019
"What would be the best way to speed up the code."
• By not importing numeric data as character/strings, and then awkwardly converting it to numeric afterwards.
• By not using str2num (which hides slow eval inside).
• By not expanding the output arrays nearly half-a-million times inside a loop.
• By not using a cell array to store one numeric scalar per cell.
• By not importing any data that you do not need.
For example, much like the efficient code I showed you earlier:
fmt = '"TET4{%f%*f%*f%*f%*f%*s%*f%*f%f%f%f%f}"'; % note the ignored fields!
[fid,msg] = fopen('NodeNosatElementsUnedited.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1};

Andrei Bobrov on 26 Dec 2019
T.Varend = str2double(regexp(T{:,end},'(\-)?\d+(\.\d+e\-\d+)?(?=\}\$)','match','once'));
T.Var0 = str2double(regexp(T{:,1},'\d+(?=\$)','match','once'));
T = T(:,[end,6:7,end-1]);
T.Properties.VariableNames = {'LABEL','x','y','z'};