MATLAB Answers

Converting unformatted text to formatted text

20 views (last 30 days)
This question was flagged by Walter Roberson
I asked this question before and neglected some info, so I want to start fresh to avoid confusion.
clear all;
close all
clc
projectdir = 'C:\Users\me\data.psr';
newdir = 'C:\Users\me\Desktop\Test1';
fid=fopen(projectdir,'r');
T=textscan(fid, '%s');
fclose(fid);
for i=8:107
a=T{1,1}{i,1};
b= a(30:48);
matrix(i).r = b(2);
matrix(i).c = b(5);
matrix(i).info = b(8:13);
end
A = zeros(9,9)
for j=8:107
A(matrix(j).r, matrix(j).c) = matrix(j).info;
end;
The error:
Assignment has more non-singleton rhs dimensions than non-singleton subscripts
Error in Untitled2 (line 23)
A(matrix(j).r, matrix(j).c) = matrix(j).info;
This answer by user Stephen Cobeldick might help, although it was created only to deal with the histogram. It gives an error when ran however.
str = fileread('temp.txt');
% identify digits:
rgx = '[A-Z]+\[(\d+)\]\[(\d+)\]:*(\d+)';
C = regexp(str,rgx,'tokens');
% convert digits to numeric:
M = cellfun(@str2double,vertcat(C{:}));
M(:,1:2) = 1+M(:,1:2);
% convert to linear indices:
out = nan(max(M(:,1)),max(M(:,2)));
idx = sub2ind(size(out),M(:,1),M(:,2));
% allocate values:
out(idx) = M(:,3)
Error using cellfun
Input #2 expected to be a cell array, was double instead.
Error in Untitled3 (line 12)
M = cellfun(@str2double,vertcat(C{:}));

  6 Comments

Show 3 older comments
per isakson
per isakson on 25 Nov 2015
Stephen, Thanks for making me aware that I'm wasting my time!
Stephen Cobeldick
Stephen Cobeldick on 25 Nov 2015
I hope that you get the help and information that you need, and have fun learning MATLAB! We do put a lot of effort in when people need it, so please come and ask more questions :)

Sign in to comment.

Accepted Answer

per isakson
per isakson on 24 Nov 2015
Edited: per isakson on 28 Nov 2015
I have assumed that the size of the resulting arrays are known
fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram'; % avoid magic number
len = length( str );
is_counter = strncmp( str, rows, len );
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';
len = length( str );
is_mean = strncmp( str, rows, len );
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';
len = length( str );
is_range = strncmp( str, rows, len );
range_rows = rows( is_range );
%
counter_matrix = nan( 10, 10 );
for jj = 1 : length( counter_rows )
%
cac = textscan( counter_rows{jj}, '%*s%d%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
counter_matrix( cac{1}+1, cac{2}+1 ) = cac{3}; % one based
end
mean_vector = nan( 1, 10 );
for jj = 1 : length( mean_rows )
%
cac = textscan( mean_rows{jj}, '%*s%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
mean_vector( 1, cac{1}+1 ) = cac{2}; % one based
end
range_vector = nan( 1, 10 );
for jj = 1 : length( range_rows )
%
cac = textscan( range_rows{jj}, '%*s%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
range_vector( 1, cac{1}+1 ) = cac{2}; % one based
end
&nbsp
or maybe better - no assumptions regarding sizes
fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram'; % avoid magic number
len = length( str );
is_counter = strncmp( str, rows, len );
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';
len = length( str );
is_mean = strncmp( str, rows, len );
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';
len = length( str );
is_range = strncmp( str, rows, len );
range_rows = rows( is_range );
%
CRS = permute( char( counter_rows ), [2,1] );
cac = textscan( CRS, '%*s%f%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
sz1 = min( num(:,1:2), [], 1 );
sz2 = max( num(:,1:2), [], 1 );
sz = sz2-sz1+[1,1];
ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1 ); % one based
counter_matrix( ix_linear ) = num(:,3);
counter_matrix = reshape( counter_matrix, sz );
MRS = permute( char( mean_rows ), [2,1] );
cac = textscan( MRS, '%*s%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
mean_vector( num(:,1)+1 ) = num(:,2); % one based
RRS = permute( char( range_rows ), [2,1] );
cac = textscan( RRS, '%*s%f%f' ...
, 'Delimiter' , ' []:'...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
%
range_vector( num(:,1)+1 ) = num(:,2); % one based
hope they return identical results :-)
&nbsp
and another iteration
Comments:
  • A function is superior to a script. It doesn't mess with the base workspace. It's easier to debug and it's easier to call from a script or function.
  • This function is readable. It's fairly straightforward to add new keywords and row formats.
  • The switch case can be replaced by a feval construct. But why do that?
  • The subfunctions, f1, f2 and f3, have large parts of their code in common. That asks for further refactoring.
  • Allocating a separate sub-function to each type of row makes testing easier.
  • If speed becomes a problem analyze the code with the profiler.
>> S = cssm( 'c:\m\cssm\text4.txt' )
S =
RainflowCycleCounterHistogram: [10x10 double]
RainflowCycleMeanBreakpoints: [-111 100 300 330 360 380 390 400 410 420]
RainflowCycleRangeBreakpoints: [0 35 70 100 135 170 200 230 260 300]
RainflowCycleReversalTolerance: 20
PowerCylinderTemperature: 0
PowerCylinderTemperatureHistogram: [1x12 double]
PowerCylinderTemperatureHistogramBreakpoints: [0 150 175 200 220 250 300 320 350 370 400]
>>
where
function S = cssm( filespec )
fid = fopen( filespec );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = strtrim( rows{:} );
type_list = {
... format keyword
'f1', 'RainflowCycleCounterHistogram'
'f2', 'RainflowCycleMeanBreakpoints'
'f2', 'RainflowCycleRangeBreakpoints'
'f3', 'RainflowCycleReversalTolerance'
'f3', 'PowerCylinderTemperature'
'f2', 'PowerCylinderTemperatureHistogram'
'f2', 'PowerCylinderTemperatureHistogramBreakpoints'
};
for jj = 1 : size( type_list, 1 )
switch type_list{jj,1}
case 'f1'
S.(type_list{jj,2}) = f1( type_list{jj,2}, rows );
case 'f2'
S.(type_list{jj,2}) = f2( type_list{jj,2}, rows );
case 'f3'
S.(type_list{jj,2}) = f3( type_list{jj,2}, rows );
otherwise
error( 'The format, "%s", is not yet implemented', type_list{jj,1} )
end
end
end
function matrix = f1( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
sz1 = min( num(:,1:2), [], 1 );
sz2 = max( num(:,1:2), [], 1 );
sz = sz2-sz1+[1,1];
ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1 ); % one based
matrix( ix_linear ) = num(:,3);
matrix = reshape( matrix, sz );
end
function matrix = f2( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
matrix( num(:,1)+1 ) = num(:,2); % one based
end
function matrix = f3( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f', 'Delimiter',':' );
matrix = cac{:};
end
function ism = is_member( keyword, rows )
% the keyword is followed by either ":" or "["
cac = regexp( rows, ['^',keyword,'(?=(:|\[))'], 'once' );
ism = not( cellfun( @isempty, cac ) );
end

  12 Comments

Show 9 older comments
Ibro Tutic
Ibro Tutic on 25 Nov 2015
Sounds good, I'll see what I can figure out, thanks!
dpb
dpb on 25 Nov 2015
What is the desired output again? I'd approach it a little more generically but not sure where am headed as for what, precisely to do with the end result but I'll note that from your file one can do the following--
>> S=textread('test4.txt','%s','delimiter','\n','whitespace','','headerlines',3); % read into cell array of strings
>> tok=cellfun(@(x) tokens(x,'[]:'),S,'uniformoutput',0); % find tokens each line
>> whos tok
Name Size Bytes Class Attributes
tok 52x1 13660 cell
>> tok{1} % sample what looks like
ans =
RainflowCycleCounterHistogram
0
0
1.0000000000
>> ntok=cellfun(@(x) size(x,1),tok); % number in each row
>> [min(ntok) max(ntok)] % range overall in file
ans =
2 4
>> for n=min(ntok):max(ntok) % build specific format string
fmt=['%s' repmat('[%d]',1,n-2) ':%f']
end
fmt =
%s:%f
fmt =
%s[%d]:%f
fmt =
%s[%d][%d]:%f
>> [u,iu]=unique(cellfun(@(x) x(1,:),tok,'uniform',0),'stable') % what's in file and where???
u =
'RainflowCycleCounterHistogram'
'RainflowCycleMeanBreakpoints'
'RainflowCycleRangeBreakpoints'
'RainflowCycleReversalTolerance'
'PowerCylinderTemperature'
'PowerCylinderTemperatureHistogram'
'PowerCylinderTemperatureHistogramBreakpoints'
iu =
1
8
18
28
29
30
42
>>
From the above pieces one can write a general parser for each possible data line format as long as they follow the form of
String[Index1][Index2]: Value
where the number of indices can be 0,1,2. The above actually will hand N-dimensional arrays; just that 2's the largest seen to date.
With the above it's simple enough to write a routine that loops over the elements in the U array , build the proper format string and select and parse the given lines without any specific testing for matching strings at all unless and until a user asks for only a given one or set at which time those can be returned from the general result.
But, you don't need to parse the individual lines at all; simply convert the fields within the token array for the ones of choice from the corollary tok array; ntok gives the info on how many elements there are corresponding to the fields.
function tok = tokens(s,d)
% Simple string parser returns tokens in input string s
%
% T=TOKENS(S) returns the tokens in the string S delimited
% by "white space". Any leading white space characters are ignored.
%
% TOKENS(S,D) returns tokens delimited by one of the
% characters in D. Any leading delimiter characters are ignored.
% DPBozarth (Rev 1 1998)
% Get initial token and set up for rest
if nargin==1
[tok,r] = strtok(s);
while ~isempty(r)
[t,r] = strtok(r);
tok = strvcat(tok,t);
end
else
[tok,r] = strtok(s,d);
while ~isempty(r)
[t,r] = strtok(r,d);
tok = strvcat(tok,t);
end
end
Also, of course, regexp can return tokens if one's got the patience to figure out the proper expression needed...
per isakson
per isakson on 25 Nov 2015
Now I added a new piece of code to the answer.

Sign in to comment.

More Answers (1)

dpb
dpb on 24 Nov 2015
>> fmt='%*s%f%f%f';
>> fid=fopen('test4.txt');
>> c=cell2mat(textscan(fid,fmt,'headerlines',3,'delimiter','[]:','collectoutput',1,'multipledelimsAsOne',1));
>> v(sub2ind(sz,c(:,1)+1,c(:,2)+1))=c(:,3)
v =
Columns 1 through 10
1 0 1 1000 0 0 0 1 0 0
Columns 11 through 20
0 0 0 1 0 0 0 0 0 0
>> fid=fclose(fid);

  0 Comments

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!