Converting unformatted text to formatted text

Question

Ibro Tutic on 24 Nov 2015

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/257283-converting-unformatted-text-to-formatted-text

Commented: Stephen23 on 26 Dec 2020

I asked this question before and neglected some info, so I want to start fresh to avoid confusion.

 clear all;
 close all
 clc
 projectdir = 'C:\Users\me\data.psr';
 newdir = 'C:\Users\me\Desktop\Test1';
 fid=fopen(projectdir,'r');
 T=textscan(fid, '%s');
 fclose(fid);
 for i=8:107
    a=T{1,1}{i,1};
    b= a(30:48);
    matrix(i).r = b(2);
    matrix(i).c = b(5);
    matrix(i).info = b(8:13);
 end
 A = zeros(9,9)
 for j=8:107
    A(matrix(j).r, matrix(j).c) = matrix(j).info;
 end;

The error:

 Assignment has more non-singleton rhs dimensions than non-singleton subscripts
 Error in Untitled2 (line 23)
    A(matrix(j).r, matrix(j).c) = matrix(j).info;

This answer by user Stephen Cobeldick might help, although it was created only to deal with the histogram. It gives an error when ran however.

 str = fileread('temp.txt');
 % identify digits:
 rgx = '[A-Z]+\[(\d+)\]\[(\d+)\]:*(\d+)';
 C = regexp(str,rgx,'tokens');
 % convert digits to numeric:
 M = cellfun(@str2double,vertcat(C{:}));
 M(:,1:2) = 1+M(:,1:2);
 % convert to linear indices:
 out = nan(max(M(:,1)),max(M(:,2)));
 idx = sub2ind(size(out),M(:,1),M(:,2));
 % allocate values:
 out(idx) = M(:,3)
 Error using cellfun
 Input #2 expected to be a cell array, was double instead.
 Error in Untitled3 (line 12)
 M = cellfun(@str2double,vertcat(C{:}));

7 Comments
Show 5 older commentsHide 5 older comments

Ibro Tutic on 25 Nov 2015

Edited: John Kelly on 10 Nov 2017

Yes, I probably should have left the question, but I was under the impression that if I ask the same question again you would feel that your answers weren't good enough. Adding in more and more info that I missed will confuse the person answering the question and me, as I probably don't remember what exactly I had posted before. This was the simplest solution to a small problem, now what I will do is rewrite the original question and put it in there to solve what seems to be the biggest issue in the history of this forum.

I completely understand where you are coming from, but I am not sure that you understand what I am trying to accomplish by posting this new question. Sorry, I guess? I am trying to rectify my mistake and it seems that people are more worried about the fact that I deleted a question rather than trying to "help" with my other questions, judging from what isakson just commented. I am legitimately trying to learn how to do this and people are making MASSIVE deals out of problems that shouldn't be that important (yes, if I just deleted the entire question I would understand, but I clearly stated my intentions). Like I said, yea, I probably messed up deleting the question, but I'm not sure if arguing about that rather than actually helping with the question is the mature thing to do.

It's not like I am consistently deleting every question that I get answered to cover up a trail or something. It was my first time doing it and now I realize that I screwed up in doing so. I remain respectful in every aspect of my questions, giving credit to people who wrote certain code, etc.

With that said, thanks for any help you/isakson/dpd provided.

Stephen23 on 25 Nov 2015

I hope that you get the help and information that you need, and have fun learning MATLAB! We do put a lot of effort in when people need it, so please come and ask more questions :)

Stephen23 on 26 Dec 2020

OP deleted comments which are still visible in Google Cache:

https://web.archive.org/web/20201226070945/https://webcache.googleusercontent.com/search?q=cache%3ACh71iOXjozwJ%3Ahttps%3A%2F%2Fwww.mathworks.com%2Fmatlabcentral%2Fanswers%2F257283-converting-unformatted-text-to-formatted-text

https://web.archive.org/web/20201226071335/https://www.mathworks.com/matlabcentral/answers/257283-converting-unformatted-text-to-formatted-text

Sign in to comment.

Sign in to answer this question.

Answer 1

per isakson on 24 Nov 2015

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/257283-converting-unformatted-text-to-formatted-text#answer_201032

Edited: per isakson on 28 Nov 2015

Open in MATLAB Online

I have assumed that the size of the resulting arrays are known

fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram';  % avoid magic number
len = length( str );
is_counter   = strncmp( str, rows, len ); 
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';  
len = length( str );
is_mean   = strncmp( str, rows, len ); 
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';  
len = length( str );
is_range   = strncmp( str, rows, len ); 
range_rows = rows( is_range );
%
counter_matrix = nan( 10, 10 );
for jj = 1 : length( counter_rows )
%    
    cac = textscan( counter_rows{jj}, '%*s%d%d%f'   ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
%            
    counter_matrix( cac{1}+1, cac{2}+1 ) = cac{3};  % one based      
end
mean_vector = nan( 1, 10 );
for jj = 1 : length( mean_rows )
%    
    cac = textscan( mean_rows{jj}, '%*s%d%f'        ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
%            
    mean_vector( 1, cac{1}+1 ) = cac{2};  % one based      
end
range_vector = nan( 1, 10 );
for jj = 1 : length( range_rows )
%    
    cac = textscan( range_rows{jj}, '%*s%d%f'        ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
%            
    range_vector( 1, cac{1}+1 ) = cac{2};  % one based      
end

&nbsp

or maybe better - no assumptions regarding sizes

fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram';  % avoid magic number
len = length( str );
is_counter   = strncmp( str, rows, len ); 
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';  
len = length( str );
is_mean   = strncmp( str, rows, len ); 
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';  
len = length( str );
is_range   = strncmp( str, rows, len ); 
range_rows = rows( is_range );
%
CRS = permute( char( counter_rows ), [2,1] );
cac = textscan( CRS, '%*s%f%f%f'                ...
            ,   'Delimiter'             , '[]: '...
            ,   'MultipleDelimsAsOne'   , true  ...
            ,   'CollectOutput'         , true  ); 
num = cac{1};          
% 
sz1 = min( num(:,1:2), [], 1 );
sz2 = max( num(:,1:2), [], 1 );
sz  = sz2-sz1+[1,1];
ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1  ); % one based 
counter_matrix( ix_linear ) = num(:,3); 
counter_matrix = reshape( counter_matrix, sz );
MRS = permute( char( mean_rows ), [2,1] );
cac = textscan( MRS, '%*s%f%f'                  ...
            ,   'Delimiter'             , '[]: '...
            ,   'MultipleDelimsAsOne'   , true  ...
            ,   'CollectOutput'         , true  ); 
num = cac{1};                  
%            
mean_vector( num(:,1)+1 ) = num(:,2);  % one based
RRS = permute( char( range_rows ), [2,1] );
cac = textscan( RRS, '%*s%f%f'                  ...
            ,   'Delimiter'             , ' []:'...
            ,   'MultipleDelimsAsOne'   , true  ...
            ,   'CollectOutput'         , true  ); 
%            
range_vector( num(:,1)+1 ) = num(:,2);  % one based

hope they return identical results :-)

&nbsp

and another iteration

Comments:

A function is superior to a script. It doesn't mess with the base workspace. It's easier to debug and it's easier to call from a script or function.
This function is readable. It's fairly straightforward to add new keywords and row formats.
The switch case can be replaced by a feval construct. But why do that?
The subfunctions, f1, f2 and f3, have large parts of their code in common. That asks for further refactoring.
Allocating a separate sub-function to each type of row makes testing easier.
If speed becomes a problem analyze the code with the profiler.

>> S = cssm( 'c:\m\cssm\text4.txt' )
S = 
                   RainflowCycleCounterHistogram: [10x10 double]
                    RainflowCycleMeanBreakpoints: [-111 100 300 330 360 380 390 400 410 420]
                   RainflowCycleRangeBreakpoints: [0 35 70 100 135 170 200 230 260 300]
                  RainflowCycleReversalTolerance: 20
                        PowerCylinderTemperature: 0
               PowerCylinderTemperatureHistogram: [1x12 double]
    PowerCylinderTemperatureHistogramBreakpoints: [0 150 175 200 220 250 300 320 350 370 400]
>>

where

function    S = cssm( filespec )
    fid = fopen( filespec );
    rows = textscan( fid, '%s', 'Delimiter', '\n' );
    fclose( fid );
    rows = strtrim( rows{:} );
    type_list   = {
    ... format  keyword   
        'f1', 'RainflowCycleCounterHistogram'
        'f2', 'RainflowCycleMeanBreakpoints'
        'f2', 'RainflowCycleRangeBreakpoints'
        'f3', 'RainflowCycleReversalTolerance'
        'f3', 'PowerCylinderTemperature'
        'f2', 'PowerCylinderTemperatureHistogram'
        'f2', 'PowerCylinderTemperatureHistogramBreakpoints'    
        };
    for jj = 1 : size( type_list, 1 )
        switch type_list{jj,1}
            case 'f1'
                S.(type_list{jj,2}) = f1( type_list{jj,2}, rows );
            case 'f2'
                S.(type_list{jj,2}) = f2( type_list{jj,2}, rows );
            case 'f3'
                S.(type_list{jj,2}) = f3( type_list{jj,2}, rows );
            otherwise
                error( 'The format, "%s", is not yet implemented', type_list{jj,1} )
        end
    end
end
function    matrix = f1( keyword, rows )
    ism = is_member( keyword, rows );
    cur_rows = rows( ism );
    %
    str = permute( char( cur_rows ), [2,1] );
    cac = textscan( str, '%*s%f%f%f'                ...
                ,   'Delimiter'             , '[]: '...
                ,   'MultipleDelimsAsOne'   , true  ...
                ,   'CollectOutput'         , true  ); 
    num = cac{1};          
    % 
    sz1 = min( num(:,1:2), [], 1 );
    sz2 = max( num(:,1:2), [], 1 );
    sz  = sz2-sz1+[1,1];
    ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1  ); % one based 
    matrix( ix_linear ) = num(:,3); 
    matrix = reshape( matrix, sz );
end
function    matrix = f2( keyword, rows )
    ism = is_member( keyword, rows );
    cur_rows = rows( ism );
    %
    str = permute( char( cur_rows ), [2,1] );
    cac = textscan( str, '%*s%f%f'                  ...
                ,   'Delimiter'             , '[]: '...
                ,   'MultipleDelimsAsOne'   , true  ...
                ,   'CollectOutput'         , true  ); 
    num = cac{1};                  
    %            
    matrix( num(:,1)+1 ) = num(:,2);  % one based      
end
function    matrix = f3( keyword, rows )
    ism = is_member( keyword, rows );
    cur_rows = rows( ism );
    %
    str = permute( char( cur_rows ), [2,1] );
    cac = textscan( str, '%*s%f', 'Delimiter',':' );
    matrix = cac{:};   
end
function    ism = is_member( keyword, rows )
    %   the keyword is followed by either ":" or "["
    cac = regexp( rows, ['^',keyword,'(?=(:|\[))'], 'once' );
    ism = not( cellfun( @isempty, cac ) );
end

12 Comments
Show 10 older commentsHide 10 older comments

dpb on 24 Nov 2015

Basically, there are two cases...the first is if the data are all numeric and regular with at most header lines, you can forget the format spec and use an empty string; textscan (and its red-haired stepchild cousin textread will then return the same shape of the input file automagically.

Other than that, you basically have to know what the format of the input file is and as mine and Per's answers show, use the "features" of the file structure to be able to parse specific formats. In this case, there were no blanks in the lines to be parsed, hence the delimiters could become the non-characters of interest, the square brackets and the colon.

Since, however, there were adjoining [] that if are considered delimiters are indicators for an empty field and that isn't the way wanted the record to be interpreted, the need for the 'MultipleDelimsAsOne' parameter to be set so that sequence would be treated as only one. Other than that, it was '%*s' to skip the first string field up to the first bracket and three numeric fields. I didn't differentiate between the integer and floating point fields, altho one can do so but when do, textscan will return a separate cell for each type which is more hassle generally to use when done.

All in all, it takes some "time in grade" to be able to figure out all the gyrations inherent in the C format parsing and there's much that is, simply put, essentially magic, particularly when it comes to fixed-width fields.

Ibro Tutic on 25 Nov 2015

Edited: Ibro Tutic on 25 Nov 2015

Open in MATLAB Online

text4.txt

Yes, all possible names are known before hand. The files are small because they are pulled manually from the ECU whenever we need them and they are specific to whatever we are looking for. dpb's answer to use strcmp instead of strncmp looks like it will solve my problem.

I had a file called test4 and text4, I attached the wrong one. I went ahead and attached the new file.

I tried to use strcmp to only look for the exact text, but it returns a value of 0 for every row (I assume this is because it is looking ONLY for that exact string and the fact that there is more characters behind the string causes it to return a 0). I went ahead and attached a text file with ALL of the data I am looking at (not actual data, modified numbers) and the code I added to per isakson's original code. I am using the actual file to test the code against, so that will need to be changed to account for the text4.txt file.

 clear all;
 close all
 clc
 projectdir = 'C:\Users\it58528\Documents\Power Cylinder Temp and Rainflow Cycle Counter - After 16500 Cycles - 2015-10-12.prm';
fid = fopen(projectdir);
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram';  % avoid magic number
len = length( str );
is_counter   = strncmp( str, rows, len ); 
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';  
len = length( str );
is_mean   = strncmp( str, rows, len ); 
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';  
len = length( str );
is_range   = strncmp( str, rows, len ); 
range_rows = rows( is_range );
%
str = 'PowerCylinderTemperatureHistogram';
len = length (str);
is_temp   = strcmp ( str, rows );
temp_rows = rows ( is_temp );
%
str = 'PowerCylinderTemperatureHistogramBreakpoints';
len = length (str);
is_break   = strncmp ( str, rows, len );
break_rows = rows ( is_break);
counter_matrix = nan( 10, 10 );
for jj = 1 : length( counter_rows )
%    
    cac = textscan( counter_rows{jj}, '%*s%d%d%f'   ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
%            
    counter_matrix( cac{1}+1, cac{2}+1 ) = cac{3};  % one based      
end
mean_vector = nan( 1, 10 );
for jj = 1 : length( mean_rows )
%    
    cac = textscan( mean_rows{jj}, '%*s%d%f'        ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
%            
    mean_vector( 1, cac{1}+1 ) = cac{2};  % one based      
end
range_vector = nan( 1, 10 );
for jj = 1 : length( range_rows )
%    
    cac = textscan( range_rows{jj}, '%*s%d%f'        ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
%            
    range_vector( 1, cac{1}+1 ) = cac{2};  % one based      
end
temp_matrix = nan ( 1, 12 );
for jj = 1 : length ( 12 )
 %   
    cac = textscan( temp_rows{jj}, '%*s%d%f'        ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
 %           
    temp_matrix( 1, cac{1}+1 ) = cac{2};    %one based
end
temp_vector = nan ( 1, 11 );
for jj = 1 : length ( break_rows )
%
    cac = textscan( break_rows{jj}, '%*s%d%f'        ...
                ,   'Delimiter'          , ' []:'   ...
                ,   'MultipleDelimsAsOne', true     ); 
%
    temp_vector( 1, cac{1}+1 ) = cac{2};
end

dpb on 25 Nov 2015

Edited: dpb on 25 Nov 2015

Open in MATLAB Online

What is the desired output again? I'd approach it a little more generically but not sure where am headed as for what, precisely to do with the end result but I'll note that from your file one can do the following--

>> S=textread('test4.txt','%s','delimiter','\n','whitespace','','headerlines',3);  % read into cell array of strings
>> tok=cellfun(@(x) tokens(x,'[]:'),S,'uniformoutput',0); % find tokens each line
>> whos tok
Name       Size            Bytes  Class    Attributes
tok       52x1             13660  cell               
>> tok{1}  % sample what looks like 
ans =
RainflowCycleCounterHistogram
0                            
0                            
1.0000000000                 
>> ntok=cellfun(@(x) size(x,1),tok);  % number in each row
>> [min(ntok) max(ntok)]              % range overall in file
ans =
   2     4
>> for n=min(ntok):max(ntok)  % build specific format string
  fmt=['%s' repmat('[%d]',1,n-2) ':%f']
end
fmt =
%s:%f
fmt =
%s[%d]:%f
fmt =
%s[%d][%d]:%f
>> [u,iu]=unique(cellfun(@(x) x(1,:),tok,'uniform',0),'stable')  % what's in file and where???
u = 
  'RainflowCycleCounterHistogram'
  'RainflowCycleMeanBreakpoints'
  'RainflowCycleRangeBreakpoints'
  'RainflowCycleReversalTolerance'
  'PowerCylinderTemperature'
  'PowerCylinderTemperatureHistogram'
  'PowerCylinderTemperatureHistogramBreakpoints'
iu =
   1
   8
  18
  28
  29
  30
  42
>>

From the above pieces one can write a general parser for each possible data line format as long as they follow the form of

String[Index1][Index2]: Value

where the number of indices can be 0,1,2. The above actually will hand N-dimensional arrays; just that 2's the largest seen to date.

With the above it's simple enough to write a routine that loops over the elements in the U array , build the proper format string and select and parse the given lines without any specific testing for matching strings at all unless and until a user asks for only a given one or set at which time those can be returned from the general result.

But, you don't need to parse the individual lines at all; simply convert the fields within the token array for the ones of choice from the corollary tok array; ntok gives the info on how many elements there are corresponding to the fields.

function tok = tokens(s,d)
%   Simple string parser returns tokens in input string s
%
%   T=TOKENS(S) returns the tokens in the string S delimited
%   by "white space".  Any leading white space characters are ignored.
%
%   TOKENS(S,D) returns tokens delimited by one of the
%   characters in D.  Any leading delimiter characters are ignored.
% DPBozarth (Rev 1 1998)
% Get initial token and set up for rest
if nargin==1
  [tok,r] = strtok(s);
  while ~isempty(r)
    [t,r] = strtok(r);
    tok = strvcat(tok,t);
  end
else
  [tok,r] = strtok(s,d);
  while ~isempty(r)
    [t,r] = strtok(r,d);
    tok = strvcat(tok,t);
  end
end

Also, of course, regexp can return tokens if one's got the patience to figure out the proper expression needed...

per isakson on 25 Nov 2015

Now I added a new piece of code to the answer.

Sign in to comment.

Answer 2

dpb on 24 Nov 2015

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/257283-converting-unformatted-text-to-formatted-text#answer_201029

Open in MATLAB Online

>> fmt='%*s%f%f%f';
>> fid=fopen('test4.txt');
>> c=cell2mat(textscan(fid,fmt,'headerlines',3,'delimiter','[]:','collectoutput',1,'multipledelimsAsOne',1));
>> v(sub2ind(sz,c(:,1)+1,c(:,2)+1))=c(:,3)
v =
  Columns 1 through 10
           1           0           1        1000           0           0           0           1           0           0
  Columns 11 through 20
           0           0           0           1           0           0           0           0           0           0
>> fid=fclose(fid);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Converting unformatted text to formatted text

7 Comments
Show 5 older commentsHide 5 older comments

Accepted Answer

12 Comments
Show 10 older commentsHide 10 older comments

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Converting unformatted text to formatted text

7 Comments Show 5 older commentsHide 5 older comments

Accepted Answer

12 Comments Show 10 older commentsHide 10 older comments

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

7 Comments
Show 5 older commentsHide 5 older comments

12 Comments
Show 10 older commentsHide 10 older comments

0 Comments
Show -2 older commentsHide -2 older comments