Reading a *.txt document and extracting specific words/phrases

1 view (last 30 days)
I have a *.txt document file and I would like to extract the words/phrases that I know the start and end character number of them in that document.
For example the word's start and end char number is : 711,724. I tried to match them using the following MATLAB code:
filetoread ='document file path';
fid = fopen(filetoread)
x=zeros(1,1);
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
x = [x , tline];
end
x(1, 711:724)
In the code I try to save the whole document in a matrix x and printing the columns between 711 and 724. But it does not match the words correctly. I think the problem is with whitespaces,empty lines,...
(I attached a sample document)
I would appreciate any help,
Many thanks

Answers (1)

Azzi Abdelmalek
Azzi Abdelmalek on 18 Mar 2016
filetoread ='yourfile.txt';
fid = fopen(filetoread)
k=1;
v=cell(1,1)
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
v{k,1}=tline
k=k+1
end
a=cellfun(@(x) strtrim(x),v,'un',0)
a(cellfun(@isempty,a) )=[]
out=cellfun(@(x) x(10:20),a,'un',0)
  1 Comment
Shima Asaadi
Shima Asaadi on 18 Mar 2016
Thank you very much for answer.
In this case each paragraph is considered separately, though considering empty lines. for example the word with start/end char numbers of "570,590" in the original document can not be extracted in this way. Because it is in a paragraph that starts from first to the length of the paragraph. How can I modify the code to take the whole documents at once?
Thank you for your help

Sign in to comment.

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!