Why fileID value is -1 in fopen function despite that it seems to have opened and read the file content successfully?

18 views (last 30 days)
Hi
I have a folder that has multiple text file that I need to open each one a read some specific info of its content and eventually make a summary excel file in which each row contains one file name and the secleted specifc info from that file.
I made a function called searchData that takes a specified folder path as input and returns filesNumber (which has how many data files found in the folder), fileDataSummary (the summary of the data I need from each file), missingFiles (list of the files names that fopen did not manage to open) as well as writing the fileDataSummary into and excel file named fileDataSummary.
I attached a sample data files folder as a demo and the result I get in the excel file. In case someone wants to try and run the function:)
Before calling the function I chose the directory and specifiy the data files folder path
clear all
clc
cd('C:\Users\Desktop\Data Preprocessing\Data')
folderPath ='C:\Users\Desktop\Data Preprocessing\Data\20160502';
Then when call the function from the command window
[filesNumber,dataSummary,missingFiles]=searchData(folderPath);
The function seems to go through each file, read the specified information that I want and write them into the summary excel file. So everything seems to work.
But keep getting fileID = -1 and errmsg (No such file or directory) for some of the files that has a strange charachter like a cross (┼) in the file name.
Two examples of two files names is below:
Rosenb3_LB03_SV41_S.txt (fileID = 3)
Rosenb3_LB03_V┼_VG.txt (fileID = -1)
In my case I have millions of files of data files and some of them seem to have this cross (┼) in their name. Although it seems that fopen manges to read open them I still get this fileID=-1 which supposed to mean that fopen cannot open the file.
So I am not what does that mean. I dont want to start analysing the full data set I have and something goes wrong.
Am I missing something here?
The function is as follows
function [filesNumber,fileDataSummary, missingFiles] = searchData(folderPath)
addpath(folderPath);
filesNames = dir(folderPath);
ixi=0;
missingFiles= {};
for i = 3:length(filesNames)
j = i-2;
% MATLAB® reserves file identifiers 0, 1, and 2 for standard input,...
% standard output (the screen), and standard error, respectively. ...
% If fopen cannot open the file, then fileID is -1.
[fileID,errmsg]=fopen((filesNames(i).name),'r','n','UTF-8');
% check which files fopen could not open and disply name, fileID and error message
if(~strcmp(errmsg,''))
disp(filesNames(i).name)
disp(fileID)
disp(errmsg)
ixi = ixi +1;
missingFiles(ixi,1)= {filesNames(i).name};
else
ScanText= textscan(fileID, '%s %s %s %s %s'); % '%s %s %s' is to specify how textscan should look at the file
fclose(fileID);
% fileDataCell includes the data of the file in a cell array
fileDataCell=[ScanText{1},ScanText{2},ScanText{3},ScanText{4},ScanText{5}];
end
% Check the sensor file name
fileDataSummary(j,1)={filesNames(i).name};
% Check the file Enhet
if strcmp(fileDataCell{4,1},'Enhet')==1
fileDataSummary(j,2)= fileDataCell(4,2);
else
fileDataSummary(j,2)= {strcat('Enhet is not in location (4,1)',...
'_',fileDataCell{4,2})};
end
% Check the file type
if strcmp(fileDataCell{5,1},'IOTyp')==1
fileDataSummary(j,3)= {strcat(fileDataCell{5,2},...
'_',fileDataCell{5,3},'_',fileDataCell{5,4},'_',fileDataCell{5,5})};
else fileDataSummary(j,3)= {strcat('IOTyp is not in location (5,1)',...
'_', fileDataCell{5,2},'_',fileDataCell{5,3},'_',fileDataCell{5,4}...
,'_',fileDataCell{5,5})};
end
% Check the additional information
if strcmp(fileDataCell{6,1},'Text')==1
fileDataSummary(j,4)= {strcat(fileDataCell{6,2},...
'_',fileDataCell{6,3},'_',fileDataCell{6,4},'_',fileDataCell{6,5})};
else fileDataSummary(j,4)= {strcat('Text is not in location (6,1)',...
'_', fileDataCell{6,2},'_',fileDataCell{6,3},'_',fileDataCell{6,4}...
,'_',fileDataCell{6,4})};
end
end
filesNumber=length(filesNames)-2;
xlswrite(strcat('C:\Users\Desktop\Data Preprocessing','\','fileDataSummary.xlsx'),fileDataSummary)
end
Thanks for your help in advance!:)
  2 Comments
Guillaume
Guillaume on 11 Jul 2019
A few things that have no bearing on your question itself.
1) You should never add data directories to the path. That's just begging for trouble the day one of these folders contain a rogue m file that suddenly shadows a function used by your code. All matlab functions happily work with absolute paths, so use that:
%no addpath
filesNames = dir(folderPath);
for ...
%use absolute path constructed with fullfile:
[fileID,errmsg]=fopen(fullfile(folderPath, filesNames(i).name),'r','n','UTF-8');
end
2) I assume you're starting at filesNames(3) to skip over the '.' and '..' folder that dir stupidly returns. That's risky, filenames starting by any of "#$%&'()+,- will be listed before '.'. A much safer way:
filesNames = dir(folderPath);
filesNames = filesNames(~ismember({filesNames.name}, {'.', '..'}));
for i = 1:numel(filesNames)
[fileID,errmsg]=fopen(fullfile(folderPath, filesNames(i).name),'r','n','UTF-8');
%...
end
3)
fileDataCell=[ScanText{1},ScanText{2},ScanText{3},ScanText{4},ScanText{5}];
is simply:
fileDataCell = [ScanText{:}];
4) Use fullfile as I've shown above instead of strcar to build paths. fullfile will add the correct directory separator (not always '\') if needed
5)
strcat('IOTyp is not in location (5,1)',...
'_', fileDataCell{5,2},'_',fileDataCell{5,3},'_',fileDataCell{5,4}...
,'_',fileDataCell{5,5})
is more easier to read as:
sprintf('IOTyp is not in location (5,1)_%s_%s_%s_%s', fileDataCell{5, 2:5});
in my opinion. It's shorter anyway. And so on, for the other strcat.
6)
cellarray{i,j} = something; %assign something to the CONTENT of cell i,j
makes more sense than
cellarray(i, j) = {something}; %REPLACE CELL i,j by cell array containing something
in my opinion. It's two less characters anyway. It's also probably marginally faster.
---
As for your question, can you show the output of
double(filesNames(i).name)
for one of the files that does not open properly.
Also, what OS and what locale are you using?
Walter Roberson
Walter Roberson on 11 Jul 2019
Also:
MS Windows makes no promises about the order that file names will be returned in: it is considered to be a matter for the file system -- so the order could change from FAT32 to NTSF for example. In turn, NTSF makes no promises about the order that file names will be returned in; in practice it appears to be sorted by byte code, but there are unanswered questions about canconicalization of unicode sequences.
The ismember() that Guillaume suggests works well to eliminate . and ..
You should also consider whether you should also remove all other folder names that might happen to be in the directory.

Sign in to comment.

Answers (1)

dpb
dpb on 12 Jul 2019
Edited: dpb on 12 Jul 2019
Also, too:
Your logic doesn't immediately continue to the next file in the for...end loop after the error message--it just doesn't try to read that particular file but it then proceeds to redo all the following code including writing another record in your fileDataSummary array so it will not have any missing entries; just some duplicated values for the specific j that the file wasn't opened for that are the same data as the previous loop because they haven't been updated.(*)
You need something like
for i = 1:length(filesNames)
if filesNames(i).isdir, continue, end % skip directory entries
% If fopen cannot open the file, then fileID is -1.
[fileID,errmsg]=fopen((filesNames(i).name),'r','n','UTF-8');
% check which files fopen could not open and disply name, fileID and error message
if(fileID<0)
disp(filesNames(i).name)
disp(fileID)
disp(errmsg)
ixi = ixi +1;
missingFiles(ixi,1)= {filesNames(i).name};
contine % SKIP TO THE NEXT FILE; DON"T COLLECT $200
else
ScanText= textscan(fileID, '%s %s %s %s %s'); % '%s %s %s' is to specify how textscan should look at the file
fclose(fileID);
% fileDataCell includes the data of the file in a cell array
fileDataCell=[ScanText{1},ScanText{2},ScanText{3},ScanText{4},ScanText{5}];
end
...
Also NB: I accounted for Guillaume's and Walter's comments about file order and directory entries by beginning the loop at 1 and checking the isdir logic value to not take them...you can also preprocess to remove all elements for which isdir==1.
My preferred way to deal with this issue is to use a file-matching wildcard in the dir() call--for your example using '*.csv' would return all of the given files and, presuming some discipline in not naming directories with .csv extensions, nothing but desired files of the desired type will be returned.
Also NB2: Since the loop index begins at 1, you'll need to fix up the j index logic to use i instead; also note there will be empty spaces in the summary array now when the files aren't found that aren't there in your existing code because of the aforementioned logic error.
(*) This is presuming at least the first file is succuessfully opened; otherwise you would have an error in that variable fileDataCell wouldn't exist if the first file hadn't successfully been read. But, once you read one, then it appears superficially as if everything works when it doesn't leading to your erroneous presumption of what is happening regarding fopen and the question describing a situation that doesn't actually occur--it did not read a file that wasn't opened at all..

Categories

Find more on Programming in Help Center and File Exchange

Products


Release

R2016a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!