Test existence of files with EXIST

51 views (last 30 days)
Actually the command exist(FileName, 'file') seems to be sufficient to check the existence of a file. Therefore I used this code to check, if the input of a function is an existing file (thanks to David who has found the bug):
function Hash = DataHash(Data)
...
if exist(Data, 'file') ~= 2
error('File not found: %s.', Data);
end
The help text of exist explains, when the value 2 is replied:
2 if A is an M-file on MATLAB's search path. It also returns 2 when A is
the full pathname to a file or when A is the name of an ordinary file on
MATLAB's search path
But "when A is the full pathname to a file" does not match, when A is a MEX-, MDL- or P-file, because in these cases 3, 4 or 6 is replied respectively. So let's try to improve the check:
if ~any(exist(Data, 'file') == [2, 3, 4, 6])
error('File not found: %s.', Data);
end
But even then, exist() is smarter then expected:
File1 = fullfile(matlabroot, '\toolbox\matlab\graph2d\plot')
File2 = fullfile(matlabroot, '\toolbox\matlab\graph2d\plot.m')
File3 = fullfile(matlabroot, '\toolbox\signal\signal\@dspdata\plot')
File4 = fullfile(matlabroot, '\toolbox\signal\signal\@dspdata\plot.m')
exist(File1, 'file') % 0 !
exist(File2, 'file') % 2
exist(File3, 'file') % 2 !
exist(File4, 'file') % 2
I guess that File1 is not recognized, because plot is a built-in function, while @dspdata\plot (File3) is not a built-in function. But File3 is not an existing file:
fopen(File1, 'r') % -1
fopen(File2, 'r') % 3
fopen(File3, 'r') % -1 !! inspite of: exist(File3, 'file') ~= 0
fopen(File4, 'r') % 4
fclose('all')
So how can we check the existence of a file in a simple and reliable way?
function Ex = FileExist(FileName)
FID = fopen(FileName, 'r');
if FID == -1
Ex = false;
else
Ex = true;
fclose(FID);
end
But there are still exceptions, because even fopen() is smart also:
cd(tempdir);
fopen('plot.m', 'r') % 3, file is *found*!
Here fopen() searches in all folders of the Matlab PATH, but actually it should be searched in the current folder only. This has the side-effect, that fopen(name, 'r') is relatively slow. Another idea:
cd(tempdir);
fopen('plot.m', 'r+') % -1, file is not found
This is faster than the 'r' mode, especially if folders of the PATH are stored on network drives. And requesting write access does restrict the search to the local folder only. But this fails, if the current user does not have write privileges to the file.
The next approach:
function Ex = FileExist(FileName)
dirFile = dir(FileName);
if length(dirFile) == 1
Ex = ~(dirFile.isdir);
else
Ex = false;
end
I could not find a file, where this test fails. It is very slow, if FileName is a folder on a network drive which contain very much files. But this is a rare case such that I prefer this test.
Finally a C-Mex using either GetFileAttributes under Windows or _open or _wopen under Linux/MacOS is faster: 10% for existing files, 90% for missing files. But the handling of the unicode strings is not trivial: 2 bytes per wchar under Windows, 4 bytes per wchar under Linux and MacOS, but under Linux wchar's are not used in common, but utf-8 encoded 1 byte per char strings. See Answers: Matlab string to wchar under Linux. I'm going to publish the Mex functions in the FEX, also a DirExist(), because exist(name, 'dir') has similar problems.
  • Did you consider such effects caused by the smartness of exist() in your programs?
  • Did a user of your programs run into troubles due to weak tests of file existence, e.g. when the resulting error messages are misleading?
  • Do your or your programs profit or suffer from the smartness of exist() and fopen()?
  • Do you think the behavior of these function is explained clearly enough in the help and doc text?
  • Do you want standard jobs solved reliably by simple commands in Matlab?
NOTE: Usage of the recursive font: I mean that smart is not smart.

Accepted Answer

Malcolm Lidierth
Malcolm Lidierth on 4 Nov 2012
Edited: Malcolm Lidierth on 4 Nov 2012
Easy with Java:
File1 = fullfile(matlabroot,'toolbox','matlab','graph2d','plot');
File2 = fullfile(matlabroot, 'toolbox','matlab','graph2d','plot.m');
File3 = fullfile(matlabroot, 'toolbox','signal','signal','@dspdata','plot');
File4 = fullfile(matlabroot, 'toolbox','signal','signal','@dspdata','plot.m');
file=java.io.File(File1);
file.exists()
file=java.io.File(File2);
file.exists()
file=java.io.File(File3);
file.exists()
file=java.io.File(File4);
file.exists()
ans =
0
ans =
1
ans =
0
ans =
1
  5 Comments
Malcolm Lidierth
Malcolm Lidierth on 4 Nov 2012
Edited: Malcolm Lidierth on 4 Nov 2012
@Jan
File.isFile() alone will do returning false if the entry does not exist or is a folder.
There will always be extra overhead with Java as the strings are passed as copies (to the java.lang.String constructor then by reference to File) not pointers (Java 9 may fix that).
Jan
Jan on 5 Nov 2012
Edited: Jan on 5 Nov 2012
@Malcolm: Fine, now I understand you hint "File.isFile()". Timings now to test existence of 981 files, 10 repetitions, existing / not existing files:
  • File=java.io.file(Name); Ex=File.isFile(); 0.90 / 0.80sec
  • Ex = (length(dirFile) == 1) && ~(dirFile.isdir); 0.70 / 0.60 sec
  • C-Mex, 0.29 / 0.21 sec sec
My conclusion concerning speed: These three methods are equivalent, because usual applications do not test millions of files. So we have good workarounds for the weak EXIST. Anyhow, I'm still disappointed by the built-in EXIST, because it is too over-featured to fulfill the simple test of the existence of a file.

Sign in to comment.

More Answers (1)

Daniel Shub
Daniel Shub on 5 Nov 2012
I am not sure you are using EXIST how it was intended to be used. The H1 line is: %EXIST Check if variables or functions are defined. The documentation says little about checking if files exist. I agree that the argument names and output values are confusing. I think, however, that EXIST should not be used for checking if a file exists. Determining if a function exists seems harder than determining if a file exists, therefore I wouldn't expect it to compete in terms of speed.
  3 Comments
Daniel Shub
Daniel Shub on 6 Nov 2012
I agree that it isn't good practice and it is these types of bugs that make me an FOSS supporter. That said, it may not be as bad as you think. From your example it seems that the problem with EXIST is that it can sometimes erroneously say that a file without an extension exists when it actually doesn't. Therefore any function that adds an extension automatically will be okay. In other functions EXIST may be used to throw a nice error message and the function will error later when it tries to read/write to the file. This again is not a huge problem. The problem is for functions that do not append an extension and create a new file (or follows an alternative processing path) when the current file does not exist. I think that that use case might be rare.
Jan
Jan on 6 Nov 2012
Edited: Jan on 6 Nov 2012
I do not believe that the level of hugeness can be measured. Any unexpected behavior can have severe effects.
A user of DataHash got problems, because the check for existence rejected P-files. Without the chance to modify the code, e.g. when DataHash would be P-coded, the user would need tedious workarounds like renaming the file before calculating the hash. In the real world there can be even files like "D:\MFiles\file.m.p.mex", which should not confuse the detection of the file existence also. The reliability of a function must be proved using non-standard input, because "reliable for standard input" is a very weak label.
I assume, the smartness of fopen() is more dangerous: It opens a file anywhere in the path, when the file name is relative. Lukily this does not concern opening the file with write-access. And again the workaround is a standard good programming practize at all: never work with relative paths, but always use fully qualified path names - therefore I spend so much time in GetFullPath.
So perhaps all I want to say is:
Do not use exist(Name, 'file') with relative paths!!!

Sign in to comment.

Categories

Find more on File Operations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!