identifying mistakes in date vectors

1 view (last 30 days)
Dear all,
I have many excel files and in each excel I have a column under the name "dates". The date format is dd/mm/yyy and the data are monthly
For instance,
01/12/2005
01/01/2006
01/02/2006
01/03/2006 and so on
Since I have a large number of excel files (around 100) I want to see in each file if the sequence of dates is disrupted by mistakes
For instance, the below date sequence contains mistakes
A={01/12/2005
01/01/2006
01/02/2006
01/03/2006
01/04/2006
01/07/2006
01/06/2001
12/07/2006}
Is there any way to identify these mistakes?
Thanks
PS:Note that the start date and end date are different across files. So if you provide any code, please take into account that I need to be able to find the initial date "automatically"

Accepted Answer

Walter Roberson
Walter Roberson on 19 Feb 2013
You did not define "mistake". It appears that "out of sequence" is a mistake; how about duplicates? Are missing values mistakes? Are dates not on the first day of a month mistakes?
  5 Comments
José-Luis
José-Luis on 19 Feb 2013
Edited: José-Luis on 19 Feb 2013
Each "exception" has to be handled in the code. It can be a never ending story if you are dealing with human generated data. You would not believe the many possible ways to screw things up. One hundred files is not really that much data, if it looks like the snippet you provided. I would recommend writing a general import routine ( xlsread(), datenum(), etc...). Check for the most common errors you expect (ordered data, all the same year) and then manually check the rest. Things that are evident to the human eye may require a s**tload of coding.
Walter Roberson
Walter Roberson on 19 Feb 2013
Okay then what I would suggest is using datevec() on the input dates.
If the day of the month of the datevec are not all 1 then you have day of month problems.
diff() the datevec first column. If the year differences are not all 0's or 1's (in particular if there are negatives) then you have year problems.
diff() the datevec second column (month). If the month differences are not all 1's or -11's, or if the -11 do not correspond to the month number becoming 1, then you have month problems.
Ummm, I think that might be good enough, once you add a little more logic to select / display the entry in error. As you asked only to "find out" whether there were errors, rather than to (e.g.) sort the dates as well, it sounds to me as if your plan is to manually look at and repair the problem entries in excel before running again to recheck.

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!