Parsing of financial data file that is very irregular in format

1 view (last 30 days)
Thanks in advance for any responses. I am a retired engineer with some programming experience and am working to use Matlab for portfolio analysis. I am currently at the point of reading data from my investment company's data files. I can obtain data on my current holdings as four CSV files that are of similar but different formats. Ther are intended to be printed as PDFs I think.
Numeric values sometimes have $ symbols and sometimes have commas and sometimes have other formats. I have written some code that is successful in extracting Date, Symbol, Description, Quantity and Price from each of the files that is based on counting commas and performing IF tests. There is more information in which I may be interested but for which I have not written code. I have attached the code file which works for the full files and a sample of two files that differ in format but which have personal data removed and most of the data removed. I have not converted the attached DOCX files back to CSV to try my code on them.
I feel that what is shown in "importMLdata.m" is a brute force approach and am looking for more elegant methods for attacking these files.
What I would like in responses is direction to advanced file and string parsing methods. If the use of nested IF statements as I have used is the best approach, I would appreciate affirmation of that also so I do not think Matlab is hiding some strength from me in its documentation.
I appreciate your indulgence in evaluating this general question. The attached matlab code is a work in progress as shown by the many commented out lines but it currently does work for multiple files in a directory. I would also appreciate your comments on programming style, and any other issues you see in the code.
Thank you
  7 Comments
dpb
dpb on 2 Aug 2020
BTW.
function [output1,output2] = importMLdata %(input1,input2,input3)
takes care of the spurious inputs...
Mark Smith
Mark Smith on 8 Aug 2020
DB, there is not an Accept Answer button since you posted comments and not an Answer to my question about parsing irregular financial data. I would like to give you credit for providing me an answer that worked. The 'detectImportOptions' suggestion was key. Use of the 'detectImportOptions' output causes the data needed to be parsed into a table format in a single line of code. The results can then be manipulated in a table to remove unwanted lines.
I would show you the code but I was working in the command window to test things when the machine crashed. I have just now gotten back to the issue. The crash was due to a bad memory stick which had to be replaced.
I would appreciate the Admins crediting you with answering my question if that is possible.

Sign in to comment.

Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!