I forgot to mention that I have the Text Analytics Toolbox but have not found in the documentation how that would be useful for low level file or string parsing. If there is a toolbox that would help, please mention it. I may have it. Matlab version is 2020a. Thank you.
Parsing of financial data file that is very irregular in format
1 view (last 30 days)
Show older comments
Thanks in advance for any responses. I am a retired engineer with some programming experience and am working to use Matlab for portfolio analysis. I am currently at the point of reading data from my investment company's data files. I can obtain data on my current holdings as four CSV files that are of similar but different formats. Ther are intended to be printed as PDFs I think.
Numeric values sometimes have $ symbols and sometimes have commas and sometimes have other formats. I have written some code that is successful in extracting Date, Symbol, Description, Quantity and Price from each of the files that is based on counting commas and performing IF tests. There is more information in which I may be interested but for which I have not written code. I have attached the code file which works for the full files and a sample of two files that differ in format but which have personal data removed and most of the data removed. I have not converted the attached DOCX files back to CSV to try my code on them.
I feel that what is shown in "importMLdata.m" is a brute force approach and am looking for more elegant methods for attacking these files.
What I would like in responses is direction to advanced file and string parsing methods. If the use of nested IF statements as I have used is the best approach, I would appreciate affirmation of that also so I do not think Matlab is hiding some strength from me in its documentation.
I appreciate your indulgence in evaluating this general question. The attached matlab code is a work in progress as shown by the many commented out lines but it currently does work for multiple files in a directory. I would also appreciate your comments on programming style, and any other issues you see in the code.
Thank you
7 Comments
dpb
on 2 Aug 2020
BTW.
function [output1,output2] = importMLdata %(input1,input2,input3)
takes care of the spurious inputs...
Answers (0)
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!