Dataset condensation and distance function optimization in KNN classifier
2 views (last 30 days)
Show older comments
Hi,
I am working on a decision system for stocks. I have a lot of data (time series 6000 stocks, 10 years of daily data on various metrics related to valuation, price momentum, street estimates, intrinsic business quality, etc.)
From my reading, it sounds like a KNN classifier is the easiest and best type of framework for me to focus on (after considering NN's, decision tree's, etc.). However, the MATLAB provided toolboxes seem to lack some important components that I would need. Namely: data condensation and some way of optimizing the distance function.
I Googled a few things and found that the "Hart" algorithm is often used for condensation ("CNN"), and found this link which seems to be the kind of thing I need (<http://mirlab.org/jang/matlab/toolbox/machineLearning/help/dsCondense_help.html#2)>. Unfortunately it doesn't seem this code is freely available.
For optimizing distance functions there seem to be more freely available code online, such as http://www.cs.cmu.edu/~liuy/distlearn.htm and http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html.
Does anybody know where I can find good code to accomplish condensing and distance function optimization? Any other comments on the general approach would be greatly appreciated.
THANK YOU!
Regards, Mike
0 Comments
Answers (0)
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!