# LDA transformation matrix for discriminative feature extraction

4 views (last 30 days)
J on 12 Dec 2014
Commented: Ilya on 15 Dec 2014
Hi all,
I'm tackling a recognition problem where my data is a set of histograms. Each histogram is a 1x768 vector of textural frequencies.
There are two classes; A and B. I basically want to know whether a test histogram is a member of A or not, (B resembles all other groups, i.e. 'impostor data').
I want to find the LDA transformation matrix W that projects the histograms (h) into LDA space:
x = W*h;
This then be able to extract discriminative features from test histograms
Here's the process that I've followed so far:
• Acquire training data (histograms) for classes A and B. Store in hA (10x768), hB(128x768).
• Perform LDA on hA and hB to find the transformation matrix W (768x768) (I wrote the Matlab code using: http://sebastianraschka.com/Articles/2014_python_lda.html ).
• Find the 'comparison' histograms by applying W on each element of hA: c = W * transpose(hA)
• For a 'test' histogram, t (1x768), I want to know whether it belongs to A or not. So extract the discriminative features,d, using: d = W * transpose(t)
• To 'classify' the test histogram, compute the normalised correlation between d and each member of c correlation(i) = (transpose(c(i)) * d) / ( c(i)*|d| )
• summing all correlation(i) gives an overall 'similarity' score. If this score is above a threshold, then the test histogram is an element of A, otherwise its in B.
Here's my problem:
In the computation of W, we inverse the 'within class scatter matrix', which turns out as a matrix full of NaN's.
Theoretically, the overall method that I'm following should work since I'm basing it off a paper that achieved very good performance (less than 3% error).
My questions: Does anyone know why I'm unable to generate the transformation matrix? Is there a way to generate W using the Matlab toolboxes? (I've been unable to find anything)
Any other input or comments are defiantly welcome.
EDIT 1:
I've read around a bit and I think my problem might be to do with having significantly more features (768) than sets of training data (10+128).
• A common solution seems to be to use PCA to reduce the feature set first.
• However, the paper that I'm using does not mention PCA at all.
• Is there a way to use LDA without first reducing the feature set, OR is PCA implied when performing LDA?
Regards

Ilya on 12 Dec 2014
I don't know how you define W (when I click on the link, I get a 404 error not found). But I can make an observation: A correctly-computed within-class scatter matrix does not have NaNs unless the inputs have NaNs. So either your hA and hB matrices have NaNs or you have an error in your code that puts NaNs there. If your input data have NaNs, you need to do something about them.
##### 2 CommentsShowHide 1 older comment
Ilya on 15 Dec 2014
What is described in that paper amounts to this piece of MATLAB code:
L = fitcdiscr(meas,species);
[LTrans,Lambda] = eig(L.BetweenSigma,L.Sigma,'chol');
[Lambda,sorted] = sort(diag(Lambda),'descend') % sort by eigenvalues
LTrans = LTrans(:,sorted);
LTrans(:,[3 4]) = [] % get rid of zero eigenvalues
Xtransformed = L.XCentered*LTrans;
You can run it if you have a sufficiently recent version of the Statistics Toolbox.
LDA does not imply PCA. When you do feature transformation/reduction by LDA for K classes, you can find at most K-1 new features. This is set by the rank of the between-class covariance matrix.
And if you do PCA on say a matrix of size 10-by-768, you can find at most 9 principal components. This is set by the rank of the matrix (you lose 1 due to centering).