Main Page | Report this Page
Science Forum Index  »  Space - Consult Forum  »  Weird loadings curves in Fisher's Linear Discriminant...
Page 1 of 3    Goto page 1, 2, 3  Next

Weird loadings curves in Fisher's Linear Discriminant...

Author Message
Julio...
Posted: Sun Oct 25, 2009 1:56 pm
Guest
Hi,

I have been following the posts from this group for a while, always
appreciating the quality of the discussions.

This time I would like to ask for some help.

I apply Fisher's Linear Discriminant to spectroscopy dataset having
235 variables (so-called 'wavenumbers') and about 13000 instances (so-
called 'spectra'). Spectra can be plotted as curves of smooth shape,
thus neighbouring variables are highly correlated.

Well, the "loadings curves" (the columns of the LDA transformation
matrix itself, i.e. the coefficients of the linear combination that
maps the original variables onto the "LDA scores" space) have
alternated high-amplitude positive and negative values.

I have searched for an explanation but I couldn't find any books or
publications mentioning similar cases. Some help will be very much
appreciated. I don't have a lot of statistics background, so I can't
get a real insight into the results I described.

Here is an example:
http://www.compumag2009.com/f/loadings.png
(loadings plot)

http://www.compumag2009.com/f/loadings_abs.png
(loadings plot, absolute value)

In short: why does the Fisher's LDA loadings curve look so weird?

Thanks,

Julio
 
Ray Koopman...
Posted: Sun Oct 25, 2009 6:44 pm
Guest
On Oct 25, 4:56 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]Hi,

I have been following the posts from this group for a while, always
appreciating the quality of the discussions.

This time I would like to ask for some help.

I apply Fisher's Linear Discriminant to spectroscopy dataset having
235 variables (so-called 'wavenumbers') and about 13000 instances (so-
called 'spectra'). Spectra can be plotted as curves of smooth shape,
thus neighbouring variables are highly correlated.

Well, the "loadings curves" (the columns of the LDA transformation
matrix itself, i.e. the coefficients of the linear combination that
maps the original variables onto the "LDA scores" space) have
alternated high-amplitude positive and negative values.

I have searched for an explanation but I couldn't find any books or
publications mentioning similar cases. Some help will be very much
appreciated. I don't have a lot of statistics background, so I can't
get a real insight into the results I described.

Here is an example:http://www.compumag2009.com/f/loadings.png
(loadings plot)

http://www.compumag2009.com/f/loadings_abs.png
(loadings plot, absolute value)

In short: why does the Fisher's LDA loadings curve look so weird?

Thanks,

Julio
[/quote]
You don't say how many groups you have. The explanation that follows
is for two groups. Something like it may apply when there are more
than two groups, but I haven't checked.

The discrimant coefficients are proportional to C^-1 * m, where C is
the pooled within-groups covariance matrix and m is the difference
between the two mean vectors. If C has Toeplitz structure then C^-1
will be tridiagonal, with positive diagonals and negative immediate
offdiagonals, and I suspect that could create what you're seeing.

More generally, though, the coefficients in LDA, like those in
ordinary regression, are designed to predict, not to enlighten. They
may sometimes help us make sense out of the data, but they should not
be expected to do so. If you want to understand your data, look at the
vector of covariances or correlations of the LDA scores with the
variables.
 
Rich Ulrich...
Posted: Mon Oct 26, 2009 1:08 pm
Guest
On Sun, 25 Oct 2009 16:56:52 -0700 (PDT), Julio
<juliotrevisan at (no spam) gmail.com> wrote:

[quote]Hi,

I have been following the posts from this group for a while, always
appreciating the quality of the discussions.

This time I would like to ask for some help.

I apply Fisher's Linear Discriminant to spectroscopy dataset having
235 variables (so-called 'wavenumbers') and about 13000 instances (so-
called 'spectra'). Spectra can be plotted as curves of smooth shape,
thus neighbouring variables are highly correlated.

Well, the "loadings curves" (the columns of the LDA transformation
matrix itself, i.e. the coefficients of the linear combination that
maps the original variables onto the "LDA scores" space) have
alternated high-amplitude positive and negative values.

I have searched for an explanation but I couldn't find any books or
publications mentioning similar cases. Some help will be very much
appreciated. I don't have a lot of statistics background, so I can't
get a real insight into the results I described.

Here is an example:
http://www.compumag2009.com/f/loadings.png
(loadings plot)

http://www.compumag2009.com/f/loadings_abs.png
(loadings plot, absolute value)

In short: why does the Fisher's LDA loadings curve look so weird?

(More concrete than Ray's post.)[/quote]

In short, they look so weird, most likely, because the neighboring
bins are highly correlated - and include "correlated error". How
large are those correlations?

You could look for references that cite "confounding" or
perhaps "suppressor variables" (in "regression", for a broader
set of references), for some background.

When I try searching the web < spectroscopy regression confounding>,
I found too many discussions. You will have a better chance if you
specify your field. I saw "NMR" articles with 200+ bins.

I'm pretty sure that your results are not totally unusual for
spectroscopy, and anyone experienced in similar experiments
might be a good source for references.

- If I were to try to explore such data on my own, without
further advice, my first explorations would entail reducing the
number of predictors to 50 or even to 10 -- (a) by using wider bins,
and (b) by using a systematic selection from the bins available.
Is hte overall R^2 almost the same? Are the coefficients now
intelligible?
- But your own area is almost bound to have conventions for
how to proceed, and I think you should contact someone who
has published with similar data.


--
Rich Ulrich
 
Julio...
Posted: Tue Oct 27, 2009 5:30 am
Guest
Hi Ray

Thanks for your answer.

[quote]You don't say how many groups you have. The explanation that follows
is for two groups. Something like it may apply when there are more
than two groups, but I haven't checked.
I forgot to mention that. Indeed I have 2 groups in most of the cases,[/quote]
but sometimes more.


[quote]
The discrimant coefficients are proportional to C^-1 * m, where C is
the pooled within-groups covariance matrix and m is the difference
between the two mean vectors.
Indeed, I confirmed this implementing C^-1*m and comparing to the[/quote]
loadings vector (plot not shown).



[quote]If C has Toeplitz structure then C^-1 will be tridiagonal, with positive diagonals and negative immediate
offdiagonals, and I suspect that could create what you're seeing.
C (http://www.compumag2009.com/f/cw.png) does not have Toeplitz[/quote]
structure, as you can see in the figure, but its inverse are quite the
way you described C^-1 (http://www.compumag2009.com/f/invcwm.png)

[quote]
More generally, though, the coefficients in LDA, like those in
ordinary regression, are designed to predict, not to enlighten. They
may sometimes help us make sense out of the data, but they should not
be expected to do so. If you want to understand your data, look at the
vector of covariances or correlations of the LDA scores with the
variables.
I was keen to test this suggestion and I found this correlation vector[/quote]
(http://www.compumag2009.com/f/finalcorr.png) to be proportional to my
class mean vectors (http://www.compumag2009.com/f/means.png). By the
way, my class mean vectors are proportional to each other, as I have
only 2 classes (class=group), much more spectra of one class than the
other class, and my dataset has been standardized (mean-centered and
normalized succeeded by division by standardard deviation for each
variable).

Why is the vector you suggester proportional to one of my class means
vector?
 
Julio...
Posted: Tue Oct 27, 2009 5:54 am
Guest
Hi, Rich

Thanks for your answer.

[quote]When I try searching the web < spectroscopy regression confounding>,
I found too many discussions.  You will have a better chance if you
specify your field.  I saw "NMR" articles with 200+ bins.  
Thanks for your searching suggestions.[/quote]

I am working with FTIR (Fourier-Transform InfraRed) spectroscopy


[quote] - If I were to try to explore such data on my own, without
further advice, my first explorations would entail reducing the
number of predictors to 50 or even to 10 -- (a) by using wider bins,
By wider bins, do you mean averaging neighbouring variables?[/quote]
Perhaps a spline basis expansion could be considered a way of using
"wider bins"? By spline expansion I mean find a set of spline
functions and find (by least squares) the linear combinations of these
functions that best represent each spectrum I have, and use the
coefficients of these linear combinations as my new variables. Is the
explanation of the way that I used splines clear?

[quote]and (b) by using a systematic selection from the bins available.
I will definitely try that, but the reason I still try to use all[/quote]
variables is because my, say, client, is interested in
"enlightening" (using Ray's words), i.e., "which variables are the
most important ones?"


[quote]Is hte overall R^2   almost the same?
Yes, I suffer to see differences between the overall covariance matrix[/quote]
(http://www.compumag2009.com/f/r.png) and the within-class covariance
matrix (http://www.compumag2009.com/f/cw.png) (my data is mean-
centered).


[quote]Are the coefficients now intelligible?
I have tried LDA on the coefficients of the splines expansion and I at[/quote]
least I get a loadings plot which is easier to interpret (http://
www.compumag2009.com/f/splinesloading.png). However, the
interpretation is almost completely different, i.e., the variables
that rise as important are in different regions of the spectra
(compare http://www.compumag2009.com/f/loadings_abs.png to
http://www.compumag2009.com/f/splinesloadingabs.png). This is so
confusing!
 
Rich Ulrich...
Posted: Tue Oct 27, 2009 5:11 pm
Guest
On Tue, 27 Oct 2009 08:54:46 -0700 (PDT), Julio
<juliotrevisan at (no spam) gmail.com> wrote:

[quote]Hi, Rich

Thanks for your answer.
[/quote]
I have a couple of other comments, but my main
interest at this point is in the correlation matrix,
and its oddities.

[quote]
When I try searching the web < spectroscopy regression confounding>,
I found too many discussions.  You will have a better chance if you
specify your field.  I saw "NMR" articles with 200+ bins.  
Thanks for your searching suggestions.

I am working with FTIR (Fourier-Transform InfraRed) spectroscopy
[/quote]
I figured it was FFT. I've used FFT results from brain EEGs, which
presented 6 bins in all... and my FFT experience is "zero" beyond
that.

[quote]
 - If I were to try to explore such data on my own, without
further advice, my first explorations would entail reducing the
number of predictors to 50 or even to 10 -- (a) by using wider bins,
By wider bins, do you mean averaging neighbouring variables?
Perhaps a spline basis expansion could be considered a way of using
"wider bins"? By spline expansion I mean find a set of spline
functions and find (by least squares) the linear combinations of these
functions that best represent each spectrum I have, and use the
coefficients of these linear combinations as my new variables. Is the
explanation of the way that I used splines clear?
[/quote]
I don't follow,but I can remain in the dark. What I had in mind was
(say) simply taking the total (or average) of several adjoining
numbers. Using a smaller set of variables this way might test,
informally,whether there is any value or need to use Frequencies-bins
as narrow as the ones you have. Did you achieve that?

[quote]
and (b) by using a systematic selection from the bins available.
I will definitely try that, but the reason I still try to use all
variables is because my, say, client, is interested in
"enlightening" (using Ray's words), i.e., "which variables are the
most important ones?"


Is hte overall R^2   almost the same?
[/quote]
What I intended to ask, above, was : "Is the discrimination
between groups essentially as good with 50 variables as it
is with 250?" That is -- Do the results provide a *need* to
use 250 bins? I think you are answering something else....

Having seen the correlations, described below, I would
also suggest that you might look at what prediction that
you can achieve by using only specific *ranges* of bins,
in particular, using separate sets where the intercorrelations
are high. ("Is the whole range useful/ necessary?")


[quote]Yes, I suffer to see differences between the overall covariance matrix
(http://www.compumag2009.com/f/r.png) and the within-class covariance
matrix (http://www.compumag2009.com/f/cw.png) (my data is mean-
centered).
[/quote]
Using color is an effective way to show those. "Blinking"
between pictures suggests a slight difference, but very slight.

The narrow diagonals are, indeed, 0.90 or so -- so the variables
can easily act as "suppressors", as I figured.

What is most impressive, and puzzling to me, are the "boxes" of
high correlations, especially the big one in the middle which is
bordered by negative correlations. What I can recognize
is that this is artifact of some sort.

The important question is whether *you* recognize what
is going on. Do these represent something like "phase-
transitions" for materials?
Should that guide how you look at the variables?

On the negative side -- I hope that these pictures are
not this odd because of artifacts of the FFT.


[quote]

Are the coefficients now intelligible?
I have tried LDA on the coefficients of the splines expansion and I at
least I get a loadings plot which is easier to interpret (http://
www.compumag2009.com/f/splinesloading.png). However, the
interpretation is almost completely different, i.e., the variables
that rise as important are in different regions of the spectra
(compare http://www.compumag2009.com/f/loadings_abs.png to
http://www.compumag2009.com/f/splinesloadingabs.png). This is so
confusing!
[/quote]
From what I have seen, I suspect -- or wonder -- if you can
achieve your discrimination with some handful of variables.
If that is the case, then probably also have a wide range of
choices of exactly which variables.

You say you want to narrow down the "answer" in order
to point to a small number of "important" bins. I suggest that
you think about reducing the number of bins in some systematic
ways *before* reaching a conclusion. Think about, "What
does a *good* answer look like?" I've made some suggestions
about reducing the number of bins, but those are rooted in
'my total ignorance of your problem. What sort of reduction
makes sense to you?

--
Rich Ulrich
 
Julio...
Posted: Wed Oct 28, 2009 6:31 am
Guest
Hi, Rich

[quote]I don't follow,but I can remain in the dark.  What I had in mind was
(say) simply taking the total (or average) of several adjoining
numbers.  Using a smaller set of variables this way might test,
informally,whether there is any value or need to use Frequencies-bins
as narrow as the ones you have.  Did you achieve that?
[/quote]
I am about to try incremental/decremental/oscillatory feature
selection, but I am unsure about the classification validation method
to use. I have a library with SVM, NN, kNN, LDA, Least Squares, might
choose one of the last ones for being quicker, maybe.

Meanwhile I have investigated about this correlation matrix structure.


[quote]What is most impressive, and puzzling to me, are the "boxes" of
high correlations, especially the big one in the middle which is
bordered by negative correlations.  What I can recognize
is that this is artifact of some sort.  
[/quote]
The puzzling structure is mostly sure due to the normalization method
that I apply to the raw data. Due do measurement conditions, the raw
spectra come each one in a different scale. I don't believe that it is
possible to know the exact scale to put each spectrum into, so what I
do is normalization to the highest peak in each spectrum, so all
spectra match unity in their highest peak. This is known as the "amide
I peak" and occurs in the same variable region, sometimes with a
slight shift.

I believe it is easiest to take a look at the spectra to figure out
what this normalization is about: http://www.compumag2009.com/f/data.png

The thing is that the normalization seems to destroy the correlation
structure. I obtained a heat map of the correlation matrix *before*
normalization, and it looks very different: http://www.compumag2009.com/f/corr_unnorm.png
..

Pre-treatment of spectroscopy data seems to be the toughest problem. I
wonder there could be a smarter normalization procedure.


[quote]
The important question is whether *you* recognize what
is going on.  Do these represent something like "phase-
transitions"  for materials?  
Should that guide how you look at the variables?
[/quote]
The explanation is not physical, but as I said, is introduced by
normalization.

[quote]Think about, "What
does a *good*  answer look like?"  I've made some suggestions
about reducing the number of bins, but those are rooted in
'my total ignorance of your problem.  What sort of reduction
makes sense to you?
[/quote]
I like the way you stimulate thinking through the questions you pose.
However, in what sense do you mean "answer"?
 
Rich Ulrich...
Posted: Wed Oct 28, 2009 3:38 pm
Guest
On Wed, 28 Oct 2009 09:31:24 -0700 (PDT), Julio
<juliotrevisan at (no spam) gmail.com> wrote:

[quote]Hi, Rich

I don't follow,but I can remain in the dark.  What I had in mind was
(say) simply taking the total (or average) of several adjoining
numbers.  Using a smaller set of variables this way might test,
informally,whether there is any value or need to use Frequencies-bins
as narrow as the ones you have.  Did you achieve that?

I am about to try incremental/decremental/oscillatory feature
selection, but I am unsure about the classification validation method
to use. I have a library with SVM, NN, kNN, LDA, Least Squares, might
choose one of the last ones for being quicker, maybe.
[/quote]
Usually we call that "stepwise selection" and it is most often
a bad idea, in general. In particular, it might help to shed some
light on your data. I'd be curious as to whether you can achieve
your full effective selection with just 2 or 3 variables.

[quote]
Meanwhile I have investigated about this correlation matrix structure.


What is most impressive, and puzzling to me, are the "boxes" of
high correlations, especially the big one in the middle which is
bordered by negative correlations.  What I can recognize
is that this is artifact of some sort.  

The puzzling structure is mostly sure due to the normalization method
that I apply to the raw data. Due do measurement conditions, the raw
spectra come each one in a different scale. I don't believe that it is
possible to know the exact scale to put each spectrum into, so what I
do is normalization to the highest peak in each spectrum, so all
spectra match unity in their highest peak. This is known as the "amide
I peak" and occurs in the same variable region, sometimes with a
slight shift.
[/quote]
For the EEG data that I analyzed, the literature suggested that
"total power" varied enormously between individuals, if not
between sessions. I followed my expert's advice of initially
normalizing to "percent of total power." That would work out
somewhat differently from using "peak." - I also found, for my
data, that scatter-plots showed more regularity if I further
converted the proportions into logits, log (p/(1-p) ).

[quote]
I believe it is easiest to take a look at the spectra to figure out
what this normalization is about: http://www.compumag2009.com/f/data.png

The thing is that the normalization seems to destroy the correlation
structure. I obtained a heat map of the correlation matrix *before*
normalization, and it looks very different: http://www.compumag2009.com/f/corr_unnorm.png
[/quote]
Those are interesting pictures. The overall effect of "power"
seems to be the big difference between correlations, so that
what was 0.35 to 0.90+ for "raw" was from negative to 0.90+
for standardized.

However, I dispute part of the notion that it "looks very different".
The boxes that I was concerned with are still there. Your other
picture suggests an explanation. Possibly, the boxes and lines
correspond to troughs, as opposed to peaks.

The nature of the statistical artifact suggests to me that, perhaps,
it could be that some of the 13 000 data-lines are "off" in some
major way, not showing the same peaks and troughs. By providing
the most extreme scores "after normalizing", these cases might help
be overly-influential in some of the correlations.

Throwing out outliers is something to be done cautiously, if at all.
Is it possible, or useful, to select-out some of the raw data for
reasons *beyond* their appearance? Are there a number of
spectra that look particularly odd?

[quote].

Pre-treatment of spectroscopy data seems to be the toughest problem. I
wonder there could be a smarter normalization procedure.



The important question is whether *you* recognize what
is going on.  Do these represent something like "phase-
transitions"  for materials?  
Should that guide how you look at the variables?

The explanation is not physical, but as I said, is introduced by
normalization.

Think about, "What
does a *good*  answer look like?"  I've made some suggestions
about reducing the number of bins, but those are rooted in
'my total ignorance of your problem.  What sort of reduction
makes sense to you?

I like the way you stimulate thinking through the questions you pose.
However, in what sense do you mean "answer"?
[/quote]
Well, astronomers looking at stars can be very concerned with
precise frequencies, since these represent particular elemental
compositions. On the other hand, I look at your data and wonder
if each of 8 or so peaks, plus 8 troughs, should be each
represented by one number like "total power in the range" or
"proportion of total power in the range". The shape of the
curve - scores and relative scores - might be relevant with
15 or 20 measurements, rather than 230 measurements.

Does this represent an acceptable kind of analysis, or a
suitable start toward describing the data and the differences
between groups? What *do* the peaks/ troughs represent?
I can't look at the data without imagining that it ought to
be relevant.

--
Rich Ulrich
 
Ray Koopman...
Posted: Thu Oct 29, 2009 11:47 am
Guest
On Oct 27, 8:30 am, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]Hi Ray

Thanks for your answer.

You don't say how many groups you have. The explanation that follows
is for two groups. Something like it may apply when there are more
than two groups, but I haven't checked.

I forgot to mention that. Indeed I have 2 groups in most of the cases,
but sometimes more.

The discrimant coefficients are proportional to C^-1 * m, where C is
the pooled within-groups covariance matrix and m is the difference
between the two mean vectors.

Indeed, I confirmed this implementing C^-1*m and comparing to the
loadings vector (plot not shown).

If C has Toeplitz structure then C^-1 will be tridiagonal,
with positive diagonals and negative immediate offdiagonals,
and I suspect that could create what you're seeing.

C (http://www.compumag2009.com/f/cw.png) does not have Toeplitz
structure, as you can see in the figure, but its inverse are quite the
way you described C^-1 (http://www.compumag2009.com/f/invcwm.png)

More generally, though, the coefficients in LDA, like those in
ordinary regression, are designed to predict, not to enlighten. They
may sometimes help us make sense out of the data, but they should not
be expected to do so. If you want to understand your data, look at the
vector of covariances or correlations of the LDA scores with the
variables.

I was keen to test this suggestion and I found this correlation vector
(http://www.compumag2009.com/f/finalcorr.png) to be proportional to my
class mean vectors (http://www.compumag2009.com/f/means.png). By the
way, my class mean vectors are proportional to each other, as I have
only 2 classes (class=group), much more spectra of one class than the
other class, and my dataset has been standardized (mean-centered and
normalized succeeded by division by standardard deviation for each
variable).

Why is the vector you suggester proportional to one of my class means
vector?
[/quote]
The vector I suggested -- call it z -- is proportional to m = m1-m2 =
the difference between the two mean vectors. You centered your data,
so n2*m1 + n1*m2 = 0, where n1 & n2 are the two sample sizes. This
implies that z is also proportional to both m1 and m2, with one
constant of proportionality being positive and the other negative.

The fact that z is proportional to m is something I should have
realized. It follows from the fact that, when there are only two
groups, the LDA coefficients are proportional to the ordinary least-
squares linear regression coefficients when the centered variables are
used as predictors and group (dummy-coded) is the dependent variable.
 
Greg Heath...
Posted: Tue Nov 03, 2009 4:51 am
Guest
On Oct 25, 7:56 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]Hi,

I have been following the posts from this group for a while, always
appreciating the quality of the discussions.

This time I would like to ask for some help.

I apply Fisher's Linear Discriminant to spectroscopy dataset having
235 variables (so-called 'wavenumbers') and about 13000 instances (so-
called 'spectra'). Spectra can be plotted as curves of smooth shape,
thus neighbouring variables are highly correlated.

Well, the "loadings curves" (the columns of the LDA transformation
matrix itself, i.e. the coefficients of the linear combination that
maps the original variables onto the "LDA scores" space) have
alternated high-amplitude positive and negative values.

I have searched for an explanation but I couldn't find any books or
publications mentioning similar cases. Some help will be very much
appreciated. I don't have a lot of statistics background, so I can't
get a real insight into the results I described.

Here is an example:http://www.compumag2009.com/f/loadings.png
(loadings plot)

http://www.compumag2009.com/f/loadings_abs.png
(loadings plot, absolute value)

In short: why does the Fisher's LDA loadings curve look so weird?
[/quote]
Energy in neighboring wavelengths are highly correlated.
Reduce the dimensionality of the input via some sort of
lowpass filtering.
To help make a choice between the original and filtered alternatives,
try looking at the plots of the class means and their differences.
Very often important regions for classification are highlighted by
plotting the differences in neighboring wavelengths.
Also of interest is the plot of output-input correlation coefficients
vs wavelength where the output is the input conditional posterior
probability binary.target {0,1}.

Hope this helps,

Greg
 
Greg Heath...
Posted: Tue Nov 03, 2009 5:34 pm
Guest
On Oct 27, 11:30 am, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]Hi Ray

Thanks for your answer.

You don't say how many groups you have. The explanation that follows
is for two groups. Something like it may apply when there are more
than two groups, but I haven't checked.

I forgot to mention that. Indeed I have 2 groups in most of the cases,
but sometimes more.

The discrimant coefficients are proportional to C^-1 * m, where C is
the pooled within-groups covariance matrix and m is the difference
between the two mean vectors.

Indeed, I confirmed this implementing C^-1*m and comparing to the
loadings vector (plot not shown).

If C has Toeplitz structure then C^-1 will be tridiagonal, with positive
diagonals and negative immediate
offdiagonals, and I suspect that could create what you're seeing.

C (http://www.compumag2009.com/f/cw.png) does not have Toeplitz
structure, as you can see in the figure, but its inverse are quite the
way you described C^-1 (http://www.compumag2009.com/f/invcwm.png)
[/quote]
This plot of W = inv(C)*m indicates that there are several frequency
bands that are irrelevant for classifications. This can be confirmed
by
at the plots of W.*m = (W1*m1, W2*m2,...) vs wavelength.

[quote]More generally, though, the coefficients in LDA, like those in
ordinary regression, are designed to predict, not to enlighten. They
may sometimes help us make sense out of the data, but they should not
be expected to do so. If you want to understand your data, look at the
vector of covariances or correlations of the LDA scores with the
variables.

I was keen to test this suggestion and I found this correlation vector
(http://www.compumag2009.com/f/finalcorr.png) to be proportional to my
class mean vectors (http://www.compumag2009.com/f/means.png).
[/quote]
It is only proportional to one of them.

ore importantly, it is proportional to the differences in class means!

[quote]By the
way, my class mean vectors are proportional to each other,
[/quote]
The plot does not support this observation.

[quote]as I have
only 2 classes (class=group), much more spectra of one class than the
other class, and my dataset has been standardized (mean-centered and
normalized succeeded by division by standardard deviation for each
variable).
[/quote]
The scale of the measurements for the two classes appears to be
several
orders of magnitude.

Therefore, the correlation structure is irrelevant. You should get
the same classification performance by replacing C with the unit
matrix
and discriminating solely on the basis of total energy.

I am now suspicious about the calibrationsion of the spectra for the
two
classes. Are the measurements from the same equipment?

[quote]Why is the vector you suggester proportional to one of my class means
vector?
[/quote]
Again it is proportional to the difference.

Hope this helps.

Greg

P.S. Are you sure that the measurements for both classes are
correctly
calibrated?
 
Greg Heath...
Posted: Tue Nov 03, 2009 5:52 pm
Guest
On Oct 27, 10:54 am, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]Hi, Rich

Thanks for your answer.

When I try searching the web < spectroscopy regression confounding>,
I found too many discussions.  You will have a better chance if you
specify your field.  I saw "NMR" articles with 200+ bins.  

Thanks for your searching suggestions.

I am working with FTIR (Fourier-Transform InfraRed) spectroscopy

 - If I were to try to explore such data on my own, without
further advice, my first explorations would entail reducing the
number of predictors to 50 or even to 10 -- (a) by using wider bins,

By wider bins, do you mean averaging neighbouring variables?
[/quote]
Recommended..

[quote]Perhaps a spline basis expansion could be considered a way of using
"wider bins"? By spline expansion I mean find a set of spline
functions and find (by least squares) the linear combinations of these
functions that best represent each spectrum I have, and use the
coefficients of these linear combinations as my new variables. Is the
explanation of the way that I used splines clear?
[/quote]
Not recommended.

[quote]and (b) by using a systematic selection from the bins available.

I will definitely try that, but the reason I still try to use all
variables is because my, say, client, is interested in
"enlightening" (using Ray's words), i.e., "which variables are the
most important ones?"
[/quote]
Try deleting the bands I identified in a previous post.

[quote]Is hte overall R^2   almost the same?

Yes, I suffer to see differences between the overall covariance matrix
(http://www.compumag2009.com/f/r.png) and the within-class covariance
matrix (http://www.compumag2009.com/f/cw.png) (my data is mean-
centered).
[/quote]
That is because the scale of one class is orders of magnitde higher
than the other.

[quote]Are the coefficients now intelligible?

I have tried LDA on the coefficients of the splines expansion and I at
least I get a loadings plot which is easier to interpret
(http://www.compumag2009.com/f/splinesloading.png). However, the
interpretation is almost completely different, i.e., the variables
that rise as important are in different regions of the spectra
(comparehttp://www.compumag2009.com/f/loadings_abs.png to
http://www.compumag2009.com/f/splinesloadingabs.png). This is so
confusing!
[/quote]
I recommend forgetting about the spline fit and just combine
neighboring bins.

Hope this helps.

Greg
 
Greg Heath...
Posted: Tue Nov 03, 2009 6:13 pm
Guest
On Oct 28, 11:31 am, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]Hi, Rich

I don't follow,but I can remain in the dark.  What I had in mind was
(say) simply taking the total (or average) of several adjoining
numbers.  Using a smaller set of variables this way might test,
informally,whether there is any value or need to use Frequencies-bins
as narrow as the ones you have.  Did you achieve that?
[/quote]
Highly recommended approach.

[quote]I am about to try incremental/decremental/oscillatory feature
selection,
[/quote]
Unecessary. The plots you have posted clearly indicate the location
of ineffective frequency bands.

[quote]but I am unsure about the classification validation method
to use. I have a library with SVM, NN, kNN, LDA, Least Squares, might
choose one of the last ones for being quicker, maybe.

Meanwhile I have investigated about this correlation matrix structure.

What is most impressive, and puzzling to me, are the "boxes" of
high correlations, especially the big one in the middle which is
bordered by negative correlations.  What I can recognize
is that this is artifact of some sort.  

The puzzling structure is mostly sure due to the normalization method
that I apply to the raw data. Due do measurement conditions, the raw
spectra come each one in a different scale.
[/quote]
AHA!

[quote]I don't believe that it is
possible to know the exact scale to put each spectrum into, so what I
do is normalization to the highest peak in each spectrum, so all
spectra match unity in their highest peak. This is known as the "amide
I peak" and occurs in the same variable region, sometimes with a
slight shift.
[/quote]
I have found (spectral classification of stars) that this is an
unreliable
method of calibration. I have had much better results by normalizing
with respect to the total energy. All information can be retained by
adding the average energy as another variable.

[quote]I believe it is easiest to take a look at the spectra to figure out
what this normalization is about:http://www.compumag2009.com/f/data.png

The thing is that the normalization seems to destroy the correlation
structure. I obtained a heat map of the correlation matrix *before*
normalization, and it looks very different:http://www.compumag2009.com/f/corr_unnorm.png
.
Pre-treatment of spectroscopy data seems to be the toughest problem. I
wonder there could be a smarter normalization procedure.
[/quote]
See above.

[quote]The important question is whether *you* recognize what
is going on.  Do these represent something like "phase-
transitions"  for materials?  
Should that guide how you look at the variables?

The explanation is not physical, but as I said, is introduced by
normalization.
[/quote]
Try the normalization I suggested.

[quote]Think about, "What
does a *good*  answer look like?"  I've made some suggestions
about reducing the number of bins,
[/quote]
Agree.


_____SNIP

Hope this helps.

Greg
 
Greg Heath...
Posted: Tue Nov 03, 2009 6:38 pm
Guest
On Oct 28, 4:38 pm, Rich Ulrich <rich.ulr... at (no spam) comcast.net> wrote:
[quote]On Wed, 28 Oct 2009 09:31:24 -0700 (PDT), Julio

juliotrevi... at (no spam) gmail.com> wrote:
Hi, Rich

I don't follow,but I can remain in the dark.  What I had in mind was
(say) simply taking the total (or average) of several adjoining
numbers.  Using a smaller set of variables this way might test,
informally,whether there is any value or need to use Frequencies-bins
as narrow as the ones you have.  
[/quote]

Agree.

[quote]Did you achieve that?

I am about to try incremental/decremental/oscillatory feature
selection, but I am unsure about the classification validation method
to use. I have a library with SVM, NN, kNN, LDA, Least Squares, might
choose one of the last ones for being quicker, maybe.

Usually we call that "stepwise selection"  and it is most often
a bad idea, in general.  In particular, it might help to shed some
light on your data.  I'd be curious as to whether you can achieve
your full effective selection with just 2 or 3 variables.
[/quote]
Probably not.

The plot of inv(C)*m indicates the presence of 6 or more
influential frequency bands.

Using a succession of larger bins can help.

[quote]Meanwhile I have investigated about this correlation matrix structure.

What is most impressive, and puzzling to me, are the "boxes" of
high correlations, especially the big one in the middle which is
bordered by negative correlations.  What I can recognize
is that this is artifact of some sort.  

The puzzling structure is mostly sure due to the normalization method
that I apply to the raw data. Due do measurement conditions, the raw
spectra come each one in a different scale. I don't believe that it is
possible to know the exact scale to put each spectrum into, so what I
do is normalization to the highest peak in each spectrum, so all
spectra match unity in their highest peak. This is known as the "amide
I peak" and occurs in the same variable region, sometimes with a
slight shift.

For the EEG data that I analyzed, the literature suggested that
"total power"  varied enormously between individuals, if not
between sessions.  I followed my expert's advice of initially
normalizing to "percent of total power."  That would work out
somewhat differently from using "peak."    - I also found, for my
data, that scatter-plots showed more regularity if I further
converted the proportions into logits,  log (p/(1-p) ).  
[/quote]
That's interesting.

[quote]I believe it is easiest to take a look at the spectra to figure out
what this normalization is about:http://www.compumag2009.com/f/data.png
[/quote]
Whoa!

How about separate class-conditional plots?

The mean plots indicated a different scale of energy levels.

This plot certainly doesn't show that.


[quote]The thing is that the normalization seems to destroy the correlation
structure. I obtained a heat map of the correlation matrix *before*
normalization, and it looks very different:http://www.compumag2009.com/f/corr_unnorm.png

Those are interesting pictures.  The overall effect of "power"
seems to be the big difference between correlations, so that
what was 0.35 to 0.90+  for "raw"  was from negative to 0.90+
for standardized.

However, I dispute part of the notion that it "looks very different".
The boxes that I was concerned with are still there.  Your other
picture suggests an explanation.  Possibly, the boxes and lines
correspond to troughs, as opposed to peaks.  

The nature of the statistical artifact suggests to me that, perhaps,
it could be that some of the 13 000   data-lines are "off" in some
major way, not showing the same peaks and troughs.  By providing
the most extreme scores "after normalizing",  these cases might help
be overly-influential in some of the correlations.

Throwing out outliers is something to be done cautiously, if at all.
Is it possible, or useful, to select-out some of the raw data for
reasons *beyond*  their appearance?   Are there a number of
spectra that look particularly odd?





.

Pre-treatment of spectroscopy data seems to be the toughest problem. I
wonder there could be a smarter normalization procedure.

The important question is whether *you* recognize what
is going on.  Do these represent something like "phase-
transitions"  for materials?  
Should that guide how you look at the variables?

The explanation is not physical, but as I said, is introduced by
normalization.

Think about, "What
does a *good*  answer look like?"  I've made some suggestions
about reducing the number of bins, but those are rooted in
'my total ignorance of your problem.  What sort of reduction
makes sense to you?

I like the way you stimulate thinking through the questions you pose.
However, in what sense do you mean "answer"?

Well, astronomers looking at stars can be very concerned with
precise frequencies, since these represent particular elemental
compositions.  On the other hand, I look at your data and wonder
if each of 8 or so peaks, plus 8 troughs, should be each
represented by one number like "total power in the range" or
"proportion of total power in the range".   The shape of the
curve - scores and relative scores -  might be relevant with
15 or 20 measurements, rather than 230 measurements.  

Does this represent an acceptable kind of analysis, or a
suitable start toward describing the data and the differences
between groups?    What *do*  the peaks/ troughs represent?
I can't look at the data without imagining that it ought to
be relevant.
[/quote]
Good advice.

Greg
 
Ray Koopman...
Posted: Tue Nov 03, 2009 8:09 pm
Guest
On Nov 3, 7:34 pm, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
[quote][...]
More generally, though, the coefficients in LDA, like those in
ordinary regression, are designed to predict, not to enlighten.
They may sometimes help us make sense out of the data, but they
should not be expected to do so. If you want to understand your
data, look at the vector of covariances or correlations of the
LDA scores with the variables.

I was keen to test this suggestion and I found this correlation vector
(http://www.compumag2009.com/f/finalcorr.png) to be proportional to
my class mean vectors (http://www.compumag2009.com/f/means.png).

It is only proportional to one of them.
[/quote]
It's proportional to both of them: it's positively proportional to
the blue (T) curve, and negatively proportional to the red (N) curve.
The red and blue curves are negatively proportional to one another.
 
 
Page 1 of 3    Goto page 1, 2, 3  Next
All times are GMT - 5 Hours
The time now is Sun Nov 22, 2009 8:55 am