 |
|
| Science Forum Index » Space - Consult Forum » Weird loadings curves in Fisher's Linear Discriminant... |
|
Page 2 of 3 Goto page Previous 1, 2, 3 Next |
|
| Author |
Message |
| Ray Koopman... |
Posted: Tue Nov 03, 2009 8:18 pm |
|
|
|
Guest
|
On Oct 28, 8:31 am, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote][...]
The puzzling structure is mostly sure due to the normalization method
that I apply to the raw data. Due do measurement conditions, the raw
spectra come each one in a different scale. I don't believe that it is
possible to know the exact scale to put each spectrum into, so what
I do is normalization to the highest peak in each spectrum, so all
spectra match unity in their highest peak. This is known as the "amide
I peak" and occurs in the same variable region, sometimes with a
slight shift.
[/quote]
I concur with Rich and Greg: normalize to equate total power. |
|
|
| Back to top |
|
|
|
| Greg Heath... |
Posted: Wed Nov 04, 2009 2:35 pm |
|
|
|
Guest
|
On Nov 4, 1:09 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[quote]On Nov 3, 7:34 pm, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
[...]
More generally, though, the coefficients in LDA, like those in
ordinary regression, are designed to predict, not to enlighten.
They may sometimes help us make sense out of the data, but they
should not be expected to do so. If you want to understand your
data, look at the vector of covariances or correlations of the
LDA scores with the variables.
I was keen to test this suggestion and I found this correlation vector
(http://www.compumag2009.com/f/finalcorr.png) to be proportional to
my class mean vectors (http://www.compumag2009.com/f/means.png).
It is only proportional to one of them.
It's proportional to both of them: it's positively proportional to
the blue (T) curve, and negatively proportional to the red (N) curve.
The red and blue curves are negatively proportional to one another.
[/quote]
You're right. I need to reschedule my eye surgery ASAP.
I am confused about centering and normalization:
Why is the red mean negative?
Why isn't the blue max unity?
Greg |
|
|
| Back to top |
|
|
|
| Julio... |
Posted: Thu Nov 05, 2009 1:52 pm |
|
|
|
Guest
|
[quote]I am confused about centering and normalization:
Why is the red mean negative?
Why isn't the blue max unity?
[/quote]
Hi, Greg
Standardization has been applied *after* normalization [to amide I
peak or other].
In the case of normalization to amide I peak, these peaks are forced
to match vertically: http://www.compumag2009.com/f/data.png
After that (or other normalization), the spectra are standardized
(mean-centering followed by division by standard deviation for each
variable). Then the dataset will look like: http://www.compumag2009.com/f/data_standardized.png.
Each variable has mean zero, but means within class won't be zero.
The magnitude of a class mean is inversely proportional to the number
of samples in that class. Here is something I didn't mention, but may
be relevant: *the number of samples in class "T" is about 10x lower
than in class "N"*. From the 13000 spectra, only about 1200 are from
class "T".
Proportionality of class means follows from overall mean-centering:
sum(points from class 1) + sum(points from
class 2) = 0
sum(points from class 1)*n_samples_1/n_samples_1 + sum(points from
class 2)*n_samples_2/n_samples_2 = 0
mean_1*n_samples_1 +
mean_2*n_samples_2 = 0 |
|
|
| Back to top |
|
|
|
| Julio... |
Posted: Thu Nov 05, 2009 2:00 pm |
|
|
|
Guest
|
[quote]For the EEG data that I analyzed, the literature suggested that
"total power" varied enormously between individuals, if not
between sessions. I followed my expert's advice of initially
normalizing to "percent of total power." That would work out
somewhat differently from using "peak." - I also found, for my
data, that scatter-plots showed more regularity if I further
converted the proportions into logits, log (p/(1-p) ).
[/quote]
I tried several pre-processing sequences.
I was not sure about the meaning of "total power", is it the sum of
all the points of the spectrum? I understand that some types of
spectroscopy give measurements in power units, so I assumed it to be
the case in your suggestion. If so, I referred to this type of
normalization as "normalization to area" in the table below.
Table: http://www.compumag2009.com/f/classification.png
One curious thing is the classifier performance varies very little.
The only explanation I found is, as so much mentioned, excess of
variables. |
|
|
| Back to top |
|
|
|
| Julio... |
Posted: Thu Nov 05, 2009 2:12 pm |
|
|
|
Guest
|
[quote]Usually we call that "stepwise selection" and it is most often
a bad idea, in general. In particular, it might help to shed some
light on your data. I'd be curious as to whether you can achieve
your full effective selection with just 2 or 3 variables.
[/quote]
I am implementing the suggestions from this forum slowly, but I want
to try them all.
Today I coded the forward stepwise feature selection (starting with 1
variable and adding one at a time), with the following result:
http://www.compumag2009.com/f/stepwise_feasel.png
(10 variables achieve around 80% classification).
The classification rate using all variables is around 80% as well. |
|
|
| Back to top |
|
|
|
| Julio... |
Posted: Fri Nov 06, 2009 4:51 am |
|
|
|
Guest
|
[quote]P.S. Are you sure that the measurements for both classes are
correctly
calibrated?
[/quote]
Hi, Greg
Thanks for your help.
As you can see in http://www.compumag2009.com/f/data_standardized.png,
both classes have spectra in the same order or magnitude.
There is a concern about the raw spectra (the ones before pre-
processing of any kind). They come from colonies which have grown on
top of slides. The transformed ones (class "T") tend to be more
sparse, and therefore generate less intensive spectra, but after
normalization, this effect vanishes. |
|
|
| Back to top |
|
|
|
| Julio... |
Posted: Fri Nov 06, 2009 5:02 am |
|
|
|
Guest
|
[quote]The nature of the statistical artifact suggests to me that, perhaps,
it could be that some of the 13 000 data-lines are "off" in some
major way, not showing the same peaks and troughs. By providing
the most extreme scores "after normalizing", these cases might help
be overly-influential in some of the correlations.
Throwing out outliers is something to be done cautiously, if at all.
Is it possible, or useful, to select-out some of the raw data for
reasons *beyond* their appearance? Are there a number of
spectra that look particularly odd?
[/quote]
Actually I have thrown several outliers away. The current dataset
seems to have only spectra that follow more or less the same pattern:
http://www.compumag2009.com/f/data.png
It is hard do answer your question "Is it possible, or useful, to
select-out some of the raw data for reasons *beyond* their
appearance?". I think probably, some spectra are very noisy, even
though they have the standard pattern. Actually I don't know how to
assess spectra for anomalities beyond their appearance? How can I
possibly do that? |
|
|
| Back to top |
|
|
|
| Greg Heath... |
Posted: Fri Nov 06, 2009 5:44 pm |
|
|
|
Guest
|
On Nov 5, 7:00 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]For the EEG data that I analyzed, the literature suggested that
"total power" varied enormously between individuals, if not
between sessions. I followed my expert's advice of initially
normalizing to "percent of total power." That would work out
somewhat differently from using "peak." - I also found, for my
data, that scatter-plots showed more regularity if I further
converted the proportions into logits, log (p/(1-p) ).
I tried several pre-processing sequences.
I was not sure about the meaning of "total power", is it the sum of
all the points of the spectrum? I understand that some types of
spectroscopy give measurements in power units, so I assumed it to be
the case in your suggestion. If so, I referred to this type of
normalization as "normalization to area" in the table below.
Table:http://www.compumag2009.com/f/classification.png
One curious thing is the classifier performance varies very little.
The only explanation I found is, as so much mentioned, excess of
variables.
[/quote]
Well, I don't need any variables.
I say all of the data belongs to class T.
My correct classification rate is
CCR% = 100% * (13000-1200)/13000 = 90.79%
Your linear model will have to do better than that.
You need to mitigate for the unbalances in sample size
by using weighted least squares.
Furthermore, you should report the class-conditional error
rates.
Hope this helps.
Greg |
|
|
| Back to top |
|
|
|
| Greg Heath... |
Posted: Fri Nov 06, 2009 5:45 pm |
|
|
|
Guest
|
On Nov 5, 7:12 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]Usually we call that "stepwise selection" and it is most often
a bad idea, in general. In particular, it might help to shed some
light on your data. I'd be curious as to whether you can achieve
your full effective selection with just 2 or 3 variables.
I am implementing the suggestions from this forum slowly, but I want
to try them all.
Today I coded the forward stepwise feature selection (starting with 1
variable and adding one at a time), with the following result:
http://www.compumag2009.com/f/stepwise_feasel.png
(10 variables achieve around 80% classification).
The classification rate using all variables is around 80% as well.
[/quote]
What are the class-conditional error rates?
Greg |
|
|
| Back to top |
|
|
|
| Greg Heath... |
Posted: Fri Nov 06, 2009 6:18 pm |
|
|
|
Guest
|
On Nov 6, 10:44 pm, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
[quote]On Nov 5, 7:00 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
For the EEG data that I analyzed, the literature suggested that
"total power" varied enormously between individuals, if not
between sessions. I followed my expert's advice of initially
normalizing to "percent of total power." That would work out
somewhat differently from using "peak." - I also found, for my
data, that scatter-plots showed more regularity if I further
converted the proportions into logits, log (p/(1-p) ).
I tried several pre-processing sequences.
I was not sure about the meaning of "total power", is it the sum of
all the points of the spectrum? I understand that some types of
spectroscopy give measurements in power units, so I assumed it to be
the case in your suggestion. If so, I referred to this type of
normalization as "normalization to area" in the table below.
Table:http://www.compumag2009.com/f/classification.png
One curious thing is the classifier performance varies very little.
The only explanation I found is, as so much mentioned, excess of
variables.
Well, I don't need any variables.
I say all of the data belongs to class T.
[/quote]
WHOOPS! I mean class N.
[quote]My correct classification rate is
CCR% = 100% * (13000-1200)/13000 = 90.79%
Your linear model will have to do better than that.
You need to mitigate for the unbalances in sample size
by using weighted least squares.
[/quote]
W = 1/sqrt( 11800) for class N equations
W = 1/sqrt( 1200) for class T equations
Search Google Groups for
greg-heath unbalances priors
[quote]Furthermore, you should report the class-conditional error
rates.
[/quote]
Hope this helps.
Greg |
|
|
| Back to top |
|
|
|
| Ray Koopman... |
Posted: Sat Nov 07, 2009 3:55 pm |
|
|
|
Guest
|
On Oct 29, 1:47 pm, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[quote]The vector I suggested -- call it z -- is proportional to m = m1-m2 =
the difference between the two mean vectors. You centered your data,
so n2*m1 + n1*m2 = 0, where n1 & n2 are the two sample sizes. [...]
n1*m1 + n2*m2 = 0[/quote] |
|
|
| Back to top |
|
|
|
| Greg Heath... |
Posted: Sat Nov 07, 2009 10:06 pm |
|
|
|
Guest
|
On Nov 6, 11:18 pm, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
[quote]On Nov 6, 10:44 pm, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
On Nov 5, 7:00 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
For the EEG data that I analyzed, the literature suggested that
"total power" varied enormously between individuals, if not
between sessions. I followed my expert's advice of initially
normalizing to "percent of total power." That would work out
somewhat differently from using "peak." - I also found, for my
data, that scatter-plots showed more regularity if I further
converted the proportions into logits, log (p/(1-p) ).
I tried several pre-processing sequences.
I was not sure about the meaning of "total power", is it the sum of
all the points of the spectrum? I understand that some types of
spectroscopy give measurements in power units, so I assumed it to be
the case in your suggestion. If so, I referred to this type of
normalization as "normalization to area" in the table below.
Table:http://www.compumag2009.com/f/classification.png
One curious thing is the classifier performance varies very little.
The only explanation I found is, as so much mentioned, excess of
variables.
Well, I don't need any variables.
I say all of the data belongs to class T.
WHOOPS! I mean class N.
My correct classification rate is
CCR% = 100% * (13000-1200)/13000 = 90.79%
Your linear model will have to do better than that.
You need to mitigate for the unbalances in sample size
by using weighted least squares.
W = 1/sqrt( 11800) for class N equations
W = 1/sqrt( 1200) for class T equations
Search Google Groups for
greg-heath unbalances priors
[/quote]
Of course I meant unbalanced. In fact, just try
greg-heath unbalanced
Stop giggling!
[quote]Furthermore, you should report the class-conditional
error rates.
[/quote]
Hope this helps.
Greg |
|
|
| Back to top |
|
|
|
| Julio... |
Posted: Sun Nov 08, 2009 11:52 pm |
|
|
|
Guest
|
On Nov 7, 3:45 am, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
[quote]On Nov 5, 7:12 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
Usually we call that "stepwise selection" and it is most often
a bad idea, in general. In particular, it might help to shed some
light on your data. I'd be curious as to whether you can achieve
your full effective selection with just 2 or 3 variables.
I am implementing the suggestions from this forum slowly, but I want
to try them all.
Today I coded the forward stepwise feature selection (starting with 1
variable and adding one at a time), with the following result:
http://www.compumag2009.com/f/stepwise_feasel.png
(10 variables achieve around 80% classification).
The classification rate using all variables is around 80% as well.
What are the class-conditional error rates?
Greg
[/quote]
Hi Greg
Thanks for your help.
These are typical results I get (10-fold cross-validation; dataset
split: 90% training + 10% testing):
) Confusion matrix:
)
) classified as
) N T
) state-of-nature N 81.47 18.53
) T 25.30 74.70
)
) Classification rate: 80.79% = (81.47%*12054 + 74.70%*1352)/13406
Actually I have 13406 spectra = 12054 N + 1352 T, so a classifier with
no sensitivity to "T" instances would get 12054/13406*100%=89.91%
correct classification. Indeed it seems that the classifier is making
a big sacrifice to achieve some sensitivity.
Julio |
|
|
| Back to top |
|
|
|
| Greg Heath... |
Posted: Thu Nov 12, 2009 10:58 pm |
|
|
|
Guest
|
On Nov 9, 4:52 am, Julio <juliotrevi... at (no spam) gmail.com> wrote:
[quote]On Nov 7, 3:45 am, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
On Nov 5, 7:12 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
Usually we call that "stepwise selection" and it is most often
a bad idea, in general. In particular, it might help to shed some
light on your data. I'd be curious as to whether you can achieve
your full effective selection with just 2 or 3 variables.
I am implementing the suggestions from this forum slowly, but I want
to try them all.
Today I coded the forward stepwise feature selection (starting with 1
variable and adding one at a time), with the following result:
http://www.compumag2009.com/f/stepwise_feasel.png
(10 variables achieve around 80% classification).
The classification rate using all variables is around 80% as well.
What are the class-conditional error rates?
Greg
Hi Greg
Thanks for your help.
These are typical results I get (10-fold cross-validation; dataset
split: 90% training + 10% testing):
) Confusion matrix:
)
) classified as
) N T
) state-of-nature N 81.47 18.53
) T 25.30 74.70
)
) Classification rate: 80.79% = (81.47%*12054 + 74.70%*1352)/13406
Actually I have 13406 spectra = 12054 N + 1352 T, so a classifier with
no sensitivity to "T" instances would get 12054/13406*100%=89.91%
correct classification. Indeed it seems that the classifier is making
a big sacrifice to achieve some sensitivity.
[/quote]
Are you sure a linear classifier is appropriate?
Maybe you need a quadratic classifier or
neural network.
Hope this helps.
Greg |
|
|
| Back to top |
|
|
|
| Greg Heath... |
Posted: Fri Nov 13, 2009 5:35 pm |
|
|
|
Guest
|
On Nov 13, 6:50 pm, Rich Ulrich <rich.ulr... at (no spam) comcast.net> wrote:
[quote]On Fri, 13 Nov 2009 00:58:19 -0800 (PST), Greg Heath
he... at (no spam) alumni.brown.edu> wrote:
On Nov 9, 4:52 am, Julio <juliotrevi... at (no spam) gmail.com> wrote:
On Nov 7, 3:45 am, Greg Heath <he... at (no spam) alumni.brown.edu> wrote:
On Nov 5, 7:12 pm, Julio <juliotrevi... at (no spam) gmail.com> wrote:
Usually we call that "stepwise selection" and it is most often
a bad idea, in general. In particular, it might help to shed some
light on your data. I'd be curious as to whether you can achieve
your full effective selection with just 2 or 3 variables.
I am implementing the suggestions from this forum slowly, but I want
to try them all.
Today I coded the forward stepwise feature selection (starting with 1
variable and adding one at a time), with the following result:
http://www.compumag2009.com/f/stepwise_feasel.png
(10 variables achieve around 80% classification).
The classification rate using all variables is around 80% as well.
What are the class-conditional error rates?
Greg
Hi Greg
Thanks for your help.
These are typical results I get (10-fold cross-validation; dataset
split: 90% training + 10% testing):
) Confusion matrix:
)
) classified as
) N T
) state-of-nature N 81.47 18.53
) T 25.30 74.70
)
) Classification rate: 80.79% = (81.47%*12054 + 74.70%*1352)/13406
Actually I have 13406 spectra = 12054 N + 1352 T, so a classifier with
no sensitivity to "T" instances would get 12054/13406*100%=89.91%
correct classification. Indeed it seems that the classifier is making
a big sacrifice to achieve some sensitivity.
By the way -- Yes, perhaps, but it is a necessary sacrifice.
Are you sure a linear classifier is appropriate?
Maybe you need a quadratic classifier or
neural network.
Or something that starts with multiple types.
I have been reminded of the recent post where the Original
Poster suggested a nearest-neighbor classification when
faced with multiple modes for two populations.
Both that one and this one make me think of using cluster
analysis as a first step.
[/quote]
Yes. This could result in a nearest cluster classifier or
a Radial Basis Function neural network.
Hope rhis helps.
Greg |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Mon Nov 30, 2009 12:35 pm
|
|