| |
 |
|
|
Science Forum Index » Statistics - Math Forum » Nonparametric covariance equality test...
Page 1 of 1
|
| Author |
Message |
| Edward Jensen... |
Posted: Mon Jun 30, 2008 10:56 am |
|
|
|
Guest
|
Hi.
I'm applying Fisher's discriminant analysis to a data set which is not
normal distributed. In Fisher's original formulation, the only requirement
is that the covariance matrices of the groups are equal. Is there a method
to verify this with non-normal data?
Thanks in advance. |
|
|
| Back to top |
|
| Paul Rubin... |
Posted: Mon Jun 30, 2008 3:32 pm |
|
|
|
Guest
|
Edward Jensen wrote:
Quote: Hi.
I'm applying Fisher's discriminant analysis to a data set which is not
normal distributed. In Fisher's original formulation, the only requirement
is that the covariance matrices of the groups are equal. Is there a method
to verify this with non-normal data?
I don't know how to test equality of covariance matrices off-hand, but ...
Fisher's discriminant function is optimal (in an expected accuracy
sense) if the samples are multivariable normal with equal covariance
matrices. If they're not normal, and not particularly close to normal,
I'm not sure that Fisher's model is a good choice.
If the populations are multivariable normal but with unequal covariance
matrices, the optimal (again with respect to expected accuracy) function
is a quadratic, due I believe to someone named Smith. Unless you are
wedded to having a linear function (or wedded to citing Fisher, not a
bad name to drop with respect to reviewers), you might consider using
Smith's function. Again, normality is assumed (with the usual
hand-waving for "almost normal is good enough").
For thoroughly non-Gaussian data (e.g., Bernoulli attributes), I'd look
elsewhere.
/Paul |
|
|
| Back to top |
|
| Edward Jensen... |
Posted: Mon Jun 30, 2008 4:32 pm |
|
|
|
Guest
|
"Paul Rubin" <rubin at (no spam) msu.edu> wrote in message
news:xpbak.11746$N87.7414 at (no spam) nlpi068.nbdc.sbc.com...
Quote: Edward Jensen wrote:
Hi.
I'm applying Fisher's discriminant analysis to a data set which is not
normal distributed. In Fisher's original formulation, the only
requirement is that the covariance matrices of the groups are equal. Is
there a method to verify this with non-normal data?
I don't know how to test equality of covariance matrices off-hand, but ...
Fisher's discriminant function is optimal (in an expected accuracy sense)
if the samples are multivariable normal with equal covariance matrices.
If they're not normal, and not particularly close to normal, I'm not sure
that Fisher's model is a good choice.
It's interesting you mention it, because that's a point I never fully
understood. I know that LDA with normal populations is optimal with respect
to execpted cost of misclassification and equivalently the total probability
of misclassification, but in what sense can Fisher's original formulation be
considered optimal? According to my source (Johnson & Wichern) Fisher's
linear combination maximally separates the sample means.
Is the point that only by assuming normality we are guaranteed a minimal
probability of misclassification, and with non-normal data we can't say
anything about population separability but only sample means seperability?
If that is the case, then I guess Fisher's original formulation is only good
for exploratory data analysis and thusly can be used as a classification
rule.
Quote: For thoroughly non-Gaussian data (e.g., Bernoulli attributes), I'd look
elsewhere.
Actually, my data is a feature vector extracted from time series. With LDA
I have obtained a succesful classification ratio of 94.5% (expected actual
error rate with cross-validation procedure). I have also tried a logistic
regression model and got almost the same result. Can you recommend a better
classifier for this type of problem? The concept of support vector machines
pops up quite a lot in the litteratur, but I have never looked into it. I
would perfer to stay in a statistical framework, where proper inteference
methods can be applied, but that might be difficult without normality.
Thanks for the feedback. |
|
|
| Back to top |
|
| Paul Rubin... |
Posted: Mon Jun 30, 2008 8:03 pm |
|
|
|
Guest
|
Edward Jensen wrote:
Quote: Fisher's discriminant function is optimal (in an expected accuracy sense)
if the samples are multivariable normal with equal covariance matrices.
If they're not normal, and not particularly close to normal, I'm not sure
that Fisher's model is a good choice.
It's interesting you mention it, because that's a point I never fully
understood. I know that LDA with normal populations is optimal with respect
to execpted cost of misclassification and equivalently the total probability
of misclassification, but in what sense can Fisher's original formulation be
considered optimal? According to my source (Johnson & Wichern) Fisher's
linear combination maximally separates the sample means.
Write down the expected cost of misclassification in general terms,
summing (across classes) the product of prior probability, the
misclassification cost, and the integral of the density for this class
over the region where you classify into some _other_ class (which is 1
minus the integral over the region where you classify into _this_
class). Don't try to define the classification regions in functional
terms, just write them as sets (whose disjoint union is the entire
feature space). Now suppose you are pointed to a very small ball in
feature space (whose probability under any of the class distributions is
approximately the density at the center point times the volume of the
ball) and told you can stick it in any classification region you like.
You get the biggest reduction in expected cost by assigning it to the
class that maximizes the product of misclassification cost, prior
probability and density at the ball's center. Now if you write that
condition (the product for this class is less than the product for each
other group), substitute the normal density (with common covariance but
different mean for each class) and do a little algebra, you come up with
Fisher's formula -- which is a bit of a revelation, considering that we
started imposing no restrictions on how you draw the boundaries of the
classification regions (stick the ball anywhere you like) and ended up
with a linear classifier.
Repeat this analysis without the assumption of equal covariances (but
keeping the assumption of normality) and you get Smith's quadratic function.
Quote:
Is the point that only by assuming normality we are guaranteed a minimal
probability of misclassification,
Technically "minimal expected cost", but if the misclassification costs
are equal, then yes, minimal expected misclassification rate.
Quote: and with non-normal data we can't say
anything about population separability but only sample means seperability?
I'm not sure how to interpret this. I come from a background in convex
analysis, so to me "separability" means I can stick a surface in the
feature space and the Montagues will stay on their side and the Capulets
will stay on theirs. That's not true even if you assume multivariable
normal with equal covariances; you still have a nonzero probability of
observations from population straying to the wrong side of the
classification hyperplane.
The point (to me) is that we can say that Fisher's classifier minimizes
expected error rate/cost only if both assumptions hold: multivariable
normality and equal covariance.
Quote: If that is the case, then I guess Fisher's original formulation is only good
for exploratory data analysis and thusly can be used as a classification
rule.
You lost me there.
Quote:
For thoroughly non-Gaussian data (e.g., Bernoulli attributes), I'd look
elsewhere.
Actually, my data is a feature vector extracted from time series. With LDA
I have obtained a succesful classification ratio of 94.5% (expected actual
error rate with cross-validation procedure). I have also tried a logistic
regression model and got almost the same result. Can you recommend a better
classifier for this type of problem? The concept of support vector machines
pops up quite a lot in the litteratur, but I have never looked into it. I
would perfer to stay in a statistical framework, where proper inteference
methods can be applied, but that might be difficult without normality.
I assume you mean "inference" methods, although with statistics you
never know. :-)
SVMs are supported by statistical theory (Vapnik and Chervonenkis, say).
I don't know about inference (I'm not an expert on this) (or anything
else), but you always have something like jackknifing if you need some
kind of interval estimates. One virtue of SVMs is that you get pretty
nonlinear in your classifiers without paying a big computational cost.
(Think of this as embedding in higher dimensional spaces on the cheap,
in much the same way that a polynomial model is just a linear model
after you embed the original feature space in a larger one containing
products and powers of the features.) The catch, to me, is that you
need some sort of insight/intuition into picking the kernel function for
the SVM. (Plus referees may not be familiar with it.)
I've dinked around with other approaches. If you directly minimize the
number of misclassifications in your training sample, V-C theory says
that asymptotically you'll get the optimal classifier. The good news is
that this is utterly free of parametric assumptions. The bad news is
that the underlying problem is an NP-hard optimization problem, which
means "asymptotically" is a long way off.
You might want to look at something like nearest neighbor methods;
they're pretty easy to implement, they're well accepted, and I *think*
(not sure) the question of inference has been dealt with.
Quote:
Thanks for the feedback.
You're welcome.
/Paul |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Mon Dec 01, 2008 10:29 am
|
|