| |
 |
|
|
Science Forum Index » Image Processing Forum » The paradox of PCA for improving learning
Page 1 of 1
|
| Author |
Message |
| Guest |
Posted: Mon Dec 04, 2006 9:01 am |
|
|
|
|
Hi,
What is really the motivation behind all works that apply PCA to reduce
dimensionality and then run a classifier in the reduced space? Is there
any improvement in learning?
I read somewhere that this is due to Ockham's razor principle that
simpler models that can explain data are preferred over complex ones.
Similarly computational learning theory provides bounds on
generalization error based on the sum of empirical error and model's
complexity; hence models with fewer degrees of freedom are preferred.
Therefore, say if we want to use a linear classifier to learn faces vs
non-faces, the classifier that is directly applied to raw images
requires N weights if the image has N pixels. However, projecting
images to the most informative eigen faces results in compact features
of length n where n << N. Now the linear classifier needs much fewer
weights. As mentioned above, we now expect to get better generalization
using the latter classifier because it has less expressivity (while the
dimensionality reduction preserves a good portion of the original
information).
However, something that is confusing to me is that we did not count the
eigen vectors as parameters. In fact, given the classifier with the
learned weights in the reduced space; we cannot say whether a test
image is face or non-face without having the information about the
computed eigen faces simply because we cannot extract the features
based on which the classifier works.
Now the most interesting part is that if we count the eigen images that
we considered for projection, we have in fact increased the number of
free parameters that we need for a classification. For example,
consider 20x20 images... If we want to train a linear classifier in the
original space, we will need 400 weights (401 with a bias)... seems
huge number of parameters. So let's use PCA which gives us 400 new
bases, each of which has 400 elements. Say we only pick 10 eigen faces,
so we get 400*10 numbers crucial for our dimensionality reduction. Now
although the linear classifier in the reduced space needs 10 weights
(11 with bias), adding it to the 4000 parameters required for feature
extraction gives 4011 which is much larger than the original 401
weights. So isn't this observation really a negative point for issues
related to generalization based on the number of free parameters?
Thnx
H.M. |
|
|
| Back to top |
|
| Speedy |
Posted: Tue Dec 05, 2006 4:56 am |
|
|
|
Guest
|
hmobahi@gmail.com wrote:
Quote: say if we want to use a linear classifier to learn faces vs
non-faces, the classifier that is directly applied to raw images
requires N weights if the image has N pixels. However, projecting
images to the most informative eigen faces results in compact features
of length n where n << N. Now the linear classifier needs much fewer
weights.
[...]
However, something that is confusing to me is that we did not count the
eigen vectors as parameters.
This is because the PCA is agnostic about the task. It has no knowledge
about what the classifier is asked to do. Instead, the PCA looks only
at the data themselves and tries to extract intrinsic properties of the
data (namely the principal components). For this reason, PCA is usually
counted as being part of feature extraction and not classification.
Hope this helps,
Marcus |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Thu Aug 21, 2008 9:34 pm
|
|