 |
|
| Science Forum Index » Space - Consult Forum » Simple probability... |
|
Page 2 of 2 Goto page Previous 1, 2 |
|
| Author |
Message |
| Ray Koopman... |
Posted: Tue Oct 20, 2009 8:06 pm |
|
|
|
Guest
|
On Oct 20, 3:00 pm, "w.ccarleton" <w.ccarleton at (no spam) gmail.com> wrote:
[quote]On Oct 20, 5:37 pm, Rich Ulrich <rich.ulrich at (no spam) comcast.net> wrote:
On Tue, 20 Oct 2009 07:34:29 -0700 (PDT), "w.ccarleton"
w.ccarleton at (no spam) gmail.com> wrote:
On Oct 19, 6:43 pm, Rich Ulrich <rich.ulrich at (no spam) comcast.net> wrote:
On Sun, 18 Oct 2009 23:50:12 -0700 (PDT), Ray Koopman
koopman at (no spam) sfu.ca> wrote:
On Oct 15, 10:21 am, "w.ccarleton" <w.ccarleton at (no spam) gmail.com
wrote:
On Oct 15, 3:53 am, Ray Koopman <koopmsn at (no spam) sfu.ca> wrote:
[snip, preceding]
But if you are willing to change the question a little then there is
another answer that is just as simple and requires no assumptions.
Do you really want to know what proportion _have_ both A and B?
Wouldn't it be equally informative to know instead what proportion
_lack_ both A and B; or, equivalently, what proportion have at least
one of A or B? Unless one of those it options is clearly better than
the other, I would suggest averaging the two approaches, which leads
to simply pAi+pBi as the combination measure. This would be equivalent
to giving each house a "feature count" score, and then using the
average feature count at each level in the PCA. It can also be easily
extended to situations where there are more than two features, and
(imho) is more in the "linear combination" spirit of PCA than the
proportion having or lacking all the features in the set would be.
That's closer to the spirit of my own post on the 14th -- coming
up with a score.
I suggested a couple of other versions of score, which may be
interesting since there are unequal proportions.
[snip, rest]
--
Rich Ulrich
Thanks again Rich and Ray,
When you say pAi + pBi is equivalent to a 'feature count score' I'm
not exactly following. I had some similar trouble understanding the
scoring system that Rich suggested. Could one or both of you try to
explain that to me again? If I understand it, then I would be able
to add the percentage of houses that contain A to the percentage of
houses that contain B at each level and use that as the new variable.
My confusion is coming from the issue of having then greater than 100%
of houses with both features - I guess I'm asking: why is it valid to
add percentages?
Well, in the sense that you merely want an indicator variable
with a continuous scale, adding percentages gives you one.
If you want a meaningful number, you may (say) assume
independence, and estimate the "percentage with both"
or "percentage with neither", or whatever cell you deem
interesting -- so that it will be easiest to talk about.
I originally thought that you had the flexibility of scoring
each house, and that led me astray.
--
Rich Ulrich
Oh I see... okay so it says nothing meaningful in and of itself,
[/quote]
But it is meaningful! Don't think of pA and pB as proportions or
percents, but as means of the "real" variables, A and B, which are
indicators. Each house has a score on each variable: 1 = present,
0 = absent. With this coding, the mean of each variable happens to
also be a proportion, but it's primarily a mean.
Then make a new variable, T=A+B. Each house has score, 0/1/2,
which equals the number of features it has. The mean of T is the
sum of the means of A and B, or pA+pB, and is between 0 and 2.
There is nothing new in what I'm saying. "Symptom counting" is a
standard technique in my field (psychology). What, if anything,
makes this case different is that there are only two features.
Usually there are several, often many.
[quote]but it gives me a single variable so that I can satisfy the PCA
assumptions and carry on with my life. Do either of you happen to have
a reference that I could read in which this technique may have been
used? I think that I can rationally defend it, but my committee will
likely want to see that it has been used successfully elsewhere.
Thanks very much for your time (both Rich and Ray).
Chris[/quote] |
|
|
| Back to top |
|
|
|
| w.ccarleton... |
Posted: Wed Oct 21, 2009 3:11 am |
|
|
|
Guest
|
On Oct 21, 2:06 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[quote]On Oct 20, 3:00 pm, "w.ccarleton" <w.ccarle... at (no spam) gmail.com> wrote:
On Oct 20, 5:37 pm, Rich Ulrich <rich.ulr... at (no spam) comcast.net> wrote:
On Tue, 20 Oct 2009 07:34:29 -0700 (PDT), "w.ccarleton"
w.ccarle... at (no spam) gmail.com> wrote:
On Oct 19, 6:43 pm, Rich Ulrich <rich.ulr... at (no spam) comcast.net> wrote:
On Sun, 18 Oct 2009 23:50:12 -0700 (PDT), Ray Koopman
koop... at (no spam) sfu.ca> wrote:
On Oct 15, 10:21 am, "w.ccarleton" <w.ccarle... at (no spam) gmail.com
wrote:
On Oct 15, 3:53 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[snip, preceding]
But if you are willing to change the question a little then there is
another answer that is just as simple and requires no assumptions.
Do you really want to know what proportion _have_ both A and B?
Wouldn't it be equally informative to know instead what proportion
_lack_ both A and B; or, equivalently, what proportion have at least
one of A or B? Unless one of those it options is clearly better than
the other, I would suggest averaging the two approaches, which leads
to simply pAi+pBi as the combination measure. This would be equivalent
to giving each house a "feature count" score, and then using the
average feature count at each level in the PCA. It can also be easily
extended to situations where there are more than two features, and
(imho) is more in the "linear combination" spirit of PCA than the
proportion having or lacking all the features in the set would be.
That's closer to the spirit of my own post on the 14th -- coming
up with a score.
I suggested a couple of other versions of score, which may be
interesting since there are unequal proportions.
[snip, rest]
--
Rich Ulrich
Thanks again Rich and Ray,
When you say pAi + pBi is equivalent to a 'feature count score' I'm
not exactly following. I had some similar trouble understanding the
scoring system that Rich suggested. Could one or both of you try to
explain that to me again? If I understand it, then I would be able
to add the percentage of houses that contain A to the percentage of
houses that contain B at each level and use that as the new variable.
My confusion is coming from the issue of having then greater than 100%
of houses with both features - I guess I'm asking: why is it valid to
add percentages?
Well, in the sense that you merely want an indicator variable
with a continuous scale, adding percentages gives you one.
If you want a meaningful number, you may (say) assume
independence, and estimate the "percentage with both"
or "percentage with neither", or whatever cell you deem
interesting -- so that it will be easiest to talk about.
I originally thought that you had the flexibility of scoring
each house, and that led me astray.
--
Rich Ulrich
Oh I see... okay so it says nothing meaningful in and of itself,
But it is meaningful! Don't think of pA and pB as proportions or
percents, but as means of the "real" variables, A and B, which are
indicators. Each house has a score on each variable: 1 = present,
0 = absent. With this coding, the mean of each variable happens to
also be a proportion, but it's primarily a mean.
Then make a new variable, T=A+B. Each house has score, 0/1/2,
which equals the number of features it has. The mean of T is the
sum of the means of A and B, or pA+pB, and is between 0 and 2.
There is nothing new in what I'm saying. "Symptom counting" is a
standard technique in my field (psychology). What, if anything,
makes this case different is that there are only two features.
Usually there are several, often many.
[/quote]
Ah the light just went on... of course it ranges from 0 to 2. I think
that this will work. I should also be able to find an article, or even
a text book, that discusses 'symptom counting' if that's the common
term applied to the technique. Thanks very much!
Chris |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Tue Dec 01, 2009 12:24 am
|
|