Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Math Forum  »  Changes in Two Proportions...
Page 1 of 1    
Author Message
...
Posted: Tue Jul 22, 2008 12:17 am
Guest
Let's say the Group A sample starts at 10%, and the Group B sample
starts at 15%. If each group improves by an absolute 5%, then they
went to 15% and 20%, respectively.

Or I could look at it on a relative basis, where Group A increased by
a relative 50% and Group B by a relative 33%.

Or I could extrapolate to the overall populations and look at the
simple numerical increases, except that Group A is much smaller than
Group B.

I can't quite figure out what is a truly fair pre-post comparison.

I can't use simple population numbers, because the population sizes
are too different.

I can't use percentages, because the 100% upper bound constrains the
population starting at the higher percentage. For example, a student
who always scores 95% on a test is less able to improve than a student
who scores 70% on a test.

How to frame this for meaningful comparison?

Roy - Carpe Noctem
RichUlrich...
Posted: Tue Jul 22, 2008 1:46 pm
Guest
On Mon, 21 Jul 2008 22:17:18 -0700, rayo at (no spam) home.NOT wrote:

Quote:
Let's say the Group A sample starts at 10%, and the Group B sample
starts at 15%. If each group improves by an absolute 5%, then they
went to 15% and 20%, respectively.

This is the additive model, achieved by ANOVA.

Quote:

Or I could look at it on a relative basis, where Group A increased by
a relative 50% and Group B by a relative 33%.

Relative Risk is a usually a bad basis of modeling, compared
to looking at Odds Ratios -- They will not differ much for
low fractions. But if you reverse your labels, your two changes
are "90% to 85%" and "85% to 80%". Those are not at all the same
in description as your first use of Relative Risk, but the Odds
Ratio approach is invariant to the direction.

You can get this explicitly with log-linear or logistic models.

Quote:

Or I could extrapolate to the overall populations and look at the
simple numerical increases, except that Group A is much smaller than
Group B.

I can't quite figure out what is a truly fair pre-post comparison.

I can't use simple population numbers, because the population sizes
are too different.

I can't use percentages, because the 100% upper bound constrains the
population starting at the higher percentage. For example, a student
who always scores 95% on a test is less able to improve than a student
who scores 70% on a test.

How to frame this for meaningful comparison?

The logistic approach is also competitive with the Probit
approach. Probits assume that the underlying distributions --
the generating system that leads to these results --
are Normal, instead of logistic. And there are other
possibilities. I remember a brief discussion of this sort
of issue in DJ Finney's book on Bioassay. The right test
is the one that reflects or preserves the generating
mechanism for the fractions.

In short, there is no analysis that is automatically right.

Usually, the choice does not matter for the tests that
are on hand, because they will all give the same results,
especially when the fractions are in the middle (say,
between 0.20 and 0.80). Especially for extremes, you
may need to argue that one Model is more appropriate
than the others, or else give the reader the choice of
analyses and interpretations.

One area that a modeling debate was active, when I took
health courses in the 1970s, was in the interpretation of the
effects of low levels of radiation. The arguments were both
medical and statistical, with practical applications for power
plants or for nuclear emergencies.
Are there the same Expected Deaths when you expose 100
people to 10 REM versus 10 people to 100 REM? More? Less?
(Further, is "deaths" the only outcome that matters?)

--
Rich Ulrich
...
Posted: Tue Jul 22, 2008 11:16 pm
Guest
Thanks,

These are not matched pairs. This is data from two separate years of a
large national survey.

I have seen a substantial change for a particular measure. Now I'm
teasing out which sub-populations changed the most.

And looking for the clearest way to portray it in a table.

On Tue, 22 Jul 2008 14:46:14 -0400, RichUlrich
<rich.ulrich at (no spam) comcast.net> wrote:

Quote:
On Mon, 21 Jul 2008 22:17:18 -0700, rayo at (no spam) home.NOT wrote:

Let's say the Group A sample starts at 10%, and the Group B sample
starts at 15%. If each group improves by an absolute 5%, then they
went to 15% and 20%, respectively.

This is the additive model, achieved by ANOVA.


Or I could look at it on a relative basis, where Group A increased by
a relative 50% and Group B by a relative 33%.

Relative Risk is a usually a bad basis of modeling, compared
to looking at Odds Ratios -- They will not differ much for
low fractions. But if you reverse your labels, your two changes
are "90% to 85%" and "85% to 80%". Those are not at all the same
in description as your first use of Relative Risk, but the Odds
Ratio approach is invariant to the direction.

You can get this explicitly with log-linear or logistic models.


Or I could extrapolate to the overall populations and look at the
simple numerical increases, except that Group A is much smaller than
Group B.

I can't quite figure out what is a truly fair pre-post comparison.

I can't use simple population numbers, because the population sizes
are too different.

I can't use percentages, because the 100% upper bound constrains the
population starting at the higher percentage. For example, a student
who always scores 95% on a test is less able to improve than a student
who scores 70% on a test.

How to frame this for meaningful comparison?

The logistic approach is also competitive with the Probit
approach. Probits assume that the underlying distributions --
the generating system that leads to these results --
are Normal, instead of logistic. And there are other
possibilities. I remember a brief discussion of this sort
of issue in DJ Finney's book on Bioassay. The right test
is the one that reflects or preserves the generating
mechanism for the fractions.

In short, there is no analysis that is automatically right.

Usually, the choice does not matter for the tests that
are on hand, because they will all give the same results,
especially when the fractions are in the middle (say,
between 0.20 and 0.80). Especially for extremes, you
may need to argue that one Model is more appropriate
than the others, or else give the reader the choice of
analyses and interpretations.

One area that a modeling debate was active, when I took
health courses in the 1970s, was in the interpretation of the
effects of low levels of radiation. The arguments were both
medical and statistical, with practical applications for power
plants or for nuclear emergencies.
Are there the same Expected Deaths when you expose 100
people to 10 REM versus 10 people to 100 REM? More? Less?
(Further, is "deaths" the only outcome that matters?)


Roy - Carpe Noctem
Bruce Weaver...
Posted: Wed Jul 23, 2008 1:36 am
Guest
On Jul 23, 12:16 am, r... at (no spam) home.NOT wrote:
Quote:
Thanks,

These are not matched pairs. This is data from two separate years of a
large national survey.

I have seen a substantial change for a particular measure. Now I'm
teasing out which sub-populations changed the most.

And looking for the clearest way to portray it in a table.

Note: The author of this message requested that it not be
archived. This message will be removed from Groups in
6 days (Jul 30, 12:16 am).


If I may be so bold, why do you ask that your posts not be archived by
Google Groups? Doing so can make it unnecessarily difficult for
someone to follow the thread in future.

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
RichUlrich...
Posted: Wed Jul 23, 2008 3:36 pm
Guest
[rayo, top-posting in response to my Reply ...]

On Tue, 22 Jul 2008 21:16:08 -0700, rayo at (no spam) home.NOT wrote:

Quote:
Thanks,

These are not matched pairs. This is data from two separate years of a
large national survey.

I have seen a substantial change for a particular measure. Now I'm
teasing out which sub-populations changed the most.

And looking for the clearest way to portray it in a table.


I hope that is just a volunteering of extra information; I hope
that you do not mistake my advice as applying only to matched pairs.

So, if your fractions are all in the mid-range, .2 to .8, you can
settle for the simple, additive model. When some the fractions are
small, the multiplicative description is probably better. If some
fractions are high, then you do need to either use Odds Ratios
or consider the relative sizes of (1-P).


By the way, I see in my "header-view" that Bruce is listed as posting
from Google as his "institution". That may account for why he sees
the no-archive message that does not appear in the post as it came
to me.

--
Rich Ulrich
...
Posted: Thu Jul 24, 2008 1:15 am
Guest
On Wed, 23 Jul 2008 16:36:14 -0400, RichUlrich
<rich.ulrich at (no spam) comcast.net> wrote:

Quote:
[rayo, top-posting in response to my Reply ...]

On Tue, 22 Jul 2008 21:16:08 -0700, rayo at (no spam) home.NOT wrote:

Thanks,

These are not matched pairs. This is data from two separate years of a
large national survey.

I have seen a substantial change for a particular measure. Now I'm
teasing out which sub-populations changed the most.

And looking for the clearest way to portray it in a table.


I hope that is just a volunteering of extra information; I hope
that you do not mistake my advice as applying only to matched pairs.

So, if your fractions are all in the mid-range, .2 to .8, you can
settle for the simple, additive model. When some the fractions are
small, the multiplicative description is probably better. If some
fractions are high, then you do need to either use Odds Ratios
or consider the relative sizes of (1-P).


By the way, I see in my "header-view" that Bruce is listed as posting
from Google as his "institution". That may account for why he sees
the no-archive message that does not appear in the post as it came
to me.

Thanks,

I'll wait till I get the data crunched. Then maybe get back with you.
All the percents ought to come out in the 10-25 range.

I don't know what Bruce is talking about, but I don't think anything I
wrote here needs to be passed on to future generations.

Roy - Carpe Noctem
Bruce Weaver...
Posted: Thu Jul 24, 2008 2:22 am
Guest
On Jul 24, 2:15 am, r... at (no spam) home.NOT wrote:

Quote:
I don't know what Bruce is talking about, but I don't think anything I
wrote here needs to be passed on to future generations.

Roy - Carpe Noctem


It appears that Roy is posting via www.giganews.com, not Google. It
must have no archiving as the default setting. The irony is that if
someone quotes when responding, it'll most likely get archived
anyway. ;-)

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
...
Posted: Fri Jul 25, 2008 1:17 am
Guest
On Thu, 24 Jul 2008 05:22:14 -0700 (PDT), Bruce Weaver
<bweaver at (no spam) lakeheadu.ca> wrote:

Quote:
On Jul 24, 2:15 am, r... at (no spam) home.NOT wrote:

I don't know what Bruce is talking about, but I don't think anything I
wrote here needs to be passed on to future generations.

Roy - Carpe Noctem


It appears that Roy is posting via www.giganews.com, not Google. It
must have no archiving as the default setting. The irony is that if
someone quotes when responding, it'll most likely get archived
anyway. Wink

As far as I know, I'm using an ancient version of Free Agent and
tagging into usenet. I'm not going through a website, except for the
link up thru my isp.

This is all very old school -- as is usenet. I'm so old I bought my
second computer from some nerd in Austin who assembled them in his
apartment.

Roy - Carpe Noctem
...
Posted: Fri Jul 25, 2008 1:33 am
Guest
On Wed, 23 Jul 2008 16:36:14 -0400, RichUlrich
<rich.ulrich at (no spam) comcast.net> wrote:

Quote:
[rayo, top-posting in response to my Reply ...]

On Tue, 22 Jul 2008 21:16:08 -0700, rayo at (no spam) home.NOT wrote:

Thanks,

These are not matched pairs. This is data from two separate years of a
large national survey.

I have seen a substantial change for a particular measure. Now I'm
teasing out which sub-populations changed the most.

And looking for the clearest way to portray it in a table.


I hope that is just a volunteering of extra information; I hope
that you do not mistake my advice as applying only to matched pairs.

So, if your fractions are all in the mid-range, .2 to .8, you can
settle for the simple, additive model. When some the fractions are
small, the multiplicative description is probably better. If some
fractions are high, then you do need to either use Odds Ratios
or consider the relative sizes of (1-P).

Rich,

I woke up this morning and got it. So .. I instruct my SAS geek to do
proc surveyfreq on two indepedent national surveys, with the variable
of interest being, say, "did X in past year" and split by gender.

I end up with 4 percentages, each with a variance. So I'm thinking the
odd ratio entails a simple pooled variance calculation using all four
variances and the combined sum of all the sample n's (unweighted).

Does that seem right to you?



BTW, I am looking back at my locked messages and I see that you have
helping me out several times over more than 3 years and I think for a
couple of years longer.

I have always tried to take my problems as far as I could on my own
before asking for help (I know you can be cranky with intellectual
laziness Wink). My own situation is a doctorate in public health but, in
the real world, I mostly do administrative work. So, when these things
come up, I have to dig down to my lizard brain to find ancient
statistical and epidemiological training.

In any case, you have been among the angels of the internet, and I am
very grateful for what you do.

Roy - Carpe Noctem
Stan Devia...
Posted: Fri Jul 25, 2008 3:56 pm
Guest
On Jul 22, 2:17 pm, r... at (no spam) home.NOT wrote:
Quote:
Let's say the Group A sample starts at 10%, and the Group B sample
starts at 15%. If each group improves by an absolute 5%, then they
went to 15% and 20%, respectively.

Or I could look at it on a relative basis, where Group A increased by
a relative 50% and Group B by a relative 33%.

Or I could extrapolate to the overall populations and look at the
simple numerical increases, except that Group A is much smaller than
Group B.

I can't quite figure out what is a truly fair pre-post comparison.

I can't use simple population numbers, because the population sizes
are too different.

I can't use percentages, because the 100% upper bound constrains the
population starting at the higher percentage. For example, a student
who always scores 95% on a test is less able to improve than a student
who scores 70% on a test.

How to frame this for meaningful comparison?

Roy - Carpe Noctem

Are you able to standardize the rates (either using the direct or
indirect method)? This may allow for a fairer comparison between the
population groups

http://www.avon.nhs.uk/phnet/PHinfo/understanding.htm#Standardised
RichUlrich...
Posted: Fri Jul 25, 2008 5:04 pm
Guest
On Thu, 24 Jul 2008 23:33:48 -0700, rayo at (no spam) home.NOT wrote:

Quote:
On Wed, 23 Jul 2008 16:36:14 -0400, RichUlrich
rich.ulrich at (no spam) comcast.net> wrote:

[rayo, top-posting in response to my Reply ...]

On Tue, 22 Jul 2008 21:16:08 -0700, rayo at (no spam) home.NOT wrote:

[snip]
Rich,

I woke up this morning and got it. So .. I instruct my SAS geek to do
proc surveyfreq on two indepedent national surveys, with the variable
of interest being, say, "did X in past year" and split by gender.

I end up with 4 percentages, each with a variance. So I'm thinking the
odd ratio entails a simple pooled variance calculation using all four
variances and the combined sum of all the sample n's (unweighted).

Does that seem right to you?


No, not at all. Odds ratios uses ODDS - not variances.

If the odds against Horse1 are 10 to 1, and the Odds against
Horse 2 are 5 to 1, the Odds, those two Odds are 10.0 and 5.0,
and the Odds ratio is 2.0. Or, in the opposite direction, the ratio
is 5/10 or .5 -- An Odds Ratio of 2.0 can be equivalent to 0.5, but
merely expressed in the opposite direction. With that tiiny warning,
the OR is robust against the choice of the direction (i.e., using P or
using (1-P)), as compared to the Relative Risk which is resembles for
small proportions.

Given a single Proportion for something being true, that P can be
stated as the Odds, Odds = P/(1-P). Then you compare to
Odds descriptively as an Odds Ratio, Odds1/Odds2. The Log
of the OR is what is frequently used in analyses.

If all your proportions are small, you might be happier dealing
with the Relative Risk for your comparisons.

[snip, some]
Quote:

In any case, you have been among the angels of the internet, and I am
very grateful for what you do.

Thanks.

--
Rich Ulrich
...
Posted: Fri Jul 25, 2008 8:32 pm
Guest
On Fri, 25 Jul 2008 18:04:18 -0400, RichUlrich
<rich.ulrich at (no spam) comcast.net> wrote:

Quote:
On Thu, 24 Jul 2008 23:33:48 -0700, rayo at (no spam) home.NOT wrote:

On Wed, 23 Jul 2008 16:36:14 -0400, RichUlrich
rich.ulrich at (no spam) comcast.net> wrote:

[rayo, top-posting in response to my Reply ...]

On Tue, 22 Jul 2008 21:16:08 -0700, rayo at (no spam) home.NOT wrote:

[snip]
Rich,

I woke up this morning and got it. So .. I instruct my SAS geek to do
proc surveyfreq on two indepedent national surveys, with the variable
of interest being, say, "did X in past year" and split by gender.

I end up with 4 percentages, each with a variance. So I'm thinking the
odd ratio entails a simple pooled variance calculation using all four
variances and the combined sum of all the sample n's (unweighted).

Does that seem right to you?


No, not at all. Odds ratios uses ODDS - not variances.

If the odds against Horse1 are 10 to 1, and the Odds against
Horse 2 are 5 to 1, the Odds, those two Odds are 10.0 and 5.0,
and the Odds ratio is 2.0. Or, in the opposite direction, the ratio
is 5/10 or .5 -- An Odds Ratio of 2.0 can be equivalent to 0.5, but
merely expressed in the opposite direction. With that tiiny warning,
the OR is robust against the choice of the direction (i.e., using P or
using (1-P)), as compared to the Relative Risk which is resembles for
small proportions.

Given a single Proportion for something being true, that P can be
stated as the Odds, Odds = P/(1-P). Then you compare to
Odds descriptively as an Odds Ratio, Odds1/Odds2. The Log
of the OR is what is frequently used in analyses.

If all your proportions are small, you might be happier dealing
with the Relative Risk for your comparisons.

I understand the cross-multiplication, but I still need a confidence
interval. From my ancient text, Taylor Series seems computationally
manageable for OR.

You may be correct about RR, particularly as it is more intuitive to
people not in the field - my main audience.

Roy - Carpe Noctem
...
Posted: Sat Jul 26, 2008 12:27 am
Guest
On Fri, 25 Jul 2008 18:56:05 -0700 (PDT), Stan Devia
<stan.devia at (no spam) gmail.com> wrote:

Quote:
On Jul 22, 2:17 pm, r... at (no spam) home.NOT wrote:
Let's say the Group A sample starts at 10%, and the Group B sample
starts at 15%. If each group improves by an absolute 5%, then they
went to 15% and 20%, respectively.

Or I could look at it on a relative basis, where Group A increased by
a relative 50% and Group B by a relative 33%.

Or I could extrapolate to the overall populations and look at the
simple numerical increases, except that Group A is much smaller than
Group B.

I can't quite figure out what is a truly fair pre-post comparison.

I can't use simple population numbers, because the population sizes
are too different.

I can't use percentages, because the 100% upper bound constrains the
population starting at the higher percentage. For example, a student
who always scores 95% on a test is less able to improve than a student
who scores 70% on a test.

How to frame this for meaningful comparison?

Roy - Carpe Noctem

Are you able to standardize the rates (either using the direct or
indirect method)? This may allow for a fairer comparison between the
population groups

http://www.avon.nhs.uk/phnet/PHinfo/understanding.htm#Standardised

I remember doing that many years ago for a study of cervical cancer
deaths where the local population was much younger than the national
population. It doesn't really apply in this case.

My SAS geek and I were fooling around with a national survey (many,
many subjects), calculating out various contrived variables, and we
noticed a dramatic shift for a particular measure from one year to the
next (most recent) year. We think we can tie it to a particular change
in public policy.

So we want to know which sub-sets of the population shifted the most -
gender, race/ethnicity, age, income, employment, a few other cuts.
Because the shift is believed to be time dependent, based on length of
exposure to the new policy, we plan to split the sample into calendar
quarters based on date of interview.

The particular measure we used uses a one year recall period, which I
might cut down to three months.

If the approach pans out, then we can document the effectiveness of
the policy change within a single year of passage. And we can possibly
estimate the new "angle of repose".

We have to compute this pronto before anybody else notices. Public
health data rarely turns on a dime.

Roy - Carpe Noctem
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sat Nov 22, 2008 7:03 pm