| |
 |
|
|
Science Forum Index » Space - Consult Forum » Testing for difference in proportions
Page 1 of 1
|
| Author |
Message |
| Steve Jones |
Posted: Wed Dec 27, 2006 5:17 am |
|
|
|
Guest
|
Hi all,
I have a dataset of about 1 million customers/records. The records are
grouped into 15 groups based on past behavior of the customers. Each
group has a bad rate (proportion of bads to the total number of
customers in the group). I need to test if the proportions are different
from each other. It makes business sense to test neighboring groups. The
groups are 'never late', late 1, 2, 3...14 times. I'll appreciate any
pointers on how to accomplish this. Thanks, SJ.
--
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG |
|
|
| Back to top |
|
| Adam |
Posted: Wed Dec 27, 2006 11:42 am |
|
|
|
Guest
|
"Steve Jones" <st_jones77@yahoo.com> wrote in message
news:81ddbc84357f0f17dacc976b5b6480db.43464@mygate.mailgate.org...
Quote: Hi all,
I have a dataset of about 1 million customers/records. The records are
grouped into 15 groups based on past behavior of the customers. Each
group has a bad rate (proportion of bads to the total number of
customers in the group). I need to test if the proportions are different
from each other. It makes business sense to test neighboring groups. The
groups are 'never late', late 1, 2, 3...14 times. I'll appreciate any
pointers on how to accomplish this. Thanks, SJ.
Hi Steve
Are you familiar with logistic regression? That would certainly accomplish
what you need, if I've understood correctly that each of your records has
the outcome either "bad" or "not bad".
You have a choice of ways to fit the model. You can either treat the number
of times late as a continuous variable ranging from 0 to 14, or treat the
group as a categorical variable with 15 categories. The advantage of
treating it as a categorical variable is that it makes fewer assumptions
about the relationship between "times late" and "bad", and the disadvantage
is that you lose some statistical power. However, if you have a million
records in your dataset, I'm guessing you're not going to be short of
statistical power, assuming that you have a reasonable proportion of "bad".
Does that help?
Adam |
|
|
| Back to top |
|
| Anon. |
Posted: Wed Dec 27, 2006 12:10 pm |
|
|
|
Guest
|
Steve Jones wrote:
Quote: Hi all,
I have a dataset of about 1 million customers/records. The records are
grouped into 15 groups based on past behavior of the customers. Each
group has a bad rate (proportion of bads to the total number of
customers in the group). I need to test if the proportions are different
from each other. It makes business sense to test neighboring groups. The
groups are 'never late', late 1, 2, 3...14 times. I'll appreciate any
pointers on how to accomplish this. Thanks, SJ.
Perhaps it's worth simplifying the problem by pointing out that with
that large a sample size, almost anything will be significant: the
standard error for the number of observations in a group is about 130
(assuming p=0.5, and equal numbers in each group), or 0.2%.
Would it be enough to simply plot the proportions, with standard errors
(=sqrt(p(1-p)/N)? That might tell you all you need to know.
Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Fri Aug 29, 2008 2:57 pm
|
|