| |
 |
|
|
Science Forum Index » Statistics - Education Forum » Binomial dta: how to handle don't-cares?
Page 1 of 2 Goto page 1, 2 Next
|
| Author |
Message |
| Stan Brown |
Posted: Mon Feb 05, 2007 5:52 pm |
|
|
|
Guest
|
Greetings. I'm embarrassed to ask this, but I'm more embarrassed at
not knowing the answer:
Survey taken: 1366 mailed out
Responses received: 380
119 "yes"
29 neutral
232 "no"
In case you're wondering, it was about a proposed sewer system in my
town, where people currently have septic and density is low. The
estimated cost is about $150 a month per household in taxes, plus
monthly sewer fees, plus $3-5K to connect. No wonder the people are
opposed, and no wonder the response rate was so high.
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/ |
|
|
| Back to top |
|
| Bruce Weaver |
Posted: Tue Feb 06, 2007 8:55 am |
|
|
|
Guest
|
Stan Brown wrote:
Quote: Greetings. I'm embarrassed to ask this, but I'm more embarrassed at
not knowing the answer:
Survey taken: 1366 mailed out
Responses received: 380
119 "yes"
29 neutral
232 "no"
In case you're wondering, it was about a proposed sewer system in my
town, where people currently have septic and density is low. The
estimated cost is about $150 a month per household in taxes, plus
monthly sewer fees, plus $3-5K to connect. No wonder the people are
opposed, and no wonder the response rate was so high.
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
If you used the usual z-test for comparing two independent proportions,
squaring your z gives the test statistic for a chi-square goodness of
fit test with 2 categories. And of course, chi-square GOF tests can
have more than two categories (df = # of categories minus 1). But I
seriously doubt you are interested in testing the null hypothesis that
Yes, No, and Neutral are all equally likely in the population. Nor is
it clear to me how you'd come up with a null hypothesis that specifies
different proportions for the 3 categories. So a simple comparison of
Yes and No may well be the appropriate thing to do here (plus reporting
the number & percentage of Neutrals).
If you were looking for associations between Yes/No/Neutral and some
other categorical variable (with chi-square test of association), then
it would be a different story. Assuming the expected counts were high
enough, you could include the Neutral category; and you could decompose
the overall chi-square into orthogonal components to address more
specific questions.
--
Bruce Weaver
bweaver@lakeheadu.ca
www.angelfire.com/wv/bwhomedir |
|
|
| Back to top |
|
| Old Mac User |
Posted: Tue Feb 06, 2007 8:20 pm |
|
|
|
Guest
|
This is a trinomial, not a binomial. There is a formal and accurate
way to treat those data. I'll do that tomorrow morning and will post
the solution here. This turns into a graphic... a confidence
envelope. I'll post the appropriate numbers and will explain how to
draw the graphic since that's not feasible using this form of
communication. OMU
On Feb 6, 7:55 am, Bruce Weaver <bwea...@lakeheadu.ca> wrote:
Quote: Stan Brown wrote:
Greetings. I'm embarrassed to ask this, but I'm more embarrassed at
not knowing the answer:
Survey taken: 1366 mailed out
Responses received: 380
119 "yes"
29 neutral
232 "no"
In case you're wondering, it was about a proposed sewer system in my
town, where people currently have septic and density is low. The
estimated cost is about $150 a month per household in taxes, plus
monthly sewer fees, plus $3-5K to connect. No wonder the people are
opposed, and no wonder the response rate was so high.
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
If you used the usual z-test for comparing two independent proportions,
squaring your z gives the test statistic for a chi-square goodness of
fit test with 2 categories. And of course, chi-square GOF tests can
have more than two categories (df = # of categories minus 1). But I
seriously doubt you are interested in testing the null hypothesis that
Yes, No, and Neutral are all equally likely in the population. Nor is
it clear to me how you'd come up with a null hypothesis that specifies
different proportions for the 3 categories. So a simple comparison of
Yes and No may well be the appropriate thing to do here (plus reporting
the number & percentage of Neutrals).
If you were looking for associations between Yes/No/Neutral and some
other categorical variable (with chi-square test of association), then
it would be a different story. Assuming the expected counts were high
enough, you could include the Neutral category; and you could decompose
the overall chi-square into orthogonal components to address more
specific questions.
--
Bruce Weaver
bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir- Hide quoted text -
- Show quoted text - |
|
|
| Back to top |
|
| Richard Ulrich |
Posted: Wed Feb 07, 2007 12:41 am |
|
|
|
Guest
|
On Mon, 5 Feb 2007 16:52:33 -0500, Stan Brown
<the_stan_brown@fastmail.fm> wrote:
Quote: Greetings. I'm embarrassed to ask this, but I'm more embarrassed at
not knowing the answer:
Survey taken: 1366 mailed out
Responses received: 380
119 "yes"
29 neutral
232 "no"
In case you're wondering, it was about a proposed sewer system in my
town, where people currently have septic and density is low. The
estimated cost is about $150 a month per household in taxes, plus
monthly sewer fees, plus $3-5K to connect. No wonder the people are
opposed, and no wonder the response rate was so high.
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
Okay, there is a preference for No, regardless of what
you do with "neutral." The presentation is more a matter
of politics and of sense, than of statistics.
Who has been campaigning how strongly, for what?
- Is this a 'random' sample, or was there any chance that
one side is using the survey as a tool?
Was this question the whole content of the survey?
Definitely, state the full results - all three categories.
The ethics of survey-reporting says that you must be explicit
about the context, the content, the questions, etc.
--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| mcap |
Posted: Wed Feb 07, 2007 1:14 am |
|
|
|
Guest
|
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
Quote: value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
Regardless of what your null hypothesis is - focus on your research
question. What are you trying to find out, in plain language. That
will dictate how you handle your data. You could compare proportions
or do other tests but as was previously stated, a simple listing of
percentages may be all you need.
Although your response rate was decent as far as surveys go, it was
still substantially below 50%. You must factor in the possibility
that those were opposed were more likely to respond than those who did
not.
Marc |
|
|
| Back to top |
|
| Stan Brown |
Posted: Thu Feb 08, 2007 8:08 am |
|
|
|
Guest
|
Tue, 06 Feb 2007 07:55:31 -0500 from Bruce Weaver
<bweaver@lakeheadu.ca>:
Quote: On Mon, 5 Feb 2007 16:52:33 -0500, Stan Brown
Survey taken: 1366 mailed out (proposed sewer system)
Responses received: 380
119 "yes"
29 neutral
232 "no"
If you used the usual z-test for comparing two independent proportions,
I'm sorry, I don't understand. Which two independent proportions do I
have here? Unless I'm missing something, the yeses, noes, and
neutrals are mutually dependent.
Quote: squaring your z gives the test statistic for a chi-square goodness of
fit test with 2 categories. And of course, chi-square GOF tests can
have more than two categories (df = # of categories minus 1). But I
seriously doubt you are interested in testing the null hypothesis that
Yes, No, and Neutral are all equally likely in the population.
And you are right to doubt that! :-)
I should make it clear that I had no part in the planning of this
survey. I'm a homeowner in the proposed district, and I filled out an
mailed in my survey, but have no other association. I'm trying to
interpret the results that were published.
(I can't imagine why anyone, let alone 29 persons, would take the
trouble to address and stamp an envelope to mail in a survey saying
"I don't care," but they did.)
The practical question is whether the sewer should be built, and that
in turn should depend on whether a majority are in favor. Ideally,
I'd like to show that there's no need to waste more town resources on
further planning, since it's virtually certain a majority of the town
is opposed. Can I do that, from the available data?
I suppose one null could be "p >= .5, 50% or more are in favor."
Using that null, I could compute p' three different ways:
** 119/(380-29), counting neutrals as non-respondents
** 119/380, counting neutrals as "no"
** (119+29)/380, counting neutrals as "yes"
The second and third seem clearly wrong to me, and I really wonder
about even the first.
The percentages are pretty compelling: 31% in favor, 8% neutral, 61%
opposed, n = 380. But surely there's something useful I can say about
the feelings of the population?
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/ |
|
|
| Back to top |
|
| Stan Brown |
Posted: Thu Feb 08, 2007 8:08 am |
|
|
|
Guest
|
Tue, 06 Feb 2007 23:41:48 -0500 from Richard Ulrich
<Rich.Ulrich@comcast.net>:
Quote: On Mon, 5 Feb 2007 16:52:33 -0500, Stan Brown
the_stan_brown@fastmail.fm> wrote:
Survey taken: 1366 mailed out (proposed sewer system)
Responses received: 380
119 "yes"
29 neutral
232 "no"
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
Okay, there is a preference for No, regardless of what
you do with "neutral." The presentation is more a matter
of politics and of sense, than of statistics.
Who has been campaigning how strongly, for what?
There hasn't been much of a campaign. The town board has been
debating this off and on since before I bought my house last summer.
I think developers probably want it so they can build with higher
density; much of the town is rural and the rest is low-density
suburban. We're 10-20 minutes outside of Ithaca.
Quote: - Is this a 'random' sample, or was there any chance that
one side is using the survey as a tool?
There's a chance, but AFAIK a copy was sent to every homeowner in the
proposed district. Reading my copy, I couldn't tell whether the town
supervisor, who prepared it, was hoping for yeses or noes.
It was a one-page summary, including state grant figures and cost
figures, with the survey at the bottom to be mailed in. (No stamp was
provided, making the response rate even more amazing.)
Quote: Was this question the whole content of the survey?
Yes,
"My opinion:
"___ I support the proposal
"___ I am neutral toward the proposal
"___ I oppose the proposal."
Quote: Definitely, state the full results - all three categories.
Right. I guess I should have made it clear, I have no connection with
this other than as a homeowner who stands to see a $100-a-month rise
in my tax bill *and* the privilege of making expensive connections
and then paying additional user fees.
Quote: The ethics of survey-reporting says that you must be explicit
about the context, the content, the questions, etc.
I agree with you that of course the full numbers should be prevented;
my question was about proper drawing of conclusions.
It seems obvious that opinion is quite strongly "no", but I'm looking
at how to frame a proper hypothesis test and p-value.
Alternatively, maybe I should make it a 95% confidence interval. Do I
calculate a binomial CI from a sample size of 380-29, excluding the
neutrals as though they had not responded? Or does the three-way
nature of the question mean that I can't analyze it in a binomial
manner and have to do some sort of Chi-squared?
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/ |
|
|
| Back to top |
|
| Old Mac User |
Posted: Thu Feb 08, 2007 1:09 pm |
|
|
|
Guest
|
Several people have given good advice including "the numbers are what
they are and should be reported as such". This is politics more than
science. In polls of this sort there are always issues about "was
this a random sample". Meaning... were those who responded a
representative sample of the entire population of those who could have
responded? (People who have an issue are more likely to respond,
etc.)
Recognizing all of this and more, two days ago I promised I would post
a "95% confidence envelope". That was supposed to happen yesterday
morning. A balky furnace and a "funny-acting" hot water heater
distracted me... especially since we are experiencing what surely is a
case of global cooling combined with snow this week. Both systems are
fixed, so here we go.
Political statement: About 30 years ago Global Cooling was "the big
concern". Now it's "Global Warming". Here's a big thank you to all
those who bought SUVs and built larger homes and in doing so saved us
from crop failures in the midwest.
So out of a population of 1366 who had an opportunity to vote, 119
said Yes, 232 said No, and 29 were neutral. With all of the
aforementioned concerns, the task here is to (with caution) use data
from this responding sample of 380 to infer something about the entire
population of 1366.
What begins as a simple "binomial" Yes or No usually turns into a
trinomial Yes, No, Indifferent outcome. This is always a source of
concern and frustration. I'm going to post a "trinomial analysis" here
for better or worse.
Before I do this, there is still one more caution. This "answer" is
based on an assumption that must be documented right here. That
assumption is...
"the total number of responders (380) is 'small' relative to the total
population (1366). Or, if you prefer... the population is
'infinite'". This same flawed assumption in imbedded in a "binomial
analysis" of this sort unless we are willing to deal with a
hypergeometric distribution.
Here are some calculations done with software designed to address the
matter of Yes/No/Indifferent (trinomial). the outcome of this
analysis if a 95% confidence envelope. Well, almost 95%. I'll explain
that later.
The next eight rows document the data and the setup for analysis.
N No Yes
380 232 119
FractNo 0.611
FractYes 0.313
DP and DQ 0.001 0.001
Target prob contour = 0.950
Actual prob contour = 0.935
Number of rows in the table = 104
The following table shows the approx. 95% confidence envelope.
To see this, set up coordinates on ordinary rectangular graph paper.
Botjh scales range from 0 to 1 and represent the fraction favoring and
not favoring the proposal.
The vertical axis is for "Yes" and the horizontal axis is for "No".
Now go to the vertical or "Yes" axis = 0.313 and the horizontal axis
or No = 0.567. Put a small circle there.
Then go to "Yes" = 0.313 and No. = 0.643. Put a small circle there.
Do the same for each value of "Yes" (on the vertical scale) in the
table.
Skip some of them to taste... there are more here than you need.)
Notice that for each value of Yes there are two values of No. The
column "Diff" is the width of the ellipsoid for each value "Yes" value
on the vertical axix.
On completion of this graphic you have the (approx.) 95% confidence
envelope. "The prob is 0.935 that this envelope encompasses the
'true' value of the fraction of Yes and the fraction of No in the
total population." (Again, we are using a classic trinomial here when
a hypergeometric would be more accurate... it's the "sample is small
relative to the population" thing.)
The top "half" of the ellipsoid is completed at Row 39. The bottom
"half" begins at Row 40.
Notice that the cited confidence envelope is actually for 93.46%, not
95.00%.
I'll try to explain this later. But it's close enough for this
discussion.
Does the (approx.) 95% confidence envelope encompass Yes/No =
0.50/0.50? No. By inspection, examine Row 39...
39 0.351 0.543 0.603 0.060 0.00459 0.44806 <--
The the top of the ellipsoid Yes = 0.351 and No is 0.543 and 0.603
0.351 is a long way from 0.500.
This ellipsoid is very elongated (consistently narrow) because there
were so few (small fraction of) Neutral responders. If there had
been, for instance, 150 Neutral responders then the ellipsoid would be
much more circular.
This is surely a case of using high precision software to calculate a
low precision confidence envelope. I say "low precision" because of
the reasons cited by others combined with using an "exact trinomial
calculation" where a hypergeometric would be more proper.
This software was written to deal with a similar but slightly
different type of data. That is, a "random sample of" N people are
presented with two similar products (call them A and B) and asked
which of those they favor. The answers must be "favor A, favor B, or
No Preference". From this we want to calculate a confidence envelope
on the entire population of "people who would have an interest in this
type of product". The actual calculations are complex and slow even
on the fastest desktop computers. There is an element of "trial and
error" in converging on the 95% confidence envelope so I stopped it a
bit short and settled on the 93.5% confidence envelope.
This same software is also used for another trinomial situation in
which we compare products A, B, and C and the testers must pick one of
these... no "No Preferences" allowed.
The last column in the table is the accumulated probability...
totalling 0.93455. The last column is there just in case it's needed
for another purpose.
This is a long and somewhat complicated post. If you find obvious
typos or other things that need help, please let me know.
Be of good cheer... OMU
Row Yes No1 No2 Diff Prob CProb
1 0.313 0.567 0.643 0.076 0.01676 0.01676
2 0.314 0.567 0.642 0.075 0.01671 0.03347
3 0.315 0.566 0.641 0.075 0.01664 0.05012
4 0.316 0.565 0.640 0.075 0.01655 0.06666
5 0.317 0.564 0.640 0.076 0.01643 0.08310
6 0.318 0.563 0.639 0.076 0.01628 0.09938
7 0.319 0.563 0.638 0.075 0.01609 0.11547
8 0.320 0.562 0.637 0.075 0.01589 0.13136
9 0.321 0.561 0.636 0.075 0.01566 0.14702
10 0.322 0.560 0.635 0.075 0.01541 0.16243
11 0.323 0.560 0.634 0.074 0.01513 0.17755
12 0.324 0.559 0.633 0.074 0.01483 0.19239
13 0.325 0.558 0.632 0.074 0.01452 0.20691
14 0.326 0.558 0.631 0.073 0.01418 0.22109
15 0.327 0.557 0.630 0.073 0.01383 0.23492
16 0.328 0.556 0.629 0.073 0.01347 0.24839
17 0.329 0.556 0.628 0.072 0.01309 0.26149
18 0.330 0.555 0.627 0.072 0.01271 0.27419
19 0.331 0.554 0.626 0.072 0.01231 0.28651
20 0.332 0.554 0.625 0.071 0.01190 0.29841
21 0.333 0.553 0.623 0.070 0.01149 0.30990
22 0.334 0.552 0.622 0.070 0.01108 0.32098
23 0.335 0.552 0.621 0.069 0.01066 0.33164
24 0.336 0.551 0.620 0.069 0.01024 0.34188
25 0.337 0.551 0.619 0.068 0.00982 0.35170
26 0.338 0.550 0.618 0.068 0.00940 0.36110
27 0.339 0.549 0.617 0.068 0.00899 0.37009
28 0.340 0.549 0.616 0.067 0.00858 0.37867
29 0.341 0.548 0.615 0.067 0.00818 0.38685
30 0.342 0.548 0.614 0.066 0.00777 0.39462
31 0.343 0.547 0.613 0.066 0.00739 0.40201
32 0.344 0.547 0.612 0.065 0.00700 0.40901
33 0.345 0.546 0.610 0.064 0.00662 0.41563
34 0.346 0.546 0.609 0.063 0.00625 0.42188
35 0.347 0.545 0.608 0.063 0.00590 0.42778
36 0.348 0.545 0.607 0.062 0.00556 0.43334
37 0.349 0.544 0.606 0.062 0.00523 0.43856
38 0.350 0.544 0.605 0.061 0.00490 0.44347
39 0.351 0.543 0.603 0.060 0.00459 0.44806 <--
40 0.312 0.568 0.644 0.076 0.01677 0.46483
41 0.311 0.569 0.645 0.076 0.01675 0.48158
42 0.310 0.570 0.646 0.076 0.01670 0.49827
43 0.309 0.571 0.647 0.076 0.01662 0.51489
44 0.308 0.572 0.648 0.076 0.01651 0.53139
45 0.307 0.573 0.649 0.076 0.01637 0.54776
46 0.306 0.573 0.650 0.077 0.01621 0.56397
47 0.305 0.574 0.651 0.077 0.01601 0.57998
48 0.304 0.575 0.652 0.077 0.01579 0.59577
49 0.303 0.576 0.652 0.076 0.01554 0.61131
50 0.302 0.577 0.653 0.076 0.01527 0.62659
51 0.301 0.578 0.654 0.076 0.01498 0.64156
52 0.300 0.579 0.655 0.076 0.01466 0.65623
53 0.299 0.580 0.656 0.076 0.01433 0.67055
54 0.298 0.581 0.657 0.076 0.01398 0.68453
55 0.297 0.582 0.658 0.076 0.01361 0.69814
56 0.296 0.583 0.658 0.075 0.01321 0.71135
57 0.295 0.584 0.659 0.075 0.01282 0.72417
58 0.294 0.585 0.660 0.075 0.01241 0.73657
59 0.293 0.586 0.661 0.075 0.01199 0.74856
60 0.292 0.587 0.662 0.075 0.01156 0.76012
61 0.291 0.588 0.662 0.074 0.01112 0.77124
62 0.290 0.589 0.663 0.074 0.01068 0.78192
63 0.289 0.591 0.664 0.073 0.01023 0.79215
64 0.288 0.592 0.665 0.073 0.00979 0.80194
65 0.287 0.593 0.665 0.072 0.00935 0.81129
66 0.286 0.594 0.666 0.072 0.00891 0.82020
67 0.285 0.595 0.667 0.072 0.00847 0.82867
68 0.284 0.596 0.668 0.072 0.00805 0.83672
69 0.283 0.598 0.668 0.070 0.00761 0.84432
70 0.282 0.599 0.669 0.070 0.00719 0.85152
71 0.281 0.600 0.670 0.070 0.00679 0.85831
72 0.280 0.601 0.670 0.069 0.00639 0.86470
73 0.279 0.603 0.671 0.068 0.00600 0.87069
74 0.278 0.604 0.672 0.068 0.00562 0.87632
75 0.277 0.605 0.672 0.067 0.00526 0.88158
76 0.276 0.606 0.673 0.067 0.00491 0.88649
77 0.275 0.608 0.674 0.066 0.00457 0.89106
78 0.274 0.609 0.674 0.065 0.00425 0.89531
79 0.273 0.611 0.675 0.064 0.00393 0.89924
80 0.272 0.612 0.675 0.063 0.00364 0.90288
81 0.271 0.613 0.676 0.063 0.00336 0.90624
82 0.270 0.615 0.676 0.061 0.00309 0.90932
83 0.269 0.616 0.677 0.061 0.00284 0.91217
84 0.268 0.618 0.677 0.059 0.00259 0.91475
85 0.267 0.620 0.678 0.058 0.00236 0.91711
86 0.266 0.621 0.678 0.057 0.00215 0.91927
87 0.265 0.623 0.679 0.056 0.00196 0.92122
88 0.264 0.624 0.679 0.055 0.00177 0.92299
89 0.263 0.626 0.679 0.053 0.00160 0.92459
90 0.262 0.628 0.680 0.052 0.00144 0.92603
91 0.261 0.630 0.680 0.050 0.00128 0.92731
92 0.260 0.632 0.680 0.048 0.00114 0.92845
93 0.259 0.633 0.680 0.047 0.00102 0.92947
94 0.258 0.635 0.681 0.046 0.00091 0.93038
95 0.257 0.637 0.681 0.044 0.00080 0.93118
96 0.256 0.640 0.681 0.041 0.00069 0.93188
97 0.255 0.642 0.681 0.039 0.00060 0.93248
98 0.254 0.644 0.680 0.036 0.00051 0.93299
99 0.253 0.647 0.680 0.033 0.00043 0.93342
100 0.252 0.649 0.680 0.031 0.00037 0.93379
101 0.251 0.652 0.679 0.027 0.00029 0.93408
102 0.250 0.655 0.678 0.023 0.00023 0.93431
103 0.249 0.659 0.676 0.017 0.00016 0.93447
104 0.248 0.664 0.674 0.010 0.00008 0.93455
On Feb 8, 7:08 am, Stan Brown <the_stan_br...@fastmail.fm> wrote:
Quote: Tue, 06 Feb 2007 23:41:48 -0500 from Richard Ulrich
Rich.Ulr...@comcast.net>:
On Mon, 5 Feb 2007 16:52:33 -0500, Stan Brown
the_stan_br...@fastmail.fm> wrote:
Survey taken: 1366 mailed out (proposed sewer system)
Responses received: 380
119 "yes"
29 neutral
232 "no"
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
Okay, there is a preference for No, regardless of what
you do with "neutral." The presentation is more a matter
of politics and of sense, than of statistics.
Who has been campaigning how strongly, for what?
There hasn't been much of a campaign. The town board has been
debating this off and on since before I bought my house last summer.
I think developers probably want it so they can build with higher
density; much of the town is rural and the rest is low-density
suburban. We're 10-20 minutes outside of Ithaca.
- Is this a 'random' sample, or was there any chance that
one side is using the survey as a tool?
There's a chance, but AFAIK a copy was sent to every homeowner in the
proposed district. Reading my copy, I couldn't tell whether the town
supervisor, who prepared it, was hoping for yeses or noes.
It was a one-page summary, including state grant figures and cost
figures, with the survey at the bottom to be mailed in. (No stamp was
provided, making the response rate even more amazing.)
Was this question the whole content of the survey?
Yes,
"My opinion:
"___ I support the proposal
"___ I am neutral toward the proposal
"___ I oppose the proposal."
Definitely, state the full results - all three categories.
Right. I guess I should have made it clear, I have no connection with
this other than as a homeowner who stands to see a $100-a-month rise
in my tax bill *and* the privilege of making expensive connections
and then paying additional user fees.
The ethics of survey-reporting says that you must be explicit
about the context, the content, the questions, etc.
I agree with you that of course the full numbers should be prevented;
my question was about proper drawing of conclusions.
It seems obvious that opinion is quite strongly "no", but I'm looking
at how to frame a proper hypothesis test and p-value.
Alternatively, maybe I should make it a 95% confidence interval. Do I
calculate a binomial CI from a sample size of 380-29, excluding the
neutrals as though they had not responded? Or does the three-way
nature of the question mean that I can't analyze it in a binomial
manner and have to do some sort of Chi-squared?
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/- Hide quoted text -
- Show quoted text - |
|
|
| Back to top |
|
| Richard Ulrich |
Posted: Sat Feb 10, 2007 1:40 am |
|
|
|
Guest
|
On Thu, 8 Feb 2007 07:08:48 -0500, Stan Brown
<the_stan_brown@fastmail.fm> wrote:
[snip, various]
Quote:
I agree with you that of course the full numbers should be prevented;
my question was about proper drawing of conclusions.
It seems obvious that opinion is quite strongly "no", but I'm looking
at how to frame a proper hypothesis test and p-value.
"Even when you lump the Yes with (the relatively small number that
are) Neutral, there is a clear majority for No." That would seem to
cover it.
The binomial CI is then appropriate.
But the informal conditions of the survey suggest that the
statistical test is not thoroughly robust. It could be interesting,
say, if the "early returns" could be compared to "late returns"
or if several sources could be compared. Ordinary political
surveys gain credibility by citing previous surveys, and citing
other surveys.
Quote:
Alternatively, maybe I should make it a 95% confidence interval. Do I
calculate a binomial CI from a sample size of 380-29, excluding the
neutrals as though they had not responded? Or does the three-way
nature of the question mean that I can't analyze it in a binomial
manner and have to do some sort of Chi-squared?
--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| John M. |
Posted: Sat Feb 10, 2007 6:50 am |
|
|
|
Guest
|
On 7 Feb, 07:14, "mcap" <mca...@hotmail.com> wrote:
Quote: On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
Regardless of what your null hypothesis is - focus on your research
question. What are you trying to find out, in plain language. That
will dictate how you handle your data. You could compare proportions
or do other tests but as was previously stated, a simple listing of
percentages may be all you need.
Although your response rate was decent as far as surveys go, it was
still substantially below 50%. You must factor in the possibility
that those were opposed were more likely to respond than those who did
not.
Why? This would only be true if it could be established that the
survey originated from the "Yes" camp. If the reverse were true, then
those desiring the scheme would feel greater necessity to reply.
The method is flawed once the number replying falls below a threshold.
The threshold might be quantifiable by a skilled statistician. The
best approach is to consult a statistician before beginning the task. |
|
|
| Back to top |
|
| illywhacker |
Posted: Wed Feb 14, 2007 10:21 am |
|
|
|
Guest
|
On Feb 10, 11:50 am, "John M." <john_howard_mor...@hotmail.co.uk>
wrote:
Quote: On 7 Feb, 07:14, "mcap" <mca...@hotmail.com> wrote:
Although your response rate was decent as far as surveys go, it was
still substantially below 50%. You must factor in the possibility
that those were opposed were more likely to respond than those who did
not.
Why? This would only be true if it could be established that the
survey originated from the "Yes" camp. If the reverse were true, then
those desiring the scheme would feel greater necessity to reply.
I think you may have missed the point. If the goal is to infer
frequencies of opinions in the population from a self-selecting sample
such as this survey generates, then one has to model the influence
that opinion has on probability to respond, even if the model chosen
is that it has no influence.
On the other hand, maybe you have not missed the point, but instead
are proposing a model for this influence. But I think you would be
pretty hard pressed to argue that this is the only reasonable model.
It is easy to imagine that 'no' opinions are so incensed at the idea
of being made to pay all that money that all of them reply, whereas
'yes' respondents cannot be bothered.
illywhacker; |
|
|
| Back to top |
|
| illywhacker |
Posted: Thu Feb 15, 2007 5:23 am |
|
|
|
Guest
|
On Feb 14, 7:30 pm, "John M." <john_howard_mor...@hotmail.co.uk>
wrote:
Quote: On 14 Feb, 16:21, "illywhacker" <illywac...@gmail.com> wrote:
On Feb 10, 11:50 am, "John M." <john_howard_mor...@hotmail.co.uk
wrote:
If the goal is to infer
frequencies of opinions in the population from a self-selecting sample
such as this survey generates, then one has to model the influence
that opinion has on probability to respond, even if the model chosen
is that it has no influence.
I think I see your point here. The desired model would have to be
created by surveying responsiveness conditional on the level of
opinion, wouldn't it?
Right! One would have to have another survey before the real
survey .
Quote: Being an amateur in Ststs, I wouln't propose anything. I'd leave it to
a pro.
Perhaps you are an amateur in stats, but since this is social
anthropology or something, go for it .
illywhacker; |
|
|
| Back to top |
|
| Stan Brown |
Posted: Fri Feb 16, 2007 8:41 am |
|
|
|
Guest
|
8 Feb 2007 09:09:28 -0800 from Old Mac User
<chendrixstats@yahoo.com>:
Quote: Several people have given good advice including "the numbers are what
they are and should be reported as such". This is politics more than
A belated thanks to you and others who answered. I've been trying to
digest the replies, and also to educate myself on this sort of
trinomial.
Can someone suggest good search terms, or a good print reference, on
analyzing yes/no/don't-care responses? I tried "trinomial", but as
you might expect I was overwhelmed with the polynomial type.
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/ |
|
|
| Back to top |
|
| John M. |
Posted: Fri Feb 16, 2007 9:06 am |
|
|
|
Guest
|
On 16 Feb, 14:41, Stan Brown <the_stan_br...@fastmail.fm> wrote:
Quote: 8 Feb 2007 09:09:28 -0800 from Old Mac User
chendrixst...@yahoo.com>:
Can someone suggest good search terms, or a good print reference, on
analyzing yes/no/don't-care responses? I tried "trinomial", but as
you might expect I was overwhelmed with the polynomial type.
Looking at this thread I can't help but feel you would be better to
let the raw data speak, perhaps with some commentary on the pitfalls
of this kind of survey. Every method of analysis will draw down
crticism because the structure of the survey is fatally flawed. |
|
|
| Back to top |
|
| illywhacker |
Posted: Fri Feb 16, 2007 10:08 am |
|
|
|
Guest
|
On Feb 16, 1:41 pm, Stan Brown <the_stan_br...@fastmail.fm> wrote:
Quote:
Can someone suggest good search terms, or a good print reference, on
analyzing yes/no/don't-care responses? I tried "trinomial", but as
you might expect I was overwhelmed with the polynomial type.
I agree with John M.'s post of Feb 16. There is no way a serious
analysis of this data can be done with the information available. If
you are honest about the assumptions involved in any analysis you may
do, those assumptions will always be questioned, rightly, because they
will be very questionable. You might be able to blind your audience
with talk of trinomial distributions, but that is not usually very
convincing in the long run.
By the way, one of the problems with the types of ad hoc statistical
analysis that are being suggested to you is that the assumptions being
made are never made evident. A properly founded Bayesian analysis
would avoid this problem. So I should correct my second sentence of
this post. There is a way a serious analysis could be done, but its
conclusion would be that you can infer almost nothing of any certainty
about the beliefs of the population.
illywhacker; |
|
|
| Back to top |
|
| |
Page 1 of 2 Goto page 1, 2 Next
All times are GMT - 5 Hours
The time now is Tue Dec 02, 2008 12:34 am
|
|