Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Education Forum  »  Sample size - How big should it be?
Page 1 of 1    
Author Message
Rob
Posted: Sat Jan 26, 2008 7:17 am
Guest
How do you decide how big your sample should be?

Ok I have k = N/n

What next?
David Winsemius
Posted: Sat Jan 26, 2008 6:50 pm
Guest
Rob <user@example.com> wrote in news:TEEmj.7232$421.4228@news-
server.bigpond.net.au:

Quote:
How do you decide how big your sample should be?

Ok I have k = N/n

What next?


You need to let us know what you plan to do with those three numbers.

--
DW
Rob
Posted: Sun Jan 27, 2008 4:41 am
Guest
David Winsemius wrote:
Quote:
Rob <user@example.com> wrote in news:TEEmj.7232$421.4228@news-
server.bigpond.net.au:

How do you decide how big your sample should be?

Ok I have k = N/n

What next?


You need to let us know what you plan to do with those three numbers.


Thank you for replying.

OK lets say I was doing an employee survey total population 1000 (N)

Using a random number generator I pick a sample of (n)? employees to
survey.

How do I work out what size my sample I would need to get a
representative sample? What is considered to be significant?

While I have you - if I don't use random sample ie. I invite everybody
in the company to take part in an anonymous online survey and only take
the people that reply. How do I determine if my results are meaningful?

I mean if 1% reply I would say the result do not mean much. If 100%
reply then I could be confident of the response. How do I determine
what size sample is big enough.
Stan Brown
Posted: Sun Jan 27, 2008 8:42 am
Guest
Sun, 27 Jan 2008 08:41:02 GMT from Rob <user@example.com>:
Quote:
Rob <user@example.com> wrote in news:TEEmj.7232$421.4228@news-
server.bigpond.net.au:

How do you decide how big your sample should be?

Ok I have k = N/n

OK lets say I was doing an employee survey total population 1000 (N)

Using a random number generator I pick a sample of (n)? employees to
survey.

Still not sure what k is meant to be.

Quote:
How do I work out what size my sample I would need to get a
representative sample? What is considered to be significant?

Is this course work? If so, it looks like right now you're in the
sampling-and-descriptive-statistics portion of the course, and your
question will be answered in the inferential-statistics portion.

Briefly, "representative" and "significant" are matters of degree.
You have to decide up front what an acceptable significance level is.
(5% and 1% and 0.1% are common choices, but not the only ones.) Or if
you're trying to estimate a quantity, you decide up front what margin
of error you can accept and what level of confidence you need. Either
way, once those decisions are made then there's a formula that tells
you how big a sample you need.

(The formulas also require you to know something about your
population -- often this is obtained through a small pilot study.)

Quote:
While I have you - if I don't use random sample ie. I invite everybody
in the company to take part in an anonymous online survey and only take
the people that reply. How do I determine if my results are meaningful?

That one's easy: they're not. All the statistical techniques you will
learn depend on having a good random sample. If you have a self-
selected sample, there's an excellent chance they are not
representative of the population.

Quote:
I mean if 1% reply I would say the result do not mean much.

Not necessarily. Presidential polls typically sample about 1000
voters, which is well under 1% of all voters, yet they are accurate
to within ±3%. When testing yes/no data, what matters is how many are
in the sample and not what percent of the population is in the
sample.

Quote:
If 100% reply then I could be confident of the response. How do I
determine what size sample is big enough.

Tell us more about what you're trying to measure, and what
significance level or confidence level you're shooting for.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
"If there's one thing I know, it's men. I ought to: it's
been my life work." -- Marie Dressler, in /Dinner at Eight/
Rob
Posted: Mon Jan 28, 2008 1:26 am
Guest
Stan Brown wrote:
Quote:
Sun, 27 Jan 2008 08:41:02 GMT from Rob <user@example.com>:
Rob <user@example.com> wrote in news:TEEmj.7232$421.4228@news-
server.bigpond.net.au:

How do you decide how big your sample should be?

Ok I have k = N/n
OK lets say I was doing an employee survey total population 1000 (N)

Using a random number generator I pick a sample of (n)? employees to
survey.

Still not sure what k is meant to be.

I got my sampling mixed I was describing the Systematic Sample N=64 n=8
k=8 You randomly select the first person sampled and then pick every
8th person after that.

A sample based on a random number generator I suspect would produce a
better sample.

Quote:

How do I work out what size my sample I would need to get a
representative sample? What is considered to be significant?

Is this course work?


No, unfortunately, I am working from a good book (Statistics for
Managers using Microsoft Excel) trying to build on some very basic
statistics from a Biology degree many moons ago.

Quote:
If so, it looks like right now you're in the
sampling-and-descriptive-statistics portion of the course, and your
question will be answered in the inferential-statistics portion.

Briefly, "representative" and "significant" are matters of degree.
You have to decide up front what an acceptable significance level is.
(5% and 1% and 0.1% are common choices, but not the only ones.) Or if
you're trying to estimate a quantity, you decide up front what margin
of error you can accept and what level of confidence you need. Either
way, once those decisions are made then there's a formula that tells
you how big a sample you need.

For a simple employee survey what would you think is reasonable?

Quote:

(The formulas also require you to know something about your
population -- often this is obtained through a small pilot study.)


Thanks for that. It is so obvious now! If the standard deviation is
high you would need a larger sample.

You can try and predict what it the standard deviation going to be but
you really have to wait till the end of the survey to work it out for sure.

Quote:
While I have you - if I don't use random sample ie. I invite everybody
in the company to take part in an anonymous online survey and only take
the people that reply. How do I determine if my results are meaningful?

That one's easy: they're not. All the statistical techniques you will
learn depend on having a good random sample. If you have a self-
selected sample, there's an excellent chance they are not
representative of the population.

Side question here - When dealing in surveys aren't the samples always
to some extent, self selected anyway?

Not every one is going to respond to a survey. You can't force them and
if you could they might give false information just to get rid of you.

Quote:

I mean if 1% reply I would say the result do not mean much.

Not necessarily. Presidential polls typically sample about 1000
voters, which is well under 1% of all voters, yet they are accurate
to within ±3%. When testing yes/no data, what matters is how many are
in the sample and not what percent of the population is in the
sample.

If 100% reply then I could be confident of the response. How do I
determine what size sample is big enough.

Tell us more about what you're trying to measure, and what
significance level or confidence level you're shooting for.

I don't know. In scientific papers they aim for 95% confidence interval
that is the only number I have been exposed to. I suspect that they do
not use such a high confidence level in other fields but I have no idea
really.
mcap
Posted: Mon Jan 28, 2008 3:52 pm
Guest
Sample size calculations tell you how many subjects you need to find a
hypothesized significant difference (or some other quantity) with an
esimated variability. This tells you nothing about a representative
sample. In fact, it is conceivable that the less representative
sample you have, the more likely you are to get a significant
result.

I don't know if there is a "test" that tells you whether your sample
is representative. You can have a perfectly random selection process
but it is the response that can introduce bias.

You have to look at the characteristics of your sample that responded
and compare them to the characteristics of your target population. In
a company, I could think of several important factors to
match......age, experience, education, proportion from each department
or division, general level or rank, salary, etc. It all depends on
what you are looking at. You could use confidence intervals of your
sample and see if they contain the company average or proportion (if
you have access) but then keep in mind, you are penalized for having a
very large sample size.

Hope I am at least in the ballpark of what you are looking for.
Opinion polls and commercial surveys do a nice job of representative
results. But...they have ways of weighting certain responses based
on their target populations. So there is a level of complexity there
that you probably won't get into.
Richard Ulrich
Posted: Mon Jan 28, 2008 11:55 pm
Guest
- taking one digression here -

On Mon, 28 Jan 2008 06:00:49 GMT, Rob <user@example.com> wrote:

[snip, much]
Quote:

Side question here - When dealing in surveys aren't the samples always
to some extent, self selected anyway?

Not every one is going to respond to a survey. You can't force them and
if you could they might give false information just to get rid of you.
[snip, rest


"Self-selected", sure, if you are talking only about the type
of survey that uses one phone call. By the way, I think that
good political pollers do make more than *one* attempt to
reach a number. And they should keep track of which attempt
succeeded, in case "hard to reach" makes a difference in outcome.

Similarly, there are surveys that use mail for contact. Or
follow up phone calls with mail, etc. - If the "quick responders"
look just like the "slow responders", the pollster will be happier.

The U.S. government publishes surveys of employers, and surveys
of markets, though this is the sort of knowledge-seeking that the
Bush Administration has worked to de-fund. (Republican tax and
economic policies sell better with ignorance.)

--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html
Rob
Posted: Tue Jan 29, 2008 5:26 am
Guest
Richard Ulrich wrote:
Quote:
- taking one digression here -

On Mon, 28 Jan 2008 06:00:49 GMT, Rob <user@example.com> wrote:

[snip, much]
Side question here - When dealing in surveys aren't the samples always
to some extent, self selected anyway?

Not every one is going to respond to a survey. You can't force them and
if you could they might give false information just to get rid of you.
[snip, rest

"Self-selected", sure, if you are talking only about the type
of survey that uses one phone call. By the way, I think that
good political pollers do make more than *one* attempt to
reach a number. And they should keep track of which attempt
succeeded, in case "hard to reach" makes a difference in outcome.


I would have thought that hard to reach is not the main problem.
A lot of people do not want to talk to pollsters.

Quote:
Similarly, there are surveys that use mail for contact. Or
follow up phone calls with mail, etc. - If the "quick responders"
look just like the "slow responders", the pollster will be happier.


That is fine responder wants to take part and has overlooked the filling
of the survey. Some people don't care.

At work when I am required to participate in a survey that is not
anonymous there are certain questions that I will not answer truthfully.

How can they follow up if they do not know who I am?

Quote:
The U.S. government publishes surveys of employers, and surveys
of markets, though this is the sort of knowledge-seeking that the
Bush Administration has worked to de-fund. (Republican tax and
economic policies sell better with ignorance.)

Sound like you know from personal experience. Sadly I doubt that when
changes of government occur the damage is ever fully repaired.
Rob
Posted: Tue Jan 29, 2008 5:37 am
Guest
mcap wrote:
Quote:
Sample size calculations tell you how many subjects you need to find a
hypothesized significant difference (or some other quantity) with an
esimated variability. This tells you nothing about a representative
sample. In fact, it is conceivable that the less representative
sample you have, the more likely you are to get a significant
result.

I don't know if there is a "test" that tells you whether your sample
is representative. You can have a perfectly random selection process
but it is the response that can introduce bias.

You have to look at the characteristics of your sample that responded
and compare them to the characteristics of your target population. In
a company, I could think of several important factors to
match......age, experience, education, proportion from each department
or division, general level or rank, salary, etc. It all depends on
what you are looking at. You could use confidence intervals of your
sample and see if they contain the company average or proportion (if
you have access) but then keep in mind, you are penalized for having a
very large sample size.

Hope I am at least in the ballpark of what you are looking for.
Opinion polls and commercial surveys do a nice job of representative
results. But...they have ways of weighting certain responses based
on their target populations. So there is a level of complexity there
that you probably won't get into.

Interesting ideas - I will be making another post on Generation
differences (You know Baby boomers, Gen X, Gen Y etc) I would be
interested on your feed back on that.

In scientific papers it is always the same 95% confidence interval seems
to be pretty well universally accepted. I was hoping for an answer like
"In my organization when we do employee surveys we am for a confidence
interval of X"

What are the names of the formulas used for determining the size of
sample so I can search for them?
z
Posted: Wed Feb 06, 2008 6:53 am
Guest
On Jan 29, 4:37 am, Rob <u...@example.com> wrote:
Quote:
mcap wrote:
Sample size calculations tell you how many subjects you need to find a
hypothesized significant difference (or some other quantity) with an
esimated variability.  This tells you nothing about a representative
sample.  In fact, it is conceivable that the less representative
sample you have, the more likely you are to get a significant
result.

I don't know if there is a "test" that tells you whether your sample
is representative. You can have a perfectly random selection process
but it is the response that can introduce bias.

You have to look at the characteristics of your sample that responded
and compare them to the characteristics of your target population.  In
a company, I could think of several important factors to
match......age, experience, education, proportion from each department
or division, general level or rank, salary, etc.  It all depends on
what you are looking at.  You could use confidence intervals of your
sample and see if they contain the company average or proportion (if
you have access) but then keep in mind, you are penalized for having a
very large sample size.

Hope I am at least in the ballpark of what you are looking for.
Opinion polls and commercial surveys do a nice job of representative
results.   But...they have ways of weighting certain responses based
on their target populations.  So there is a level of complexity there
that you probably won't get into.

Interesting ideas - I will be making another post on Generation
differences (You know Baby boomers, Gen X,  Gen Y etc) I would be
interested on your feed back on that.

In scientific papers it is always the same 95% confidence interval seems
to be pretty well universally accepted.  I was hoping for an answer like
  "In my organization when we do employee surveys we am for a confidence
interval of X"

What are the names of the formulas used for determining the size of
sample so I can search for them?- Hide quoted text -

- Show quoted text -

Well, the key point is that the size of the original population does
not enter into the calculation of sample size; and, as an extension of
that, the percent sampled doesn't matter, either. The only number that
matters is the absolute size of the sample. Does't matter if you are
sampling the average height of the population of the US or the crowd
in a diner, the precision of the answer will be the same in either
case, depending only on the number of people you sample.

It's like sampling a liquid; you take a drop out of a testtube to test
the pH. Now, if you want to test the pH of a swimming pool, you don't
need to pull out a couple of gallons in order to get the same
percentage sample; one drop will still do it. Providing it's well
mixed; see below.

Now, if you want to know how big your sample needs to be with respect
to 95% confidence, or anything like that, you need an a priori guess
of the two other numbers that go into the formula; what your
approximate "answer" will be, mean or percentage or whatever, and what
the approximate standard deviation of that answer will be. Of course,
if you had a good answer, you wouldn't need to take the sample in the
first place, so you have to do the best guess you can and then after
you have your sample, the actual data will let you know what your
confidence interval, precision, whatever is; hopefully if your guess
was right, or more conservative, you will have the precision you
want.

The problem, which you seem to be sort of aware of, is whether your
sample is "representative", or "random", etc., vs. biased. As in, is
the liquid you sampled well mixed. That's a tougher thing, and size of
the sample isn't as important as you might think, compared to good
design. Even if you test 100% of the population, theory considers that
to be just one sample of an infinite population; if you test 100% of
the population a few years later, you will undoubtedly see a different
mix to some degree, so your 100% sample is not a perfectly precise
sample.

So, about the only rule of thumb available, is that nobody pays much
attention to samples with less than 20. Thirty would be even better.
After that, it's all up for grabs.

For instance: picking random folks out of the phone book to call, for
instance; seems random? But in fact, reliable pollsters and surveyors
nowadays don't do that, instead having dialers that just generate
random numbers and dial them, that being a more random sample than
folks who are listed in the phone book, which is subject to bias from
unlisted numbers, people who changed residence, etc. Of course, folks
who have phones are not a random sample of everybody.....
z
Posted: Wed Feb 06, 2008 6:56 am
Guest
On Jan 29, 4:26 am, Rob <u...@example.com> wrote:

Quote:
At work when I am required to participate in a survey that is not
anonymous there are certain questions that I will not answer truthfully.

How can they follow up if they do not know who I am?

Which brings up the issue of being able to break the anonymity.
Imagine you work in a small company and they survey the employees
anonymously, and find an answer they don't like; how hard could it be
to figure out who in the company might be male, 40-45 years old,
white, makes $50-75,000 and hired in 2005? I've seen that done in one
place I worked. For "good causes" of course, just to pursue the issue
to a more satisfactory solution, not for punitive purposes. But a
promise of anonymity should be, but isn't always, a guarantee.
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sat Oct 11, 2008 6:30 pm