| |
 |
|
|
Science Forum Index » Space - Consult Forum » Collinearity, confidence intervals and sampling...
Page 1 of 1
|
| Author |
Message |
| reflex... |
Posted: Wed May 14, 2008 3:43 pm |
|
|
|
Guest
|
Ok I've been doing some more reseach on this whole collinearity thing and
read that if you have collinear variables, the best fitting plane of the
data points in a regression will be narrower and less achored (because the
predictors are highly correlated so the predictor values fall in a straight
line). Consequently, if response varied from sample to sample, the
coefficients could change substantially. Therefore the standard errors of
the coefficients are necessarily larger.
Does this mean that this is not a problem if you have population level data
(ie sample size doesn't matter because you have 'sampled' the entire
population you are interested)?
Are there are other effects of collinearity that do not matter if you have
population level data? What about other assumptions of regression e.g.
normal distribution of variables, homoskedasticity.
The website I've been looking at is
http://www.stat.psu.edu/~jglenn/stat501/12multicollinearity/04multico_corr.html
which is an excellent source on collinearity.
As always, any replies well appreciated. |
|
|
| Back to top |
|
| Richard Ulrich... |
Posted: Thu May 15, 2008 12:31 am |
|
|
|
Guest
|
On Wed, 14 May 2008 21:43:06 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Quote: Ok I've been doing some more reseach on this whole collinearity thing and
read that if you have collinear variables, the best fitting plane of the
data points in a regression will be narrower and less achored (because the
predictors are highly correlated so the predictor values fall in a straight
line). Consequently, if response varied from sample to sample, the
coefficients could change substantially. Therefore the standard errors of
the coefficients are necessarily larger.
Does this mean that this is not a problem if you have population level data
(ie sample size doesn't matter because you have 'sampled' the entire
population you are interested)?
That's a slightly-true observation, with no real application.
With true Population level data, you might have "measurement
error" but you have no "statistical error." This is like the
results of taking a vote, as compared to taking an opinion poll.
("Recounts" are used to reduce "measurement error" in votes.)
With true population data, or data treated as such, you have no
role for inference or generalization or the direct application
of science; you have an administrative tool.
Basically - If you are hoping to say anything interesting to
almost anybody else, you are treating some "population" as
a sample. So, unless there is special reason, you never will
treat a population as a "population."
If you want more discussion, you might Google-groups and
look at threads found by
< groups:sci.stat.* "finite population" author:ulrich >
Quote:
Are there are other effects of collinearity that do not matter if you have
population level data? What about other assumptions of regression e.g.
normal distribution of variables, homoskedasticity.
The website I've been looking at is
http://www.stat.psu.edu/~jglenn/stat501/12multicollinearity/04multico_corr.html
which is an excellent source on collinearity.
As always, any replies well appreciated.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| reflex... |
Posted: Fri May 16, 2008 3:54 am |
|
|
|
Guest
|
"Richard Ulrich" <Rich.Ulrich at (no spam) comcast.net> wrote in message
news:qghn24hqmfs20su188b7iaku2od617323d at (no spam) 4ax.com...
Quote: On Wed, 14 May 2008 21:43:06 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Ok I've been doing some more reseach on this whole collinearity thing and
read that if you have collinear variables, the best fitting plane of the
data points in a regression will be narrower and less achored (because
the
predictors are highly correlated so the predictor values fall in a
straight
line). Consequently, if response varied from sample to sample, the
coefficients could change substantially. Therefore the standard errors of
the coefficients are necessarily larger.
Does this mean that this is not a problem if you have population level
data
(ie sample size doesn't matter because you have 'sampled' the entire
population you are interested)?
That's a slightly-true observation, with no real application.
With true Population level data, you might have "measurement
error" but you have no "statistical error." This is like the
results of taking a vote, as compared to taking an opinion poll.
("Recounts" are used to reduce "measurement error" in votes.)
With true population data, or data treated as such, you have no
role for inference or generalization or the direct application
of science; you have an administrative tool.
Basically - If you are hoping to say anything interesting to
almost anybody else, you are treating some "population" as
a sample. So, unless there is special reason, you never will
treat a population as a "population."
If you want more discussion, you might Google-groups and
look at threads found by
groups:sci.stat.* "finite population" author:ulrich
Are there are other effects of collinearity that do not matter if you
have
population level data? What about other assumptions of regression e.g.
normal distribution of variables, homoskedasticity.
The website I've been looking at is
http://www.stat.psu.edu/~jglenn/stat501/12multicollinearity/04multico_corr.html
which is an excellent source on collinearity.
As always, any replies well appreciated.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html
Say if you had a population sample of all hospitals in England, and you
wanted to say something interesting about all hospitals in England, then you
wouldn't need to generalise to a wider population because you know the whole
population. Surely that's a real application?
Cheers |
|
|
| Back to top |
|
| Richard Ulrich... |
Posted: Fri May 16, 2008 5:04 pm |
|
|
|
Guest
|
On Fri, 16 May 2008 09:54:15 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Quote:
Say if you had a population sample of all hospitals in England, and you
wanted to say something interesting about all hospitals in England, then you
wouldn't need to generalise to a wider population because you know the whole
population. Surely that's a real application?
What can you say about "all the hospitals in England"
in the year 2007 that *remains interesting to people* , if
you are unable to extrapolate or infer something about
the hospitals in the year 2008?
- or any other hospitals in any time or place....
Come up with something *interesting*, and I think I
can show you that you are drawing inferences.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| reflex... |
Posted: Sat May 17, 2008 7:04 am |
|
|
|
Guest
|
"Richard Ulrich" <Rich.Ulrich at (no spam) comcast.net> wrote in message
news:5o0s24p0al837j4esq82vm9bs0os8slj1m at (no spam) 4ax.com...
Quote: On Fri, 16 May 2008 09:54:15 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Say if you had a population sample of all hospitals in England, and you
wanted to say something interesting about all hospitals in England, then
you
wouldn't need to generalise to a wider population because you know the
whole
population. Surely that's a real application?
What can you say about "all the hospitals in England"
in the year 2007 that *remains interesting to people* , if
you are unable to extrapolate or infer something about
the hospitals in the year 2008?
- or any other hospitals in any time or place....
Come up with something *interesting*, and I think I
can show you that you are drawing inferences.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html
The population of interest is what you are trying to generalise to in the
first place though isn't it? So in taking a random sample from a set
population (hospitals in England) you can arguably generalise to all
hospitals (assuming you sample is good/representative enough). And if you
have data on the whole population, no generalising is needed, and hence this
is a good thing. You *are* inferring about all hospitals in the future of
course, but aren't you doing this with all research? You can't sample the
future, yet! So it is preferable to have population-level data and draw
inferences about the future than have a random sample of that population and
draw inferences about the future. |
|
|
| Back to top |
|
| Richard Ulrich... |
Posted: Sat May 17, 2008 6:58 pm |
|
|
|
Guest
|
On Sat, 17 May 2008 13:04:35 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Quote:
"Richard Ulrich" <Rich.Ulrich at (no spam) comcast.net> wrote in message
news:5o0s24p0al837j4esq82vm9bs0os8slj1m at (no spam) 4ax.com...
On Fri, 16 May 2008 09:54:15 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Say if you had a population sample of all hospitals in England, and you
wanted to say something interesting about all hospitals in England, then
you
wouldn't need to generalise to a wider population because you know the
whole
population. Surely that's a real application?
What can you say about "all the hospitals in England"
in the year 2007 that *remains interesting to people* , if
you are unable to extrapolate or infer something about
the hospitals in the year 2008?
- or any other hospitals in any time or place....
Come up with something *interesting*, and I think I
can show you that you are drawing inferences.
The population of interest is what you are trying to generalise to in the
first place though isn't it? So in taking a random sample from a set
population (hospitals in England) you can arguably generalise to all
hospitals (assuming you sample is good/representative enough). And if you
have data on the whole population, no generalising is needed, and hence this
is a good thing. You *are* inferring about all hospitals in the future of
course, but aren't you doing this with all research? You can't sample the
future, yet! So it is preferable to have population-level data and draw
inferences about the future than have a random sample of that population and
draw inferences about the future.
You miss the point about drawing inferences.
Re-read some posts. Do some Googling-groups as I suggested
if it remains opaque.
Drawing "inferences" says that whatever is on hand is being
treated as a "sample". So, even if it is a full set of the hospitals
of England, you use the confidence limits as if it were any
other tiny sample.
If you do just want to describe a fixed set of hospitals, it is
like reporting any vote, that is, the only error is in counting.
If it is supposed to generalize or have another meaning, then
it is a sample. From a population on hand, you know how
many meals to order, etc. Or how many were eaten for the
term. You don't know about next year.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| reflex... |
Posted: Sun May 18, 2008 4:50 am |
|
|
|
Guest
|
"Richard Ulrich" <Rich.Ulrich at (no spam) comcast.net> wrote in message
news:apru245gi1tjsuah2ckgt4r0o3207c3euk at (no spam) 4ax.com...
Quote: On Sat, 17 May 2008 13:04:35 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
"Richard Ulrich" <Rich.Ulrich at (no spam) comcast.net> wrote in message
news:5o0s24p0al837j4esq82vm9bs0os8slj1m at (no spam) 4ax.com...
On Fri, 16 May 2008 09:54:15 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Say if you had a population sample of all hospitals in England, and
you
wanted to say something interesting about all hospitals in England,
then
you
wouldn't need to generalise to a wider population because you know the
whole
population. Surely that's a real application?
What can you say about "all the hospitals in England"
in the year 2007 that *remains interesting to people* , if
you are unable to extrapolate or infer something about
the hospitals in the year 2008?
- or any other hospitals in any time or place....
Come up with something *interesting*, and I think I
can show you that you are drawing inferences.
The population of interest is what you are trying to generalise to in the
first place though isn't it? So in taking a random sample from a set
population (hospitals in England) you can arguably generalise to all
hospitals (assuming you sample is good/representative enough). And if
you
have data on the whole population, no generalising is needed, and hence
this
is a good thing. You *are* inferring about all hospitals in the future
of
course, but aren't you doing this with all research? You can't sample
the
future, yet! So it is preferable to have population-level data and draw
inferences about the future than have a random sample of that population
and
draw inferences about the future.
You miss the point about drawing inferences.
Re-read some posts. Do some Googling-groups as I suggested
if it remains opaque.
Drawing "inferences" says that whatever is on hand is being
treated as a "sample". So, even if it is a full set of the hospitals
of England, you use the confidence limits as if it were any
other tiny sample.
If you do just want to describe a fixed set of hospitals, it is
like reporting any vote, that is, the only error is in counting.
If it is supposed to generalize or have another meaning, then
it is a sample. From a population on hand, you know how
many meals to order, etc. Or how many were eaten for the
term. You don't know about next year.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html
I'm really busy with revision atm (not related to populations) so I don't
have time to search the groups but I will do in the future as I'm quite
confused over this issue. For example I have an assignment soon that we
have been given a population-level dataset for which says that all tests of
significance are irrelevant so don't talk about them in the analysis.
What I would say is that the ordering meals example is very specific and
therefore it's harder to generalise to next year. But you can still get an
idea. If however your analysis demonstrated something much more general,
say that patient satisfaction is determined by the socio-economic
characteristics of the local area and not management organisation (I can't
think of a better example but you get the idea) then commonsense tells you
this is likely to apply to next year. |
|
|
| Back to top |
|
| Richard Ulrich... |
Posted: Mon May 19, 2008 6:51 pm |
|
|
|
Guest
|
On Sun, 18 May 2008 10:50:18 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
[snip, previous comments. On Finite Populations.
Quote:
I'm really busy with revision atm (not related to populations) so I don't
have time to search the groups but I will do in the future as I'm quite
confused over this issue. For example I have an assignment soon that we
have been given a population-level dataset for which says that all tests of
significance are irrelevant so don't talk about them in the analysis.
"All tests are irrelevant" is an acceptable mandate
when the sample is very large, which is what "populations"
frequently are. The New England Journal of Medicine, many
years ago, told authors to omit tests when reporting huge
health surveys. Especially when everything trivial comes out
as nominally "significant", the proper approach is to describe
effect sizes.
- Either your population is *large* (and not just a "population")
and you have elided the instructions, or (a) you are dealing with
an odd question, or (b) you have a questionable textbook or
instructor. Is this assignment from a text that can be named?
Quote:
What I would say is that the ordering meals example is very specific and
therefore it's harder to generalise to next year. But you can still get an
idea.
Finite: You use the numbers to pay for last year's meals.
For next year: You are "generalizing" from one year's data as
a "sample", to the next year's data. No longer a Finite population.
Quote: If however your analysis demonstrated something much more general,
say that patient satisfaction is determined by the socio-economic
characteristics of the local area and not management organisation (I can't
think of a better example but you get the idea) then commonsense tells you
this is likely to apply to next year.
Again - if you are applying inference to the next year, you
are treating the Population as a sample. Year-to-year correlation
might be estimated and applied in some fashion in order to
reduce the standard deviations, but SDs on the estimates are
still appropriate.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| Hylton Boothroyd... |
Posted: Tue May 20, 2008 4:46 am |
|
|
|
Guest
|
reflex <sdfs at (no spam) sdfsd.com> wrote:
Quote: Say if you had a population sample of all hospitals in England, and you
wanted to say something interesting about all hospitals in England, then you
wouldn't need to generalise to a wider population because you know the whole
population. Surely that's a real application?
There are various ways of looking at how the hospitals in England got
there!
Your are in the "They are there, so what more is to be said" camp.
Which is the least imaginative point of view of all, but the way in
which the majority of people think. So, life being what it is, a
proportion of professional statisticians think exactly like that, and
get very cross if asked to think in any other way. As similarly do a
proportion of academic/teaching statisticians.
[Remark: One view of the origins of crossness in the human mind is that
it results from the person who is cross not having any relevant concept
in his/her head to deal with the situation. The crossness emerges in
bluster, posturing, contempt ... but only rarely calms into humility!]
Another, far-richer but far-more-complex, point of view is that the
hospitals that are actually there are there as the outcome of some
hidden (partly probabilistic) process, which in re-runs of life starting
from the same point, say 50 years ago, could have turned out in a whole
variety of ways.
It would be useful to establish things about that process
- to foresee how it might develop,
- to intervene at well-judged points to change the likely outcomes.
- and just to understand how things happen.
So the intellectual challenge is to model how the hospitals got into
their present state, and to do so in ways which leaves you with a useful
model for exploring policy issues for the future.
In that sense, the current set of hospitals is simply a sample of the
"might-have-been"s to give you a clue about the future set of
"could-be"s.
I'm in the camp that thinks that if you aren't at least trying to think
in those terms you haven't begun to be potentially useful. Facts are
useless without an interpretive context.
--
Hylton |
|
|
| Back to top |
|
| reflex... |
Posted: Tue May 20, 2008 8:42 am |
|
|
|
Guest
|
"Hylton Boothroyd" <hylton.boothroyd at (no spam) null.c0m> wrote in message
news:1ih8gb0.ovfjlioi41zcN%hylton.boothroyd at (no spam) null.c0m...
Quote: reflex <sdfs at (no spam) sdfsd.com> wrote:
Say if you had a population sample of all hospitals in England, and you
wanted to say something interesting about all hospitals in England, then
you
wouldn't need to generalise to a wider population because you know the
whole
population. Surely that's a real application?
There are various ways of looking at how the hospitals in England got
there!
Your are in the "They are there, so what more is to be said" camp.
Which is the least imaginative point of view of all, but the way in
which the majority of people think. So, life being what it is, a
proportion of professional statisticians think exactly like that, and
get very cross if asked to think in any other way. As similarly do a
proportion of academic/teaching statisticians.
[Remark: One view of the origins of crossness in the human mind is that
it results from the person who is cross not having any relevant concept
in his/her head to deal with the situation. The crossness emerges in
bluster, posturing, contempt ... but only rarely calms into humility!]
Another, far-richer but far-more-complex, point of view is that the
hospitals that are actually there are there as the outcome of some
hidden (partly probabilistic) process, which in re-runs of life starting
from the same point, say 50 years ago, could have turned out in a whole
variety of ways.
It would be useful to establish things about that process
- to foresee how it might develop,
- to intervene at well-judged points to change the likely outcomes.
- and just to understand how things happen.
So the intellectual challenge is to model how the hospitals got into
their present state, and to do so in ways which leaves you with a useful
model for exploring policy issues for the future.
In that sense, the current set of hospitals is simply a sample of the
"might-have-been"s to give you a clue about the future set of
"could-be"s.
I'm in the camp that thinks that if you aren't at least trying to think
in those terms you haven't begun to be potentially useful. Facts are
useless without an interpretive context.
--
Hylton
Whoa there! I'm not cross! I'm simply a *social science* student trying to
get my head round some statistical issues. I emphasise social science
because we are not taught statistics to the degree that statisticians are
(obviously) so the general knowledge level of this group is far beyond
anything I've been taught.
I realise that having population-level data does not mean that there is
nothing else to be said. Inferences, hypotheses and theories are based upon
analysis of the data. My point was simply regarding the statistical side of
things, in that if you have population-level data, as I've been taught, you
don't need to worry about the quality of your sample and the associated
tests of significance. Hence why I was asking whether or not collinearity
and confidence internals are relevant if you have population-level data.
Quote: So the intellectual challenge is to model how the hospitals got into
their present state, and to do so in ways which leaves you with a useful
model for exploring policy issues for the future.
In that sense, the current set of hospitals is simply a sample of the
"might-have-been"s to give you a clue about the future set of
"could-be"s.
Do you examples of any such research? I find the way you define
population-level data as a sample of 'might-have-beens' interesting but hard
to fully understand without examples. What if research is not interested in
how things came into being though? Could policy decisions not arise simply
from how things are now? And does this mean tests of significance are still
relevant?
As you can tell, I'm new to all this and I've definately been taught the way
you are looking at things (as a statistician I guess). |
|
|
| Back to top |
|
| reflex... |
Posted: Tue May 20, 2008 8:49 am |
|
|
|
Guest
|
"Richard Ulrich" <Rich.Ulrich at (no spam) comcast.net> wrote in message
news:mo3434hkic2s93942og5mf8r19s7p4s25p at (no spam) 4ax.com...
Quote: On Sun, 18 May 2008 10:50:18 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
[snip, previous comments. On Finite Populations.
I'm really busy with revision atm (not related to populations) so I don't
have time to search the groups but I will do in the future as I'm quite
confused over this issue. For example I have an assignment soon that we
have been given a population-level dataset for which says that all tests
of
significance are irrelevant so don't talk about them in the analysis.
"All tests are irrelevant" is an acceptable mandate
when the sample is very large, which is what "populations"
frequently are. The New England Journal of Medicine, many
years ago, told authors to omit tests when reporting huge
health surveys. Especially when everything trivial comes out
as nominally "significant", the proper approach is to describe
effect sizes.
- Either your population is *large* (and not just a "population")
and you have elided the instructions, or (a) you are dealing with
an odd question, or (b) you have a questionable textbook or
instructor. Is this assignment from a text that can be named?
The population is all schools in a certain area, several thousand.
Quote:
What I would say is that the ordering meals example is very specific and
therefore it's harder to generalise to next year. But you can still get
an
idea.
Finite: You use the numbers to pay for last year's meals.
For next year: You are "generalizing" from one year's data as
a "sample", to the next year's data. No longer a Finite population.
I see what you're saying here - your population of interest changes the
moment you draw inferences beyond it, right? But it is ok to do this,
right? And what statistical tests would justify doing this or be
appropriate in doing this, if any?
Quote: If however your analysis demonstrated something much more general,
say that patient satisfaction is determined by the socio-economic
characteristics of the local area and not management organisation (I
can't
think of a better example but you get the idea) then commonsense tells
you
this is likely to apply to next year.
Again - if you are applying inference to the next year, you
are treating the Population as a sample. Year-to-year correlation
might be estimated and applied in some fashion in order to
reduce the standard deviations, but SDs on the estimates are
still appropriate.
So the 'specifity' of your inference makes no difference to whether it's
justified or not? (As in, if I was to infer that next years all hospitals
would require X meals, this estimate would likely be out due to many
factors, but if I was to infer that next year patient satisfaction will be
determined by the socio-economic characteristics of the local area as they
are this year, then commonsense says such things hold year on year).
Thanks again for your comments and patience with a novice! |
|
|
| Back to top |
|
| Hylton Boothroyd... |
Posted: Wed May 21, 2008 4:13 pm |
|
|
|
Guest
|
reflex <sdfs at (no spam) sdfsd.com> wrote:
Quote: "Hylton Boothroyd" <hylton.boothroyd at (no spam) null.c0m
There are various ways of looking at how the hospitals in England got
there!
Your are in the "They are there, so what more is to be said" camp.
Which is the least imaginative point of view of all, but the way in
which the majority of people think. So, life being what it is, a
proportion of professional statisticians think exactly like that, and
get very cross if asked to think in any other way. As similarly do a
proportion of academic/teaching statisticians.
Whoa there! I'm not cross!
I didn't think you were. But I can see what I wrote could be read that
way.
I was remembering some pretty sharp disagreements I'd had
with some members of both kinds of expert, and wanted to
indicate that the split of view is not so much a split between
beginners and experts as a split between people with
characteristically different ways of looking at things. And in
particular, a great variation in the conceptual frameworks with which
they consciously or unconsciously approach the analysis of data.
[now selecting a little, re-ordering a little, and commenting a little]
Quote: Another, far-richer but far-more-complex, point of view is that the
hospitals that are actually there are there as the outcome of some
hidden (partly probabilistic) process, which in re-runs of life
starting from the same point, say 50 years ago, could have turned
out in a whole variety of ways.
In that sense, the current set of hospitals is simply a sample of the
"might-have-been"s to give you a clue about the future set of
"could-be"s.
My point was simply regarding the statistical side of
things, in that if you have population-level data,
as I've been taught, you don't need to worry about the
quality of your sample and the associated tests of significance.
Hence why I was asking whether or not collinearity
and confidence intervals are relevant if you have
population-level data.
I'll leave others to comment on collinearity.
All sorts of things can be said about even the simplest ideas of
sampling.
When I decided to include a short module on sampling for second year
undergraduate Management Science students some 15 years ago, I found
that I wanted to alert them to all kinds of issues that didn't seem to
be brought out in the texts that they already had for other aspects of
statistics.
I'd gradually come to feel that confidence intervals ought to come with
health warnings that start with something like:
"If I've correctly imagined the characteristic structure
of the population data, and if I've sampled from
the population in a random way, and if each observation
was correctly taken, then ..."
Sure, I know that you'd really want to quote an all-in view of the
confidence you have in the figure, but there aren't any tools for
doing that.
At its simplest, if each of the N elements in the population has a
precisely observable number attached to it, and if you collect the full
set of numbers, then you know the full set of numbers exactly. So,
among other things you know the value of the total of the numbers
exactly and the value of the mean of the numbers exactly. Which means
that the widths of the confidence intervals for those two measures
are zero.
If you only collect the numbers for n elements you don't know any of the
population characteristics exactly. But in the special case that
- the n elements have been chosen randomly ...
- the set of attached numbers for the whole population
is nicely behaved (that is the histogram of the set of
N numbers is [in truth but unknown to you] a nice little hump
with modest tails at each side) ...
then you can use a simple formula to calculate confidence intervals for
the unknown mean and total of the numbers. The width of the interval is
approximately proportional to the square root of
(1/n - 1/N)
which means that as n reaches N, the width of the confidence interval
drops to zero (which agrees with the previous paragraph).
If the population, unbeknown to you, includes a few wildly different
numbers from the rest, then most bets are off!
Quote: So the intellectual challenge is to model how the hospitals got into
their present state, and to do so in ways which leaves you with a
useful model for exploring policy issues for the future.
In that sense, the current set of hospitals is simply a sample
of the "might-have-been"s to give you a clue about the future
set of "could-be"s.
Do you examples of any such research?
Nothing simple. An example I'm most familiar with, and which was
treated in a similar way by the US Air Force, was in the provision of
engineering spares where a group of us
devised and fitted a model that at the heart of it
- treated the demand for a particular spare as
a stream of randomly timed events with a particular
ongoing unknown intensity, for which we had data to date,
but which for most spares left quite a range of uncertainty
about the unknown intensity,
- treated the combined distribution of the unknown intensities
of all the spares as having a mathematical form that could be
estimated for a whole family of engineering spares,
- estimated it from the whole collection of data,
- devised and put in place a computer programme that
accepted start date, current date, use-to-date
of each spare and estimated the probability distributions
of future requirements during future periods,
- injected that data model into a decision model for
controlling the acquisition of spares.
In using a structure of analysis like that, we were allowing for the
inherent variability in a stream of randomly timed events, and also for
the degree of ignorance about the actual intensity for each spare, given
that all we had was actual use-to-date (which could by chance have been
above or below the underlying intensity).
One of the more amusing outcomes of the decision model was that
it led to rules like
"Even if you haven't used one of these in the first two years,
keep one on the shelf it it costs less than £25"
Quote: Could policy decisions not arise simply
from how things are now?
I think one could imagine contexts in which it would feel like that. And
socially anything is possible!
But change requires you to have a competing picture of how things might
be. Which is an altogether different mental activity from drawing
inferences about the current state of things and/or the immediate past.
The competing pictures of how things might be are _not_ logical
consequences of how things are/were. There is a piece of fresh
imagination in there.
Which is why the academic instinct to say "More research is needed" can
often seem just to be putting off the day of creative thinking.
Quote: And does this mean tests of significance are still relevant?
Like others in the thread, I don't see what tests of significance you
can do on just a straight description of how things are!
--
Hylton |
|
|
| Back to top |
|
| Richard Ulrich... |
Posted: Wed May 21, 2008 11:11 pm |
|
|
|
Guest
|
On Tue, 20 May 2008 14:49:48 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
Quote:
"Richard Ulrich" <Rich.Ulrich at (no spam) comcast.net> wrote in message
news:mo3434hkic2s93942og5mf8r19s7p4s25p at (no spam) 4ax.com...
On Sun, 18 May 2008 10:50:18 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
[snip, previous comments. On Finite Populations.
[snip. Again]
me > >
Quote: Finite: You use the numbers to pay for last year's meals.
For next year: You are "generalizing" from one year's data as
a "sample", to the next year's data. No longer a Finite population.
reflex
I see what you're saying here - your population of interest changes the
moment you draw inferences beyond it, right?
To be clear -- Your "population" changes, effectively, to
a "sample". So if you make an inference, you need to
have computed test, confidence intervals, or whatever
it is that suits the discussion.
Quote: But it is ok to do this,
right? And what statistical tests would justify doing this or be
appropriate in doing this, if any?
? (You seem lost. Perhaps you should set the discussion
aside until you have practical examples to sink your teeth
into.)
Quote:
If however your analysis demonstrated something much more general,
say that patient satisfaction is determined by the socio-economic
characteristics of the local area and not management organisation (I
can't
think of a better example but you get the idea) then commonsense tells
you
this is likely to apply to next year.
me
Again - if you are applying inference to the next year, you
are treating the Population as a sample. Year-to-year correlation
might be estimated and applied in some fashion in order to
reduce the standard deviations, but SDs on the estimates are
still appropriate.
reflex
So the 'specifity' of your inference makes no difference to whether it's
justified or not?
Huh? If you draw an inference, you should have used
sample statistics on whatever-it-is you are inferring from.
Quote: (As in, if I was to infer that next years all hospitals
would require X meals, this estimate would likely be out due to many
factors,
- You could use the estimate, with a confidence interval,
if you assume that "factors are generally the same." Or if
factors are different, you could "model" it all; or you could
simply assert that there are too many changes for any reliable
estimate.
Quote: but if I was to infer that next year patient satisfaction will be
determined by the socio-economic characteristics of the local area as they
are this year, then commonsense says such things hold year on year).
There are so many things wrong with that statement....
Let's see. A decent report might conclude, "There is an
*association* between socio-economic characteristics
and the self-reports of patient satisfaction." Causation is
much tougher to approach. Even if there is a huge effect
size.
Would anyone dare to make that statement for a large
population with a tiny difference between "satisfaction"
as reported by patients of different social classes?
- I think, with data in hand, it would be clearer to make
the distinctions between "large samples" which need
estimates of effect sizes, and "Populations" which might
only need description -- assuming that there is some
self-absorbed used to be made by the description. (I don't
imagine any good use for this particular "finding" as
a description.)
Let's see. A tiny Population might have a fairly large
difference of some sort, which would barely (or not) be
statistically significant. It *might* be interesting in some
regard. But Finite Population statistics are mainly interesting
for some sort of bureaucrat-adminstrators; for their manipulation
or use of that set of people or creatures, or whatever.
To conclude that some particular finding might apply to
another "Population" -- thus making both into "samples" --
the statistical test should be significant. That includes,
holding up from the sampling of year to year. (If they
are exactly the same *people* in the sample, that would
introduce correlation, making the prediction easier or
more accurate.)
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Thu Jul 24, 2008 11:21 pm
|
|