 |
|
| Science Forum Index » Statistics - Education Forum » multiple regression question... |
|
Page 1 of 1 |
|
| Author |
Message |
| prof... |
Posted: Mon Oct 19, 2009 11:57 am |
|
|
|
Guest
|
In MLR it's not uncommon to have an independent variable X1 which is
significantly correlated with the dependent variable but yet in the
full MLR model the coefficient b1 is not significant. The logical
reason being that X1 may not significantly add to the model in the
presence of the other independent variables.
However, I recently saw a MLR analysis in which X1 was not
significantly correlated with the dependent variable, but in the full
MLR model b1 was significant. Just the reverse of the first case. I
don't know how to logically explain to my students what this means,
other than resorting to the possibility of multicollinearity; although
using the tolerance values, multicollinearity was not present.
Any help????
Thanks,
Burt |
|
|
| Back to top |
|
|
|
| Rich Ulrich... |
Posted: Mon Oct 19, 2009 4:37 pm |
|
|
|
Guest
|
On Mon, 19 Oct 2009 14:57:54 -0700 (PDT), prof <rbmadden at (no spam) ualr.edu>
wrote:
[quote]In MLR it's not uncommon to have an independent variable X1 which is
significantly correlated with the dependent variable but yet in the
full MLR model the coefficient b1 is not significant. The logical
reason being that X1 may not significantly add to the model in the
presence of the other independent variables.
However, I recently saw a MLR analysis in which X1 was not
significantly correlated with the dependent variable, but in the full
MLR model b1 was significant. Just the reverse of the first case. I
don't know how to logically explain to my students what this means,
other than resorting to the possibility of multicollinearity; although
using the tolerance values, multicollinearity was not present.
Any help????
[/quote]
In the fairly trivial case, a non-significant predictor which
is totally uncorrelated with the other predictor can become
"significant" when taken "after" the other, merely because
the size of the error term is reduced. For instance, if the
new variable accounts for R^2= .50, then the error term
of the regression becomes less than half what it was before;
so the t-test on the remaining, independent variables will
be twice as large.
In the more subtle case, two variables may each, separately
be smaller predictors alone than they *each* are together.
Consider Y= a1*X1 + a2*X2 + e (small)
Now add a BIG random error, the same (adjusted) error each
time, to X1 and X2, but in opposite directions. Clearly this will
reduce the size of univariate correlations of Y with the
new variables, X1' and X2' -- without changing the multiple r.
(Added-error will need to be adjusted for the relative sizes
of a1 and a2 to keep the original relation.)
Another way of describing what happens to create the
effect that you describe is that the once-non-predicting
variable has a correlation with the Residuals of the first
step, despite having little or no or even a negative
correlation with the original variable.
--
Rich Ulrich |
|
|
| Back to top |
|
|
|
| Jeff Miller... |
Posted: Tue Oct 20, 2009 1:28 pm |
|
|
|
Guest
|
On Oct 20, 10:57 am, prof <rbmad... at (no spam) ualr.edu> wrote:
[quote]However, I recently saw a MLR analysis in which X1 was not
significantly correlated with the dependent variable, but in the full
MLR model b1 was significant. I
don't know how to logically explain to my students what this means,
other than resorting to the possibility of multicollinearity; although
using the tolerance values, multicollinearity was not present.
Any help????
[/quote]
My students (in Psych) seem to get some help from
an example where you are predicting a daughter's height from
the mother's height and from "the mother's socio-economic status
when she was a child"--the latter variable being a suppressor.
You can read a detailed discussion of the example here (see page 184):
http://bitweb.tekotago.ac.nz/staticdata/MillerHadenGLM16Feb2006.pdf
You might also find something useful in one of these articles:
Smith, R. L., Ager, J. W., & Williams, D. L. (1992). Suppressor
variables in
multiple regression/correlation. Educational and Psychological
Measurement, 52,
17-29.
Tzelgov, J., & Henik, V. (1991). Suppression situations in
psychological
research. Psychological Bulletin, 109, 524-536.
Finally, I'd be interested to hear what you find works well
if you are willing to report back, as I also find this devilishly
difficult to teach.
Jeff Miller |
|
|
| Back to top |
|
|
|
| Ray Koopman... |
Posted: Tue Oct 20, 2009 2:56 pm |
|
|
|
Guest
|
On Oct 19, 2:57 pm, prof <rbmad... at (no spam) ualr.edu> wrote:
[quote]In MLR it's not uncommon to have an independent variable X1 which is
significantly correlated with the dependent variable but yet in the
full MLR model the coefficient b1 is not significant. The logical
reason being that X1 may not significantly add to the model in the
presence of the other independent variables.
However, I recently saw a MLR analysis in which X1 was not
significantly correlated with the dependent variable, but in the full
MLR model b1 was significant. Just the reverse of the first case. I
don't know how to logically explain to my students what this means,
other than resorting to the possibility of multicollinearity; although
using the tolerance values, multicollinearity was not present.
Any help????
Thanks,
Burt
[/quote]
My recollection of one of Lloyd Humphreys' examples of suppression,
based on his experience in the US aircrew selection program in WWII:
Mechanical ability mattered. (The best simple predictor (single yes/no
question) of success in pilot training was "Have you ever ridden and
maintained a motorcycle?") So they used a standard test of mechanical
aptitude as an initial screen. However, it had to be a paper-and-
pencil test, because they couldn't manage a hands-on test with actual
hardware. That raised the problem that part of what the test measured
was paper-and-pencil-test-taking ability, the msst convenient measure
of which was a vocabulary test, which then got a negative weight in
the regression equation, even though it correlated positively with
success in flight school. Subtracting the vocabulary score from the
mechanical score suppressed the test-taking-ability component of the
mechanical score, making it a purer measure of mechanical ability. |
|
|
| Back to top |
|
|
|
| Rich Ulrich... |
Posted: Tue Oct 20, 2009 4:57 pm |
|
|
|
Guest
|
- I received a brief e-mail which mentioned again the difficulty of
teaching this to business students, and I will add a comment
which may help more directly than what I already posted.
On Mon, 19 Oct 2009 18:37:24 -0400, Rich Ulrich
<rich.ulrich at (no spam) comcast.net> wrote:
[quote]On Mon, 19 Oct 2009 14:57:54 -0700 (PDT), prof <rbmadden at (no spam) ualr.edu
wrote:
In MLR it's not uncommon to have an independent variable X1 which is
significantly correlated with the dependent variable but yet in the
full MLR model the coefficient b1 is not significant. The logical
reason being that X1 may not significantly add to the model in the
presence of the other independent variables.
However, I recently saw a MLR analysis in which X1 was not
significantly correlated with the dependent variable, but in the full
MLR model b1 was significant. Just the reverse of the first case. I
don't know how to logically explain to my students what this means,
other than resorting to the possibility of multicollinearity; although
using the tolerance values, multicollinearity was not present.
Any help????
In the fairly trivial case, a non-significant predictor which
is totally uncorrelated with the other predictor can become
"significant" when taken "after" the other, merely because
the size of the error term is reduced. For instance, if the
new variable accounts for R^2= .50, then the error term
of the regression becomes less than half what it was before;
so the t-test on the remaining, independent variables will
be twice as large.
In the more subtle case, two variables may each, separately
be smaller predictors alone than they *each* are together.
Consider Y= a1*X1 + a2*X2 + e (small)
Now add a BIG random error, the same (adjusted) error each
time, to X1 and X2, but in opposite directions. Clearly this will
reduce the size of univariate correlations of Y with the
new variables, X1' and X2' -- without changing the multiple r.
(Added-error will need to be adjusted for the relative sizes
of a1 and a2 to keep the original relation.)
Another way of describing what happens to create the
effect that you describe is that the once-non-predicting
variable has a correlation with the Residuals of the first
step, despite having little or no or even a negative
correlation with the original variable.
[/quote]
- AND, the presence of this pattern ought to work as a
sort of "diagnostic", warning that the it may be better
to describe the problem in terms of a different set of
variables.
"Multicolllinearity" would be present if there was an
additional variable which represented (say) a sort of
difference of X1 and X2, in addition to X1 and X2; the
circumstance exists when this difference is what is predictive.
I'm not surprised that this arises in business and economics,
because there are so *many* variables to choose from,
depending on how you parse the economy. The GDP
(gross domestic product) differs from the GNP ("national")
by some term ... which could be included in place of one of
the others, without loss of prediction. But that difference
has various categories as well, and there have been
discussions about what should be included. (Are both GDP
and GNP still viable categories?)
So, all you need for a "suppressor" relationship to arise
is that you selected several variables to use, while
avoiding joint multicollinearity; while you happened to omit
the particular measurement that wants to be the best
predictor of the outcome.
Or, if that variable was never defined, perhaps it *should*
be defined.
--
Rich Ulrich |
|
|
| Back to top |
|
|
|
| prof... |
Posted: Wed Oct 21, 2009 6:21 am |
|
|
|
Guest
|
On Oct 20, 7:56 pm, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[quote]On Oct 19, 2:57 pm, prof <rbmad... at (no spam) ualr.edu> wrote:
In MLR it's not uncommon to have an independent variable X1 which is
significantly correlated with the dependent variable but yet in the
full MLR model the coefficient b1 is not significant. The logical
reason being that X1 may not significantly add to the model in the
presence of the other independent variables.
However, I recently saw a MLR analysis in which X1 was not
significantly correlated with the dependent variable, but in the full
MLR model b1 was significant. Just the reverse of the first case. I
don't know how to logically explain to my students what this means,
other than resorting to the possibility of multicollinearity; although
using the tolerance values, multicollinearity was not present.
Any help????
Thanks,
Burt
My recollection of one of Lloyd Humphreys' examples of suppression,
based on his experience in the US aircrew selection program in WWII:
Mechanical ability mattered. (The best simple predictor (single yes/no
question) of success in pilot training was "Have you ever ridden and
maintained a motorcycle?") So they used a standard test of mechanical
aptitude as an initial screen. However, it had to be a paper-and-
pencil test, because they couldn't manage a hands-on test with actual
hardware. That raised the problem that part of what the test measured
was paper-and-pencil-test-taking ability, the msst convenient measure
of which was a vocabulary test, which then got a negative weight in
the regression equation, even though it correlated positively with
success in flight school. Subtracting the vocabulary score from the
mechanical score suppressed the test-taking-ability component of the
mechanical score, making it a purer measure of mechanical ability.
[/quote]
Thanks for all the comments. They have been very helpful.
Burt |
|
|
| Back to top |
|
|
|
| David Duffy... |
Posted: Wed Oct 21, 2009 1:55 pm |
|
|
|
Guest
|
prof <rbmadden at (no spam) ualr.edu> wrote:
[quote]In MLR it's not uncommon to have an independent variable X1 which is
significantly correlated with the dependent variable but yet in the
full MLR model the coefficient b1 is not significant. The logical
reason being that X1 may not significantly add to the model in the
presence of the other independent variables.
However, I recently saw a MLR analysis in which X1 was not
significantly correlated with the dependent variable, but in the full
MLR model b1 was significant. Just the reverse of the first case. I
don't know how to logically explain to my students what this means,
other than resorting to the possibility of multicollinearity; although
using the tolerance values, multicollinearity was not present.
Any help????
Thanks,
Burt
[/quote]
It's easier to comprehend, IMHO, with categorical data, where it is called
Simpson's paradox.
Cheers, David Duffy. |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Tue Dec 08, 2009 4:42 pm
|
|