Main Page | Report this Page
 
   
Science Forum Index  »  Space - Consult Forum  »  More (me vs.) multiple regression...
Page 1 of 1    
Author Message
reflex...
Posted: Fri May 09, 2008 3:58 pm
Guest
First off thanks Richard Ulrich for answering my other recently posted
question.

This multiple regression business is overwhelming me at the minute and I
wonder if any of the writers of the textbooks I've come across so far
understand it fully enough to offer a concise, simple, no-nonsense
explanation. Either they don't understand it themselves or they are not
good at explaining things. If I understood it I'm sure I could offer an
easier-to-understand explanation than the textbooks I've seen so far. They
don't illustrate things with examples throughout or offer step by step
guides. Does anyone know any really to understand books or articles that
don't get off on using technical terms? I'm not a statistician but a social
scientist so I'm not interested in the formulas per se.

As as example, why aren't there any books (that I've found anway) that say
to carry out a regression analysis step by step? Surely this would make so
much sense:
'step one - decide whether you are trying to predict something or use your
model to demonstrate causation. to do this.... etc, what happens if you
don't do this... common errors in doing this...'
'step two - explore the variables to make sure each is normally distributed.
to do this, what happens if you don't, common errors, the extent that it
matters depending on sample size...' etc
[note I don't know whether these should be the first steps or not as I'm
thoroughly confused despite having read books and websites for the last few
days]

Another thing the textbooks don't do is offer a simple explanation of the
terms involved, a simple glossary that develops as the text does. Granted
books may have glossaries at the end but they tend to be separate from the
text and overly technical. The situation is made worse because many terms
refer to very similar things, or things you don't need to worry about.

There are so many variables in regression analysis which also makes things
very confusing, to a non numbers oriented person. And there are many
different ways to do things. For example, to check for colinearity you can
either check the tolerance or VIF values. But which is better? And why
don't authors just say 'just check the VIF values and ignore the tolerance'.
I mean I've read that tolerance is an outcome of VIF (or something similar)
so why does SPSS even produce both measures in its output if only one is
needed? Another point about VIF while I'm here. I've read it's too high
if over 2 in some places and in others I've read it's too high if over 4.
In fact I've read contradictory advice in the same book. From Miles and
Shevlin: 'when the VIF is equal to four the standard error is doubled
(sqrt4=2) and so four is often uesd as an arbitrary cut-off to determine
when collinearity has become too serious.' Then very soon after a table is
presented 'collinearity diagnostics for three independent variables' whereby
one variable has a VIF of 2.108 and the text says 'the VIF is greater than
two, alerting us to the possibility of collinearity'.

I have other issue with this whole collinearity thing. One of the
assumptions of regression analysis is that the indepedent variables are not
supposed to be related too much. I've read if they're highly related
they're likely to be measuring the same underlying thing and if we use both
in the model we are using their common variance twice. But doesn't it just
mean they're both strong predictors of the dependent variable and therefore
should be included in the model? Say if im trying to explain number of
posessions (DV) and I have variables income and wealth (IVs). Now income
and wealth are likely to be correlated, but to me they are both likely to be
important predictors of possessions. So why can't I include them both? Why
can't i just put them both in the model and regression analysis will do the
rest? Isn't this the whole point of regression, to discern the relationship
between variables? None of the textbooks I've come across so far explain
these things and answer the question 'why?'. Why is there an arbitrary cut
off point for how related the indepedent variables are? Relatedly, if you
either have to drop one of two colinear variables, which one if you think
both are (theoretically) important? And if you want to combine them, how
should you if they can't conveniantly be labelled under a common variable
name ? If you have colinearity between percentage of ethnic minority people
in a population along with percentage of immigrants (IVs) and your dependent
variable is english proficiency of population how do you combine ethnic %
with immigrant % into something sensible?

I could go on. I've not even mentioned things like the entering order of
variables (all that stepwise, forward, backward thing, which incidentally
made no difference to my model when I experimented). Apparently this can
have a major effect on the R^2 and surprise surprise, again there is no
concensus. In fact I've read many authors don't mention such things in
their published analysis so we can't trust their results. Is there is no
consensus and poor analysis that is being made my paid researchers, what
hopes do us students have? Why isn't there a standard, a guidebook to
follow that is standardised and simple?

I feel like I've been beaten (up) by regression. I can't get my head round
a multitude of technical terms. The penny hasn't dropped despite studying
extremely hard and concentrating whilst doing it. Could it be that I'm
reading the wrong texts, that there is no concensus on this thing or that
I'm just not cut out for regresssion (which I refuse to believe). I mean,
it even seems there is no consensus on what the thing is called. Is is
multiple regression, linear regression, OLS regression? Are these things
different?

Does anyone have any books or better yet journal articles that offer a
great, simple, no nonsense explanation of regression. If not why hasn't
anyone published one?

Many thanks
Scott Seidman...
Posted: Fri May 09, 2008 8:52 pm
Guest
"reflex" <sdfs at (no spam) sdfsd.com> wrote in news:HV2Vj.307$Nk5.183 at (no spam) newsfe15.ams2:

Quote:
'step one - decide whether you are trying to predict something or use
your model to demonstrate causation. to do this.... etc, what happens
if you don't do this... common errors in doing this...'
'step two - explore the variables to make sure each is normally
distributed. to do this, what happens if you don't, common errors, the
extent that it matters depending on sample size...' etc
[note I don't know whether these should be the first steps or not as
I'm thoroughly confused despite having read books and websites for the
last few days]


Your variables don't need to be normally distributed. You might mean that
you need the residuals to be normally distributed, but 1) you can't know
this until after you've calculated your regression coefficients, so it
certainly can't be step 2, and 2) you don't absolutely need your residuals
to be normally distributed, depending on what you're trying to do. Your
regression will be optimized with respect to sum-squared-error regardless.



--
Scott
Reverse name to reply
Richard Ulrich...
Posted: Fri May 09, 2008 10:17 pm
Guest
On Fri, 9 May 2008 21:58:00 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:

Quote:
First off thanks Richard Ulrich for answering my other recently posted
question.

This multiple regression business is overwhelming me at the minute and I
wonder if any of the writers of the textbooks I've come across so far
understand it fully enough to offer a concise, simple, no-nonsense
explanation. Either they don't understand it themselves or they are not
good at explaining things. If I understood it I'm sure I could offer an
easier-to-understand explanation than the textbooks I've seen so far. They
don't illustrate things with examples throughout or offer step by step
guides. Does anyone know any really to understand books or articles that
don't get off on using technical terms? I'm not a statistician but a social
scientist so I'm not interested in the formulas per se.

You will not develop much great understanding if
you do not appreciate some of the mathematical bases,
at least intuitively. Unfortunately, it is true that some
teachers of statistics courses do not have the depth
needed to be an inspirational mentor. You could be
hampered by having suffered under someone like that.
Or else, from someone who happens to have a teaching
style that you have not adapted to.

Self-study?
Perhaps you should focus on, or at least start with, some
books on case-studies in use of statistics. "Statistics in
Medicine" is one title.

Quote:

As as example, why aren't there any books (that I've found anway) that say
to carry out a regression analysis step by step? Surely this would make so
much sense:
'step one - decide whether you are trying to predict something or use your
model to demonstrate causation. to do this.... etc, what happens if you
don't do this... common errors in doing this...'

Okay, one problem you may be facing is that most books
that "teach regression" are going to be *statistics* books,
so they teach statistics.

That is -- they do not teach experimental design, they do not
teach scientific method, and they do not teach *your* particular
subject matter (except incidentally). There is enough complication
and *vocabulary* to teach, that I don't mind that they try to get
that across.


Quote:
'step two - explore the variables to make sure each is normally distributed.
to do this, what happens if you don't, common errors, the extent that it
matters depending on sample size...' etc
[note I don't know whether these should be the first steps or not as I'm
thoroughly confused despite having read books and websites for the last few
days]

Another thing the textbooks don't do is offer a simple explanation of the
terms involved, a simple glossary that develops as the text does. Granted
books may have glossaries at the end but they tend to be separate from the
text and overly technical. The situation is made worse because many terms
refer to very similar things, or things you don't need to worry about.

As it happens, regression is used by many diverse
professionals, for many different things; it is a flexible
tool. But that means that item X and Y may be important
to some users, whereas they hardly matter, but Z and W
are paramount for some others.

Google scholar might help you find some leads for
textbooks that are popular and widely used. The book
on regression by Cohen & Cohen is less technical than
some, and covers some of the questions you think should
be raised.

I've found it useful, at times, to visit my university library
and pull a few books off the shelf that seem to be in the
area I'm concerned with, and browse them to find which
one helps me the most; then I study that one some more.
Quote:

There are so many variables in regression analysis which also makes things
very confusing, to a non numbers oriented person. And there are many
different ways to do things. For example, to check for colinearity you can
either check the tolerance or VIF values. But which is better? And why
don't authors just say 'just check the VIF values and ignore the tolerance'.
I mean I've read that tolerance is an outcome of VIF (or something similar)
so why does SPSS even produce both measures in its output if only one is
needed? Another point about VIF while I'm here. I've read it's too high
if over 2 in some places and in others I've read it's too high if over 4.
In fact I've read contradictory advice in the same book. From Miles and
Shevlin: 'when the VIF is equal to four the standard error is doubled
(sqrt4=2) and so four is often uesd as an arbitrary cut-off to determine
when collinearity has become too serious.' Then very soon after a table is
presented 'collinearity diagnostics for three independent variables' whereby
one variable has a VIF of 2.108 and the text says 'the VIF is greater than
two, alerting us to the possibility of collinearity'.

I have other issue with this whole collinearity thing. One of the
assumptions of regression analysis is that the indepedent variables are not
supposed to be related too much. I've read if they're highly related
they're likely to be measuring the same underlying thing and if we use both
in the model we are using their common variance twice. But doesn't it just
mean they're both strong predictors of the dependent variable and therefore
should be included in the model? Say if im trying to explain number of
posessions (DV) and I have variables income and wealth (IVs). Now income
and wealth are likely to be correlated, but to me they are both likely to be
important predictors of possessions. So why can't I include them both? Why
can't i just put them both in the model and regression analysis will do the
rest? Isn't this the whole point of regression, to discern the relationship
between variables? None of the textbooks I've come across so far explain
these things and answer the question 'why?'. Why is there an arbitrary cut
off point for how related the indepedent variables are? Relatedly, if you
either have to drop one of two colinear variables, which one if you think
both are (theoretically) important? And if you want to combine them, how
should you if they can't conveniantly be labelled under a common variable
name ? If you have colinearity between percentage of ethnic minority people
in a population along with percentage of immigrants (IVs) and your dependent
variable is english proficiency of population how do you combine ethnic %
with immigrant % into something sensible?

You can Google groups, < groups:sci.stat.* author:ulrich "suppressor">
to find some interesting things about collinearity.


Quote:

I could go on. I've not even mentioned things like the entering order of
variables (all that stepwise, forward, backward thing, which incidentally
made no difference to my model when I experimented). Apparently this can
have a major effect on the R^2 and surprise surprise, again there is no
concensus.

Actually, there *is* a pretty solid consensus in social
science stat-books published in the last 25 years, to the
effect that "stepwise" is not something that social sciences
finds very useful. Or even, acceptable. (Stepwise is creeping
back in, under the heading of "data-mining" -- commercial
data samples are more apt to have thousands of cases.
You might Google-groups for me on that subject, too.)

Quote:
In fact I've read many authors don't mention such things in
their published analysis so we can't trust their results. Is there is no
consensus and poor analysis that is being made my paid researchers, what
hopes do us students have? Why isn't there a standard, a guidebook to
follow that is standardised and simple?

(a) There's some good advice out there, if you can find it.
(b) It's too new for the good advice to have overwhelmed
the bad advice. And it's too new for all the answers to have
been refined.

Multiple regression was barely more than a theory or a
small-sample toy, up to 40 years ago. Computers have
made it practical.

Quote:

I feel like I've been beaten (up) by regression. I can't get my head round
a multitude of technical terms. The penny hasn't dropped despite studying
extremely hard and concentrating whilst doing it. Could it be that I'm
reading the wrong texts, that there is no concensus on this thing or that
I'm just not cut out for regresssion (which I refuse to believe). I mean,
it even seems there is no consensus on what the thing is called. Is is
multiple regression, linear regression, OLS regression? Are these things
different?

I say "multiple regression" when I am thinking of the
thing in general; "multiple linear regression" when I am
emphasizing that linearity is implicit -- like, when someone
has data that is going to fail that way; "OLS regression" when
there might be some competing reason for doing ML logistic
regression, or some other variation.

Quote:

Does anyone have any books or better yet journal articles that offer a
great, simple, no nonsense explanation of regression. If not why hasn't
anyone published one?


Look for titles in Google-Scholar, searching for "regression".
You might be looking for some of the older things if
you want it to focus on the basics.

--
Rich Ulrich

http://www.pitt.edu/~wpilib/index.html
Bruce Weaver...
Posted: Sat May 10, 2008 6:58 am
Guest
Richard Ulrich wrote:
Quote:
On Fri, 9 May 2008 21:58:00 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:

--- snip ---

Does anyone have any books or better yet journal articles that offer a
great, simple, no nonsense explanation of regression. If not why hasn't
anyone published one?


Look for titles in Google-Scholar, searching for "regression".
You might be looking for some of the older things if
you want it to focus on the basics.


You could also try online notes such as the following:

http://davidmlane.com/hyperstat/prediction.html
http://www.psychstat.missouristate.edu/introbook/sbk16.htm
http://www2.chass.ncsu.edu/garson/pa765/regress.htm

The first two are introductory, the 3rd more advanced.

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
reflex...
Posted: Mon May 12, 2008 3:32 am
Guest
"Bruce Weaver" <bweaver at (no spam) lakeheadu.ca> wrote in message
news:z7idnW-OQ-lBELjVnZ2dnUVZ_hqdnZ2d at (no spam) tbaytel.net...
Quote:
Richard Ulrich wrote:
On Fri, 9 May 2008 21:58:00 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:

--- snip ---

Does anyone have any books or better yet journal articles that offer a
great, simple, no nonsense explanation of regression. If not why hasn't
anyone published one?


Look for titles in Google-Scholar, searching for "regression".
You might be looking for some of the older things if
you want it to focus on the basics.


You could also try online notes such as the following:

http://davidmlane.com/hyperstat/prediction.html
http://www.psychstat.missouristate.edu/introbook/sbk16.htm
http://www2.chass.ncsu.edu/garson/pa765/regress.htm

The first two are introductory, the 3rd more advanced.

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

Thanks again to the repliers for the comments. I will look at the sources
suggested next time I'm at the library and have a look at the websites.
Actually I had a curosry look at the davidmlane one which seems much more
logical than anything else I've looked at. I'm glad to have it clear now
that the variables themselves do no need to be normally distributed, only
the residuals (which means no skew, kurtosis, and heteroscedasticity,
right?).

Rich, I think I may have had a bit of an uninspiring teacher to be honest.
Still I take responsibility for my lack of understanding. I think to
understand some more *why* questions would make everything simpler e.g. why
the regression equation is what it is.
reflex...
Posted: Wed May 14, 2008 9:50 am
Guest
"reflex" <sdfs at (no spam) sdfsd.com> wrote in message
news:WgTVj.3145$uI.314 at (no spam) newsfe17.ams2...
Quote:

"Bruce Weaver" <bweaver at (no spam) lakeheadu.ca> wrote in message
news:z7idnW-OQ-lBELjVnZ2dnUVZ_hqdnZ2d at (no spam) tbaytel.net...
Richard Ulrich wrote:
On Fri, 9 May 2008 21:58:00 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:

--- snip ---

Does anyone have any books or better yet journal articles that offer a
great, simple, no nonsense explanation of regression. If not why
hasn't anyone published one?


Look for titles in Google-Scholar, searching for "regression".
You might be looking for some of the older things if
you want it to focus on the basics.


You could also try online notes such as the following:

http://davidmlane.com/hyperstat/prediction.html
http://www.psychstat.missouristate.edu/introbook/sbk16.htm
http://www2.chass.ncsu.edu/garson/pa765/regress.htm

The first two are introductory, the 3rd more advanced.

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

Thanks again to the repliers for the comments. I will look at the sources
suggested next time I'm at the library and have a look at the websites.
Actually I had a curosry look at the davidmlane one which seems much more
logical than anything else I've looked at. I'm glad to have it clear now
that the variables themselves do no need to be normally distributed, only
the residuals (which means no skew, kurtosis, and heteroscedasticity,
right?).

Rich, I think I may have had a bit of an uninspiring teacher to be honest.
Still I take responsibility for my lack of understanding. I think to
understand some more *why* questions would make everything simpler e.g.
why the regression equation is what it is.





Although this website is short it is the clearest, most logical explanation
I've seen of multiple regression yet
http://www1.uni-hamburg.de/RRZ/Software/Statistica/Handbuch/stmulreg.html#index
Richard Ulrich...
Posted: Wed May 14, 2008 4:19 pm
Guest
On Wed, 14 May 2008 15:50:45 +0100, "reflex" <sdfs at (no spam) sdfsd.com> wrote:
[snip. Here is a note that saves the recommendations.]
Quote:

You could also try online notes such as the following:

http://davidmlane.com/hyperstat/prediction.html
http://www.psychstat.missouristate.edu/introbook/sbk16.htm
http://www2.chass.ncsu.edu/garson/pa765/regress.htm

The first two are introductory, the 3rd more advanced.

--
Bruce Weaver
[snip]

Although this website is short it is the clearest, most logical explanation
I've seen of multiple regression yet
http://www1.uni-hamburg.de/RRZ/Software/Statistica/Handbuch/stmulreg.html#index

That's rather good. I like the animated outlier.

--
Rich Ulrich

http://www.pitt.edu/~wpilib/index.html
Bruce Weaver...
Posted: Wed May 14, 2008 6:25 pm
Guest
reflex wrote:
--- snip ---
Quote:

Although this website is short it is the clearest, most logical
explanation I've seen of multiple regression yet
http://www1.uni-hamburg.de/RRZ/Software/Statistica/Handbuch/stmulreg.html#index

The material on that site is from the StatSoft Electronic
Textbook. It's hard to say whether they have permission to mirror
the material. Here's the original site:

http://www.statsoft.com/textbook/stathome.html

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."
Jean...
Posted: Thu May 15, 2008 3:08 am
Guest
On May 14, 7:25 pm, Bruce Weaver <bwea... at (no spam) lakeheadu.ca> wrote:
Quote:
reflex wrote:

--- snip ---



Although this website is short it is the clearest, most logical
explanation I've seen of multiple regression yet
http://www1.uni-hamburg.de/RRZ/Software/Statistica/Handbuch/stmulreg....

The material on that site is from the StatSoft Electronic
Textbook. It's hard to say whether they have permission to mirror
the material. Here's the original site:

http://www.statsoft.com/textbook/stathome.html

--
Bruce Weaver
bwea... at (no spam) lakeheadu.cawww.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

I'm still learning how to use regression and build models. But I
found this book really helpful: Regression Analysis by Example, by
Chatterjee and Hadi. They have a step by step list that you're
looking for, as well as a website that shows how they got to their
examples, complete with syntax in spss, sas, or minitab.

Hope that helps!
Jean
reflex...
Posted: Fri May 16, 2008 3:55 am
Guest
"Jean" <cjkuo584 at (no spam) gmail.com> wrote in message
news:7f630244-3be8-415b-86e7-816d95da0c91 at (no spam) l64g2000hse.googlegroups.com...
Quote:
On May 14, 7:25 pm, Bruce Weaver <bwea... at (no spam) lakeheadu.ca> wrote:
reflex wrote:

--- snip ---



Although this website is short it is the clearest, most logical
explanation I've seen of multiple regression yet
http://www1.uni-hamburg.de/RRZ/Software/Statistica/Handbuch/stmulreg....

The material on that site is from the StatSoft Electronic
Textbook. It's hard to say whether they have permission to mirror
the material. Here's the original site:

http://www.statsoft.com/textbook/stathome.html

--
Bruce Weaver
bwea... at (no spam) lakeheadu.cawww.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

I'm still learning how to use regression and build models. But I
found this book really helpful: Regression Analysis by Example, by
Chatterjee and Hadi. They have a step by step list that you're
looking for, as well as a website that shows how they got to their
examples, complete with syntax in spss, sas, or minitab.

Hope that helps!
Jean

Thanks Jean I'll have a look when I'm in the library.

I've just found a truly excellent website that is far superior to anything
else. Check out http://www.stat.psu.edu/~jglenn/stat501/. It's from some
kind of online course.
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sun Jul 27, 2008 1:24 am