Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Education Forum  »  Looking for feedback for an online multivariate regression t
Page 1 of 3    Goto page 1, 2, 3  Next
Author Message
Owen
Posted: Wed Nov 29, 2006 6:17 am
Guest
Hi, I'm looking for feedback on how to make the following online tool
more user-friendly, intuitive, understandable, and easy to use. Not a
simple task, given that it allows you to play around with multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting whichever
pre-build model you like. Or you can build your own... I'm hoping for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx
Old Mac User
Posted: Wed Nov 29, 2006 11:18 am
Guest
Owen...

I looked at one example from your link. Here are three comments.

First, you seem to have included in the model all possible "variables"
including interactions and quadratics regardless of whether those
factors are "signficant" or not.

Second, there seems to be no guidance mechanism in this software...
no aid to variable selection other than a graphic. For instance, no
t-ratios with each model coefficient, etc.

Third, the "variables" are surely correlated among themselves.
This means that properly attributing effects of the variables is not
likely to be possible.

Some kind of variable selection method would be more appropriate here.

If you wish, send me one example of such data... on one page of
a spreadsheet... with the "variables" clearly identified... and I'll
send
back to you an example of what I mean. Send as an attachment,
by e-mail. OMU





Owen wrote:
Quote:
Hi, I'm looking for feedback on how to make the following online tool
more user-friendly, intuitive, understandable, and easy to use. Not a
simple task, given that it allows you to play around with multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting whichever
pre-build model you like. Or you can build your own... I'm hoping for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx
Reef Fish
Posted: Wed Nov 29, 2006 1:33 pm
Guest
Old Mac User wrote:
Quote:
Owen...

I looked at one example from your link. Here are three comments.

I couldn't find any example from the link in his post. On the other
link
it had "multiple regression" which is NOT "multivariate regression", so
that's like Richard Ulrich calling Y = bX a nonlinear regression model.
:-)

Quote:

First, you seem to have included in the model all possible "variables"
including interactions and quadratics regardless of whether those
factors are "signficant" or not.

This is actually CORRECT, before any analysis of a quadratic
SURFACE associated with a set of candidate variables. The
unimportant main effect or interactions will be dropped ONLY in
the light of the regression results!

For example if you want to fit a quadratic curve in X, you MUST
incluce X, X^2 and a constant. Once the data is there, it may turn
out that the X term is not necessary of the curve is symmetric
around the origin.

Can't comment on the rest -- they may or may not be appropriate if
the user is supposed to EXPLORE the surface to be fitted, rather
than confirm a particular form. In that case, graphics are much
more telling then certain "test statistics".

-- Reef Fish Bob,

Quote:
Second, there seems to be no guidance mechanism in this software...
no aid to variable selection other than a graphic. For instance, no
t-ratios with each model coefficient, etc.

Third, the "variables" are surely correlated among themselves.
This means that properly attributing effects of the variables is not
likely to be possible.

Some kind of variable selection method would be more appropriate here.

If you wish, send me one example of such data... on one page of
a spreadsheet... with the "variables" clearly identified... and I'll
send
back to you an example of what I mean. Send as an attachment,
by e-mail. OMU





Owen wrote:
Hi, I'm looking for feedback on how to make the following online tool
more user-friendly, intuitive, understandable, and easy to use. Not a
simple task, given that it allows you to play around with multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting whichever
pre-build model you like. Or you can build your own... I'm hoping for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx
Old Mac User
Posted: Wed Nov 29, 2006 4:05 pm
Guest
Reef Fish...

You wrote...

"For example if you want to fit a quadratic curve in X, you MUST
incluce X, X^2 and a constant. Once the data is there, it may turn
out that the X term is not necessary of the curve is symmetric
around the origin."

I agree. And both os us know the reason why this is so. However, it's
not at all apparent to me how the online software he's using is going
to aid in determining whether X is or is not needed in the model.

In retrospect, here's what seems to be happening with that software.
All of the factors (lionear, interaction, quadratic) are set up and
ready for inclusion in the model. But the user must click on a box
beside each factor (user selects as he/she sees fit... cut and try...
very inefficient of time and effort) while attempting to make the
"regression" go through the data. The determination of how well it fits
is done with a graphic... and eyeball sort of thing. Unless I've
missed something, there are no t-ratios, standard errors, etc. to give
guidance concerning which added factors are or are not "significant".

I had no problem with the link or in getting to his examples. OMU



Reef Fish wrote:
Quote:
Old Mac User wrote:
Owen...

I looked at one example from your link. Here are three comments.

I couldn't find any example from the link in his post. On the other
link
it had "multiple regression" which is NOT "multivariate regression", so
that's like Richard Ulrich calling Y = bX a nonlinear regression model.
:-)


First, you seem to have included in the model all possible "variables"
including interactions and quadratics regardless of whether those
factors are "signficant" or not.

This is actually CORRECT, before any analysis of a quadratic
SURFACE associated with a set of candidate variables. The
unimportant main effect or interactions will be dropped ONLY in
the light of the regression results!

For example if you want to fit a quadratic curve in X, you MUST
incluce X, X^2 and a constant. Once the data is there, it may turn
out that the X term is not necessary of the curve is symmetric
around the origin.

Can't comment on the rest -- they may or may not be appropriate if
the user is supposed to EXPLORE the surface to be fitted, rather
than confirm a particular form. In that case, graphics are much
more telling then certain "test statistics".

-- Reef Fish Bob,

Second, there seems to be no guidance mechanism in this software...
no aid to variable selection other than a graphic. For instance, no
t-ratios with each model coefficient, etc.

Third, the "variables" are surely correlated among themselves.
This means that properly attributing effects of the variables is not
likely to be possible.

Some kind of variable selection method would be more appropriate here.

If you wish, send me one example of such data... on one page of
a spreadsheet... with the "variables" clearly identified... and I'll
send
back to you an example of what I mean. Send as an attachment,
by e-mail. OMU





Owen wrote:
Hi, I'm looking for feedback on how to make the following online tool
more user-friendly, intuitive, understandable, and easy to use. Not a
simple task, given that it allows you to play around with multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting whichever
pre-build model you like. Or you can build your own... I'm hoping for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx
Old Mac User
Posted: Wed Nov 29, 2006 4:14 pm
Guest
RF...

Two links were mentioned in the OP. The first one mentioned is the one
you want. OMU


Reef Fish wrote:
Quote:
Old Mac User wrote:
Owen...

I looked at one example from your link. Here are three comments.

I couldn't find any example from the link in his post. On the other
link
it had "multiple regression" which is NOT "multivariate regression", so
that's like Richard Ulrich calling Y = bX a nonlinear regression model.
:-)


First, you seem to have included in the model all possible "variables"
including interactions and quadratics regardless of whether those
factors are "signficant" or not.

This is actually CORRECT, before any analysis of a quadratic
SURFACE associated with a set of candidate variables. The
unimportant main effect or interactions will be dropped ONLY in
the light of the regression results!

For example if you want to fit a quadratic curve in X, you MUST
incluce X, X^2 and a constant. Once the data is there, it may turn
out that the X term is not necessary of the curve is symmetric
around the origin.

Can't comment on the rest -- they may or may not be appropriate if
the user is supposed to EXPLORE the surface to be fitted, rather
than confirm a particular form. In that case, graphics are much
more telling then certain "test statistics".

-- Reef Fish Bob,

Second, there seems to be no guidance mechanism in this software...
no aid to variable selection other than a graphic. For instance, no
t-ratios with each model coefficient, etc.

Third, the "variables" are surely correlated among themselves.
This means that properly attributing effects of the variables is not
likely to be possible.

Some kind of variable selection method would be more appropriate here.

If you wish, send me one example of such data... on one page of
a spreadsheet... with the "variables" clearly identified... and I'll
send
back to you an example of what I mean. Send as an attachment,
by e-mail. OMU





Owen wrote:
Hi, I'm looking for feedback on how to make the following online tool
more user-friendly, intuitive, understandable, and easy to use. Not a
simple task, given that it allows you to play around with multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting whichever
pre-build model you like. Or you can build your own... I'm hoping for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx
Richard Ulrich
Posted: Thu Nov 30, 2006 1:19 am
Guest
On 29 Nov 2006 09:33:57 -0800, "Reef Fish"
<large_nassua_grouper@yahoo.com> wrote:
[snip]

Quote:
that's like Richard Ulrich calling Y = bX a nonlinear regression model.
Smile

Misrepresent. In irrelevant thread. Slur. More fair to
mention, "constraints". Mention, "proposed".

I'm thinking -- Does Reef Fish feel tremendously
inferior, that he has to attack me so promiscuously?
(and attack other folks, ditto?)

As to the above. Long ago,
Bob refused to consider alternate terminology for
teaching linear models. *He* thinks it's a big deal.


--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
Posted: Fri Dec 01, 2006 1:02 am
Guest
Richard Ulrich wrote:
Quote:
On 29 Nov 2006 09:33:57 -0800, "Reef Fish"
large_nassua_grouper@yahoo.com> wrote:
[snip]

that's like Richard Ulrich calling Y = bX a nonlinear regression model.
:-)

Misrepresent. In irrelevant thread. Slur. More fair to
mention, "constraints". Mention, "proposed".

Misquote! misrepresent. Ignored smiley in the ANALOGY humor.

I was commenting on the OP's misuse of "multivariate regression":

RF> it had "multiple regression" which is NOT "multivariate
regression", so
RF> that's like Richard Ulrich calling Y = bX a nonlinear regression
model.
:-)

Richard Ulrich was mentioned in the analogy because Richard Ulrich
ALSO didn't know that a "multivariate regression" is NOT a "multiple
regression", which is a "unvariate multiple regression". Richard
argued
and argued. Here is the thread:

http://groups.google.com/group/sci.stat.math/msg/f3f6d4e3cfda69c1

Actually Google found 61 threads with Ulrich and "multivariate
regression"
in which he made the same error of misidentification of what a
multivariate regression is: One with MORE THAN ONE dependent
variables for the same set of independent variables.


So, in this case, Richard Ulrich made TWO errors and I used the
analogy to point out the OP's error in the use of "multivariate
regression" as being ALMOST as bad as Richard's Y = bX
calling it a nonlinear regression!

See, it's not at all like the way Richard misquote and TWIST his
defense into an attack.

Quote:

I'm thinking -- Does Reef Fish feel tremendously
inferior, that he has to attack me so promiscuously?

I attack your statistical ERRORS -- it's unfortunate that Richard
Ulrich is the person who made those errors isn't it?

Quote:
(and attack other folks, ditto?)

Yes, I attack the ERRORS of the other folks just the same.

Quote:

As to the above. Long ago,
Bob refused to consider alternate terminology for
teaching linear models. *He* thinks it's a big deal.

So, right here in one post, Richard Ulrich through his own
NOISE of attacking me, revealed his own errors in:

1. the term "multivariate regression"
2. his errors in the standard meaning of "linear model" in statistics
3. the WORST of his errors, calling Y = bX a "nonlinear model"

and he whines that I feel "tremendously inferior" to need to
attack him?

That's just the same Richard Ulrich, the Chief Quack in rec.stat.math
and all THREE groups. No one else comes close to him in terms
of making STATISTICAL ERRORS.

And every time he make posts like this, he merely sticks his own
foot further up his mouth to show everyone what his errors were.

-- Reef Fish Bob.
Reef Fish
Posted: Fri Dec 01, 2006 5:23 pm
Guest
David A. Heiser wrote:
Quote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1164821637.043630.93170@n67g2000cwd.googlegroups.com...

Old Mac User wrote:
Owen...

I looked at one example from your link. Here are three comments.

I couldn't find any example from the link in his post. On the other
link
it had "multiple regression" which is NOT "multivariate regression", so
that's like Richard Ulrich calling Y = bX a nonlinear regression model.
:-)


First, you seem to have included in the model all possible "variables"
including interactions and quadratics regardless of whether those
factors are "signficant" or not.

This is actually CORRECT, before any analysis of a quadratic
SURFACE associated with a set of candidate variables. The
unimportant main effect or interactions will be dropped ONLY in
the light of the regression results!

For example if you want to fit a quadratic curve in X, you MUST
incluce X, X^2 and a constant. Once the data is there, it may turn
out that the X term is not necessary of the curve is symmetric
around the origin.

Can't comment on the rest -- they may or may not be appropriate if
the user is supposed to EXPLORE the surface to be fitted, rather
than confirm a particular form. In that case, graphics are much
more telling then certain "test statistics".

-- Reef Fish Bob
+++++++++++++++++++++++++++++++++++++++++
There is a conceptiual problem here.

Definitely. First the other, now yours.

Quote:
Reef Fish Bob's messages over the year
(and prior years) have stressed to fact that linear regression depends on
the predictor variables being independent.

No. The predictor variables must be "linearly independent". Else you
cannot get a solution because the inverse (X'X) matrix needed to get
the beta estimates will be singular.

That is the meaning of "linear independence".

It has NOTHING to do with correlations.


Quote:
That is two variables that are
correlated will give misleading coefficient values when the correlated
values are taken as being independent.

That is YOUR conceptual AND practical error! No question about it.


Quote:
Now here an independent variable (x)
is used to create n, (x^m) additional variables, which clearly are
correlated, with non-zero correlations.

That is correct -- and they are LINEARLY independent. You need
to review the threads in sci.stat.math where these concepts had been
discussed thoroughly. You are late coming to this class, almost two
years late. :-)

Quote:
The conceptual problem here is the dichotomy of the two very "loud"
messages.

What dichotomy? Everything is black and white and perfectly clear
EXCEPT to those (like yourself) who are ill-educated in the linear
models and regression BASIC concepts.

Quote:

If fact we do know that a linear regression on introduced x^m varaibles does
give a set of polynimial coefficients that fits the resulting polynomial to
a set of data. However these are all highly correlated additional variables.

Yes. And a polynomial regression IS a linear model in which all the
independent variables are LINEARLY INDEPENDENT.
Quote:

David Heiser

Start getting busy reading all those threads I had pointed Greg Heath
to
read. Use Google and search for keywords of "linear independence"
or "linear regression" and author "reef fish".

Right now, you are VERY confused and deficient about your understanding
of linear regression problems. My diagnosis is with 100% confidence.

-- Reef Fish Bob.
Quote:




Second, there seems to be no guidance mechanism in this software...
no aid to variable selection other than a graphic. For instance, no
t-ratios with each model coefficient, etc.

Third, the "variables" are surely correlated among themselves.
This means that properly attributing effects of the variables is not
likely to be possible.

Some kind of variable selection method would be more appropriate here.

If you wish, send me one example of such data... on one page of
a spreadsheet... with the "variables" clearly identified... and I'll
send
back to you an example of what I mean. Send as an attachment,
by e-mail. OMU





Owen wrote:
Hi, I'm looking for feedback on how to make the following online tool
more user-friendly, intuitive, understandable, and easy to use. Not a
simple task, given that it allows you to play around with multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting whichever
pre-build model you like. Or you can build your own... I'm hoping for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx
David A. Heiser
Posted: Sat Dec 02, 2006 9:50 pm
Guest
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1165008239.234584.280010@80g2000cwy.googlegroups.com...
Quote:
David A. Heiser wrote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1164821637.043630.93170@n67g2000cwd.googlegroups.com...

Old Mac User wrote:
Owen...

I looked at one example from your link. Here are three comments.

I couldn't find any example from the link in his post. On the other
link
it had "multiple regression" which is NOT "multivariate regression", so
that's like Richard Ulrich calling Y = bX a nonlinear regression model.
:-)


First, you seem to have included in the model all possible "variables"
including interactions and quadratics regardless of whether those
factors are "signficant" or not.

This is actually CORRECT, before any analysis of a quadratic
SURFACE associated with a set of candidate variables. The
unimportant main effect or interactions will be dropped ONLY in
the light of the regression results!

For example if you want to fit a quadratic curve in X, you MUST
incluce X, X^2 and a constant. Once the data is there, it may turn
out that the X term is not necessary of the curve is symmetric
around the origin.

Can't comment on the rest -- they may or may not be appropriate if
the user is supposed to EXPLORE the surface to be fitted, rather
than confirm a particular form. In that case, graphics are much
more telling then certain "test statistics".

-- Reef Fish Bob
+++++++++++++++++++++++++++++++++++++++++
There is a conceptiual problem here.

Definitely. First the other, now yours.

Reef Fish Bob's messages over the year
(and prior years) have stressed to fact that linear regression depends on
the predictor variables being independent.

No. The predictor variables must be "linearly independent". Else you
cannot get a solution because the inverse (X'X) matrix needed to get
the beta estimates will be singular.

That is the meaning of "linear independence".

It has NOTHING to do with correlations.


That is two variables that are
correlated will give misleading coefficient values when the correlated
values are taken as being independent.

That is YOUR conceptual AND practical error! No question about it.


Now here an independent variable (x)
is used to create n, (x^m) additional variables, which clearly are
correlated, with non-zero correlations.

That is correct -- and they are LINEARLY independent. You need
to review the threads in sci.stat.math where these concepts had been
discussed thoroughly. You are late coming to this class, almost two
years late. :-)

The conceptual problem here is the dichotomy of the two very "loud"
messages.

What dichotomy? Everything is black and white and perfectly clear
EXCEPT to those (like yourself) who are ill-educated in the linear
models and regression BASIC concepts.


If fact we do know that a linear regression on introduced x^m varaibles
does
give a set of polynimial coefficients that fits the resulting polynomial
to
a set of data. However these are all highly correlated additional
variables.

Yes. And a polynomial regression IS a linear model in which all the
independent variables are LINEARLY INDEPENDENT.

David Heiser

Start getting busy reading all those threads I had pointed Greg Heath
to
read. Use Google and search for keywords of "linear independence"
or "linear regression" and author "reef fish".

Right now, you are VERY confused and deficient about your understanding
of linear regression problems. My diagnosis is with 100% confidence.

-- Reef Fish Bob.
------------------------------------

Thanks Bob for steering me to the messages and your comments last December
and earlier this year I missed all this at that time

First of all I find that you are right on all counts. Even your statement to
Jerry Dallal

"The practical problem is that regression coefficients can behave poorly
even if the
(exact) predictors are not exactly linearly dependent. Hence, the problem
is more than just "linear dependence plus rounding error", which is what you
seem to be suggesting here. Wouldn't this then imply that the problem
should be framed in terms of the properties of the correlation matrix, and
that linear dependence is the red herring?"

is true. Fits my experience

The references

http://www.cs.ut.ee/~toomas_l/linalg/lin1/node7.html

and

http://www.rit.edu/~pnveme/pigf/Vectors/vector_lindep_1.html

explained the situation that all the words failed to do. Too bad it came so
late in the dialog. You should have started from this back in December. I
saw the above definitions, and it all clicked back in.

I had forgotten all about my readings in Stewart when I wrote my comment. I
had to go back to Stewart, G.W., "Matrix Algorithms, Vol. 1 Basic
Decompositions" (1995), pages 29 and 30. He does the definition of linear
dependence differently from the first reference. For the sum
a1x1+a2x2+a3x3..= 0 implies that any one "a" value can be solved for. If
there is a solution (other than all a's = 0) then the system is linearly
dependent.

According to Stewart, a "basis" represents a subspace of only independent
vectors from the initial vector set. A basis has linearly independent
columns. Consequently there may be many basis's, of which each gives a
different linear solution.

If the a's and x's are integers, then linear dependence can readily be
identified and each basis identified. If the a's and x's are floating point
numbers, then the decision as to independence and identification of each
basis becomes "fuzzy". One can't then be sure then that the basis of the
model encompasses the subspace of the data. External correlations of the
variables clearly provide no means of making this decision.

My interpretation of this is that a product variable of two variables
(considering only simple correlations), powers of any variable and log
transformations of any variable as additional variables (increasing the
subspace), then can be introduced in the model since these can not be
constructed as linear forms expanding existing the basis.

You said: When."The determinant of X'X is zero. You can throw in those
symptoms relating to r and multiple R, but I would rather not even bother,
because there are TOO MANY partial correlations and multiple
correlations to weed through before you can say something definite about
linear dependence of independence among a set of
X's'."

It appears then, that in all cases, the determinant of the expanded data set
(with the added variables) should be calculated to ensure that the basis of
the model still holds..

However the fundamental problem of coefficient accuracy is still present.

You say: "1. The OLDEST known ill-effect (as shown by the JASA paper by
Longley (1967) is the loss of numerical accuracy in single
precision arithmetic in all of the programs in that era on the
small set of real data in economics. The loss could be as
much as the accuracy in the first significant figure, or even the
sign of the coefficient."

Double precision does not correct this problem of getting correct values.
It is the inherent increase in the "standard error" of the coefficient
value, meaning greater uncertainty in the calculated value..

The problem Bob is that you end up screaming like a drill sergeant. Not
every one has been in the Army or Marines, and has learned to straighten up,
and listen and learn when he is shouting in your face. I still have the
greatest respect for those black drill sergeants that I had who survived the
Korean War.

Your image is not helped when we see these messages from you concerned about
someone impersonating "Reef Fish", about having more messages on the news
group in October than any one else, about the failure of alternate news
groups, long diatribe interchages with Alfonso, etc. Your messages should be
professional and stand alone, without all the added grafiti, accusations,
ridicule, etc. Herman Rubin and Old Mac User seem to do very well without
all those additions.

David Heiser







Quote:




Second, there seems to be no guidance mechanism in this software...
no aid to variable selection other than a graphic. For instance, no
t-ratios with each model coefficient, etc.

Third, the "variables" are surely correlated among themselves.
This means that properly attributing effects of the variables is not
likely to be possible.

Some kind of variable selection method would be more appropriate here.

If you wish, send me one example of such data... on one page of
a spreadsheet... with the "variables" clearly identified... and I'll
send
back to you an example of what I mean. Send as an attachment,
by e-mail. OMU





Owen wrote:
Hi, I'm looking for feedback on how to make the following online
tool
more user-friendly, intuitive, understandable, and easy to use. Not
a
simple task, given that it allows you to play around with
multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting
whichever
pre-build model you like. Or you can build your own... I'm hoping
for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx

Owen
Posted: Sat Dec 02, 2006 10:06 pm
Guest
I appreciate the feedback and suggestions given -- thank you. I agree,
for the stats people, there will have to be an option to show all tests
of significance, a correlation matrix, partial and semi-partial
correlation matrices, how much additional proportion of variance in
sales can be explained by introducing each variable. It is difficult
to cater to both of the two very different populations that tend to
visit the web site -- added complexity and numbers scares many people
off who are actually looking to explore environmental factors related
to sales, but lack of numbers tends to send the statisticians and
theorists runniing!

The variables do not need to be independent, however... I am
purposefully avoiding any kind of factor-analysis or structural
equation modeling approaches where latent variables are "extracted"
from the numbers based on correlations. This is more of a
"shake-the-box 10-million times and perform meta-analysis on what seems
to be important." Most of the models will contain garbage, but over
time -- with THIS type of data in particular, I must say (sales,
customer behavior, weather and environmental factors) -- there is
convergence... from the garbage emerges likely candidates.

In other words, after "shaking the box of garbage" a few million times
-- the garbage being blatantly ignoring assumptions of independence,
ignoring any kind of strategy of variable insertion by correlation
coefficient, etc --- the frequency of variables selected (when selected
randomly, and tuned by order of random subsets of the selected
variables), begins to look very much like a normal curve with some
data. Sometimes order emerges, and when it emerges millions of times
in a row, then you have something... when it does not, then you get
garbage.

We all know there cannot be a wholly automatic tool that will generate
perfect models... that's why statisticians have jobs :)

Anyway, enough rambling. I appreciate your feedback and will talk to
the other members about ways to display more "under-the-hood"
happenings, tests of significance, correlation matrices, etc. It may
have to go on braintechscientific.com, though, since so far I have been
veto'd from adding any more numbers to the tool. I also hear - loud
and clear - the comment that some visitors see the model and think
"what the heck do I do now?" It will be a challenge to address both
ends of the spectrum, but your feedback sincerely helps. With sincere
thanks,

Owen Emlen
Member, BrainTech, LLC
http://www.braintechllc.com

PS: for those of you interested in how the "garbage is shaken" millions
of times to obtain useful information, I would suggest the following
page: http://www.braintechllc.com/exploratory.aspx. It is still aimed
at a pseudo-business audience, but the information down a page where it
displays frequency of variable selection should provide some
clarification about what is going on.

PPS: For those who could not view anything, the issue with Firefox has
been fixed...
Reef Fish
Posted: Sun Dec 03, 2006 12:32 am
Guest
David A. Heiser wrote:
Quote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1165008239.234584.280010@80g2000cwy.googlegroups.com...
David A. Heiser wrote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1164821637.043630.93170@n67g2000cwd.googlegroups.com...

Old Mac User wrote:
Owen...

I looked at one example from your link. Here are three comments.

I couldn't find any example from the link in his post. On the other
link
it had "multiple regression" which is NOT "multivariate regression", so
that's like Richard Ulrich calling Y = bX a nonlinear regression model.
:-)


First, you seem to have included in the model all possible "variables"
including interactions and quadratics regardless of whether those
factors are "signficant" or not.

This is actually CORRECT, before any analysis of a quadratic
SURFACE associated with a set of candidate variables. The
unimportant main effect or interactions will be dropped ONLY in
the light of the regression results!

For example if you want to fit a quadratic curve in X, you MUST
incluce X, X^2 and a constant. Once the data is there, it may turn
out that the X term is not necessary of the curve is symmetric
around the origin.

Can't comment on the rest -- they may or may not be appropriate if
the user is supposed to EXPLORE the surface to be fitted, rather
than confirm a particular form. In that case, graphics are much
more telling then certain "test statistics".

-- Reef Fish Bob
+++++++++++++++++++++++++++++++++++++++++
There is a conceptiual problem here.

Definitely. First the other, now yours.

Reef Fish Bob's messages over the year
(and prior years) have stressed to fact that linear regression depends on
the predictor variables being independent.

No. The predictor variables must be "linearly independent". Else you
cannot get a solution because the inverse (X'X) matrix needed to get
the beta estimates will be singular.

That is the meaning of "linear independence".

It has NOTHING to do with correlations.


That is two variables that are
correlated will give misleading coefficient values when the correlated
values are taken as being independent.

That is YOUR conceptual AND practical error! No question about it.


Now here an independent variable (x)
is used to create n, (x^m) additional variables, which clearly are
correlated, with non-zero correlations.

That is correct -- and they are LINEARLY independent. You need
to review the threads in sci.stat.math where these concepts had been
discussed thoroughly. You are late coming to this class, almost two
years late. :-)

The conceptual problem here is the dichotomy of the two very "loud"
messages.

What dichotomy? Everything is black and white and perfectly clear
EXCEPT to those (like yourself) who are ill-educated in the linear
models and regression BASIC concepts.


If fact we do know that a linear regression on introduced x^m varaibles
does
give a set of polynimial coefficients that fits the resulting polynomial
to
a set of data. However these are all highly correlated additional
variables.

Yes. And a polynomial regression IS a linear model in which all the
independent variables are LINEARLY INDEPENDENT.

David Heiser

Start getting busy reading all those threads I had pointed Greg Heath
to
read. Use Google and search for keywords of "linear independence"
or "linear regression" and author "reef fish".

Right now, you are VERY confused and deficient about your understanding
of linear regression problems. My diagnosis is with 100% confidence.

-- Reef Fish Bob.
------------------------------------
Thanks Bob for steering me to the messages and your comments last December
and earlier this year I missed all this at that time

First of all I find that you are right on all counts. Even your statement to
Jerry Dallal

"The practical problem is that regression coefficients can behave poorly
even if the
(exact) predictors are not exactly linearly dependent. Hence, the problem
is more than just "linear dependence plus rounding error", which is what you
seem to be suggesting here. Wouldn't this then imply that the problem
should be framed in terms of the properties of the correlation matrix, and
that linear dependence is the red herring?"

The short answer is NO. The short explanation is that it is the
definition
in LINEAR ALGEBRA, in which the term "correlation" doesn't even exist.

Quote:

is true. Fits my experience

The references

http://www.cs.ut.ee/~toomas_l/linalg/lin1/node7.html

and

http://www.rit.edu/~pnveme/pigf/Vectors/vector_lindep_1.html

explained the situation that all the words failed to do.

Those are the definitions in linear algebra.


Quote:
Too bad it came so
late in the dialog. You should have started from this back in December. I
saw the above definitions, and it all clicked back in.

I've been in these groups since only early 2005. The early errors
(mostly
by Richard Ulrich who had been making them since 1995 without check
by anyone in this group) were mostly regression related, and it just
hadn't come to the notion of LINEAR models and linear indepdence vs
statistical independence until the December/January. But this is 12
months later, I don't mean what you mean by "so late in the dialog" --
you were at least 10 or 20 years late understanding the notion, and
you still DON'T understand it, even with the algebra definitions you
found!
Quote:

I had forgotten all about my readings in Stewart when I wrote my comment. I
had to go back to Stewart, G.W., "Matrix Algorithms, Vol. 1 Basic
Decompositions" (1995), pages 29 and 30. He does the definition of linear
dependence differently from the first reference. For the sum
a1x1+a2x2+a3x3..= 0 implies that any one "a" value can be solved for. If
there is a solution (other than all a's = 0) then the system is linearly
dependent.

According to Stewart, a "basis" represents a subspace of only independent
vectors from the initial vector set. A basis has linearly independent
columns. Consequently there may be many basis's, of which each gives a
different linear solution.

If the a's and x's are integers, then linear dependence can readily be
identified and each basis identified. If the a's and x's are floating point
numbers, then the decision as to independence and identification of each
basis becomes "fuzzy". One can't then be sure then that the basis of the
model encompasses the subspace of the data. External correlations of the
variables clearly provide no means of making this decision.

My interpretation of this is that a product variable of two variables
(considering only simple correlations), powers of any variable and log
transformations of any variable as additional variables (increasing the
subspace), then can be introduced in the model since these can not be
constructed as linear forms expanding existing the basis.

You said the linear algebra definition said it much better than words,
and then you spent the next four paragraphs which are unnecessary,
misleading and even wrong in your last paragrph's "interpretation".

Quote:

You said: When."The determinant of X'X is zero. You can throw in those
symptoms relating to r and multiple R, but I would rather not even bother,
because there are TOO MANY partial correlations and multiple
correlations to weed through before you can say something definite about
linear dependence of independence among a set of
X's'."

It appears then, that in all cases, the determinant of the expanded data set
(with the added variables) should be calculated to ensure that the basis of
the model still holds..

NO. Wrong again. The paragraphs you cited simply says that you
SHOULDN'T rely on the use of ANY correlation to check for linear
independence. The singularity of X'X is one sufficient condition
which
is easy for the readers of this group to understand. Later, I talked
about the eigenvalues of X'X which is the true LINEAR ALGEBRA
way of detecting linear dependence. It has no correlations nor any
of your expanded variables involved.
Quote:

However the fundamental problem of coefficient accuracy is still present.

You say: "1. The OLDEST known ill-effect (as shown by the JASA paper by
Longley (1967) is the loss of numerical accuracy in single
precision arithmetic in all of the programs in that era on the
small set of real data in economics. The loss could be as
much as the accuracy in the first significant figure, or even the
sign of the coefficient."

Double precision does not correct this problem of getting correct values.

Now go back to the archives and read the topics on "numerical accuracy"
vs "statistical accuracy".

The exact solution of the LS solution to the Longley data is known.
Double precision gets quite close to the exact numerical solution.
But the inherent instability of the system remains -- in the algebraic
sense.

Quote:
It is the inherent increase in the "standard error" of the coefficient
value, meaning greater uncertainty in the calculated value..

Wrong AGAIN. The standard error pertain to the uncertainty of the
statistically ESTIMATED values for inference purposes. A perfectly
exact numerical solution will show the same standard error for
statistical inference.

Quote:

The problem Bob is that you end up screaming like a drill sergeant.

But you haven't read the number of time I REPEATED the same
definitions,
explanations to the NOISE makers BEFORE I started screaming.

I haven't even screamed at YOU yet, even though you STILL missed
most of the homework you should have done in reading what had been
discussed in THIS group, by ME, to the others -- and I have to repeat
much of the same to you.

Quote:
Not
every one has been in the Army or Marines, and has learned to straighten up,
and listen and learn when he is shouting in your face. I still have the
greatest respect for those black drill sergeants that I had who survived the
Korean War.

I take you analogy of me to a Marine drill sargeant as a supreme
INSULT,
even if you did not intend for it to be. In the statistical topics I
discussed
in this group, I might have started screaming at my STUDENTS before
they missed the concepts for the 10th time, and they don't even insult
me
by what they write. The Quacks in this group -- and you seem to be
joining them, even belately, is simply too OBTUSE to learn what they
had missed, and too eager to blame ME for my screaming and oblivious
to the fact that those to whom I scream should have gotten the idea
in less than 1/10 of the time I spent painstakingly explaining to them
while turned off their ears and spend time flaming me. Richard
Ulrich is STILL doing it, even in the post where I posted about the
FORGER and that and why two of my posting accounts were blocked
since May 2006 because of PROVEN conspiracy by the people in
rec.travel.cruises.

And now, you are joining the NOISY group, after having been pointed
to the reading you should have done, didn't read them sufficiently to
understand, and come back to blame my screaming.

Whatever you find my image, I am here to correct ERROR and
MALPRACTICE by posters in this group. I am not running for public
office of popularity contest. Your final paragraphs are completely
graduitous.

Do a little soul searching about YOURSELF -- you haven't learned
the lessons about LINEAR INDEPENDENCE in regression problems,
YET.

But you are already busy making NOISE after one remedial lesson.

-- Reef Fish Bob.
Quote:

Your image is not helped when we see these messages from you concerned about
someone impersonating "Reef Fish", about having more messages on the news
group in October than any one else, about the failure of alternate news
groups, long diatribe interchages with Alfonso, etc. Your messages should be
professional and stand alone, without all the added grafiti, accusations,
ridicule, etc. Herman Rubin and Old Mac User seem to do very well without
all those additions.

David Heiser











Second, there seems to be no guidance mechanism in this software...
no aid to variable selection other than a graphic. For instance, no
t-ratios with each model coefficient, etc.

Third, the "variables" are surely correlated among themselves.
This means that properly attributing effects of the variables is not
likely to be possible.

Some kind of variable selection method would be more appropriate here.

If you wish, send me one example of such data... on one page of
a spreadsheet... with the "variables" clearly identified... and I'll
send
back to you an example of what I mean. Send as an attachment,
by e-mail. OMU





Owen wrote:
Hi, I'm looking for feedback on how to make the following online
tool
more user-friendly, intuitive, understandable, and easy to use. Not
a
simple task, given that it allows you to play around with
multivariate
regression equations... you can get to the tool by going to
http://www.braintechllc.com/mymodels.aspx and then selecting
whichever
pre-build model you like. Or you can build your own... I'm hoping
for
some honest, genuine, constructive feedback about the usability and
navigatability of the site, and especially about the online tools.

Owen, Member BrainTech, LLC
http://www.braintechllc.com/owen.aspx

Greg Heath
Posted: Sun Dec 03, 2006 8:11 am
Guest
Reef Fish wrote:
Quote:
David A. Heiser wrote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1164821637.043630.93170@n67g2000cwd.googlegroups.com...

Old Mac User wrote:
Owen...

-----SNIP
There is a conceptiual problem here.

Definitely. First the other, now yours.

Reef Fish Bob's messages over the year
(and prior years) have stressed to fact that linear regression depends on
the predictor variables being independent.

No. The predictor variables must be "linearly independent". Else you
cannot get a solution because the inverse (X'X) matrix needed to get
the beta estimates will be singular.

No.

If the predictor variables are not linearly independent, you
can get a solution to X*b = y. In fact, you can get an infinite
number of solutions. The problem is that the solutions are
not unique because solutions of X*b = 0 exist. Therefore,
the confidence intervals for the regression coefficients are
infinite.

Quote:
That is the meaning of "linear independence".

No.

The meaning of linear independence is that solutions to
X*b = 0 do not exist (i.e., X has no zero singular values).

Quote:
It has NOTHING to do with correlations.

It has EVERYTHING to do with correlations:

If all variables are standardized,

Cxx = X'*X/(n-1) and Cxy = X'*y/(n-1).

where Cxx and Cxy are correlation coefficient matrices.
Therefore,

Cxx*b = Cxy

and the predictor variables are linearly independent
if and only if solutions to Cxx*b=0 do not exist.

If two standardized variables are linearly dependent,

x2 = a1 * x1

Using the definitions of correlation cefficient and
variance,

C12 = C21 = E{ x2*x1} = a1* E{ x1^2} = a1
1 = E{x2^2} = a1*E{x1*x2} = a1*C12

Therefore,

a1 = C12 = C21 = (+/-)1,
x2 = (+/-)x1
and
det(Cxx) = det( [ 1 (+/-)1 ; (+/-)1 1 ] ) = 0

If three standardized variables are linearly
dependent, but no two are (i.e., abs(Cij) < 1,
i,j = 1,2,3)

x3 = a1*x1+a2*x2

1) C13 = a1 + a2*C12
2) C23 = a1*C12+a2
3) 1 = a1*C13 +a2*C23

Therefore (from Eqs. 1-2)

a1 = (C31-C32*C21)/(1-C12^2),
a2 = (C32-C31*C12)/(1-C12^2),

provided ( Eq. 3)

1 + 2*C12*C13*C23 - [ C12^2+C13^2+C23^2 ] = 0

which is just det( Cxx ) = 0.

When the correlation coefficients are all equal,

0 = det( Cxx ) = 1 - 3*C^2 + 2*C^3 = (1-C)^2 (1+2*C)

Therefore

C12 = C13 = C23 = -1/2

and

x1 + x2 + x3 = 0.

It looks like if m (m >= 3) standardized variables are linearly
dependent, but no proper subset is (i.e., abs(Cij) < 1/(m-2),
i,j = 1,2,...m)

xm = a1*x1+a2*x2+... +am-1*xm-1.

If all of the correlation coefficients are equal,

C = -1/(m-1)

etc

Notice that for m >=4, abs(C) <= 1/3 so that linear dependence
is not necessarily associated with large absolute values
of correlation coefficient.

-----SNIP

Quote:
... Everything is black and white and perfectly clear
EXCEPT to those (like yourself) who are ill-educated in the linear
models and regression BASIC concepts.

I would add to the class of ill-educated anyone who claims
linear regression requires the predictor variables to be
linear independent.

When the variables are linear dependent, the linear regression
solution is

b = pinv(X)*y

which minimizes the sum-squared-error ||y-X*b)||^2.

However, I do agree that in these cases variable reduction
to obtain linear independence is a better aproach, especially
if a priori knowledge and/or common sense is used to guide
the reduction.

Quote:
Start getting busy reading all those threads I had pointed Greg Heath
to read.

Reef, you should really be more specific. There is a lot of
redundancy and irrelevance peppered in those posts.

Quote:
Use Google and search for keywords of "linear independence"
or "linear regression" and author "reef fish".

Right now, you are VERY confused and deficient about your understanding
of linear regression problems.

Apparently, you aren't the only one.

Hope this helps.

Greg
Reef Fish
Posted: Sun Dec 03, 2006 12:36 pm
Guest
Greg Heath wrote:
Quote:
Reef Fish wrote:
David A. Heiser wrote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1164821637.043630.93170@n67g2000cwd.googlegroups.com...

Old Mac User wrote:
Owen...

-----SNIP
There is a conceptiual problem here.

Definitely. First the other, now yours.

Reef Fish Bob's messages over the year
(and prior years) have stressed to fact that linear regression depends on
the predictor variables being independent.

No. The predictor variables must be "linearly independent". Else you
cannot get a solution because the inverse (X'X) matrix needed to get
the beta estimates will be singular.

No.

If you want to dabble with the mathematical solution of a system of
linear
equations, you should go to some other ng.

For someone who knows NOTHING about the statistical problem of
OLS solution to the regression problem (linear models), your sophomoric
dabble in linear systems is just ... OT.
Quote:


Notice that for m >=4, abs(C) <= 1/3 so that linear dependence
is not necessarily associated with large absolute values
of correlation coefficient.

That was a lesson I taught you long ago.
Quote:

-----SNIP

... Everything is black and white and perfectly clear
EXCEPT to those (like yourself) who are ill-educated in the linear
models and regression BASIC concepts.

I would add to the class of ill-educated anyone who claims
linear regression requires the predictor variables to be
linear independent.

Glad you repeated the proof of your ignorance about regression
problems.
Quote:

Start getting busy reading all those threads I had pointed Greg Heath
to read.

That was what I said to David Heiser. I said it to you long ago and
you failed every time.
Quote:

Greg

Your FREE lesson days are over.

-- Reef Fish Bob.
Greg Heath
Posted: Tue Dec 05, 2006 5:22 am
Guest
Reef Fish wrote:
Quote:
Greg Heath wrote:
Reef Fish wrote:
David A. Heiser wrote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1164821637.043630.93170@n67g2000cwd.googlegroups.com...

Old Mac User wrote:
Owen...

-----SNIP
There is a conceptiual problem here.

Definitely. First the other, now yours.

Reef Fish Bob's messages over the year
(and prior years) have stressed to fact that linear regression depends on
the predictor variables being independent.

No. The predictor variables must be "linearly independent". Else you
cannot get a solution because the inverse (X'X) matrix needed to get
the beta estimates will be singular.

No.

If you want to dabble with the mathematical solution of a system of
linear equations, you should go to some other ng.

This sci.stat.math, not sci.stat.reef.

Quote:
For someone who knows NOTHING about the statistical problem of
OLS solution to the regression problem (linear models), your sophomoric
dabble in linear systems is just ... OT.

On the contrary, I made it clear that your insistence on linearly
independent variables is sophmoric.

Quote:
Notice that for m >=4, abs(C) <= 1/3 so that linear dependence
is not necessarily associated with large absolute values
of correlation coefficient.

That was a lesson I taught you long ago.

I first became aware of it from a post by Jerry Dallal. I was
somewhat surprised because many of the multicollinearity
examples I had worked with were associated with
correlations which 1) exceeded 0.5 2) were larger than
other correlation coefficients in the matrix.

When I replied to a poster that sometimes multicollinearity
can be understood by looking at the correlation coefficient
matrix,you misquoted me as saying multicollinearity could
be always be detected, a priori, by the presence of large
correlation coefficients. Upon correction, you refused to
confess that the detection misstatement was a figment
of your imagination.

You mentioned Jerry's example which was then analyzed
by me. It was then that I realized that the size of correlation
coefficients associated with linear dependence tend to
decrease as the number of dependent variables increases.
It may have been obvious to Jerry and/or you, however, that
was never explicitly pointed out. I had to deduce that myself.

You seem to be disturbed that I took the time to explicitly
point that out to someone who was not aware of it instead
of referring him to tens of posts by you containing a plethora
of redundant and irrelevant statements.

Please take the time to pick out a few of the better posts
for reference instead of confusing people that are already
somewhat confused.

Hope this helps.

Greg
Reef Fish
Posted: Tue Dec 05, 2006 8:57 am
Guest
Greg Heath wrote:
Quote:
Reef Fish wrote:
Greg Heath wrote:
Reef Fish wrote:
David A. Heiser wrote:
"Reef Fish" <large_nassua_grouper@yahoo.com> wrote in message
news:1164821637.043630.93170@n67g2000cwd.googlegroups.com...

Old Mac User wrote:
Owen...

-----SNIP
There is a conceptiual problem here.

Definitely. First the other, now yours.

Reef Fish Bob's messages over the year
(and prior years) have stressed to fact that linear regression depends on
the predictor variables being independent.

No. The predictor variables must be "linearly independent". Else you
cannot get a solution because the inverse (X'X) matrix needed to get
the beta estimates will be singular.

No.

If you want to dabble with the mathematical solution of a system of
linear equations, you should go to some other ng.

This sci.stat.math, not sci.stat.reef.

The first correct statement you have made in quite a while, Greg.
Quote:

For someone who knows NOTHING about the statistical problem of
OLS solution to the regression problem (linear models), your sophomoric
dabble in linear systems is just ... OT.

On the contrary, I made it clear that your insistence on linearly
independent variables is sophmoric.

Correct. It is sophomoric, junior, senior, graduate, and
post-graduate.
It is UNIVERSAL for all statistical students, professors, and
professionals.

That's why Greg Heath is ignorant and NOISY.
Quote:

Notice that for m >=4, abs(C) <= 1/3 so that linear dependence
is not necessarily associated with large absolute values
of correlation coefficient.

That was a lesson I taught you long ago.

I first became aware of it from a post by Jerry Dallal. I was
somewhat surprised because many of the multicollinearity
examples I had worked with were associated with
correlations which 1) exceeded 0.5 2) were larger than
other correlation coefficients in the matrix.

Okay. Give the credit to Jerry. Fine. But that's a well-known
result I have cited in the rec.stat groups numerous times that
correlations can be ALL LOW and still be linearly dependent.

You obviously are not aware of the statistical fact about the
MAXIMUM value of all the correlations in a correlation matrix
of size n where all of the correlations are SAME value can
be as small as 0,000001 if n is sufficient large. And when
you have that many independent variables, you can easily
make a linearly dependent set without raising the max by
much. So what's so surprising about 0.5?

Sorry, I had temporarily forgotten that your FREE lesson
days are over.

Quote:
Hope this helps.

Greg

Nothing you post helps, not even yourself. You are just back
to your old form of trying to catch up with Richard Ulrich to
try to dethrone him as the NOISIEST Quack of this group.

-- Reef Fish Bob.
 
Page 1 of 3    Goto page 1, 2, 3  Next   All times are GMT - 5 Hours
The time now is Wed Dec 03, 2008 9:00 pm