Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Math Forum  »  Help, I have problems with coefficient of correlation...
Page 1 of 1    
Author Message
Beorne...
Posted: Wed Jul 16, 2008 3:11 am
Guest
I have a non linear model to fit my data.

I have to evaluate the goodness of fit so I calculated the coefficient
of determination R and the standard error.
I though that coefficient of determination R and coefficient of lienar
correlation r were the same thing but this is not the case. In
particular I have some cases where R2 is negative.

I use tthe following formulae:

y = true data
f = fitted data

[ML notation]

rmse = sqrt( ( sum( ( f - y ).^2 ) ) /n );

r = ( n * sum( f .* y ) - sum( f ) * sum( y ) ) / sqrt( ( n *
sum( y .^ 2 )-sum( y ) ^ 2 ) * ( n * sum( f .^ 2 ) - sum( f ) ^
2 ) );

ss_tot = sum( ( y - mean( y ) ) .^2 );
ss_res = sum( ( f - y ) .^2 )
R2 = ( ss_tot - ss_res) / ss_tot
R = sqrt( R2 )

Could you explain me in simple words the reason of the differencies
between r and R and the reason of negative R2?

And, in case of negative R2, what I should use, apart the standard
error rmse, to evaluate the goodness of fit?

Thanks
Ray Koopman...
Posted: Wed Jul 16, 2008 4:56 am
Guest
On Jul 16, 6:11 am, Beorne <matteo... at (no spam) gmail.com> wrote:
Quote:
I have a non linear model to fit my data.

I have to evaluate the goodness of fit so I calculated the coefficient
of determination R and the standard error.
I though that coefficient of determination R and coefficient of lienar
correlation r were the same thing but this is not the case. In
particular I have some cases where R2 is negative.

I use tthe following formulae:

y = true data
f = fitted data

[ML notation]

rmse = sqrt( ( sum( ( f - y ).^2 ) ) /n );

r = ( n * sum( f .* y ) - sum( f ) * sum( y ) ) / sqrt( ( n *
sum( y .^ 2 )-sum( y ) ^ 2 ) * ( n * sum( f .^ 2 ) - sum( f ) ^
2 ) );

ss_tot = sum( ( y - mean( y ) ) .^2 );
ss_res = sum( ( f - y ) .^2 )
R2 = ( ss_tot - ss_res) / ss_tot
R = sqrt( R2 )

Could you explain me in simple words the reason of the differencies
between r and R and the reason of negative R2?

And, in case of negative R2, what I should use, apart the standard
error rmse, to evaluate the goodness of fit?

Thanks

That can happen if your model lacks additive and multiplicative
constants that are free to be estimated, or if it has such constants
but they were estimated by a procedure that does not minimize the sum
of squares of the residuals.
Beorne...
Posted: Wed Jul 16, 2008 5:41 am
Guest
On 16 Lug, 16:56, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
Quote:
On Jul 16, 6:11 am, Beorne <matteo... at (no spam) gmail.com> wrote:



I have a non linear model to fit my data.

I have to evaluate the goodness of fit so I calculated the coefficient
of determination R and the standard error.
I though that coefficient of determination R and coefficient of lienar
correlation r were the same thing but this is not the case. In
particular I have some cases where R2 is negative.

I use tthe following formulae:

y = true data
f = fitted data

[ML notation]

rmse = sqrt( ( sum( ( f - y ).^2 ) ) /n );

r = ( n * sum( f .* y ) - sum( f ) * sum( y ) ) / sqrt( ( n *
sum( y .^ 2 )-sum( y ) ^ 2 ) * ( n * sum( f .^ 2 ) - sum( f ) ^
2 ) );

ss_tot = sum( ( y - mean( y ) ) .^2 );
ss_res = sum( ( f - y ) .^2 )
R2 = ( ss_tot - ss_res) / ss_tot
R = sqrt( R2 )

Could you explain me in simple words the reason of the differencies
between r and R and the reason of negative R2?

And, in case of negative R2, what I should use, apart the standard
error rmse, to evaluate the goodness of fit?

Thanks

That can happen if your model lacks additive and multiplicative
constants that are free to be estimated, or if it has such constants
but they were estimated by a procedure that does not minimize the sum
of squares of the residuals.

My model is a power curve of the type
y = a * x ^ b + c (a,b,c parameters)
estimated minimizing sum of suqares.
Ray Koopman...
Posted: Wed Jul 16, 2008 6:37 am
Guest
On Jul 16, 8:41 am, Beorne <matteo... at (no spam) gmail.com> wrote:
Quote:
On 16 Lug, 16:56, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On Jul 16, 6:11 am, Beorne <matteo... at (no spam) gmail.com> wrote:

I have a non linear model to fit my data.

I have to evaluate the goodness of fit so I calculated the coefficient
of determination R and the standard error.
I though that coefficient of determination R and coefficient of lienar
correlation r were the same thing but this is not the case. In
particular I have some cases where R2 is negative.

I use tthe following formulae:

y = true data
f = fitted data

[ML notation]

rmse = sqrt( ( sum( ( f - y ).^2 ) ) /n );

r = ( n * sum( f .* y ) - sum( f ) * sum( y ) ) / sqrt( ( n *
sum( y .^ 2 )-sum( y ) ^ 2 ) * ( n * sum( f .^ 2 ) - sum( f ) ^
2 ) );

ss_tot = sum( ( y - mean( y ) ) .^2 );
ss_res = sum( ( f - y ) .^2 )
R2 = ( ss_tot - ss_res) / ss_tot
R = sqrt( R2 )

Could you explain me in simple words the reason of the differencies
between r and R and the reason of negative R2?

And, in case of negative R2, what I should use, apart the standard
error rmse, to evaluate the goodness of fit?

Thanks

That can happen if your model lacks additive and multiplicative
constants that are free to be estimated, or if it has such constants
but they were estimated by a procedure that does not minimize the
sum of squares of the residuals.

My model is a power curve of the type
y = a * x ^ b + c (a,b,c parameters)
estimated minimizing sum of suqares.

Then the parameters must have been poorly estimated. This is really
a one-parameter problem, because there are closed-form expressions
for the least-squares a and c for any given b. Let w = x^b. Then

a = (n*sum(w*y)-(sum w)(sum y)) / (n*sum(w^2)-(sum w)^2),

c = ((sum y) - a*(sum w)) / n.

Mimize n*sum((y-f)^2) = (n*sum(y^2)-(sum y)^2) -
(n*sum(w*y)-(sum w)(sum y))^2 / (n*sum(w^2)-(sum w)^2)

as a function of b, then get the corresponding a and c.
Beorne...
Posted: Wed Jul 16, 2008 9:47 pm
Guest
On 16 Lug, 17:37, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
Quote:
On Jul 16, 8:41 am, Beorne <matteo... at (no spam) gmail.com> wrote:





On 16 Lug, 16:56, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On Jul 16, 6:11 am, Beorne <matteo... at (no spam) gmail.com> wrote:

I have a non linear model to fit my data.

I have to evaluate the goodness of fit so I calculated the coefficient
of determination R and the standard error.
I though that coefficient of determination R and coefficient of lienar
correlation r were the same thing but this is not the case. In
particular I have some cases where R2 is negative.

I use tthe following formulae:

y = true data
f = fitted data

[ML notation]

rmse = sqrt( ( sum( ( f - y ).^2 ) ) /n );

r = ( n * sum( f .* y ) - sum( f ) * sum( y ) ) / sqrt(  ( n *
sum( y .^ 2 )-sum( y ) ^ 2 ) * ( n * sum( f .^ 2 ) - sum( f ) ^
2 )   );

ss_tot = sum( ( y - mean( y ) ) .^2 );
ss_res = sum( ( f - y ) .^2 )
R2 = ( ss_tot - ss_res) / ss_tot
R = sqrt( R2 )

Could you explain me in simple words the reason of the differencies
between r and R and the reason of negative R2?

And, in case of negative R2, what I should use, apart the standard
error rmse, to evaluate the goodness of fit?

Thanks

That can happen if your model lacks additive and multiplicative
constants that are free to be estimated, or if it has such constants
but they were estimated by a procedure that does not minimize the
sum of squares of the residuals.

My model is a power curve of the type
y = a * x ^ b + c  (a,b,c parameters)
estimated minimizing sum of suqares.

Then the parameters must have been poorly estimated. This is really
a one-parameter problem, because there are closed-form expressions
for the least-squares a and c for any given b.  Let w = x^b.  Then

a = (n*sum(w*y)-(sum w)(sum y)) / (n*sum(w^2)-(sum w)^2),

c = ((sum y) - a*(sum w)) / n.

Mimize  n*sum((y-f)^2) = (n*sum(y^2)-(sum y)^2) -
        (n*sum(w*y)-(sum w)(sum y))^2 / (n*sum(w^2)-(sum w)^2)

as a function of b, then get the corresponding a and c.- Nascondi testo citato


Good to know. I found the model using the matlab curve fitting
toolbox.
I'm studying how much the model does generalizes on different sets,
the model has not been taken on the same points I'm trying to match.
Clearly the points giving me negative R2 fit very bad.

But, in general, in which cases r2 and R2 are different? And when R2
can be negative?
Is more correct r2 or R2 to evaluate the goodness of match to a model?

Thanks very much.
Ray Koopman...
Posted: Thu Jul 17, 2008 6:24 am
Guest
On Jul 17, 12:47 am, Beorne <matteo... at (no spam) gmail.com> wrote:
Quote:
On 16 Lug, 17:37, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On Jul 16, 8:41 am, Beorne <matteo... at (no spam) gmail.com> wrote:
On 16 Lug, 16:56, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On Jul 16, 6:11 am, Beorne <matteo... at (no spam) gmail.com> wrote:

I have a non linear model to fit my data.

I have to evaluate the goodness of fit so I calculated the coefficient
of determination R and the standard error.
I though that coefficient of determination R and coefficient of lienar
correlation r were the same thing but this is not the case. In
particular I have some cases where R2 is negative.

I use tthe following formulae:

y = true data
f = fitted data

[ML notation]

rmse = sqrt( ( sum( ( f - y ).^2 ) ) /n );

r = ( n * sum( f .* y ) - sum( f ) * sum( y ) ) / sqrt( ( n *
sum( y .^ 2 )-sum( y ) ^ 2 ) * ( n * sum( f .^ 2 ) - sum( f ) ^
2 ) );

ss_tot = sum( ( y - mean( y ) ) .^2 );
ss_res = sum( ( f - y ) .^2 )
R2 = ( ss_tot - ss_res) / ss_tot
R = sqrt( R2 )

Could you explain me in simple words the reason of the differencies
between r and R and the reason of negative R2?

And, in case of negative R2, what I should use, apart the standard
error rmse, to evaluate the goodness of fit?

Thanks

That can happen if your model lacks additive and multiplicative
constants that are free to be estimated, or if it has such constants
but they were estimated by a procedure that does not minimize the
sum of squares of the residuals.

My model is a power curve of the type
y = a * x ^ b + c (a,b,c parameters)
estimated minimizing sum of suqares.

Then the parameters must have been poorly estimated. This is really
a one-parameter problem, because there are closed-form expressions
for the least-squares a and c for any given b. Let w = x^b. Then

a = (n*sum(w*y)-(sum w)(sum y)) / (n*sum(w^2)-(sum w)^2),

c = ((sum y) - a*(sum w)) / n.

Mimize n*sum((y-f)^2) = (n*sum(y^2)-(sum y)^2) -
(n*sum(w*y)-(sum w)(sum y))^2 / (n*sum(w^2)-(sum w)^2)

as a function of b, then get the corresponding a and c.

Good to know. I found the model using the matlab curve fitting toolbox.
I'm studying how much the model does generalizes on different sets,
the model has not been taken on the same points I'm trying to match.
Clearly the points giving me negative R2 fit very bad.

But, in general, in which cases r2 and R2 are different?
And when R2 can be negative?
Is more correct r2 or R2 to evaluate the goodness of match to a model?

If r and R2 are evaluated on the same points that were used to
estimate the values of the parameters in the model then R2 >= 0
and r = sqrt(R2). But on a new set of points, both r and R2 can
be negative, and r^2 >= R2.

There is no single best summary statistic for evaluating the
goodness of the fit of the previously-estimated parameters to
new data, because they attend to different aspects of the misfit,
to which different people attach different importance. However,
from the usual bivariate summary statistics (n, means, variances
or standard deviations, and covariance or correlation), you can
construct whatever you need. If I had to say what I think people
ought to be concerned with, it would probably be the rms error,
sqrt(sum((y-f)^2)/n).
Beorne...
Posted: Fri Jul 18, 2008 2:13 am
Guest
Thank you very much!

One last thing: can you tell me a good statistic book? And some book
focussing on modelling and pattern recognition?
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sat Nov 22, 2008 6:19 pm