Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Math Forum  »  Minimizing least squares - trivial
Page 1 of 1    
Author Message
Guest
Posted: Mon Apr 14, 2008 6:36 am
Hi all,

I have a very trivial question for many of you, but for some reason my
mind wasn't getting around it to sit down and crack the question, so I
thought I'd post it on here.

I'm trying to fit simulated data to some experimental ones...and in
theory I know that the function should be: y = f(x^3).

so when I'm trying to get the best fit by minimizing the least squares
functions, should I look at minimizing the following function:

chi_square = sum of ((y_exp - y_sim)^2)

or should I be considering minimizing (the one that made more sense to
me):

chi_square = sum of ((y_exp^(1/3)-(y_sim^(1/3))^2

Thanks for your help in advance.

_F_
Ray Koopman
Posted: Mon Apr 14, 2008 8:48 am
Guest
On Apr 14, 9:36 am, mixit...@gmail.com wrote:
Quote:
Hi all,

I have a very trivial question for many of you, but for some reason my
mind wasn't getting around it to sit down and crack the question, so I
thought I'd post it on here.

I'm trying to fit simulated data to some experimental ones...and in
theory I know that the function should be: y = f(x^3).

so when I'm trying to get the best fit by minimizing the least squares
functions, should I look at minimizing the following function:

chi_square = sum of ((y_exp - y_sim)^2)

or should I be considering minimizing (the one that made more sense to
me):

chi_square = sum of ((y_exp^(1/3)-(y_sim^(1/3))^2

Thanks for your help in advance.

_F_

You should probably be doing something like the first rather than the
second. In either case, what you are calling "chi_square" probably
does not have a chi-square distribution. In general, how you should
fit depends on what you think the errors distributions are. Are they
the same for all values of x? Are you working with discrete values
(e.g., counts) or continuous measures? And what about the simulated
values? If these are based on some "number of simulations" then they
too will have errors. Or are you using the term in a broader sense, to
mean simply "theoretical"?
Guest
Posted: Mon Apr 14, 2008 9:08 am
On Apr 14, 3:48 pm, Ray Koopman <koop...@sfu.ca> wrote:
Quote:
On Apr 14, 9:36 am, mixit...@gmail.com wrote:



Hi all,

I have a very trivial question for many of you, but for some reason my
mind wasn't getting around it to sit down and crack the question, so I
thought I'd post it on here.

I'm trying to fit simulated data to some experimental ones...and in
theory I know that the function should be: y = f(x^3).

so when I'm trying to get the best fit by minimizing the least squares
functions, should I look at minimizing the following function:

chi_square = sum of ((y_exp - y_sim)^2)

or should I be considering minimizing (the one that made more sense to
me):

chi_square = sum of ((y_exp^(1/3)-(y_sim^(1/3))^2

Thanks for your help in advance.

_F_

You should probably be doing something like the first rather than the
second. In either case, what you are calling "chi_square" probably
does not have a chi-square distribution. In general, how you should
fit depends on what you think the errors distributions are. Are they
the same for all values of x? Are you working with discrete values
(e.g., counts) or continuous measures? And what about the simulated
values? If these are based on some "number of simulations" then they
too will have errors. Or are you using the term in a broader sense, to
mean simply "theoretical"?


Well in my case I'm using a theoretical model and trying to find the
best fit parameter.
my function as I stated earlier has the form of y = A.x^3 (where A is
what I'm trying to find. A is actually an equation by itself but it
has one unknown parameter that I want to locate.)

The only reason I thought the second would be better, is because at
large x values, the error becomes more significant than at low x
values if I simply use the first expression. However, I would like to
keep the weight to be equal for all data points.

I am basically looking at the variation of Energy with Velocity, so
for every given velocity there will be a given experimentally measured
energy and another simulated one.
So at high velocities a 10% error would then weigh much more than a
10% error at small velocities and could then influence the fit.

Or should I use the sum of relative errors to be my criterion?

function to minimize = sum of [(y_exp-y_sim)/y_exp]^2
Old Mac User
Posted: Mon Apr 14, 2008 11:06 am
Guest
On Apr 14, 12:36 pm, mixit...@gmail.com wrote:
Quote:
Hi all,

I have a very trivial question for many of you, but for some reason my
mind wasn't getting around it to sit down and crack the question, so I
thought I'd post it on here.

I'm trying to fit simulated data to some experimental ones...and in
theory I know that the function should be: y = f(x^3).

so when I'm trying to get the best fit by minimizing the least squares
functions, should I look at minimizing the following function:

chi_square = sum of ((y_exp - y_sim)^2)

or should I be considering minimizing (the one that made more sense to
me):

chi_square = sum of ((y_exp^(1/3)-(y_sim^(1/3))^2

Thanks for your help in advance.

_F_

Consider using Y = a + b*Ln(X) as an opening move. Study the residuals
for lack of fit.
You may need an extension to this model. But it's a place to begin.
OMU
Ray Koopman
Posted: Mon Apr 14, 2008 12:07 pm
Guest
On Apr 14, 12:08 pm, mixit...@gmail.com wrote:
Quote:
On Apr 14, 3:48 pm, Ray Koopman <koop...@sfu.ca> wrote:



On Apr 14, 9:36 am, mixit...@gmail.com wrote:

Hi all,

I have a very trivial question for many of you, but for some reason my
mind wasn't getting around it to sit down and crack the question, so I
thought I'd post it on here.

I'm trying to fit simulated data to some experimental ones...and in
theory I know that the function should be: y = f(x^3).

so when I'm trying to get the best fit by minimizing the least squares
functions, should I look at minimizing the following function:

chi_square = sum of ((y_exp - y_sim)^2)

or should I be considering minimizing (the one that made more sense to
me):

chi_square = sum of ((y_exp^(1/3)-(y_sim^(1/3))^2

Thanks for your help in advance.

_F_

You should probably be doing something like the first rather than the
second. In either case, what you are calling "chi_square" probably
does not have a chi-square distribution. In general, how you should
fit depends on what you think the errors distributions are. Are they
the same for all values of x? Are you working with discrete values
(e.g., counts) or continuous measures? And what about the simulated
values? If these are based on some "number of simulations" then they
too will have errors. Or are you using the term in a broader sense, to
mean simply "theoretical"?

Well in my case I'm using a theoretical model and trying to find the
best fit parameter.
my function as I stated earlier has the form of y = A.x^3 (where A is
what I'm trying to find. A is actually an equation by itself but it
has one unknown parameter that I want to locate.)

The only reason I thought the second would be better, is because at
large x values, the error becomes more significant than at low x
values if I simply use the first expression. However, I would like to
keep the weight to be equal for all data points.

I am basically looking at the variation of Energy with Velocity, so
for every given velocity there will be a given experimentally measured
energy and another simulated one.
So at high velocities a 10% error would then weigh much more than a
10% error at small velocities and could then influence the fit.

Or should I use the sum of relative errors to be my criterion?

function to minimize = sum of [(y_exp-y_sim)/y_exp]^2

Unless you say something specific about how the error distributions
of the measured and simulated y relate to the corresponding true
values, you won't be able to give more than a hand-waving defense
of whatever fitting procedure you use. Working at that level, I'd
probably minimize sum{ [ log(y_exp) - log(y_sim) ]^2 }.
Guest
Posted: Mon Apr 14, 2008 1:13 pm
On Apr 14, 7:07 pm, Ray Koopman <koop...@sfu.ca> wrote:
Quote:
On Apr 14, 12:08 pm, mixit...@gmail.com wrote:



On Apr 14, 3:48 pm, Ray Koopman <koop...@sfu.ca> wrote:

On Apr 14, 9:36 am, mixit...@gmail.com wrote:

Hi all,

I have a very trivial question for many of you, but for some reason my
mind wasn't getting around it to sit down and crack the question, so I
thought I'd post it on here.

I'm trying to fit simulated data to some experimental ones...and in
theory I know that the function should be: y = f(x^3).

so when I'm trying to get the best fit by minimizing the least squares
functions, should I look at minimizing the following function:

chi_square = sum of ((y_exp - y_sim)^2)

or should I be considering minimizing (the one that made more sense to
me):

chi_square = sum of ((y_exp^(1/3)-(y_sim^(1/3))^2

Thanks for your help in advance.

_F_

You should probably be doing something like the first rather than the
second. In either case, what you are calling "chi_square" probably
does not have a chi-square distribution. In general, how you should
fit depends on what you think the errors distributions are. Are they
the same for all values of x? Are you working with discrete values
(e.g., counts) or continuous measures? And what about the simulated
values? If these are based on some "number of simulations" then they
too will have errors. Or are you using the term in a broader sense, to
mean simply "theoretical"?

Well in my case I'm using a theoretical model and trying to find the
best fit parameter.
my function as I stated earlier has the form of y = A.x^3 (where A is
what I'm trying to find. A is actually an equation by itself but it
has one unknown parameter that I want to locate.)

The only reason I thought the second would be better, is because at
large x values, the error becomes more significant than at low x
values if I simply use the first expression. However, I would like to
keep the weight to be equal for all data points.

I am basically looking at the variation of Energy with Velocity, so
for every given velocity there will be a given experimentally measured
energy and another simulated one.
So at high velocities a 10% error would then weigh much more than a
10% error at small velocities and could then influence the fit.

Or should I use the sum of relative errors to be my criterion?

function to minimize = sum of [(y_exp-y_sim)/y_exp]^2

Unless you say something specific about how the error distributions
of the measured and simulated y relate to the corresponding true
values, you won't be able to give more than a hand-waving defense
of whatever fitting procedure you use. Working at that level, I'd
probably minimize sum{ [ log(y_exp) - log(y_sim) ]^2 }.

minimizing sum{ [ log(y_exp) - log(y_sim) ]^2 } actually makes the
most sense to me now and this is what I will be working on (since that
transformation will linearize the whole expression).

However, how "defendable" is to minimize the sum of residual errors
though (or the sum of squared residual errors)? Does something like
that have been used in the literature before?
Guest
Posted: Mon Apr 14, 2008 2:55 pm
On Apr 14, 8:13 pm, mixit...@gmail.com wrote:
Quote:
On Apr 14, 7:07 pm, Ray Koopman <koop...@sfu.ca> wrote:



On Apr 14, 12:08 pm, mixit...@gmail.com wrote:

On Apr 14, 3:48 pm, Ray Koopman <koop...@sfu.ca> wrote:

On Apr 14, 9:36 am, mixit...@gmail.com wrote:

Hi all,

I have a very trivial question for many of you, but for some reason my
mind wasn't getting around it to sit down and crack the question, so I
thought I'd post it on here.

I'm trying to fit simulated data to some experimental ones...and in
theory I know that the function should be: y = f(x^3).

so when I'm trying to get the best fit by minimizing the least squares
functions, should I look at minimizing the following function:

chi_square = sum of ((y_exp - y_sim)^2)

or should I be considering minimizing (the one that made more sense to
me):

chi_square = sum of ((y_exp^(1/3)-(y_sim^(1/3))^2

Thanks for your help in advance.

_F_

You should probably be doing something like the first rather than the
second. In either case, what you are calling "chi_square" probably
does not have a chi-square distribution. In general, how you should
fit depends on what you think the errors distributions are. Are they
the same for all values of x? Are you working with discrete values
(e.g., counts) or continuous measures? And what about the simulated
values? If these are based on some "number of simulations" then they
too will have errors. Or are you using the term in a broader sense, to
mean simply "theoretical"?

Well in my case I'm using a theoretical model and trying to find the
best fit parameter.
my function as I stated earlier has the form of y = A.x^3 (where A is
what I'm trying to find. A is actually an equation by itself but it
has one unknown parameter that I want to locate.)

The only reason I thought the second would be better, is because at
large x values, the error becomes more significant than at low x
values if I simply use the first expression. However, I would like to
keep the weight to be equal for all data points.

I am basically looking at the variation of Energy with Velocity, so
for every given velocity there will be a given experimentally measured
energy and another simulated one.
So at high velocities a 10% error would then weigh much more than a
10% error at small velocities and could then influence the fit.

Or should I use the sum of relative errors to be my criterion?

function to minimize = sum of [(y_exp-y_sim)/y_exp]^2

Unless you say something specific about how the error distributions
of the measured and simulated y relate to the corresponding true
values, you won't be able to give more than a hand-waving defense
of whatever fitting procedure you use. Working at that level, I'd
probably minimize sum{ [ log(y_exp) - log(y_sim) ]^2 }.

minimizing sum{ [ log(y_exp) - log(y_sim) ]^2 } actually makes the
most sense to me now and this is what I will be working on (since that
transformation will linearize the whole expression).

However, how "defendable" is to minimize the sum of residual errors
though (or the sum of squared residual errors)? Does something like
that have been used in the literature before?

Sorry I made a mistake, I wanted to ask about the sum of relative
errors or the sum of squared relative errors and not the sum of
residual errors!
Ray Koopman
Posted: Mon Apr 14, 2008 10:21 pm
Guest
On Apr 14, 5:55 pm, mixit...@gmail.com wrote:
Quote:
Sorry I made a mistake, I wanted to ask about the sum of relative
errors or the sum of squared relative errors and not the sum of
residual errors!

Relative error as you expressed it, (a-b)/a, is asymmetric. This
can cause problems if there are large errors, because it weights
underestimates and overestimates differently, which is usually not
appropriate.

Symmetric versions of relative error such as (a-b)/(a+b) or a/b-b/a
are better, but compared to logarithmic error they are usually not
worth the extra complexity if all the errors are small, and pay too
much attention to large errors if there are any.

Minimizing absolute error instead of squared error, however "error"
may be defined, is a recognized way of reducing the influence of
extreme errors, but it is not always the best way to handle the
problem -- that depends on the model and the apprehended error
distribution.
Ray Koopman
Posted: Mon Apr 14, 2008 10:34 pm
Guest
On Apr 15, 1:21 am, Ray Koopman <koop...@sfu.ca> wrote:
Quote:
... because it weights underestimates and overestimates differently,
which is usually not appropriate.

My turn to mis-state: change "differently" to "equally".
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Fri Jul 25, 2008 5:25 pm