 |
|
| Science Forum Index » Statistics - Math Forum » Linear regression on a subset of the complete data set... |
|
Page 1 of 1 |
|
| Author |
Message |
| Torsten Hennig... |
Posted: Sun Oct 25, 2009 10:24 pm |
|
|
|
Guest
|
Hello,
I have a set of two dimensional data (t_i, y_i) which -
according to the physical background - follow a linear
relation of the form
y_i = a*log(t_i) + b,
but only on a restricted subset (t_i0,y_i0),
(t_(i0+1),y_(i0+1)),...,(t_(i0+k),y_(i0+k)) of the
original data.
Is there any theoretically-based statistical method
to identify i0 and i0+k ?
Of course one could make the restriction only by
looking at the empirical data, but this appears
quite arbitrary to me.
Maybe one could include one data point after the other
and inspect where the r_squared obviously starts to decrease ?
My final aim is to determine the slope 'a'
of the above regression curve on the restricted data
set.
Many thanks in advance for any reference.
Best wishes
Torsten. |
|
|
| Back to top |
|
|
|
| root... |
Posted: Mon Oct 26, 2009 1:24 am |
|
|
|
Guest
|
Torsten Hennig <Torsten.Hennig at (no spam) umsicht.fhg.de> wrote:
[quote]Hello,
I have a set of two dimensional data (t_i, y_i) which -
according to the physical background - follow a linear
relation of the form
y_i = a*log(t_i) + b,
but only on a restricted subset (t_i0,y_i0),
(t_(i0+1),y_(i0+1)),...,(t_(i0+k),y_(i0+k)) of the
original data.
Is there any theoretically-based statistical method
to identify i0 and i0+k ?
Of course one could make the restriction only by
looking at the empirical data, but this appears
quite arbitrary to me.
Maybe one could include one data point after the other
and inspect where the r_squared obviously starts to decrease ?
My final aim is to determine the slope 'a'
of the above regression curve on the restricted data
set.
Many thanks in advance for any reference.
Best wishes
Torsten.
[/quote]
What are the data telling you about the region <i0
and beyond i0+k? Can you extend your model to include
those regions? Do you have a valid reason for rejecting
those data or is it just "convenient" to use the log model? |
|
|
| Back to top |
|
|
|
| Torsten Hennig... |
Posted: Mon Oct 26, 2009 2:42 am |
|
|
|
Guest
|
[quote]Torsten Hennig <Torsten.Hennig at (no spam) umsicht.fhg.de> wrote:
Hello,
I have a set of two dimensional data (t_i, y_i)
which -
according to the physical background - follow a
linear
relation of the form
y_i = a*log(t_i) + b,
but only on a restricted subset (t_i0,y_i0),
(t_(i0+1),y_(i0+1)),...,(t_(i0+k),y_(i0+k)) of the
original data.
Is there any theoretically-based statistical method
to identify i0 and i0+k ?
Of course one could make the restriction only by
looking at the empirical data, but this appears
quite arbitrary to me.
Maybe one could include one data point after the
other
and inspect where the r_squared obviously starts to
decrease ?
My final aim is to determine the slope 'a'
of the above regression curve on the restricted
data
set.
Many thanks in advance for any reference.
Best wishes
Torsten.
What are the data telling you about the region <i0
and beyond i0+k? Can you extend your model to include
those regions? Do you have a valid reason for
rejecting
those data or is it just "convenient" to use the log
model?
[/quote]
ok, I will explain the underlying physical problem in
more detail.
The aim of the experiments is to determine the thermal
conductivity of a fluid.
For this purpose, a thin wire is immersed into the fluid
and electrically heated at a constant heating rate q.
After a certain time t_start, the temperature T
of the wire increases with the logarithm of time t:
T(t) = q/(4*pi*lambda)*(log(4*a*t/r0^2)-const) (*)
In deriving (*), only heat transfer from the
wire to the fluid by heat conduction have been taken
into account.
But again after a certain time (at t_end),
other physical mechanisms of heat transfer besides
conduction become significant for the temperature of
the wire (especially natural convection), and (*)
is no longer valid.
Summarizing:
There is a starting time t_start and an ending time
t_end where the temperature of the wire behaves like (*).
t_start and t_end are not known in advance, and
the models to incorporate the mechanisms of heat transfer
before t_start and after t_end are complex and difficult.
But it's also not necessary to model them for the
purpose of the experiments because within the timespan
[t_start;t_end], lambda can be estimated by
dT/d(log(t)), and that's what I want to achieve.
Hope this clarifies your questions.
Best wishes
Torsten. |
|
|
| Back to top |
|
|
|
| Paul... |
Posted: Mon Oct 26, 2009 5:43 am |
|
|
|
Guest
|
On Oct 26, 8:42 am, Torsten Hennig <Torsten.Hen... at (no spam) umsicht.fhg.de>
wrote:
[quote]
Summarizing:
There is a starting time t_start and an ending time
t_end where the temperature of the wire behaves like (*).
t_start and t_end are not known in advance, and
the models to incorporate the mechanisms of heat transfer
before t_start and after t_end are complex and difficult.
But it's also not necessary to model them for the
purpose of the experiments because within the timespan
[t_start;t_end], lambda can be estimated by
dT/d(log(t)), and that's what I want to achieve.
[/quote]
It's been so long since I worked with simulation that I've forgotten
the details, but there are statistical methods for identifying the
settling time of a simulation model (onset of steady-state, assuming
the output reaches/approaches steady-state). If you have enough data,
and if you can identify a sufficiently large central region of the
time domain where you're sure the log-linear relation holds, you might
be able to apply one of those methods to identify t_start and then, by
reversing the time axis, t_end. Any good text on analysis of
simulation output should discuss the settling time problem.
/Paul |
|
|
| Back to top |
|
|
|
| root... |
Posted: Mon Oct 26, 2009 8:12 am |
|
|
|
Guest
|
Torsten Hennig <Torsten.Hennig at (no spam) umsicht.fhg.de> wrote:
[quote]
Summarizing:
There is a starting time t_start and an ending time
t_end where the temperature of the wire behaves like (*).
t_start and t_end are not known in advance, and
the models to incorporate the mechanisms of heat transfer
before t_start and after t_end are complex and difficult.
But it's also not necessary to model them for the
purpose of the experiments because within the timespan
[t_start;t_end], lambda can be estimated by
dT/d(log(t)), and that's what I want to achieve.
Hope this clarifies your questions.
Best wishes
Torsten.
[/quote]
Ok, I assume you have to do this so many times that
just picking the start/end by eye is impossible. How
about fitting a first try, computing the residuals,
sorting out the non-fitting points, and refitting
the data?
If there is a way that you can say with any confidence
that the start is around Is, and the end is around Ie?
Then you can do a preliminary fit over that interval
and have a better handle on rejecting the outliers.
Any iterative procedure where you repeat fit then reject
outliers and refit will converge before your sample size
drops to zero. |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Thu Dec 10, 2009 3:26 am
|
|