Main Page | Report this Page
Computers Forum Index  »  Computer Artificial Intelligence - Neural Nets  »  BackPropagation Favors Input Cases with Large Output...
Page 1 of 1    

BackPropagation Favors Input Cases with Large Output...

Author Message
TomH488...
Posted: Wed Sep 09, 2009 5:18 pm
Guest
This is a simple 3 layer BP network with a single Output.

All columns are Standardized (except used 3 StDev. instead of 1)

Somewhere I read about training until the variance of the error was
something like 5% of the variance of the output.

My focus was to see how well the training was fitting the input cases.

What I learned was the following: Cases with large output magnitudes
were fit long before small output magnitudes. In fact, with even
seemingly large numbers of hidden nodes, a good fit of the small
outputs was still not achieved.

Here are the results of a training with 3000 inputs, 25 inputs, 110
hidden, and a single ouput which had a max mag of 80:

Ouput Magnitude Intervals:
00-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80

# Cases in each Interval:
1642
828
327
106
31
19
4
1

Variance Ratio for each Interval: (Output Variance / Error Variance)
..246
..029
..0085
..0072
..0021
..0022
..983
undefined

It is disconcerting to see such a poor fit on the first and most
populated interval.

QUESTIONs:

Is it reasonable to conclude that Output predictions over the
magnitude interval of [0,10] should be poor?

Does the extremely close fit of large mag outputs imply
memorization?
If yes, perhaps due to the small number of large mag outs, that may be
a non-issue.
__________________

I am going to investigate the following:

1) Train and predict only cases that are contained in the 0,10
interval and see if generalization is achieved.

2) Try adding duplicate rows to the input file for the 0,10 interval
and see if error variance can be improved.

Any comment would be greatly appreciated, and of course
Much Thanks in Advance,
Tom H.
 
Greg...
Posted: Fri Sep 11, 2009 2:20 pm
Guest
On Sep 9, 1:18 pm, TomH488 <tom... at (no spam) gmail.com> wrote:
Quote:
This is a simple 3 layer BP network with a single Output.

Since the unmodified term layer is not standardized and
ambiguous (layers of nodes or layers of weights?). It is
better to just state the number of hidden node layers.

Quote:
All columns are Standardized (except used 3 StDev. instead
of 1)

Then they are normalized; NOT standardized.

Quote:
Somewhere I read about training until the variance of the
error was something like 5% of the variance of the output.

MSE (re squared variations about the target) is a preferable
measure instead of VAR (re squared variations about the mean
of the output)

I typically train with MSEgoal = min(MSE00/100,MSE0/10) where
MSE00 is the MSE for the constant model y = mean(targets) and
MSE0 is the MSE for the linear model y = W*x + constant.

Quote:
My focus was to see how well the training was fitting the
input cases.

data = design + test
design = training + validation

The design focus should be on how well training fits validation
data.

Quote:
What I learned was the following: Cases with large output magnitudes
were fit long before small output magnitudes.

Of course; Your objective function is mean-SQUARED-error.

This is easily mitigated by one or more of the following:
1. Normalize outputs
2. Use regularization
3. Use mean-ABSOLUTE-error

Quote:
seemingly large numbers of hidden nodes, a good fit of the small
outputs was still not achieved.

Of course. That is not what you are optimizing.

Quote:
Here are the results of a training with 3000 inputs, 25 inputs, 110
hidden, and a single ouput which had a max mag of 80:

Ouput Magnitude Intervals:
00-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80

# Cases in each Interval:
1642
828
327
106
31
19
4
1

Variance Ratio for each Interval: (Output Variance / Error Variance)
.246
.029
.0085
.0072
.0021
.0022
.983
undefined

It is disconcerting to see such a poor fit on the first and most
populated interval.

QUESTIONs:

Is it reasonable to conclude that Output predictions over the
magnitude interval of [0,10] should be poor?

Does the extremely close fit of large mag outputs imply
memorization?
If yes, perhaps due to the small number of large mag outs, that may be
a non-issue.
__________________

I am going to investigate the following:

1) Train and predict only cases that are contained in the 0,10
interval and see if generalization is achieved.

2) Try adding duplicate rows to the input file for the 0,10 interval
and see if error variance can be improved.

Any comment would be greatly appreciated, and of course
Much Thanks in Advance,
Tom H.

Normalization is usually sufficient.

Hope this helps.

Greg
 
Thorsten Kiefer...
Posted: Fri Sep 11, 2009 3:48 pm
Guest
TomH488 wrote:

Quote:
This is a simple 3 layer BP network with a single Output.
For that case I discovered/developed/what ever something like a "sine

effect". I initialize the NN to generate the sine function.
If you want you can try that out to see if it helps.

Java source is here :
http://www.tokis-edv-service.de/examples/tokisprojects.tar.gz

Example applet is here:
http://www.tokis-edv-service.de/index.php/beispiele/feed-forward-neuronale-netze


-Thorsten



Quote:

All columns are Standardized (except used 3 StDev. instead of 1)

Somewhere I read about training until the variance of the error was
something like 5% of the variance of the output.

My focus was to see how well the training was fitting the input cases.

What I learned was the following: Cases with large output magnitudes
were fit long before small output magnitudes. In fact, with even
seemingly large numbers of hidden nodes, a good fit of the small
outputs was still not achieved.

Here are the results of a training with 3000 inputs, 25 inputs, 110
hidden, and a single ouput which had a max mag of 80:

Ouput Magnitude Intervals:
00-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80

# Cases in each Interval:
1642
828
327
106
31
19
4
1

Variance Ratio for each Interval: (Output Variance / Error Variance)
.246
.029
.0085
.0072
.0021
.0022
.983
undefined

It is disconcerting to see such a poor fit on the first and most
populated interval.

QUESTIONs:

Is it reasonable to conclude that Output predictions over the
magnitude interval of [0,10] should be poor?

Does the extremely close fit of large mag outputs imply
memorization?
If yes, perhaps due to the small number of large mag outs, that may be
a non-issue.
__________________

I am going to investigate the following:

1) Train and predict only cases that are contained in the 0,10
interval and see if generalization is achieved.

2) Try adding duplicate rows to the input file for the 0,10 interval
and see if error variance can be improved.

Any comment would be greatly appreciated, and of course
Much Thanks in Advance,
Tom H.
 
Leonid H2009...
Posted: Sat Sep 12, 2009 7:51 pm
Guest
Hi Greg, I see that you have lot of knowledge about neural networks,
maybe you can also answer my questions?

I'm also new in this, I gave my network a very simple patterns of
horizontal and vertical lines, and I expected my network to learn it
very easily (Classification) but to my surprise my network unable to
learn this simple patterns, what can be the problem?

Here are my questions -

http://groups.google.com/group/comp.ai.neural-nets/browse_thread/thre...

Thanks.
 
Leonid H2009...
Posted: Sat Sep 12, 2009 7:59 pm
Guest
Hi Phil, I see that you have lot of knowledge about neural networks,
maybe you can also answer my questions?

I'm also new in this, I gave my network a very simple patterns of
horizontal and vertical lines, and I expected my network to learn it
very easily (Classification) but to my surprise my network unable to
learn this simple patterns, what can be the problem?

Here are my questions -

http://groups.google.com/group/comp.ai.neural-nets/browse_thread/thread/6f837557b46659c1?hl=en#

Thanks.
 
Leonid H2009...
Posted: Sat Sep 12, 2009 8:00 pm
Guest
Hi Greg, I see that you have lot of knowledge about neural networks,
maybe you can also answer my questions?

I'm also new in this, I gave my network a very simple patterns of
horizontal and vertical lines, and I expected my network to learn it
very easily (Classification) but to my surprise my network unable to
learn this simple patterns, what can be the problem?

Here are my questions -

http://groups.google.com/group/comp.ai.neural-nets/browse_thread/thread/6f837557b46659c1?hl=en#

Thanks.
 
 
Page 1 of 1    
All times are GMT
The time now is Sat Nov 28, 2009 1:41 pm