Main Page | Report Page

 

  Computers Forum Index » Computer Artificial Intelligence - Philosophy » Reinforcement Learning...

Author Message
Don Stockbauer...
Posted: Thu Aug 19, 2010 3:27 am
 
Baby touches hot stove. Learns not to do that again.

Reinforcement learning. What more to it is there?
 
Curt Welch...
Posted: Thu Aug 19, 2010 5:15 am
 
Don Stockbauer <donstockbauer at (no spam) hotmail.com> wrote:
Quote:
Baby touches hot stove. Learns not to do that again.

Reinforcement learning. What more to it is there?

Not much more! :)

--
Curt Welch http://CurtWelch.Com/
curt at (no spam) kcwc.com http://NewsReader.Com/
 
Don Stockbauer...
Posted: Thu Aug 19, 2010 7:37 am
 
On Aug 18, 11:14 pm, c... at (no spam) kcwc.com (Curt Welch) wrote:
Quote:
Don Stockbauer <donstockba... at (no spam) hotmail.com> wrote:
Baby touches hot stove.  Learns not to do that again.

Reinforcement learning.  What more to it is there?

Not much more! Smile

Now you're messing with me!

I was sure you'd reply with a long explanation of how much more there
is to RL than what I said.

I'm getting dizzy.
 
Vesa Monisto...
Posted: Thu Aug 19, 2010 2:01 pm
 
"Curt Welch" <curt at (no spam) kcwc.com> wrote:
Quote:
Don Stockbauer <donstockbauer at (no spam) hotmail.com> wrote:

Baby touches hot stove. Learns not to do that again.

Reinforcement learning. What more to it is there?

Not much more! Smile

Sticks and carrots. -- And much more! -- An example:

http://areena.yle.fi/ohjelma/3687f3726346464e784c0be7df28acf3

After seeing that, I was convinced that no other paradigm
could work in those environments/conditions.
According to Newsweek's raport the results are not good,
because USA was in the place 26, in education.

http://www.newsweek.com/2010/08/15/interactive-infographic-of-the-worlds-best-countries.html

V.M.
 
Joachim Pimiskern...
Posted: Thu Aug 19, 2010 2:59 pm
 
Am 19.08.2010 05:27, schrieb Don Stockbauer:
Quote:
Baby touches hot stove. Learns not to do that again.

Reinforcement learning. What more to it is there?

A baby is not a snail, and pain and an oven is not an 1
at an input wire.

Upon getting hurt, the baby will think: Oh cool,
I should keep this new ability in mind. If I ever
want to trick my foe, I'll make him touch these
plates on the box with the knobs in the kitchen.

Regards,
Joachim
 
Yevgen Barsukov...
Posted: Fri Aug 20, 2010 2:11 pm
 
c... at (no spam) kcwc.com (Curt Welch) wrote:

It looks like we are mostly in agreement, so I will just focus on one
point that is somewhat fuzzy:

Quote:
To make RL work in a high dimension problem space as well as it works for
humans, you can't divide it into two steps 1) use hard wired dimension
reduction heuristics, and 2) apply RL to the reduced space. It has to be
solved in one step which applies RL in way to allows the system to learn
not just state->actions, but to learn the state reductions at the same
time.

I want to make clear that what I called "dimensionality reductions"
are not
necessarily coming from DNA, they are also coming from cultural
evolution of humanity.
But what unites both is that the both required huge computational and
physical
resources to compute / discover. And so even if we had created an RL-
engine
that is just a good a human, it would still take it 100 000 years to
recreate
the same result as cultural evolution and 500million years to recreate
results
of DNA evolution. From this point of view, we not only can but
absolutely HAVE TO
use
1) hard wired dimension reduction heuristics, and
2) apply RL to the reduced space
....only because 1) has absoultely immense cost and so can ONLY be
obtained
from database, while 2 is low-cost and readily executable using well
known algorithms.

In a way, this problem is related to solutions of NP-complete
problems. Once
you have the solution in the certain range of variables, you can just
pick it
by simple database search in linear (!) time. But it is prohibitevely
expensive
to compute the solutions directly.
So what I am suggesting is - take it directly from the database that
is already created (step 1, in this case humanity collections of
optimizations),
and than use it for whatever related problems
are there (step 2), which requires only polynomial time calculations.

Now, there is certain freedom about HOW you are going to use the
database of
solutions. You can do it strictly heuristically (so the algorithm of
access
itself CAN INCLUDE some algorithms from the database! Btw many
database sorting problems
are in fact NP-complete), or you can use RL algorithm that will
learn when and how to use it.
It remains to be seen if second method is workable because it is well
possible
that access of the data itself is a hard problem which itself
requires database type solutions to make it computable.

But aside of noting this possible difficulty, I would give it benefit
of a doubt
as long as we consciously recognize the absolute non-replaceablity of
this
precious resource and very clear about the need TO USE THE DATABASE of
solutions
to get any intelligence results comparable to humans and useful to
humans.
The AI researchers need to snap out of the idea
"lets create AI and it will do everything by itself". Any useful AI
will be initially
99.999 percent "humanity database" and 0.0001% of novel silicon (or
whatever new
technology)-specific access algorithms.

Regards,
Yevgen

--
Tune in to "Strange Drawing of the Day" buzz:
http://www.google.com/profiles/100679771837661030957#buzz
 
casey...
Posted: Sat Aug 21, 2010 4:46 am
 
On Aug 20, 9:31 pm, c... at (no spam) kcwc.com (Curt Welch) wrote:
Quote:
If you think a "walking" module which is trained by
reinforcement is an example of one of the modules,
then we could start by creating a simple robot that
needed some set of complex motions to walk (like a
robot with a few legs for example), and try to make
it use that module to help it find food faster.

I'm very sure that if you actually worked on that
project, you would start to see that the learning
side of the problem is actually very separate from
the "walking" side of the problem and that having
a "walking" module, doesn't make the learning system
work better, or worse. The "walking" modules, just
becomes part of the already highly complex environment
which the learning system has to learn about.

What modules do is provide small fast solutions to
the sub problems of the current problem being solved
by the problem solver.

Yes in theory a hypothetical RL module could learn
the walking except with biological brains there was
no generic RL network to start the processs.

You seem to imagine the neocortex is an RL machine
plonked on top of the old brain replacing the old
brain functions.

The neocortex has been around a long time and enables
animals to process all kinds of sensory data in ways
it couldn't do before but it is not doing reinfocement
learning. Reinforcement learning has always been part
of the older brain, the neocortex simply provides finer
details and generalizations for it to work with.

A lesion in a particular part of the cortex impairs
the ability to do arithmetic and the ability to describe
the relative spatial positions of objects. In other
words the brain that enabled us to do all those "trivial"
things in the jungle is now also used to do math and
without it there is no way to condition someone to
become a mathematician. The same is true for speech
encoding and decoding and so on ...

Doing math is hard, but easy to program. Doing vision
is easy, but hard to program. Does that tell you anything
about our amazing human skills?


JC
 
Don Stockbauer...
Posted: Mon Aug 23, 2010 2:52 am
 
On Aug 22, 8:23 pm, c... at (no spam) kcwc.com (Curt Welch) wrote:
Quote:
Yevgen Barsukov <evgen... at (no spam) gmail.com> wrote:
c... at (no spam) kcwc.com (Curt Welch) wrote:

I The RL module
in your brain I would argue is trivially simple and won't take us another
500 million years to reverse engineer and clone in hardware.  It's highly
likely to be done in less than 100 years.

If this is accomplished, Curt, guess at what fantastc advances
civilization would make due to it?
 
Don Stockbauer...
Posted: Mon Aug 23, 2010 4:45 am
 
On Aug 22, 11:23 pm, Don Stockbauer <donstockba... at (no spam) hotmail.com> wrote:
Quote:
On Aug 22, 10:34 pm, c... at (no spam) kcwc.com (Curt Welch) wrote:

First and foremost, we will have 1000's of AIs taking part in these endless
circular debates we call c.a.p. instead of just 10 of us!  That will be a
wonderful step forward!  The global brain will swell with delight!

I do wish that they will be endless.

I suppose they will, although as each of us dies new blood will have
to come along.

Hopefully the problem will be solved of the sun's evolution to its red
giant phase.

Interstellar travel, or populating the outer planets?????
 
Curt Welch...
Posted: Mon Aug 23, 2010 5:15 am
 
Don Stockbauer <donstockbauer at (no spam) hotmail.com> wrote:
Quote:
On Aug 22, 8:23=A0pm, c... at (no spam) kcwc.com (Curt Welch) wrote:
Yevgen Barsukov <evgen... at (no spam) gmail.com> wrote:
c... at (no spam) kcwc.com (Curt Welch) wrote:

I=A0The RL module
in your brain I would argue is trivially simple and won't take us
another 500 million years to reverse engineer and clone in hardware.
=A0It's high=
ly
likely to be done in less than 100 years.

If this is accomplished, Curt, guess at what fantastc advances
civilization would make due to it?

Yes, we have debated those guesses many times here.

First and foremost, we will have 1000's of AIs taking part in these endless
circular debates we call c.a.p. instead of just 10 of us! That will be a
wonderful step forward! The global brain will swell with delight!

--
Curt Welch http://CurtWelch.Com/
curt at (no spam) kcwc.com http://NewsReader.Com/
 
Curt Welch...
Posted: Mon Aug 23, 2010 5:15 am
 
casey <jgkjcasey at (no spam) yahoo.com.au> wrote:
Quote:
On Aug 20, 9:31=A0pm, c... at (no spam) kcwc.com (Curt Welch) wrote:
If you think a "walking" module which is trained by
reinforcement is an example of one of the modules,
then we could start by creating a simple robot that
needed some set of complex motions to walk (like a
robot with a few legs for example), and try to make
it use that module to help it find food faster.

I'm very sure that if you actually worked on that
project, you would start to see that the learning
side of the problem is actually very separate from
the "walking" side of the problem and that having
a "walking" module, doesn't make the learning system
work better, or worse. The "walking" modules, just
becomes part of the already highly complex environment
which the learning system has to learn about.

What modules do is provide small fast solutions to
the sub problems of the current problem being solved
by the problem solver.

Yes in theory a hypothetical RL module could learn
the walking except with biological brains there was
no generic RL network to start the processs.

You seem to imagine the neocortex is an RL machine
plonked on top of the old brain replacing the old
brain functions.

The neocortex has been around a long time and enables
animals to process all kinds of sensory data in ways
it couldn't do before but it is not doing reinfocement
learning. Reinforcement learning has always been part
of the older brain, the neocortex simply provides finer
details and generalizations for it to work with.

A lesion in a particular part of the cortex impairs
the ability to do arithmetic and the ability to describe
the relative spatial positions of objects. In other
words the brain that enabled us to do all those "trivial"
things in the jungle is now also used to do math and
without it there is no way to condition someone to
become a mathematician. The same is true for speech
encoding and decoding and so on ...

Doing math is hard, but easy to program. Doing vision
is easy, but hard to program. Does that tell you anything
about our amazing human skills?

JC

Actually, I think doing vision is trivial to program - but that no one has
yet uncovered that trivial code.

In my book, you are rating the powers of our brain as "amazing" based on
the fact that human brains have so far failed to uncover the trivial code
that explains how they work! Such a failure is not in my book amazing! It
just shows how unremarkable the human brain actually is! :)

How long did it take humans to figure out that f=ma explains the way apples
falls from trees? 40,000 years after language was first used? 1000
generations? How amazing is that? :)

How long did it take them to figure out evolution? How many "brains" are
there in the world that still don't understand even the basics of evolution
200 years after it was clearly explained despite having been taught about
it in school? Again, how amazing is that really?

Brain's really aren't all that amazing, and in time, it will become as
obvious to everyone that human brains compared to our man-made machines are
no more amazing than human muscle is compared to our engines. We are just
weak biological meat robots that will soon be shown to be some of the least
amazing things in the universe compared to the things we will end up
creating (and compared to much of what we have already created).

--
Curt Welch http://CurtWelch.Com/
curt at (no spam) kcwc.com http://NewsReader.Com/
 
casey...
Posted: Mon Aug 23, 2010 6:20 am
 
On Aug 23, 11:23 am, c... at (no spam) kcwc.com (Curt Welch) wrote:
Quote:

John here in c.a.p. for example, is one of many that
doesn't get it.

Funny I am under the impression you don't get it??

You take what are good ideas, and useful when used
properly, like connectionism, and claim they are the
solution to everything, as in the old example of
those who imagine that because a hammer is good for
nails it must be good for screws.

About complexity:

The complexity of interest is one "whose constituent
parts are arranged in a way that is unlikely to have
arisen by chance alone". The Blind Watchmaker -
Richard Dawkins

It is in that sense I am talking about the complexity
of the brain. Sure the end result may be "simple" but
getting there (learning) is not, particulary if there
is no external designer to guide the process.

JC
 
Don Stockbauer...
Posted: Mon Aug 23, 2010 2:17 pm
 
On Aug 23, 1:20 am, casey <jgkjca... at (no spam) yahoo.com.au> wrote:
Quote:
On Aug 23, 11:23 am, c... at (no spam) kcwc.com (Curt Welch) wrote:



John here in c.a.p. for example, is one of many that
doesn't get it.

Funny I am under the impression you don't get it??

You take what are good ideas, and useful when used
properly, like connectionism, and claim they are the
solution to everything, as in the old example of
those who imagine that because a hammer is good for
nails it must be good for screws.

About complexity:

The complexity of interest is one "whose constituent
parts are arranged in a way that is unlikely to have
arisen by chance alone". The Blind Watchmaker -
Richard Dawkins

It is in that sense I am talking about the complexity
of the brain. Sure the end result may be "simple" but
getting there (learning) is not, particulary if there
is no external designer to guide the process.

One may consider a car "simple" if all you know about it is how to
operate it. However, the underlying substrate is anything but
simple. Especially now, with cars full of computers.
 
Curt Welch...
Posted: Mon Aug 23, 2010 6:36 pm
 
casey <jgkjcasey at (no spam) yahoo.com.au> wrote:
Quote:
On Aug 23, 11:23=A0am, c... at (no spam) kcwc.com (Curt Welch) wrote:

John here in c.a.p. for example, is one of many that
doesn't get it.

Funny I am under the impression you don't get it??

You take what are good ideas, and useful when used
properly, like connectionism, and claim they are the
solution to everything,

To everything? :)

I'm fairly sure we only talk about one problem here - designing a machine
that can duplicate human behavior. And yes, for that ONE problem, I think
a connectionism-like approach is required. That is, I think it's an
engineering problem that can only be solved when it's conceptualized as a
parallel real time data flow problem.

Quote:
as in the old example of
those who imagine that because a hammer is good for
nails it must be good for screws.

Yes, well if I suggested that we use this approach on screws, your point
would be valid.

The issue that makes you say what you just did, is that I think it is one
problem that we are solving, and you think it is many different problems.
You think we have to design and build lots of modules, each with a
different function and purpose, to solve the problem of AI. I think we have
to build a learning system that will build its own modules in response to
its environment and goals.

I'm making the hammer screw error here. I just see it one problem instead
of seeing it as 1000 different problems.

Quote:
About complexity:

The complexity of interest is one "whose constituent
parts are arranged in a way that is unlikely to have
arisen by chance alone". The Blind Watchmaker -
Richard Dawkins

It is in that sense I am talking about the complexity
of the brain. Sure the end result may be "simple" but
getting there (learning) is not, particulary if there
is no external designer to guide the process.

Not sure which designer you are talking about there.

The interesting thing about all RL systems is that there is always an
external "designer" to guide the process. That's how they work. They have
something which is forcing the evolution of their design in a specific
direction, so it's not random. That is why the type of complexity you
mentioned above always emerges from such a process. It's not random
evolution, it's directed evolution. It's a complexity that would be highly
unlikely to emerge randomly.

--
Curt Welch http://CurtWelch.Com/
curt at (no spam) kcwc.com http://NewsReader.Com/
 
J.A. Legris...
Posted: Tue Aug 24, 2010 4:45 am
 
On Aug 19, 2:01 pm, c... at (no spam) kcwc.com (Curt Welch) wrote:
Quote:

I believe that's really why behaviorism was rejected.  People didn't want
to believe it could be right.  If the AI community had produced RL machines
that acted like humans back in the 50's, behaviorism never would have been
rejected like it was.  But no one did manage to build AI hardware that
duplicated the complexity of human behavior using carrot and stick
learning.  They still haven't.  The lack of the working hardware is what
allows people to continue to reject the thing they never wanted to believe
in the first place.

--
Curt Welch                                            http://CurtWelch.Com/
c... at (no spam) kcwc.com                                        http://NewsReader.Com/

Don't forget Noam Chomsky's significant contribution to the demise of
behaviourism. The pivotal event is often seen as Chomsky's scathing
review of Skinner's "Verbal Behavior", a book that few non-specialists
have read or can even access (see http://www.chomsky.info/articles/1967----.htm).

But in 1971 Chomsky also reviewed one of Skinner's popular books,
"Beyond Freedom and Dignity", which is still in print. As in the
"Verbal Behavior" review, he attacked Skinner's sloppy and self-
deluding use of "reinforcement" and other "scientific" terms (see
http://www.chomsky.info/articles/19711230.htm).

Chomsky: "Skinner does not comprehend the basic criticism: when his
formulations are interpreted literally, they are clearly false, and
when these assertions are interpreted in his characteristic vague and
metaphorical way, they are merely a poor substitute for ordinary
usage."

I suspect, Curt, that you have been duped by Skinnerian-style double-
talk. RL machines fail because RL theory provides a poor explanation
of animal behaviour.

Have a look at the 1971 article - it's an easy and entertaining read.

--
Joe
 
 
Page 1 of 2    Goto page 1, 2  Next
All times are GMT
The time now is Wed Aug 20, 2014 12:51 pm