 |
|
| Computers Forum Index » Computer Artificial Intelligence - Language » Paraphrasing... |
|
Page 1 of 1 |
|
| Author |
Message |
| Mok-Kong Shen... |
Posted: Mon Sep 21, 2009 1:55 pm |
|
|
|
Guest
|
Joe Devin wrote:
[snip]
Thank you for your valuable comments which I snipped for space reasons.
I just like to add a point to my original post: If one restricts
oneself to those types of replacements examplified by the first
three I indicated, namely:
UN vs. United Nations
cheap vs. inexpensive
my friend vs. a friend of mine
would that be fairly doable at present? If yes, how should one best
proceed?
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Joe Devin... |
Posted: Tue Sep 22, 2009 4:45 am |
|
|
|
Guest
|
Mok-Kong Shen wrote:
Quote: If one restricts
oneself to those types of replacements examplified by the first
three I indicated, namely:
UN vs. United Nations
cheap vs. inexpensive
my friend vs. a friend of mine
would that be fairly doable at present? If yes, how should one best
proceed?
Thanks,
M. K. Shen
Surely. Simply use the global replacement feature on most any text editor.
The first examples you gave are simply interchangeable synonyms. The last
one ("my friend" vs "a friend of mine") can also be accomplished by global
replacement, but the meaning will end up slightly different.
As an example, you might first check for the existence of some character
string in your text, say "kkk." If "kkk" exists nowhere in your text file,
then you might convert (setting conversion to word-boundary mode) all
occurrances of "UN" to "kkk." Then you might convert all "United nations"
to "UN." And at last, you might convert all "kkk" to "United Nations." We
call this "swapping." When you are done, you will end up with every
occurrence of "UN" replaced by "United Nations," and vice versa.
You see? I am your knowledgeable friend. Stop seeing me as stupid enemy
and I will always love you.
--Chaumont Devin.
http://witchit.com
http://panlingua.net
http://chaumontdevin.com
http://oldmaluku.com |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Wed Sep 23, 2009 6:09 pm |
|
|
|
Guest
|
pipeDream wrote:
Quote: Human flight is hardly based on emulating nature - the way birds and
bees fly is different from the way a moden plane flies.
I am afraid the books on bionics would oppose you. At the end of
the wing of a modern airplane there is a tiny stick. Even that is
inspired from a certain bird wing, if I remember correctly.
M. K. Shen |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Wed Sep 23, 2009 6:34 pm |
|
|
|
Guest
|
Joe Devin wrote:
Quote: Surely. Simply use the global replacement feature on most any text editor.
The first examples you gave are simply interchangeable synonyms. The last
one ("my friend" vs "a friend of mine") can also be accomplished by global
replacement, but the meaning will end up slightly different.
Synonyms like 'cheap'/'inexpensive' seem to have senses that are
particularly close to each other and therefore higher chances
of exchangeability. Are there listings of such materials?
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Joe Devin... |
Posted: Thu Sep 24, 2009 3:57 am |
|
|
|
Guest
|
Mok-Kong Shen wrote:
Quote: Synonyms like 'cheap'/'inexpensive' seem to have senses that are
particularly close to each other and therefore higher chances
of exchangeability. Are there listings of such materials?
Thanks,
M. K. Shen
In my work, I have done most things from scratch and not by copying
synonyms from other works. The reason is that many entries in standard
dictionaries and thesauruses may not work as expected under my system. It
is safer always to let the system grow up naturally so that anything that
would cause problems gets dealt with as the system develops. So, to put it
bluntly, I simply don't know.
But if you are interested in how my systems work, and these work the same
way for many languages, then it is as follows:
The place my system checks for synonymy is in a kind of "black box" called
the ontology. The nodes, or connecting points in the ontology can be
thought of as representing meanings, but they are not symbols. These nodes
in the ontology are called "semantic nodes," or just "semnods." Each such
semnod is usually linked to one or more nodes connected to actual English
words in another "black box" called the lexicon. These nodes within the
lexicon are called "lexical nodes," or just "lexnods." So each semnod is
generally linked to one or more lexnods, whereas each lexnod can only link
to a single English word. Thus there is a different lexnod linked to
"run," "runs," "ran," and "running," but the semnod for the particular
meaning of run in question (and there are several such meanings or semnods
for different kinds of "run") links to each of these lexnods. And not only
this, but ALL the semnods for the various meanings of "run" link to this
selfsame set of lexnods.
And now, if I haven't confused you enough, to return to the ontology.
There can only be one lexlink (link from semnod to lexnod) of a particular
type emanating from any semnod. Thus for special cases such as English
"am" and "be," which would have an identical lexlink type (that type being
"present tense verb"), separate semnods are required even though the
meanings remain the same. So I arbitrarily use the semnod linking through
to "be" as the main meaning, and then set up a separate semnod linking
through to "am" as a synonym. So the semnod linking to "am" as a present
tense verb also has a link of type "synonym" to the semnod linked to "be."
It was hard to see how all of this would work at first, but I found that
this had to be the rule, else during text generation the system might grab
"am" one time and "be" another time at random.
I am explaining all of this because, as I have written before, the old
axiom, "There's always more than one way of skinning a cat," won't seem to
work for linguistics. This process of much trial and error using computer
systems is therefore probably a very good guide to what really happens
inside human heads. Right now we understand part of these processes, but
we keep always learning more, one little piece at a time, and as we learn,
our systems keep getting better.
regards,
Chaumont Devin.
witchit.com
panlingua.net
chaumontdevin.com
oldmaluku.net |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Thu Sep 24, 2009 11:16 am |
|
|
|
Guest
|
Joe Devin wrote:
Quote: In my work, I have done most things from scratch and not by copying
synonyms from other works. The reason is that many entries in standard
dictionaries and thesauruses may not work as expected under my system. It
is safer always to let the system grow up naturally so that anything that
would cause problems gets dealt with as the system develops. So, to put it
bluntly, I simply don't know.
But if you are interested in how my systems work, and these work the same
way for many languages, then it is as follows:
The place my system checks for synonymy is in a kind of "black box" called
the ontology. The nodes, or connecting points in the ontology can be
thought of as representing meanings, but they are not symbols. These nodes
in the ontology are called "semantic nodes," or just "semnods." Each such
semnod is usually linked to one or more nodes connected to actual English
words in another "black box" called the lexicon. These nodes within the
lexicon are called "lexical nodes," or just "lexnods." So each semnod is
generally linked to one or more lexnods, whereas each lexnod can only link
to a single English word. Thus there is a different lexnod linked to
"run," "runs," "ran," and "running," but the semnod for the particular
meaning of run in question (and there are several such meanings or semnods
for different kinds of "run") links to each of these lexnods. And not only
this, but ALL the semnods for the various meanings of "run" link to this
selfsame set of lexnods.
And now, if I haven't confused you enough, to return to the ontology.
There can only be one lexlink (link from semnod to lexnod) of a particular
type emanating from any semnod. Thus for special cases such as English
"am" and "be," which would have an identical lexlink type (that type being
"present tense verb"), separate semnods are required even though the
meanings remain the same. So I arbitrarily use the semnod linking through
to "be" as the main meaning, and then set up a separate semnod linking
through to "am" as a synonym. So the semnod linking to "am" as a present
tense verb also has a link of type "synonym" to the semnod linked to "be."
It was hard to see how all of this would work at first, but I found that
this had to be the rule, else during text generation the system might grab
"am" one time and "be" another time at random.
I am explaining all of this because, as I have written before, the old
axiom, "There's always more than one way of skinning a cat," won't seem to
work for linguistics. This process of much trial and error using computer
systems is therefore probably a very good guide to what really happens
inside human heads. Right now we understand part of these processes, but
we keep always learning more, one little piece at a time, and as we learn,
our systems keep getting better.
If I don't err in reading the above, you do everything, so to
say, by hand. I wonder whether it wouldn't eventually be possible
to get certain machine support. Suppose, for example, there were
a software that could extract from a large corpus sets of words
that are somehow synonymous, it would be possible to incorporate
that data into your system, through careful human screening,
of course.
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Thu Sep 24, 2009 11:56 am |
|
|
|
Guest
|
pipeDream wrote:
Quote: is it ever possible to read what happens in the others mind.?
Devices using signals from the brain to help disabled persons
to manipulate with robotic hands and certain studies in linguistics
with the help of MRI, etc. indicate that some, though yet humble,
progress has been made in that direction.
M. K. Shen |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Thu Sep 24, 2009 11:56 am |
|
|
|
Guest
|
Joe Devin wrote:
Quote: ......... If God be truly Almighty and Merciful, then
why is he unable to do these things Himself, and why is he forced ever to
rely upon his frail human servants?
Yes, this is a "logical" problem for His existence. With the premise
above, He shouldn't have tolerated the sins of the mankind, including,
in particular, wars, genocide, political and economical exploitation
of the poor and breach of human rights (though perhaps the human
doesn't "have" any rights from the very beginning!), and would have
eliminated all the evils through a single waving of His hand. This
problem is understandably ignored, avoided or even vehemently
suppressed through some means in all religions (excepting possibly
in each individual's own religion) in my humble view.
M. K. Shen |
|
|
| Back to top |
|
|
|
| Brian Martin... |
Posted: Fri Sep 25, 2009 6:06 pm |
|
|
|
Guest
|
Wordnet (Princeton) have already done the hard work of extracting
synonym sets. I suggest we just use their work rather than reinvent the
wheel.
Mok-Kong Shen wrote:
Quote: Joe Devin wrote:
In my work, I have done most things from scratch and not by copying
synonyms from other works. The reason is that many entries in standard
dictionaries and thesauruses may not work as expected under my
system. It
is safer always to let the system grow up naturally so that anything that
would cause problems gets dealt with as the system develops. So, to
put it
bluntly, I simply don't know.
But if you are interested in how my systems work, and these work the same
way for many languages, then it is as follows:
The place my system checks for synonymy is in a kind of "black box"
called
the ontology. The nodes, or connecting points in the ontology can be
thought of as representing meanings, but they are not symbols. These
nodes
in the ontology are called "semantic nodes," or just "semnods." Each
such
semnod is usually linked to one or more nodes connected to actual English
words in another "black box" called the lexicon. These nodes within the
lexicon are called "lexical nodes," or just "lexnods." So each semnod is
generally linked to one or more lexnods, whereas each lexnod can only
link
to a single English word. Thus there is a different lexnod linked to
"run," "runs," "ran," and "running," but the semnod for the particular
meaning of run in question (and there are several such meanings or
semnods
for different kinds of "run") links to each of these lexnods. And not
only
this, but ALL the semnods for the various meanings of "run" link to this
selfsame set of lexnods.
And now, if I haven't confused you enough, to return to the ontology.
There can only be one lexlink (link from semnod to lexnod) of a
particular
type emanating from any semnod. Thus for special cases such as English
"am" and "be," which would have an identical lexlink type (that type
being
"present tense verb"), separate semnods are required even though the
meanings remain the same. So I arbitrarily use the semnod linking
through
to "be" as the main meaning, and then set up a separate semnod linking
through to "am" as a synonym. So the semnod linking to "am" as a present
tense verb also has a link of type "synonym" to the semnod linked to
"be."
It was hard to see how all of this would work at first, but I found that
this had to be the rule, else during text generation the system might
grab
"am" one time and "be" another time at random.
I am explaining all of this because, as I have written before, the old
axiom, "There's always more than one way of skinning a cat," won't
seem to
work for linguistics. This process of much trial and error using
computer
systems is therefore probably a very good guide to what really happens
inside human heads. Right now we understand part of these processes, but
we keep always learning more, one little piece at a time, and as we
learn,
our systems keep getting better.
If I don't err in reading the above, you do everything, so to
say, by hand. I wonder whether it wouldn't eventually be possible
to get certain machine support. Suppose, for example, there were
a software that could extract from a large corpus sets of words
that are somehow synonymous, it would be possible to incorporate
that data into your system, through careful human screening,
of course.
Thanks,
M. K. Shen |
|
|
| Back to top |
|
|
|
| Joe Devin... |
Posted: Sat Sep 26, 2009 3:52 pm |
|
|
|
Guest
|
Brian wrote:
I downloaded wordnet many years ago and benefitted from it as a learning
tool. I have no idea what it is like now, but at that time, here were its
fatal flaws:
1. Its construction was not guided by any coherent underlying theory, and
it showed it.
2. Instead of finding a way to include all meanings in the same ontology
(or box of meanings), because wordnet was not based upon a theory that
could do this, they had to put different parts of speech in different
files.
3. The people who did wordnet were not computer savvy, so they made choices
that gobbled up computer resources and made things to slow for real
applications.
4. A lot of the work on wordnet was evidently done by volunteers in their
spare time, and so it has (or had) a lot of errors which made it too
unreliable to use on real systems.
Nevertheless I strongly recommend wordnet as a valuable learning tool.
--Chaumont Devin. |
|
|
| Back to top |
|
|
|
| Joe Devin... |
Posted: Sat Sep 26, 2009 3:57 pm |
|
|
|
Guest
|
Bryan Martin wrote:
Quote: Wordnet (Princeton) have already done the hard work of extracting
synonym sets. I suggest we just use their work rather than reinvent the
wheel.
Synonymy is really quite trivial and not worth spending much time on in the
larger scheme of things. I am always setting it up and revising it with
just a few keystrokes using Semlex, which is tightly coupled to my larger
system. That way I can be immediately certain that not only am I getting a
synonym link to the right English word, but also to the right word sense or
meaning.
--Chaumont Devin. |
|
|
| Back to top |
|
|
|
| Brian Martin... |
Posted: Sat Oct 17, 2009 3:37 pm |
|
|
|
Guest
|
I've found Wordnet data files useful in their raw form, though I don't
use their provided API's which are query/response based.
I prefer to just load all their datafiles & parse them directly into an
internal format, sidestepping the Wordnet API's for efficiency. i.e.
parse & remap the full dataset rather than use millions of API calls.
The raw datafile format is well documented, though I agree it's a bit
convoluted.
Joe Devin wrote:
Quote: Brian wrote:
Wordnet - http://wordnet.princeton.edu/
I downloaded wordnet many years ago and benefitted from it as a learning
tool. I have no idea what it is like now, but at that time, here were its
fatal flaws:
1. Its construction was not guided by any coherent underlying theory, and
it showed it.
2. Instead of finding a way to include all meanings in the same ontology
(or box of meanings), because wordnet was not based upon a theory that
could do this, they had to put different parts of speech in different
files.
3. The people who did wordnet were not computer savvy, so they made choices
that gobbled up computer resources and made things to slow for real
applications.
4. A lot of the work on wordnet was evidently done by volunteers in their
spare time, and so it has (or had) a lot of errors which made it too
unreliable to use on real systems.
Nevertheless I strongly recommend wordnet as a valuable learning tool.
--Chaumont Devin.
|
|
|
| Back to top |
|
|
|
| Brian Martin... |
Posted: Sat Oct 17, 2009 3:39 pm |
|
|
|
Guest
|
Wordnet, while oriented to synonym sets, also includes hypernym /
hyponym links, and other ontology frameworks.
Joe Devin wrote:
Quote: Bryan Martin wrote:
Wordnet (Princeton) have already done the hard work of extracting
synonym sets. I suggest we just use their work rather than reinvent the
wheel.
Synonymy is really quite trivial and not worth spending much time on in the
larger scheme of things. I am always setting it up and revising it with
just a few keystrokes using Semlex, which is tightly coupled to my larger
system. That way I can be immediately certain that not only am I getting a
synonym link to the right English word, but also to the right word sense or
meaning.
--Chaumont Devin.
|
|
|
| Back to top |
|
|
|
| Joachim Pimiskern... |
Posted: Sat Oct 17, 2009 8:57 pm |
|
|
|
Guest
|
|
| Back to top |
|
|
|
|
|
All times are GMT
The time now is Sat Nov 28, 2009 10:11 pm
|
|