Main Page | Report this Page
Computers Forum Index  »  Computer Languages (Misc)  »  Pratt-parser...
Page 1 of 1    

Pratt-parser...

Author Message
Rod Pemberton...
Posted: Thu Nov 05, 2009 12:50 pm
Guest
Is anyone here familiar with Vaughn Pratt's concept of "Top Down Operator
Precedence" (1973) or Pratt-parsers?

I'd like to know more about the technique. Unfortunately, the articles
below are programming language heavy.

Apparently, Pratt-parsers are recursive descent parsers modified by top down
operator precedence. It seems that this dramatically reduces the code size.

AFAICT, Pratt-parsers are used with CGOL, LISP, and Scheme. Otherwise, the
concept was obscure until recently. However, it has been rediscovered
apparently by Crockford:

for Javascript, by Douglas Crockford
http://javascript.crockford.com/tdop/tdop.html

for Python, by Fredrik Lundh
http://effbot.org/zone/simple-top-down-parsing.htm


Rod Pemberton
 
Dmitry A. Kazakov...
Posted: Thu Nov 05, 2009 2:04 pm
Guest
On Thu, 5 Nov 2009 02:50:02 -0500, Rod Pemberton wrote:

Quote:
Is anyone here familiar with Vaughn Pratt's concept of "Top Down Operator
Precedence" (1973) or Pratt-parsers?

I don't have the paper, but I remember a Pratt's book (1975) where the
parsing technique was described.

Quote:
I'd like to know more about the technique. Unfortunately, the articles
below are programming language heavy.

Apparently, Pratt-parsers are recursive descent parsers modified by top down
operator precedence. It seems that this dramatically reduces the code size.

AFAICT, Pratt-parsers are used with CGOL, LISP, and Scheme. Otherwise, the
concept was obscure until recently. However, it has been rediscovered
apparently by Crockford:

for Javascript, by Douglas Crockford
http://javascript.crockford.com/tdop/tdop.html

for Python, by Fredrik Lundh
http://effbot.org/zone/simple-top-down-parsing.htm

I am using Pratt's method for decades. I am not sure if he invented it, he
didn't say it in the book, maybe he was. I have built several compilers on
it.

In my view it is simple and efficient. One of the compilers I built was in
Assembler for a 32K machine, which illustrates how simple the method is.

I also extended the methods towards two asymmetric priorities. You can find
a description and implementation here

http://www.dmitry-kazakov.de/ada/components.htm#Parsers_etc

The section 11.2 contains an explanation how the techinique works for most
complex expressions one can find in programming languages and mathematics
notation:

http://www.dmitry-kazakov.de/ada/components.htm#11.2

One of the crucial advantages of the technique is that it does not require
grammar, and as a consequence nothing need to be precompiled/generated. The
parser can be made 100% table-driven. It perfectly suitable for OO design.

I am glad to see that the method finally gains attention it certainly
deserves.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
 
Rod Pemberton...
Posted: Thu Nov 05, 2009 3:17 pm
Guest
"Dmitry A. Kazakov" <mailbox at (no spam) dmitry-kazakov.de> wrote in message
news:1jgp7sm64eejy$.1tz85slyasyb7$.dlg at (no spam) 40tude.net...
Quote:
On Thu, 5 Nov 2009 02:50:02 -0500, Rod Pemberton wrote:

Is anyone here familiar with Vaughn Pratt's concept of "Top Down
Operator
Precedence" (1973) or Pratt-parsers?

I don't have the paper, but I remember a Pratt's book (1975) where the
parsing technique was described.

I'd like to know more about the technique. Unfortunately, the articles
below are programming language heavy.

Apparently, Pratt-parsers are recursive descent parsers modified by top
down
operator precedence. It seems that this dramatically reduces the code
size.

AFAICT, Pratt-parsers are used with CGOL, LISP, and Scheme. Otherwise,
the
concept was obscure until recently. However, it has been rediscovered
apparently by Crockford:

for Javascript, by Douglas Crockford
http://javascript.crockford.com/tdop/tdop.html

for Python, by Fredrik Lundh
http://effbot.org/zone/simple-top-down-parsing.htm

I am using Pratt's method for decades. I am not sure if he invented it, he
didn't say it in the book, maybe he was. I have built several compilers on
it.

In my view it is simple and efficient. One of the compilers I built was in
Assembler for a 32K machine, which illustrates how simple the method is.

I also extended the methods towards two asymmetric priorities. You can
find
a description and implementation here

http://www.dmitry-kazakov.de/ada/components.htm#Parsers_etc

The section 11.2 contains an explanation how the techinique works for most
complex expressions one can find in programming languages and mathematics
notation:

http://www.dmitry-kazakov.de/ada/components.htm#11.2

One of the crucial advantages of the technique is that it does not require
grammar, and as a consequence nothing need to be precompiled/generated.
The
parser can be made 100% table-driven. It perfectly suitable for OO design.

I am glad to see that the method finally gains attention it certainly
deserves.


Thank you!

Hmm, your page lists "T. Pratt" and it seems Terrence Pratt wrote the book
you mentioned. But, from the descriptions, it sounds like they do same
thing... or very similar.


RP
 
Dmitry A. Kazakov...
Posted: Thu Nov 05, 2009 3:44 pm
Guest
On Thu, 5 Nov 2009 05:17:46 -0500, Rod Pemberton wrote:

Quote:
Hmm, your page lists "T. Pratt" and it seems Terrence Pratt wrote the book
you mentioned. But, from the descriptions, it sounds like they do same
thing... or very similar.

Once I asked in comp.compilers if anybody knew the author of the method,
nobody was certain about it.

Grammar-based approaches have been dominating compiler construction for
many years being in my eyes inferior. History of software is full of such
examples. For example, regular expressions is a quite weak class of
languages incapable to recognize simple bracket constructs, very difficult
and uncomfortable to use. Nevertheless it is basically the only pattern
language known, except for wild-cards patterns. Far better and simpler
SNOBOL patterns are forgotten.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
 
Ben Pfaff...
Posted: Thu Nov 05, 2009 10:07 pm
Guest
"Dmitry A. Kazakov" <mailbox at (no spam) dmitry-kazakov.de> writes:

Quote:
For example, regular expressions is a quite weak class of
languages incapable to recognize simple bracket constructs,
very difficult and uncomfortable to use. Nevertheless it is
basically the only pattern language known, except for
wild-cards patterns. Far better and simpler SNOBOL patterns are
forgotten.

That's very interesting. I have never heard of SNOBOL patterns
before. Wikipedia says:

A SNOBOL pattern can be very simple or extremely complex. A
simple pattern is just a text string (e.g. "ABCD"), but a
complex pattern may be a large structure describing, for
example, the complete grammar of a computer language. It is
possible to implement a language interpreter in SNOBOL almost
directly from a Backus-Naur form expression of it, with few
changes.

I see that there is a SNOBOL pattern extension for Python:
http://snopy.sourceforge.net/user-guide.html
--
"Writing is easy.
All you do is sit in front of a typewriter and open a vein."
--Walter Smith
 
Ben Bacarisse...
Posted: Thu Nov 05, 2009 11:20 pm
Guest
Ben Pfaff <blp at (no spam) cs.stanford.edu> writes:

Quote:
"Dmitry A. Kazakov" <mailbox at (no spam) dmitry-kazakov.de> writes:

For example, regular expressions is a quite weak class of
languages incapable to recognize simple bracket constructs,
very difficult and uncomfortable to use. Nevertheless it is
basically the only pattern language known, except for
wild-cards patterns. Far better and simpler SNOBOL patterns are
forgotten.

That's very interesting. I have never heard of SNOBOL patterns
before.

They are very powerful but a little "operational" in that some of the
time you feel you are telling the matcher what to _do_ rather than
what to match. None the less, SNOBOL left a deep impression on me. I
recall an implementation of Russel's paradox in SNOBOL: a pattern, R,
that matches only those patterns that don't match themselves. You
then ask if R matches R!

Have you come across Icon[1]? Designed, at least in part, by Griswold
of SNOBOL fame. It have some very elegant ideas. It is a shame it is
not more widely know.

[1] http://www.cs.arizona.edu/icon/

<snip>
--
Ben.
 
Ben Pfaff...
Posted: Thu Nov 05, 2009 11:36 pm
Guest
Ben Bacarisse <ben.usenet at (no spam) bsb.me.uk> writes:

Quote:
Ben Pfaff <blp at (no spam) cs.stanford.edu> writes:

"Dmitry A. Kazakov" <mailbox at (no spam) dmitry-kazakov.de> writes:

For example, regular expressions is a quite weak class of
languages incapable to recognize simple bracket constructs,
very difficult and uncomfortable to use. Nevertheless it is
basically the only pattern language known, except for
wild-cards patterns. Far better and simpler SNOBOL patterns are
forgotten.

That's very interesting. I have never heard of SNOBOL patterns
before.

They are very powerful but a little "operational" in that some of the
time you feel you are telling the matcher what to _do_ rather than
what to match. None the less, SNOBOL left a deep impression on me. I
recall an implementation of Russel's paradox in SNOBOL: a pattern, R,
that matches only those patterns that don't match themselves. You
then ask if R matches R!

I looked around the web for a while and found a few superficial
descriptions of SNOBOL patterns. I also found a few SNOBOL
reference manuals, but again their descriptions of patterns were
very brief and I found it difficult to figure out how they were
used and why exactly they were so powerful.

Can you give a few examples?

Quote:
Have you come across Icon[1]? Designed, at least in part, by Griswold
of SNOBOL fame. It have some very elegant ideas. It is a shame it is
not more widely know.

[1] http://www.cs.arizona.edu/icon/

I'm scanning through the book now:
http://www.cs.arizona.edu/icon/ftp/doc/lb1up.pdf
--
"...dans ce pays-ci il est bon de tuer de temps en temps un amiral
pour encourager les autres."
--Voltaire, _Candide_
 
Rod Pemberton...
Posted: Fri Nov 06, 2009 3:58 am
Guest
"Ben Bacarisse" <ben.usenet at (no spam) bsb.me.uk> wrote in message
news:0.0d2ddf5bbc876993d58c.20091105182024GMT.877hu43kbb.fsf at (no spam) bsb.me.uk...
Quote:
Have you come across Icon[1]? Designed, at least in part, by Griswold
of SNOBOL fame. It have some very elegant ideas. It is a shame it is
not more widely know.

[1] http://www.cs.arizona.edu/icon/


Icon was used to write GBURG parser by Chris Fraser and Todd Proebsting in
"Finite-State Code Generation". One or both of them and/or David Hanson
worked on other similarly named parsers BURG, IBURG. Chris Fraser and David
Hanson are the ones who created the LCC C compiler.


Rod Pemberton
 
Ben Bacarisse...
Posted: Fri Nov 06, 2009 6:17 am
Guest
Ben Pfaff <blp at (no spam) cs.stanford.edu> writes:

Quote:
Ben Bacarisse <ben.usenet at (no spam) bsb.me.uk> writes:

Ben Pfaff <blp at (no spam) cs.stanford.edu> writes:

"Dmitry A. Kazakov" <mailbox at (no spam) dmitry-kazakov.de> writes:

For example, regular expressions is a quite weak class of
languages incapable to recognize simple bracket constructs,
very difficult and uncomfortable to use. Nevertheless it is
basically the only pattern language known, except for
wild-cards patterns. Far better and simpler SNOBOL patterns are
forgotten.

That's very interesting. I have never heard of SNOBOL patterns
before.

They are very powerful but a little "operational" in that some of the
time you feel you are telling the matcher what to _do_ rather than
what to match. None the less, SNOBOL left a deep impression on me. I
recall an implementation of Russel's paradox in SNOBOL: a pattern, R,
that matches only those patterns that don't match themselves. You
then ask if R matches R!

I looked around the web for a while and found a few superficial
descriptions of SNOBOL patterns. I also found a few SNOBOL
reference manuals, but again their descriptions of patterns were
very brief and I found it difficult to figure out how they were
used and why exactly they were so powerful.

Can you give a few examples?

The main feature is that patterns are built from pattern primitives
using operators. Patterns are named so you can write:

DIGITS = SPAN('0123456789')
NUM = DIGITS | ( '+' | '-' ) DIGITS

The name is substituted with its value immediately unless you use what
is called an unevaluated expression:

P = *P 'a' *P | 'z'

matches the strings 'z', 'zaz', 'zazazaz' and so on. The classic
example being that you can match strings with balanced parentheses:

PAIRED = NOTANY('()') | '(' ARBNO(*PAIRED) ')'
BALANCED = PAIRED ARBNO(PAIRED)

NOTANY matches a string without any on the characters in the string
argument ([^()]* as a regular expression) and ARBNO matches an
arbitrary number of repetitions of a pattern (* in the more common RE
notation).

My comment about it being rather operational comes from the fact that
there are primitives that interact directly with the scanning
algorithm. For example, FAIL always simply fails to match causing the
scanner to backtrack if it can and a lot of the power comes from
assignments done mid-match, using the $ operator.

For example,

LEN(1) $ IT *IT

matches and repeated character. LEN(1) matches any single character
and $ IT assigns the matched character to the variable IT at that
point in the scan. *IT refers to the character stored, thereby
matching only double letters.

Output is done but assigning to the variable OUTPUT, so to illustrate
FAIL one could write:

'MISSISSIPPI' (LEN(1) $ X *X) $ OUTPUT FAIL

to print:

SS
SS
PP

Without the FAIL, only the first pair would be printed since the
pattern would have succeeded.

--
Ben.
 
James Dow Allen...
Posted: Fri Nov 06, 2009 8:07 am
Guest
On Nov 6, 1:36 am, Ben Pfaff <b... at (no spam) cs.stanford.edu> wrote:
Quote:
I looked around the web for a while and found a few superficial
descriptions of SNOBOL patterns....

Can you give a few examples?

Another nice feature of the SNOBOL programming language
is that the string matching a subpattern can be saved in
a variable. IIRC,
A B.X C
is the same as
A B C
but the string matching the subpattern B is saved in X.

I've often wanted to use this feature when sed'ing, e.g.
sed "sqhas [0-9]* wivesqwill have N happy widowsq"
where "N" is the matching [0-9]* remembered.

James Dow Allen
 
Ben Bacarisse...
Posted: Fri Nov 06, 2009 5:13 pm
Guest
James Dow Allen <jdallen2000 at (no spam) yahoo.com> writes:

Quote:
On Nov 6, 1:36 am, Ben Pfaff <b... at (no spam) cs.stanford.edu> wrote:
I looked around the web for a while and found a few superficial
descriptions of SNOBOL patterns....

Can you give a few examples?

Another nice feature of the SNOBOL programming language
is that the string matching a subpattern can be saved in
a variable. IIRC,
A B.X C

You recall the operator correctly but, unless it is my turn to
misremember, SNOBOL is very fussy about spaces and you must write:

A B . X C

Quote:
is the same as
A B C
but the string matching the subpattern B is saved in X.

In case anyone is puzzled, SNOBOL has 2 of these assignment
operators. The . assigns on a successful match and $ (that I
illustrated elsewhere) does so, repeatedly, during the match.

Quote:
I've often wanted to use this feature when sed'ing, e.g.
sed "sqhas [0-9]* wivesqwill have N happy widowsq"
where "N" is the matching [0-9]* remembered.

Which, of course, you can do. Using a name is nicer, but \1 does the
job. In SNOBOL, you use pretty much anything as the replacement:

LINE SPAN('0123456789') . N = N + 1

increments the first number found in the variable LINE.

--
Ben.
 
Christian Gollwitzer...
Posted: Fri Nov 06, 2009 8:24 pm
Guest
James Dow Allen schrieb:
Quote:
I've often wanted to use this feature when sed'ing, e.g.
sed "sqhas [0-9]* wivesqwill have N happy widowsq"
where "N" is the matching [0-9]* remembered.

Use backreferences:

$ sed 'sqhas \([0-9]*\) wivesqwill have \1 happy widowsq'
has 9 wives
will have 9 happy widows
has 4 wives
will have 4 happy widows

Christian
 
Ben Pfaff...
Posted: Fri Nov 06, 2009 10:20 pm
Guest
Ben Bacarisse <ben.usenet at (no spam) bsb.me.uk> writes:

Quote:
Ben Pfaff <blp at (no spam) cs.stanford.edu> writes:
I looked around the web for a while and found a few superficial
descriptions of SNOBOL patterns. I also found a few SNOBOL
reference manuals, but again their descriptions of patterns were
very brief and I found it difficult to figure out how they were
used and why exactly they were so powerful.

Can you give a few examples?

The main feature is that patterns are built from pattern primitives
using operators. Patterns are named so you can write:

Thank you for the examples.
--
"Long noun chains don't automatically imply security."
--Bruce Schneier
 
James Dow Allen...
Posted: Sat Nov 07, 2009 7:53 am
Guest
On Nov 6, 10:24 pm, Christian Gollwitzer <Christian.Gollwit... at (no spam) uni-
bayreuth.de> wrote:
Quote:
James Dow Allen schrieb:

I've often wanted to use this feature when sed'ing, e.g.
   sed "sqhas [0-9]* wivesqwill have N happy widowsq"
where "N" is the matching [0-9]* remembered.

Use backreferences:

$ sed 'sqhas \([0-9]*\) wivesqwill have \1 happy widowsq'
has 9 wives
will have 9 happy widows
has 4 wives
will have 4 happy widows

        Christian

Thanks! I had a hunch sed had such a facility and thought it easier
to ask here than read the man. :-)

Odd that I remembered a feature (forgetting the spaces) of a language
(SNOBOL) which I've not used for *40 years* ... yet can't remember
what I had for breakfast yesterday.

James
 
 
Page 1 of 1    
All times are GMT
The time now is Tue Nov 24, 2009 4:06 am