Main Page | Report this Page
Computers Forum Index  »  Computer Compilers  »  parser generator terminology...
Page 1 of 1    

parser generator terminology...

Author Message
Ralph Boland...
Posted: Sun Sep 06, 2009 4:56 pm
Guest
I am designing my own parser generator tool (please don't advise me on
the wisdom of creating yet another parser generator tool) and am
trying to sort out the terminology so that I can name my classes,
methods and variables (identifiers).

Some things are clear. The specification of the syntax of a language
is a grammar. The grammar consists of a set of productions or rules.
Each production has a name and a right hand side consisting of
terminals and nonterminals. Based on the grammar a parser is built
for constructing a parse tree from a given input program (sentence).
In addition a sentence in a language consists of a set of words. The
parser uses a scanner to decompose an input program into its words.
Words are classified into types called lexemes instances of which are
called tokens. The tokens correspond to the nonterminals of the
grammar.

From here things are less clear to me.

1) Is there a name for the definition of the set of tokens; preferably
a short name useful for naming identifiers? (I do not like regular
definition since it implies a set of rules I do not follow.)

2) Most parser generator tools actually use attribute grammars, which
have attributes or semantic actions, and build abstract syntax trees.
In my system the grammar is not attributed and instead a separate
table is used to define the attributes and semantic actions associated
with the grammar rules. I need a name for this table but don't know
what to call it. Possibilities are attribute table, transformation
table, and abstract syntax table. I don't like these though because
they lead to long identifier names. I am considering using tree morph
table, or just morph table. Individual entries in the morph table
would be called morphs. This works well for naming identifiers but
will be cryptic for anybody else but me. Can anybody make a better
suggestion than those I mentioned?

3) Most parser generator tools build parsers that call the scanner to
get tokens. In my system the parser calls a token fetcher that used
one or more scanners to get tokens and may then process its input
before returning its versions of tokens back to the parser. I do not
have a good name for the token fetcher. One possibility is to call
the scanner the lexer and the token fetcher the scanner. Can anybody
suggest a name for my token fetcher?

4) Some systems have specifications for defining methods for walking
over abstract syntax trees and making further transformations or
constructions. I am not planning to do this but am curious to know a
good name for such a specification preferably suitable for identifier
naming.

All suggestions much appreciated.

Ralph Boland
 
Hans-Peter Diettrich...
Posted: Sun Sep 06, 2009 11:34 pm
Guest
Ralph Boland schrieb:

Quote:
1) Is there a name for the definition of the set of tokens; preferably
a short name useful for naming identifiers? (I do not like regular
definition since it implies a set of rules I do not follow.)

IMO that's also kind of a grammar.

Quote:
2) Most parser generator tools actually use attribute grammars, which
have attributes or semantic actions, and build abstract syntax trees.
In my system the grammar is not attributed and instead a separate
table is used to define the attributes and semantic actions associated
with the grammar rules. I need a name for this table but don't know
what to call it.

I prefer to think of events and event handlers. Events can be the input
of a new token, the queries about associating a token with elements of a
rule, the reduction of a completed rule etc.


Quote:
3) Most parser generator tools build parsers that call the scanner to
get tokens. In my system the parser calls a token fetcher that used
one or more scanners to get tokens and may then process its input
before returning its versions of tokens back to the parser. I do not
have a good name for the token fetcher. One possibility is to call
the scanner the lexer and the token fetcher the scanner. Can anybody
suggest a name for my token fetcher?

Preprocessor?


Quote:
4) Some systems have specifications for defining methods for walking
over abstract syntax trees and making further transformations or
constructions. I am not planning to do this but am curious to know a
good name for such a specification preferably suitable for identifier
naming.

I used event handlers also for the creation of an AST from a parse tree.
While the construction of the parse tree is up to the parser, further
actions and transformations IMO are application specific.

DoDi
 
Chris F Clark...
Posted: Mon Sep 07, 2009 4:16 am
Guest
Ralph Boland <rpboland at (no spam) gmail.com> writes:

Quote:
I am designing my own parser generator tool (please don't advise me on
the wisdom of creating yet another parser generator tool) and am
trying to sort out the terminology so that I can name my classes,
methods and variables (identifiers).

First, although I'm one of those who regularly questions the wisdom of
yet-another-parser-generator, I will refrain in your case because you
have clearly earned your "stripes" by doing this several times before.
If you feel there is something youy want to solve, the solve it and
don't be defensive about it. I won't even make the token offer of
trying to collaborate on a new Yacc++ version, because I know your
approach is sufficiently different that it would be a hinderance than
a help.

Quote:
1) Is there a name for the definition of the set of tokens; preferably
a short name useful for naming identifiers? (I do not like regular
definition since it implies a set of rules I do not follow.)

The set of all tokens should be called a synonym of the "vocabulary"
(or perhaps vocab for short). The set of all characters is the
alphabet (or sigma). However, I think that's not what you are asking.

Do you mean a set of tokens that have different spellings but the same
token type? As opposed to tokens that have only one spelling. You
mention identifiers and they fit in this category. I'm not certain
their is a name that describes tokens that mean sets.

Quote:
In my system the grammar is not attributed and instead a separate
table is used to define the attributes and semantic actions associated
with the grammar rules. I need a name for this table but don't know
what to call it. Possibilities are attribute table, transformation
table, and abstract syntax table. I don't like these though because
they lead to long identifier names.

The abbreviation AST for abstract syntax tree would be short and is
well-known.

Quote:
I am considering using tree morph
table, or just morph table. Individual entries in the morph table
would be called morphs. This works well for naming identifiers but
will be cryptic for anybody else but me. Can anybody make a better
suggestion than those I mentioned?

Morph seems fine, since it is a nice word that means to change the
shape of something, and that's what your table does.

Quote:
3) Most parser generator tools build parsers that call the scanner to
get tokens. In my system the parser calls a token fetcher that used
one or more scanners to get tokens and may then process its input
before returning its versions of tokens back to the parser. I do not
have a good name for the token fetcher. One possibility is to call
the scanner the lexer and the token fetcher the scanner. Can anybody
suggest a name for my token fetcher?

If you put thinks like converting indentifiers into keywords in your
2nd phase, you could use Frank DeRemenrs names: the scanner (regx
processing, forming tokens from chars) and the screener (taking the
tokenize text and separating it into meaningful classes).

Another name you might consider for the after scanner part is the
"filter", that would be appropriate if its job is to remove certain
tokens that are unnecessary (e.g. whitespace and comments).

Quote:
4) Some systems have specifications for defining methods for walking
over abstract syntax trees and making further transformations or
constructions. I am not planning to do this but am curious to know a
good name for such a specification preferably suitable for identifier
naming.

Typically, I've heard such things called "rewriters".

Quote:
All suggestions much appreciated.

Ralph Boland

Hope this helps,
-Chris

******************************************************************************
Chris Clark Internet: christopher.f.clark at (no spam) compiler-resources.com
Compiler Resources, Inc. or: compres at (no spam) world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)
 
Hans Aberg...
Posted: Mon Sep 07, 2009 11:16 am
Guest
Michiel wrote:
Quote:
1) Is there a name for the definition of the set of tokens; preferably
a short name useful for naming identifiers? (I do not like regular
definition since it implies a set of rules I do not follow.)

Not that I know of. What about 'vocabulary'?

In Waite & Goos, the vocabulary V is the (disjoint) union of the sets of
non-terminals and terminals. They first defines ageneral rewriting
system, with sentences members of V*, the set of finite strings (the
free monoid) of V.

One might call non-terminals "(grammar or syntactic) variables", and
terminals "(grammar or syntactic) constants"; a joint name for both
might be "symbols" (or we decided in Bison I think).

Hans
 
Chris F Clark...
Posted: Mon Sep 07, 2009 10:27 pm
Guest
Hans Aberg <haberg_20080406 at (no spam) math.su.se> writes:

Quote:
Michiel wrote:
1) Is there a name for the definition of the set of tokens; preferably
a short name useful for naming identifiers? (I do not like regular
definition since it implies a set of rules I do not follow.)

Not that I know of. What about 'vocabulary'?

In Waite & Goos, the vocabulary V is the (disjoint) union of the sets of
non-terminals and terminals. They first defines ageneral rewriting
system, with sentences members of V*, the set of finite strings (the
free monoid) of V.

One might call non-terminals "(grammar or syntactic) variables", and
terminals "(grammar or syntactic) constants"; a joint name for both
might be "symbols" (or we decided in Bison I think).

While I'm not so foolish as to argue with Waite, Goos, or the Bison
maintainers, epseically when I've Seen V, VT, and VN used rather
universally as the 3 sets of all symbols, terminals, and
non-terminals, there is precedence for vocbulary as VT, given that
Terence Parr uses it in ANTLR if I recall correctly.

However, clearly naming is a hard thing, and most good short words
will already carry some previous usage that will be confusing or
contradictory with whatever convention one uses. And, while I find
single letter names too short (sorry mathematicians, but it seems to
obscure your work not clarify it), the convention of V, VN, VT,
\Sigma, are pretty well established. I'd have to go find a paper to
recall the rest of the 1 letter names that are commonly
used--e.g. IIRC P is the set of productions, G is the goal symbol.

Now, I would think one could expand the 1 letter names into longer
strings, V=vocab, VT = vocab_term, VN = vocab_nterm, sigma and still
be clear of how one is leveraging the theory. However, my expereince
is that if you have a term that one uses enough, one can get a way
with quite a short name, e.g. VT might be clear enough in context that
you don't need a longer form. If I talk about a SR-conflict, I can
assume most of my readers will know what that means without calling is
a shf_rdc_cfl which would be the standard abbreviation in Yacc++,
since in most places I can't use S for shift as it wouldn't be clear
if I meant state.

Hope this helps,
-Chris

******************************************************************************
Chris Clark email: christopher.f.clark at (no spam) compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: at (no spam) intel_chris
------------------------------------------------------------------------------
 
Hans Aberg...
Posted: Wed Sep 09, 2009 8:40 pm
Guest
Chris F Clark wrote:
Quote:
In Waite & Goos, the vocabulary V is the (disjoint) union of the sets of
non-terminals and terminals. They first defines a general rewriting
system, with sentences members of V*, the set of finite strings (the
free monoid) of V.
....
While I'm not so foolish as to argue with Waite, Goos, or the Bison
maintainers, epseically when I've Seen V, VT, and VN used rather
universally as the 3 sets of all symbols, terminals, and
non-terminals, there is precedence for vocbulary as VT, given that
Terence Parr uses it in ANTLR if I recall correctly.

This is not formally wrong: V can be any finite set of symbols. A
language is then a subset of V*. So it is OK to define a language L as a
subset of T*, in which case the set of terminals T is the vocabulary.
The set of non-terminals N is only needed when one wants to define a
general rewriting system from a grammar to define the language L.
Different grammar specs of L may lead to different N.

Hans
 
Chris F Clark...
Posted: Sun Sep 13, 2009 10:52 am
Guest
Hans Aberg <haberg_20080406 at (no spam) math.su.se> writes:

Quote:
Chris F Clark wrote:
In Waite & Goos, the vocabulary V is the (disjoint) union of the sets of
non-terminals and terminals. They first defines a general rewriting
system, with sentences members of V*, the set of finite strings (the
free monoid) of V.
...
While I'm not so foolish as to argue with Waite, Goos, or the Bison
maintainers, epseically when I've Seen V, VT, and VN used rather
universally as the 3 sets of all symbols, terminals, and
non-terminals, there is precedence for vocbulary as VT, given that
Terence Parr uses it in ANTLR if I recall correctly.

This is not formally wrong: V can be any finite set of symbols. A
language is then a subset of V*. So it is OK to define a language L as a
subset of T*, in which case the set of terminals T is the vocabulary.
The set of non-terminals N is only needed when one wants to define a
general rewriting system from a grammar to define the language L.
Different grammar specs of L may lead to different N.

Yes, good point. However, if one is trying to be clear, one needs to
be clear as to what means by Vocabulary, V = VN U VT or just VT.
Since, if I recall correctly, the original poster, Ralph Boland, was
interested in a case where he has both terminals and non-terminals and
he wanted a name for a set of just the terminals. Well, at least
that's how I interpreted his question.

The question goes back to whether there is a good one-word name for
"the set of terminals" in a language. While I like vocabulary as a
term for describing it, I think it may be non-standard, especially in
a context where there are non-terminals of interest.

Of course, it goes the other way too, if you use vocbulary for VT,
what word do you use for VN U VT? I don't think there is a common
word thet means all the words and phrases in a language (VT are the
named "words" and VN are the named "phrases" to my mind).

-Chris
 
Hans Aberg...
Posted: Mon Sep 14, 2009 5:17 pm
Guest
Chris F Clark wrote:
Quote:
While I'm not so foolish as to argue with Waite, Goos, or the Bison
maintainers, epseically when I've Seen V, VT, and VN used rather
universally as the 3 sets of all symbols, terminals, and
non-terminals, there is precedence for vocbulary as VT, given that
Terence Parr uses it in ANTLR if I recall correctly.

This is not formally wrong: V can be any finite set of symbols. A
language is then a subset of V*. So it is OK to define a language L as a
subset of T*, in which case the set of terminals T is the vocabulary.
The set of non-terminals N is only needed when one wants to define a
general rewriting system from a grammar to define the language L.
Different grammar specs of L may lead to different N.

Yes, good point. However, if one is trying to be clear, one needs to
be clear as to what means by Vocabulary, V = VN U VT or just VT.
Since, if I recall correctly, the original poster, Ralph Boland, was
interested in a case where he has both terminals and non-terminals and
he wanted a name for a set of just the terminals. Well, at least
that's how I interpreted his question.

The question goes back to whether there is a good one-word name for
"the set of terminals" in a language.

How about the "terminals"? :-)

Quote:
While I like vocabulary as a
term for describing it, I think it may be non-standard, especially in
a context where there are non-terminals of interest.

Of course, it goes the other way too, if you use vocbulary for VT,
what word do you use for VN U VT? I don't think there is a common
word thet means all the words and phrases in a language (VT are the
named "words" and VN are the named "phrases" to my mind).

They will be vocabularies for different languages. The language L that
is a subset of T* is the set of sentences one would write. The language
that is a subset of (N V T)* but not T* is the grammar description
language. The subset when all variables (non-terminals) have been
expanded into constants (terminals) is L.

Hans
 
George Neuner...
Posted: Mon Sep 14, 2009 10:31 pm
Guest
On Sun, 13 Sep 2009 02:52:56 -0400, Chris F Clark
<cfc at (no spam) shell01.TheWorld.com> wrote:

Quote:
I don't think there is a common
word thet means all the words and phrases in a language (VT are the
named "words" and VN are the named "phrases" to my mind).

"Lexicon"?
 
 
Page 1 of 1    
All times are GMT
The time now is Wed Dec 02, 2009 4:30 am