Main Page | Report this Page
Computers Forum Index  »  Computer Compilers  »  Switch statement code generation...
Page 1 of 2    Goto page 1, 2  Next

Switch statement code generation...

Author Message
Joshua Cranmer...
Posted: Tue Nov 03, 2009 6:01 pm
Guest
I'm trying to put together a list of various methods to do code generate
for switch statements.

This is what I have so far:
* Successive if/else branches
* Jump tables
* branch tables with linear and binary scans
* Clustering tables (e.g., if you have cases 0-100 and 300-400).

Are there any other methods worth mentioning?

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
[I think I've seen hashing into branch tables. -John]
 
glen herrmannsfeldt...
Posted: Wed Nov 04, 2009 5:27 am
Guest
Joshua Cranmer <Pidgeot18 at (no spam) verizon.invalid> wrote:
Quote:
I'm trying to put together a list of various methods to do code generate
for switch statements.

This is what I have so far:
* Successive if/else branches
* Jump tables
* branch tables with linear and binary scans
* Clustering tables (e.g., if you have cases 0-100 and 300-400).

The two I have seen done on S/370 are an indexed branch to a table of
branch instructions, and indexed load of an address to a register, and
then BR to that address. I presume those are two of the above. (IBM
programs commonly have return codes that are multiples of four,
presumably to make the indexing easier.)

Quote:
Are there any other methods worth mentioning?

VAX has the CASE instruction, most likely a microcode implementation
of one of the above. When VAX was new, there were stories that many
of the special instructions (such as POLY and INDEX) were slower than
doing the same operation using separate instructions.

I believe that JVM also has a special instruction for this
operation, as the above likely don't work there.

Quote:
[I think I've seen hashing into branch tables. -John]

Sounds better than branching into hash tables.

-- glen
[On machines with condition codes, I've seen binary searches expanded
into code with compare and branch instructions; after the comparison
you can do a three way branch on less, equal, or greater. -John]
 
Kaz Kylheku...
Posted: Wed Nov 04, 2009 6:08 am
Guest
On 2009-11-03, Joshua Cranmer <Pidgeot18 at (no spam) verizon.invalid> wrote:
Quote:
I'm trying to put together a list of various methods to do code generate
for switch statements.

This is what I have so far:
* Successive if/else branches
* Jump tables
* branch tables with linear and binary scans
* Clustering tables (e.g., if you have cases 0-100 and 300-400).

Are there any other methods worth mentioning?

if/else arranged in a binary search tree, rather than linear succession.
Analogous to your branch table with binary scan, but using code
rather than a table to reduce the comparions for N labels to log N.

E.g. cases are:

10 50 90 1000 1200 3000 4000 7500 12345

if (x < 3000) {
if (x < 90) {
if (x == 10) {
...
} else if (x == 50) {
...
}
} else {
if (x == 90) {
...
} else if (x == 1000) {
...
} else if (x == 1200) {
...
}
}
} else {
/* similarly for 3000-12345 range */
}

This can combine with clustering. Binary search to classify an input
value into cluster ranges, then check the range and use the table for that
range, or default out.
 
glen herrmannsfeldt...
Posted: Wed Nov 04, 2009 8:51 am
Guest
glen herrmannsfeldt <gah at (no spam) ugcs.caltech.edu> wrote:
(snip, someone wrote)

Quote:
[On machines with condition codes, I've seen binary searches expanded
into code with compare and branch instructions; after the comparison
you can do a three way branch on less, equal, or greater. -John]

I once saw in a Fortran book the suggestion that one use arithmetic
IF in place of computed GOTO with three labels.

IF(N-2) 1,2,3

instead of

GOTO (1,2,3),N

For modern processors depending on branch prediction logic,
either a branch table or address table will likely foil any
prediction system. Complicated conditional branching is likely
faster in that case.

-- glen
[Hey, it worked great on the 709. -John]
 
BGB / cr88192...
Posted: Wed Nov 04, 2009 12:13 pm
Guest
"glen herrmannsfeldt" <gah at (no spam) ugcs.caltech.edu> wrote in message
Quote:
Joshua Cranmer <Pidgeot18 at (no spam) verizon.invalid> wrote:
I'm trying to put together a list of various methods to do code generate
for switch statements.


<snip>

Quote:
I believe that JVM also has a special instruction for this
operation, as the above likely don't work there.


I think both the JVM and MSIL/CIL have special instructions for jump
tables...


Quote:
[I think I've seen hashing into branch tables. -John]

Sounds better than branching into hash tables.

-- glen
[On machines with condition codes, I've seen binary searches expanded
into code with compare and branch instructions; after the comparison
you can do a three way branch on less, equal, or greater. -John]

yeah...

in my compiler (on x86 and x64) I ended up using condition codes to
implement a trinary jump of this sort.

I had intended this for implementing slightly faster switches (errm...
because my compiler does not have jump tables... lame, I know, just never
got around to it...).

from what I remember, I do a binary lookup to handle switches.


then again, I am still not using my compiler as a "general purpose" C
compiler, and so there are still a few deficiencies I have never really
gotten around to fixing.

so many other things to work on, such as the x86 interpreter (now works ok,
and is ~170x slower than native, but still lacks a good deal in terms of API
bindings, me left thinking I may need some sort of IDL-like technology for
this, ...), ...

maybe just a whole lot of busywork, I don't know...
 
Ian Lance Taylor...
Posted: Wed Nov 04, 2009 12:50 pm
Guest
Joshua Cranmer <Pidgeot18 at (no spam) verizon.invalid> writes:

Quote:
I'm trying to put together a list of various methods to do code generate
for switch statements....

Are there any other methods worth mentioning?

There are a number of different methods listed in
http://ols.fedoraproject.org/GCC/Reprints-2008/sayle-reprint.pdf .

Ian
 
Anton Ertl...
Posted: Wed Nov 04, 2009 3:51 pm
Guest
Joshua Cranmer <Pidgeot18 at (no spam) verizon.invalid> writes:
Quote:
I'm trying to put together a list of various methods to do code generate
for switch statements.
....
Are there any other methods worth mentioning?

Hash-table-based jump tables.

IIRC the following paper discusses these methods:

at (no spam) Article{journals/spe/KannanP94,
title = "Short Communication: Correction to 'Producing Good
Code for the case Statement'",
author = "Sampath Kannan and Todd A. Proebsting",
journal = "Softw, Pract. Exper",
year = "1994",
number = "2",
volume = "24",
bibdate = "2003-11-25",
bibsource = "DBLP,
http://dblp.uni-trier.de/db/journals/spe/spe24.html#KannanP94",
pages = "233",
}

and this refers to:

at (no spam) Article{bernstein:85a,
author = "R. L. Bernstein",
title = "Producing good code for the case statement",
journal = "Software -- Practice \& Experience",
volume = "15",
number = "10",
pages = "1021--1024",
month = oct,
year = "1985",
}

These days, with the varying amounts of branch prediction hardware,
the best way probably depends on the hardware (which branch predictors
are implemented in the hardware?) and on the run-time characteristics
of the program (how effective are the branch predictors on this
particular usage pattern).

- anton
--
M. Anton Ertl
anton at (no spam) mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
 
Joshua Cranmer...
Posted: Wed Nov 04, 2009 5:38 pm
Guest
On 11/04/2009 12:27 AM, glen herrmannsfeldt wrote:
Quote:
I believe that JVM also has a special instruction for this
operation, as the above likely don't work there.

Actually two special bytecodes, lookupswitch and tableswitch. These are
the linear scan and the jump table methods, respectively. The upcoming
string switch appears to use a hash routine.

Quote:
[On machines with condition codes, I've seen binary searches expanded
into code with compare and branch instructions; after the comparison
you can do a three way branch on less, equal, or greater. -John]

It seems both gcc and llvm emit code like this; I've added it to my list
already.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
 
Walter Banks...
Posted: Fri Nov 06, 2009 1:12 am
Guest
Joshua Cranmer wrote:

Quote:
I'm trying to put together a list of various methods to do code generate
for switch statements.

This is what I have so far:
* Successive if/else branches

I have seen / have implemented if/else as either buried as part of the
switch body or as a separate if/else code block that jumps into the
switch body. The determining factor is the distance a conditional
branch can jump.

Regards,

Walter..
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
 
bartc...
Posted: Fri Nov 06, 2009 3:11 am
Guest
"Joshua Cranmer" <Pidgeot18 at (no spam) verizon.invalid> wrote in message
news:09-11-009 at (no spam) comp.compilers...
Quote:
On 11/04/2009 12:27 AM, glen herrmannsfeldt wrote:
I believe that JVM also has a special instruction for this
operation, as the above likely don't work there.

Actually two special bytecodes, lookupswitch and tableswitch. These are
the linear scan and the jump table methods, respectively. The upcoming
string switch appears to use a hash routine.

Hey, that's what I use.. invented independently of course.

'switch' for an indexed jump table, for case values in the range of 500 or
1000 (between minimum and maximum values).
'cswitch' for other case values, which just does a linear search.

Since these implement 'Switch' for an interpreted language, with the cswitch
bytecode implemented in tight assembler, opting for a linear search doesn't
have as much impact as it would do in a compiled language, especially if the
most common cases appear first.

However, with a big enough table size then I guess a proper lookup is called
for.

--
Bartc
 
nathan.mccauley at (no spam) gmail.com...
Posted: Sun Nov 08, 2009 4:52 am
Guest
On Nov 5, 12:12 pm, Walter Banks <wal... at (no spam) bytecraft.com> wrote:
Quote:
Joshua Cranmer wrote:
I'm trying to put together a list of various methods to do code generate
for switch statements. ...

Has anyone seen tries used for the string-based switch statements?
 
Hans-Peter Diettrich...
Posted: Mon Nov 09, 2009 12:09 am
Guest
nathan.mccauley at (no spam) gmail.com schrieb:

Quote:
I'm trying to put together a list of various methods to do code generate
for switch statements. ...

Has anyone seen tries used for the string-based switch statements?

What dou you expect?

Visual Basic had such a feature, implemented as a series of string
compares and jumps.

The typical solution is a lookup in a (sorted, hashed...) string list,
followed by an "ordinary" switch with the index of the found string. The
target address could be attached to every string in that list, so that a
second switch is not required.

A string tree might give the best lookup performance.

DoDi

[If there's a lot of strings, a patricia trie might be faster than
hashing or binary search since it doesn't require repeated scans of
the string during the match process. -John]
 
Ray...
Posted: Mon Nov 09, 2009 9:04 pm
Guest
nathan.mccauley at (no spam) gmail.com wrote:

Quote:
On Nov 5, 12:12 pm, Walter Banks <wal... at (no spam) bytecraft.com> wrote:
Joshua Cranmer wrote:
I'm trying to put together a list of various methods to do code
generate for switch statements. ...

Has anyone seen tries used for the string-based switch statements?

No... Not tries exactly. Since compilation happens on static data, it's
easy to build a simple balanced binary tree. The extra complication of
tries when you're not going to be doing insertions and deletions isn't
worth it.

I think the winner for string switches is a string representation with
a precomputed hash code, used in combination with a hashtable of
jump addresses.

Because the elements are known at compilation time, you can know in
advance the number of collisions in the longest bucket when you're
allocating the table, which means you can just allocate a table of
NxM entities where N is the number of buckets (a power of 2) and M
is the longest length needed (rounded up to a power of 2).

So when you get the string's hash code, you do a bitmask to get the
index into the hash table, and then check to see if you have a matching
hash code(success) or a zero hashcode (failure) there - if not,
increment the index by one until you do (a degenerate case of
"rehashing" that you can use here because you don't have to deal
with the possibility of making space for arbitrary insertions) and
then you have the jump address.

Bear
 
Hans-Peter Diettrich...
Posted: Wed Nov 11, 2009 12:52 pm
Guest
Hans-Peter Diettrich schrieb:

Quote:
A string tree might give the best lookup performance.

[If there's a lot of strings, a patricia trie might be faster than
hashing or binary search since it doesn't require repeated scans of
the string during the match process. -John]

Thanks for the "patricia trie" term Smile
Wikipedia redirects it to "Radix tree".

This is what I meant with "string tree".

DoDi
 
Derek M. Jones...
Posted: Wed Nov 11, 2009 5:39 pm
Guest
All,

Working out the best set of algorithms to use for mapping
switch statements to machine code obviously requires
information on switch statement usage in real code.

The latest issue of CVu, the magazine of he ACCU
www.accu.org, has an article by yours truely comparing
the usage of switch statements and nested if-else-if sequences
in C code.

You need to be an ACCU member to access the article (yearly
membership #25 or #35 if you want both magazines). I will
make a pdf freely available in a month or so.

While I'm talking about stuff I have written, switch statement code
generation gets a mention in this:
shape-of-code.coding-guidelines.com/2009/09/register-vs-stack-based-vms/
 
 
Page 1 of 2    Goto page 1, 2  Next
All times are GMT
The time now is Mon Nov 23, 2009 10:59 am