 |
|
| Computers Forum Index » Computer Languages (Misc) » the 'switch' limit...... |
|
Page 4 of 4 Goto page Previous 1, 2, 3, 4 |
|
| Author |
Message |
| Mike Austin... |
Posted: Thu Nov 05, 2009 12:11 pm |
|
|
|
Guest
|
BGB / cr88192 wrote:
Quote: this is an observation I have made in the past, and I think I will state
here as if it were some kind of rule:
when a profiler says that the largest amount of time used in an interpreter
is the main opcode dispatch switch (rather than some function called by it,
or some logic contained within said switch, ... especially if this switch
does nothing more than call functions...), then they are rapidly approaching
the performance limits they can hope to gain from said interpreter.
A little late to the party, but...
I say let an interpreter be an interpreter, and provide access to fast
libraries and frameworks. These days, you can write a 3D game in a scripting
language using math and physics libraries and opengl vertex objects. Throw in
a simple native scene graph, and your bottleneck is no longer the runtime. I
admire projects like LuaJit, but if I need absolute speed, it seems easier to
just use C or asm.
Mike
Quote: which is kind of lame in this case, as this interpreter is, technically, not
all that fast...
but, then again, there may still be a little tweaking left, as said switch
has not gained 1st place (it is 2nd), and holds approx 12% of the total
running time.
1st place, at 15%, is the hash-table for fetching instructions (and, sadly,
is not really optimizable unless I discover some way to 'optimize' basic
arithmetic expressions, which FWIW is about as productive as trying to
optimize a switch...).
this means: 27% of time:
hash EIP (currently: "((EIP*65521)>>16)&65535");
fetch opcode-struct from hash-table (the hash-fail rate is apparently very
low);
switch on opcode info (branches to functions containing further dispatch and
logic code).
(well, all this, as well as a few arithmetic ops, such as adding the opcode
size to EIP, ...).
previously I had an opcode decoder which was not super fast (approx 60% of
runtime if sequential decoding is used), but the hash seems to have largely
eliminated it (since most of the time, pre-decoded opcodes are used from the
hash).
so, alas, it is an x86 interpreter performing at approx 386 (or maybe
486)-like speeds on my computer (~6 MIPS...).
could be faster, except that MSVC on x64 has no real idea what it is doing
WRT optimizations.
then again, it is around 12x faster than when I started trying to optimize
it (~0.5 MIPS...).
note: I have plenty of past experience with both ASM and JIT, but at the
moment am leaning against this, and in favor of using pure C-based
interpretation for now.
dunno if anyone knows some major way around this.
IOW: if there is anything that can be done when the main switch creeps into
a high position, and code starts becomming very fussy about the fine details
WRT performance issues...
past experience is generally not... (that this is a general limit to C-based
interpreters...).
any comments?...
|
|
|
| Back to top |
|
|
|
| BGB / cr88192... |
Posted: Thu Nov 05, 2009 8:40 pm |
|
|
|
Guest
|
"Mike Austin" <mike at (no spam) mike-nospam-austin.com> wrote in message
news:DYSdnSRFsvEA52_XnZ2dnUVZ_hNi4p2d at (no spam) giganews.com...
Quote: BGB / cr88192 wrote:
this is an observation I have made in the past, and I think I will state
here as if it were some kind of rule:
when a profiler says that the largest amount of time used in an
interpreter is the main opcode dispatch switch (rather than some function
called by it, or some logic contained within said switch, ... especially
if this switch does nothing more than call functions...), then they are
rapidly approaching the performance limits they can hope to gain from
said interpreter.
A little late to the party, but...
I say let an interpreter be an interpreter, and provide access to fast
libraries and frameworks. These days, you can write a 3D game in a
scripting
language using math and physics libraries and opengl vertex objects.
Throw in
a simple native scene graph, and your bottleneck is no longer the runtime.
I
admire projects like LuaJit, but if I need absolute speed, it seems easier
to
just use C or asm.
FWIW, the interpreter is currently running C, and also ASM...
granted, I have yet to resolve the issue as to how to generally/easily
marshall APIs to the interpreter (with the differences in address space and
word size being just a few of the issues).
using a modified C frontend as the basis of an IDL tool is a possibility,
where I am currently thinking of C headers with embedded IDL commands.
a prior idea had involved the use of special preprocessor magic, but another
possibility is to embed all of the IDL markup in comments, which would be
treated specially by the IDL tool's preprocessor (unwrapped and placed
directly into the syntax stream...).
/*IDL|...*/ --> ...
/*IDL[...]*/ --> [...]
/*IDL[guid(b0f0-...)]*/
/*IDL| interface { */
....
the tool would then write new code and headers to attempt to marshall API's
across different situations (current thinking is via 'native call', via the
object system, ...).
I would probably use a vaguel "CLOS-like" style for presenting OO style
interfaces in C, so for example, C-code is written in this style, and with
some IDL magic in the headers, and the tool figures out how to write code to
import/export said interfaces to/from the object system.
....
/*IDL|
namespace MyApp {
class Foo {
Foo();
public int doSomething(int x);
...
}
}
*/
and, in C:
MyApp_Foo *foo;
foo=new_MyApp_Foo();
foo->doSomething(foo, 100);
or, maybe:
MyApp_Foo_doSomething(foo, 100);
granted, this is not the most elegant possibility, but it could be usable at
least... |
|
|
| Back to top |
|
|
|
|
|
All times are GMT
The time now is Sun Nov 29, 2009 8:50 am
|
|