Main Page | Report this Page
Computers Forum Index  »  Computer - Graphics (Algorithms)  »  Converting a floating point texture to a rgba texture...
Page 1 of 2    Goto page 1, 2  Next

Converting a floating point texture to a rgba texture...

Author Message
Skybuck Flying...
Posted: Fri Oct 02, 2009 9:59 am
Guest
Hello,

One thing is annoying me a little bit with my current Delphi
program/example/opengl acceleration experiment ;)

I cannot enjoy the fast speed of opengl because for now I am using
Tcanvas.Pixels[x,y] to draw the texture map to the screen. And since the
texture map is in range 0.0 to 1.0 for the color components these first need
to be converted to RGB's which means many multiplications and rounds.

But the biggest problem is the slowness of Tcanvas.Pixels.

Anyway I probably already have a "CopyMemoryToBitmap" routine somewhere
which would help with flipping the memory into bitmap format ;)

So the remaing problem is:

Converting floating point textures to rgba textures so they can be flipped
to screen.

I guess I could use an additional render to texture target... in rgba
mode... and use an extra shader... just for recalculating the floating
points to rgba's...

However doing this seems a bit weird... but it would probably be possible as
follows:

1. Draw a quad with 4 verteces which would activate all pixel shaders.
2. Shade the pixels and output them to the texture... preferably y-flipped
if necessary.
3. Read texture to cpu/system memory.
4. Flip memory to Tbitmap/canvas etc.

However I wonder if OpenGL has a better method of converting a floating
point texture/framebuffer into a bitmap ?!

So that it doesn't need to go through the vertex and pixel shaders ?!?

Hmmmm...

Maybe there is even a faster way ?

Maybe be re-enabling the "default framebuffer ?" But it would be empty...
hmm...

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 10:03 am
Guest
Ok,

For now I am gonna get rid of the Tcanvas.Pixels... and simply use an extra
memory buffer to convert the floating point texture 3x16 bits or 3x32 bits
floating point texture to rgba 4x8 bytes in cpu.

That way cpu can do something too... hopefully cpu not gonna be too slow at
it ! LOL.

Would be funny if the cpu is still fricking slow even for something like...

I have bad feeling about that ! Wink :)

But gonna try anyway... have to do this anyway ! ;)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 10:40 am
Guest
Or maybe I just flipped the texture color components...

Me confused...

Should it be record:
r,g,b,a : float
end;

or

should it be record
b,g,r,a : float
end;

for floating point texture maps for GL_RGBA ?!? (and/or GL_RGB)

Hmm...

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 10:43 am
Guest
Well I think I got the floating point format correct if I recall
correctly...

Since the OpenGL window seemed to draw ok...

So record for floating point texture format is probably:

r,g,b,a : float;

Then why does Delphi needs it other way around ?!

Weird...

Especial form vs bitmap... double weird ?!

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 10:46 am
Guest
Anyway I am using Tbitmap.Scanline for fast access...

According to other postings it indeed seems to be reversed: B,G,R,A...

The reason for this I don't understand...

For now I will have to use a special type for it...

TbgraByte = record b,g,r,a etc; // considered a bitmap rgba ;)

And use the one which is appriorate ;)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 10:50 am
Guest
Ok this solves problem nicely ! ;)

Delphi even helped me prevent a stupid error thanks to strong type checking
like so:

Faulty:

var
vBitmapColor : TbgraByte;

begin

TrgbaByte( scanline pointer etc ) := vBitmapColor; // compiler type error Wink
:)

Good:

TbgraByte( scanline pointer etc ) := vBitmapColor;

end;

:)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 10:54 am
Guest
"Skybuck Flying" <BloodyShame at (no spam) hotmail.com> wrote in message
news:d6518$4ac59fd4$d53372a9$20198 at (no spam) cache6.tilbu1.nb.home.nl...
Quote:
Hmmm something fishy going on here...

The color component order of the Delphi form seems to be:

R,G,B,A

^ Not sure about that...

Form does not seem to have a scanline property...

Maybe it's internal format is also b,g,r,a...

Canvas.Pixels might be doing a conversion as well...

Tcolor is in rgba mode at least... so it might be doing conversions like so:

RGBA to BGRA.

^ This might be another reason why .Pixels[x,y] is slow...

Quote:

The color component order of the Delphi bitmap seems to be:

B,G,R,A

When accessing the scanline pointer at least...

Maybe .Pixels[ ] does a conversion Wink <- probably.

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 10:56 am
Guest
Anyway using the Tbitmap.Scanline property + rounding seems to be fast
enough for now... for 500x400 pixels ;)

Canvas.Draw( 0, 0, mScreenBuffer ); // mScreenBuffer : TBitmap;

Draws it real fast... like under one second at least ! ;)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 1:39 pm
Guest
Having to use visual studio and converting my code to c/c++ is depressing.

However I could use other editors like c++ builder to easy the pain
somewhat...

Then finally I would have to use visual studio.

To get over this depression I am now going to play some "CoH ToS" Wink :)

For inspiration and happyness Smile LOL.

^ downtime coming Wink Smile ^ =D

Bye,
Skybuck Wink
 
Skybuck Flying...
Posted: Fri Oct 02, 2009 6:25 pm
Guest
Some other idea's to consider:

1. Speculative execution of all core cells, would probably lead to many
conflicts, however output to different cells could be stored seperatedly per
input/output core so at least all results would be ok. <- many unnecessary
executions at first and maybe later too

2. Speculative execution of all processes in the list <- different way of
parallelism, could produce more usefull executions but still very limited

These two idea's above are more "fun" idea's they are not very serious...
but could be easy to implement.

Time for a totally different idea:

3. CPU does preprocessing of all-to-be-executed instructions per
core/simulator.

CPU could have access to 2 GB of ram (virtual memory limit) 4 GB of ram
would need to be enabled for kernel memory.

Total ammount of simulators for 1 v 1 warrior fights would be:

2 GB / 84.000 bytes = 2147483648 / 84000 = 25.565 simulators.

Possibilities for memory locations per instruction are rougly:
1. A=A+B,
2. A=A+1,
3. A=A-1,
4. B=B+1,
5. B=B-1,
6. A=A/B,
7. A=B/A,
8. B=B/A,
9. B=A/B,
10. A=A*B,
11. B=B*A,
12. A=A mod A
13. A=B mod B
14. A=A mod B
15. A=B mod A
16. B=A mod A
17. B=B mod B
18. B=A mod B
19. B=B mod A

Maybe even all of these+1...

I am not sure how many possibilities there are...

Maybe 100 ? Maybe more ?

For now let's assume 100 or so.

This could mean 100 memory locations have to be read to be sure that all
locations are present for complete instruction execution and memory input
data and memory output data...

Actually the possibilities aren't that great... the pre-processor should be
able to know exactly which instruction type will be executed so the number
of possibilities will be very small... and can be pre-computed. However this
would almost be the same as actually executing it...

So another idea could be to do the pre-processor on the gpu as well... so I
guess this comes down to simply:

1. Processing the instructions on the gpu for as far as possible
2. Falling back to cpu to get any necessary code or locations and supplieing
them again to the gpu... or maybe another gpu pass can actually do all that.
3. Go back to gpu and execute the remaining part of the instructions.

(Had this idea while letting this post "idle" for a while on my pc LOL Smile)

Yeah so to keep this story short:

1. Process instructions on the gpu for as far as possible, then try to do
anything else is secondary/tertiary passes/multiple passes and so forth.

Yeah this is pretty much how I designed the original core gpu algorithm...
which also included loading/using multiple textures in the gpu up to 512 MB
! ;)

So I was hoping to do just one texture map or so... but now it turns out
that would not give enough performance.

So to make long story short: I must go back to the original core gpu design
and implement it massively ! Wink :)

However easier said then done... because more passes probably means more api
delay... and then the target might not be reached as well.

Target ofcourse being insane speed ! ;)

Let's do some calculations...

Number of steps estimated for core gpu executor design: 21

21 passes * 0.152 milliseconds = 3.192,00 milliseconds required for all
steps...

1000 / 3.192 = 313 cycles per second... let's divide this by 2 just in
case... 155 cycles per second.


25565 simulators * 155 cycles = 3.962.575

Again 4 million cycles ?!?!? wtf ?!

Kinda funny how I keep hitting this 4 million limit ! ;)

Bye,
Skybuck Wink =D
 
Skybuck Flying...
Posted: Sat Oct 03, 2009 12:48 pm
Guest
Ok,

I just did some testing of the draw routine...

The speed in a tight loop without any data changes is about 20.000 frames
per second...

I am not sure if OpenGL actually renders each one or that it detect that
nothing changed...

For now I will assume it renders each frame.

This means the actual speed in the scenerio described could be 3 times
higher...

About 12.000.000 cycles per second.

However the scenerio described is probably totally unrealistic since the cpu
could never supply 2 gb per frame...

That would be like 40.000 tb per second haha ! ;)

However I have some new idea's which might work by feedback to gpu.

But I am getting a bit tired of all these different models/scenerio's...

Maybe I describe one later on or maybe not and keep it secret :)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Sat Oct 03, 2009 12:53 pm
Guest
Euhm actually not 4 tb... because cpu could upload only those this which
would be necessary and that's definetly not everything... only small
portion...

So many different ways of implementing it... makes me dizzy and nervous ! :)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Sat Oct 03, 2009 10:58 pm
Guest
Ok,

I just tested the "streaming" idea for the cpu.

Streaming idea: "do many reads, do many writes, repeat".

Non streaming idea: "do single read, do single write repeat".

The non streaming idea works faster.

(Streaming idea requires multiplications and some extra looping, not sure if
that slows it down... most likely reason is that streaming idea requires
extra memory to hold the reads... cannot directly read into cache ?!?)
Possible solution: try doing fetches instead <- nice idea.

Going to try version 0.02 with fetches only Wink Smile and then some normal read
write cycles or so

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Sat Oct 03, 2009 11:10 pm
Guest
Ok tried it...

The prefetching "streaming" version is also slower than the non streaming
version...

Maybe the pattern of writing/reading wasn't identical for both versions...
but it's the best I could do for now...

So for now I give up on this idea ! ;)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Sun Oct 04, 2009 1:48 pm
Guest
I thiiiiiiiink I am going to attempt a Delphi to C/C++ converter tool.

The idea of having such a tool which would work very well seems very
attractive to me ! Wink :)

Bye,
Skybuck =D
 
 
Page 1 of 2    Goto page 1, 2  Next
All times are GMT
The time now is Mon Nov 30, 2009 7:37 pm