Main Page | Report this Page
Computers Forum Index  »  Computer - Graphics (Algorithms)  »  A possible solution for using BWT (especially wheeler...
Page 1 of 1    

A possible solution for using BWT (especially wheeler...

Author Message
Skybuck Flying...
Posted: Mon Aug 31, 2009 9:28 am
Guest
Hello,

I just had an idea how BWT, especially the wheeler transform algorithm could
possibly be successfully used in video decoders.

I haven't thought the idea through but I want to publish it just in case
it's a good idea:

The idea is as follows:

1. Usually video's have multiple "key" frames followed by some
"intermediate" frames.

2. BWT could be started in parallel on multiple key frames.

However this would require a little change in for example Microsoft's Video
for Windows... which simply always start at just a single frame.

Video for Windows would need to be turned into a "parallel" version where
decoders would be fed multiple key frames... or let's call them "key
streams".

So that a form of "video key-stream pipelining" occurs.

This way multiple processors/cores/threads could be working on decoding
different sections of the video.

The drawback is probably:

3. Requiring more bandwidth to store the results until actually needed.

Why is this a solution you might be wondering and what was the problem in
the first place ?

I will try to explain...

The problem with wheeler-transform is that it's a pointer-chasing problem
for the decoder.
The processor needs to fetch a pointer/index, wait for it... then fetch the
next pointer/index, and the next and then next etc... this slows down the
processor a lot... it looks like so:
p1->p2->p3->p4->p5->p6.....->p1000000.

The cpu is not fast enough to decode this in time... the gpu can't even do
this kind of decoding (or it's very hard/slow) so that's also not an option.

However what if the cpu could start working way in advance before the frame
is actually needed ?! this would give the cpu more time to work on it...

only problem is with intermediate frames... they also need a lot of time...
maybe limit
bwt to key frames only or so ?!? Wink :)

Well anyway... by letting multiple cpu's work on more key frames the decode
speed should improve somewhat...

Let's think of it as interleaving video key-stream decoding.

Example:

CPU1: Frame 1 to 8
CPU2: Frame 9 to 16
CPU3: Frame 17 to 24
CPU4: Frame 25 to 32

This way 4 cpu's are being used instead of just one.

Now the problem is ofcourse... can cpu1 decode 8 frames in the needed time
?!?

The answer is ofcourse: no funny enough... because if it could then only one
cpu would
be necessary.

However look what happens at cpu2... by the time cpu1 is done... cpu2 will
be done, cpu3 will
be done... cpu4 will be done...

So those frames will all be done... So my conclusion so far is:

Only the startup takes a while... but after that it should run smooth.

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Mon Aug 31, 2009 9:43 am
Guest
I also just had a possible "hack" idea for Video for Windows decoders to
turn them into parallel decoders.

The idea is too simply take the input from windows and immediatly return
with a black screen... thereby fooling windows into believing that the input
was actually decoded into output.

But instead the decoder simply uses this trick to quickly fetch multiple
frames from windows...

Until the decoder has for example 30 frames... then it can start working on
the frames in parallel...

Then when windows ask for frame 31 to 60... in reality frames 1 to 30 are
returned...

This would screw the bars in the "video controls" a little bit... but that
could be acceptable LOL Wink :)

Also another drawback is that the last 30 frames might not be played ?!?
Hmm... :)

Could also be acceptable for now LOL :)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Mon Aug 31, 2009 9:45 am
Guest
However one last problem is with the audio... it would be off by 30 frames
that could be a problem... This would require a audio decoder which would do
the same to compensate for it... ouch ! ;)

Bye,
Skybuck.
 
Skybuck Flying...
Posted: Mon Aug 31, 2009 9:47 am
Guest
However I am not sure how windows would react to an immediate black screen
being returned...

Maybe windows would simply wait 1/30 of a second before requesting the next
decode...

In that case this trick would ofcourse not work...

(Maybe windows buffers some frames but I doubt it Wink)

Video for Windows is old... maybe other frameworks offer more possibilities
for parallel/multi-threaded video decoders ?

Bye,
Skybuck.
 
stefanbanev at (no spam) yahoo.com...
Posted: Wed Sep 02, 2009 7:52 pm
Guest
On Aug 30, 10:28 pm, "Skybuck Flying" <BloodySh... at (no spam) hotmail.com> wrote:
Quote:
Hello,

I just had an idea how BWT, especially the wheeler transform algorithm could
possibly be successfully used in video decoders.

I haven't thought the idea through but I want to publish it just in case
it's a good idea:

The idea is as follows:

1. Usually video's have multiple "key" frames followed by some
"intermediate" frames.

2. BWT could be started in parallel on multiple key frames.

However this would require a little change in for example Microsoft's Video
for Windows... which simply always start at just a single frame.

Video for Windows would need to be turned into a "parallel" version where
decoders would be fed multiple key frames... or let's call them "key
streams".

So that a form of "video key-stream pipelining" occurs.

This way multiple processors/cores/threads could be working on decoding
different sections of the video.

The drawback is probably:

3. Requiring more bandwidth to store the results until actually needed.

Why is this a solution you might be wondering and what was the problem in
the first place ?

I will try to explain...

The problem with wheeler-transform is that it's a pointer-chasing problem
for the decoder.
The processor needs to fetch a pointer/index, wait for it... then fetch the
next pointer/index, and the next and then next etc... this slows down the
processor a lot... it looks like so:
p1->p2->p3->p4->p5->p6.....->p1000000.

The cpu is not fast enough to decode this in time... the gpu can't even do
this kind of decoding (or it's very hard/slow) so that's also not an option.

However what if the cpu could start working way in advance before the frame
is actually needed ?! this would give the cpu more time to work on it...

only problem is with intermediate frames... they also need a lot of time....
maybe limit
bwt to key frames only or so ?!? Wink :)

Well anyway... by letting multiple cpu's work on more key frames the decode
speed should improve somewhat...

Let's think of it as interleaving video key-stream decoding.

Example:

CPU1: Frame 1 to 8
CPU2: Frame 9 to 16
CPU3: Frame 17 to 24
CPU4: Frame 25 to 32

This way 4 cpu's are being used instead of just one.

Now the problem is ofcourse... can cpu1 decode 8 frames in the needed time
?!?

The answer is ofcourse: no funny enough... because if it could then only one
cpu would
be necessary.

However look what happens at cpu2... by the time cpu1 is done... cpu2 will
be done, cpu3 will
be done... cpu4 will be done...

So those frames will all be done... So my conclusion so far is:

Only the startup takes a while... but after that it should run smooth.

Bye,
  Skybuck.

Even BWT is one of the fastest context modeler it is the most trivial
one and probably one of the most inefficient in term of its entropy
output. Approaches similar to CALIC generalized for 3D case would be
more efficient in this regard and frankly its context locality makes
mach more sense for 2D/3D scalar data than unbounded BWT context which
really works great for textual data where an exact contexts match is
common but it is not the case for 2D/3D image data. However if you may
come up with BWT idea for "fuzzy" contexts it may work very well for
image compression and if you may generalize it for context-symmetric
2D/3D cases it would be a monumental achievement ;o) I'm not aware
about such development in public domain though.

--sb
 
Skybuck Flying...
Posted: Thu Sep 03, 2009 5:16 am
Guest
Another idea is to move as much processing as possible to the GPU... and
leave the Wheeler running on the CPU... combining this with other
compression algorithms might just give enough playback speed to be
acceptable.

Bye,
Skybuck.
 
 
Page 1 of 1    
All times are GMT
The time now is Wed Dec 02, 2009 8:42 am