| |
 |
|
|
Science Forum Index » Compression Forum » rududu image codec
Page 1 of 1
|
| Author |
Message |
| Nicolas |
Posted: Mon Apr 07, 2008 7:23 am |
|
|
|
Guest
|
Hello,
For those interested in image compression, I have released a new image
compression algorithm.
The main goals of this algorithm are speed and compression efficiency to be able
to use it for video compression.
An interesting feature for this group is the interleaved entropy coder. I don't
think the idea is new, but I have not been able to find an implementation. It
allows to use both range coding and bit output (Huffman, enumerative coding ...)
without speed penalty : the outputs of the range coder and bit stream are
interleaved when encoding and deinterleaved when decoding.
It's a wavelet image codec using basic (fast) context modeling. At the moment
the coefficient sign bit is not compressed as I have not found a simple way to
use the sign redundancy (maybe someone have an idea on this point ?).
Source code and 32 bits linux and windows binaries :
http://rududu.berlios.de/
Nicolas |
|
|
| Back to top |
|
| Thomas Richter |
Posted: Mon Apr 07, 2008 8:10 am |
|
|
|
Guest
|
Nicolas schrieb:
Quote: Hello,
For those interested in image compression, I have released a new image
compression algorithm.
The main goals of this algorithm are speed and compression efficiency to
be able to use it for video compression.
An interesting feature for this group is the interleaved entropy coder.
I don't think the idea is new, but I have not been able to find an
implementation. It allows to use both range coding and bit output
(Huffman, enumerative coding ...) without speed penalty : the outputs of
the range coder and bit stream are interleaved when encoding and
deinterleaved when decoding.
What is your definition of "bit stream"? Otherwise, it is easy to setup a context for
the entropy coding that does not compress at all, or otherwise interrupt the entropy
coding and continue it from that point on later. This technique is used in J2K.
Quote: It's a wavelet image codec using basic (fast) context modeling. At the
moment the coefficient sign bit is not compressed as I have not found a
simple way to use the sign redundancy (maybe someone have an idea on
this point ?).
Look into J2K, it uses context modeling from nearest neighbours to compress the sign bit.
Have you made perceptual measurements of the quality? If not, I can make the ms-ssim measurement
tool available that is currently used by the JPEG to evaluate HDPhoto aka JPEG-XR. PSNR doesn't have
to say much. How about scalibility/flexibility?
How about complexity? Can you give estimates how fast is it compared to JPEG/JPEG-XR/JPEG2000?
So long,
Thomas |
|
|
| Back to top |
|
| Nicolas |
Posted: Mon Apr 07, 2008 10:19 am |
|
|
|
Guest
|
Thomas Richter a écrit :
Quote: What is your definition of "bit stream"? Otherwise, it is easy to setup a context for
the entropy coding that does not compress at all, or otherwise interrupt the entropy
coding and continue it from that point on later. This technique is used in J2K.
Sure you can output uncompressed bits through arithmetic coder, but your bits
are slow to output, slower than just a shift (as without an arithmetic coder).
And interrupting an arithmetic coder has an overhead which is not acceptable if
you do it every few ( <16 ) coded symbol.
Quote: Look into J2K, it uses context modeling from nearest neighbours to compress the sign bit.
I know that, but from my point of view it's not "simple enough" (too much
operation / coefficient). Maybe I'll implement it, but I doubt the compression
gain will be enough to justify the CPU time (always according to my goal).
Quote: Have you made perceptual measurements of the quality? If not, I can make the ms-ssim measurement
tool available that is currently used by the JPEG to evaluate HDPhoto aka JPEG-XR. PSNR doesn't have
to say much.
You already sent me the mssim source code, and I thank you for that. But I have
not made perceptual measurement because current optimizations are made for psnr,
and developing for psnr is easier. I'm planning to implement perceptually driven
quantization and I'll switch to ssim at this point.
Quote: How about scalibility/flexibility?
As I said here and in the web site, there is no quality scalability (for speed
reason and lack of use for video compression), only resolution scalability (as
it's a wavelet codec). I'll also implement adaptive quantization, so perceptual
quantization / ROI will be possible.
Quote: How about complexity? Can you give estimates how fast is it compared to JPEG/JPEG-XR/JPEG2000?
Current implementation (plain C++) is about as fast as imagemagick's convert
utility when converting to/from JPEG. And the wavelet transform uses about half
the time.
Regards,
Nicolas |
|
|
| Back to top |
|
| Thomas Richter |
Posted: Mon Apr 07, 2008 2:30 pm |
|
|
|
Guest
|
Nicolas wrote:
Quote: Thomas Richter a écrit :
What is your definition of "bit stream"? Otherwise, it is easy to
setup a context for
the entropy coding that does not compress at all, or otherwise
interrupt the entropy
coding and continue it from that point on later. This technique is
used in J2K.
Sure you can output uncompressed bits through arithmetic coder, but your
bits are slow to output, slower than just a shift (as without an
arithmetic coder). And interrupting an arithmetic coder has an overhead
which is not acceptable if you do it every few ( <16 ) coded symbol.
It has a complexity overhead, not a rate overhead. Besides, a reasonable
MQ coder implementation is *very* fast (one addition, one comparison in
the common coding path). That said, it doesn't mean your approach is bad.
Quote: Look into J2K, it uses context modeling from nearest neighbours to
compress the sign bit.
I know that, but from my point of view it's not "simple enough" (too
much operation / coefficient). Maybe I'll implement it, but I doubt the
compression gain will be enough to justify the CPU time (always
according to my goal).
In some case or another, you need to check "related" samples for a
context model of the sign bit. If you don't want to invest for this
complexity, don't expect a gain. (-:
Quote: Have you made perceptual measurements of the quality? If not, I can
make the ms-ssim measurement
tool available that is currently used by the JPEG to evaluate HDPhoto
aka JPEG-XR. PSNR doesn't have
to say much.
You already sent me the mssim source code, and I thank you for that. But
I have not made perceptual measurement because current optimizations are
made for psnr, and developing for psnr is easier. I'm planning to
implement perceptually driven quantization and I'll switch to ssim at
this point.
That said, we will continue to go check for good metrics at the JPEG.
SSIM is not the last word on this. All I'm saying at this time: Check
whether you can adjust your code to other metrics.
Quote: How about scalibility/flexibility?
As I said here and in the web site, there is no quality scalability (for
speed reason and lack of use for video compression), only resolution
scalability (as it's a wavelet codec). I'll also implement adaptive
quantization, so perceptual quantization / ROI will be possible.
How about complexity? Can you give estimates how fast is it compared
to JPEG/JPEG-XR/JPEG2000?
Current implementation (plain C++) is about as fast as imagemagick's
convert utility when converting to/from JPEG. And the wavelet transform
uses about half the time.
Sounds fine to me. Can this be partially fit into some kind of
"low-complexity" J2K? For that, we would want to keep the codeblock
structure of J2K, just replace the EBCOT coder by something simpler.
Note that this requires that there are no inter-band decorrelation
processes (unlike what happens in the EZW or SPIHT).
So long,
Thomas |
|
|
| Back to top |
|
| Nicolas |
Posted: Tue Apr 08, 2008 10:23 am |
|
|
|
Guest
|
Thomas Richter a écrit :
Quote: It has a complexity overhead, not a rate overhead.
I was speaking of rate overhead about stopping arithmetic coder and
restarting it later. But maybe I have not understood what you meant.
My understanding is that you need to flush some bits when stopping the
coder to be able to decode the last symbols, and this is the rate overhead
I am speaking about.
Quote: Besides, a reasonable
MQ coder implementation is *very* fast (one addition, one comparison in
the common coding path). That said, it doesn't mean your approach is bad.
There is no most probable symbol when you code raw bits, and you have to
renormalize at every bit (MQ coder renormalize at every output bit right ?).
Using this to do Huffman coding you will output 1 bit at a time, and I
don't see a way to do Huffman LUT decoding this way.
According to this paper :
http://www.hpl.hp.com/techreports/2004/HPL-2004-75.pdf
MQ is slower than 8 bits driven renormalization coders (as the range coder
I'm using), and an order of magnitude slower than Huffman coding.
Quote: In some case or another, you need to check "related" samples for a
context model of the sign bit. If you don't want to invest for this
complexity, don't expect a gain. (-:
Yes, but you can have a common context for more than 1 sample, this way the
cost of the context is reduced per sample. For the sample amplitude I'm
using some kind of variance of the block (4x4). For the sign I have been
trying a "directional" context without luck.
Quote: That said, we will continue to go check for good metrics at the JPEG.
SSIM is not the last word on this. All I'm saying at this time: Check
whether you can adjust your code to other metrics.
Right now my code is not ready for other metrics (you can't even set the
quantizer according to the wavelet level), but this will be changed.
Quote: Sounds fine to me. Can this be partially fit into some kind of
"low-complexity" J2K? For that, we would want to keep the codeblock
structure of J2K, just replace the EBCOT coder by something simpler.
Note that this requires that there are no inter-band decorrelation
processes (unlike what happens in the EZW or SPIHT).
I'm using the same kind of tree EZW and SPIHT are using, and context
information from the lower frequency band (but this context information can
probably be extracted from neighboring blocks in the same band).
Regards,
Nicolas |
|
|
| Back to top |
|
| Thomas Richter |
Posted: Tue Apr 08, 2008 12:10 pm |
|
|
|
Guest
|
Nicolas schrieb:
Quote: Thomas Richter a écrit :
It has a complexity overhead, not a rate overhead.
I was speaking of rate overhead about stopping arithmetic coder and
restarting it later. But maybe I have not understood what you meant.
My understanding is that you need to flush some bits when stopping the
coder to be able to decode the last symbols, and this is the rate
overhead I am speaking about.
The point is that there is actually no need in doing this. (-: That is also done
in J2K, it's called "MQ coder truncation" (as opposed to "termination").
It is possible to "interrupt" the coding and continue from that point later
without actually having to restart it, and thus without adding
redundancy.
Quote: Besides, a reasonable MQ coder implementation is *very* fast (one
addition, one comparison in the common coding path). That said, it
doesn't mean your approach is bad.
There is no most probable symbol when you code raw bits, and you have to
renormalize at every bit (MQ coder renormalize at every output bit right
?).
Using this to do Huffman coding you will output 1 bit at a time, and I
don't see a way to do Huffman LUT decoding this way.
Unfortunately, this implies that you also *must* output one bit by a time,
which means that you need to do something in addition if you want to create "symbols" of
less than one bit.
I wonder how they implemented the MQ coder, then, the paper doesn't state
this at all. The fun part is that one can reduce the number of operations
in the MQ coder significantly, and specifically the number of branches
that slows down modern processors.
Quote: In some case or another, you need to check "related" samples for a
context model of the sign bit. If you don't want to invest for this
complexity, don't expect a gain. (-:
Yes, but you can have a common context for more than 1 sample, this way
the cost of the context is reduced per sample. For the sample amplitude
I'm using some kind of variance of the block (4x4). For the sign I have
been trying a "directional" context without luck.
"Luck" in which sense? Speed or compression gain?
Quote: That said, we will continue to go check for good metrics at the JPEG.
SSIM is not the last word on this. All I'm saying at this time: Check
whether you can adjust your code to other metrics.
Right now my code is not ready for other metrics (you can't even set the
quantizer according to the wavelet level), but this will be changed.
Sounds fine to me. Can this be partially fit into some kind of
"low-complexity" J2K? For that, we would want to keep the codeblock
structure of J2K, just replace the EBCOT coder by something simpler.
Note that this requires that there are no inter-band decorrelation
processes (unlike what happens in the EZW or SPIHT).
I'm using the same kind of tree EZW and SPIHT are using, and context
information from the lower frequency band (but this context information
can probably be extracted from neighboring blocks in the same band).
This implies, of course, that you limit somewhat the flexibility of the
design because you cannot decode bands independent of each other.
In SPIHT resp. EZW, the only opion you have is to abort decoding, thus
getting a lower-quality image, but not a lower-resolution image as
the symbols for lower and higher resolution are interleaved in the bitstream
due to the design of the coding process. How is that working for your
code?
Greetings,
Thomas |
|
|
| Back to top |
|
| Nicolas |
Posted: Tue Apr 08, 2008 1:30 pm |
|
|
|
Guest
|
Thomas Richter a écrit :
Quote: The point is that there is actually no need in doing this. (-: That is also done
in J2K, it's called "MQ coder truncation" (as opposed to "termination").
It is possible to "interrupt" the coding and continue from that point later
without actually having to restart it, and thus without adding
redundancy.
OK, didn't know about that. I'm going to have a look at how it works.
Quote: Unfortunately, this implies that you also *must* output one bit by a time,
which means that you need to do something in addition if you want to create "symbols" of
less than one bit.
My point is that I'm interleaving the output of the range coder with, for
example, the output of a Huffman coder. So I'm able to code fractional bits
in the range coder and also code Huffman symbols directly (without going
through the range coder).
Quote: "Luck" in which sense? Speed or compression gain?
Compression gain
Quote: This implies, of course, that you limit somewhat the flexibility of the
design because you cannot decode bands independent of each other.
Yes, but is it really useful to decode high frequency bands before low freq
ones ?
Quote: In SPIHT resp. EZW, the only opion you have is to abort decoding, thus
getting a lower-quality image, but not a lower-resolution image as
the symbols for lower and higher resolution are interleaved in the bitstream
due to the design of the coding process. How is that working for your
code?
I'm coding in one pass, so you really get the full quality lower resolution
image first. Right now it's not the case for color images because I'm not
interleaving components bands, but it can be done.
Regards,
Nicolas |
|
|
| Back to top |
|
| Thomas Richter |
Posted: Wed Apr 09, 2008 2:13 am |
|
|
|
Guest
|
Nicolas schrieb:
Quote: Thomas Richter a écrit :
The point is that there is actually no need in doing this. (-: That is
also done
in J2K, it's called "MQ coder truncation" (as opposed to "termination").
It is possible to "interrupt" the coding and continue from that point
later
without actually having to restart it, and thus without adding
redundancy.
OK, didn't know about that. I'm going to have a look at how it works.
Basically, the encoder has to emit enough bits from the MQ coder to
ensure that a decoder reading the bitstream up to this point can
reconstruct all symbols in this path. Several strategies exist how one
can compute a suitable truncation point in the stream.
Quote: Unfortunately, this implies that you also *must* output one bit by a
time,
which means that you need to do something in addition if you want to
create "symbols" of
less than one bit.
My point is that I'm interleaving the output of the range coder with,
for example, the output of a Huffman coder. So I'm able to code
fractional bits in the range coder and also code Huffman symbols
directly (without going through the range coder).
"Luck" in which sense? Speed or compression gain?
Compression gain
Ok, I see. Wouldn't it make sense to look for edges? I would expect some
gain along those in the wavelet highpasses.
Quote:
This implies, of course, that you limit somewhat the flexibility of the
design because you cannot decode bands independent of each other.
Yes, but is it really useful to decode high frequency bands before low
freq ones ?
Typically not (-; but that's not what the flexibility is used for. The
important aspect is that you can re-arrange coding passes from the bands
such that you get an embedded bit-stream, i.e. you have optimal quality
(given a suitable quality metric) for a given truncation position. For
that, you need to find those parts of the image that improve image
quality the most, and for that, your encoding order has to be flexible.
The EZW and SPIHT can only encode in one single order, and have only one
single image model for that. This means that even though this encoding
order is typically fine for most images, the code has no means to encode
images effectively where it's not (e.g. where the statistical model is
non-stationary.)
Quote: In SPIHT resp. EZW, the only opion you have is to abort decoding, thus
getting a lower-quality image, but not a lower-resolution image as
the symbols for lower and higher resolution are interleaved in the
bitstream
due to the design of the coding process. How is that working for your
code?
I'm coding in one pass, so you really get the full quality lower
resolution image first. Right now it's not the case for color images
because I'm not interleaving components bands, but it can be done.
I wouldn't worry about color right now, indeed.
So long,
Thomas |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Wed Oct 15, 2008 7:35 pm
|
|