 |
|
| Science Forum Index » Image Processing Forum » Remove speckles in bilevel (bw) image... |
|
Page 1 of 2 Goto page 1, 2 Next |
|
| Author |
Message |
| Laurent S.... |
Posted: Wed Jun 10, 2009 8:12 pm |
|
|
|
Guest
|
Scanned bw images of print often have tiny speckles that are
"undesirable noise". Despeckling is the operation of
removing them while leaving the rest of the image intact.
Is there a freeware or shareware utility that does this
efficiently?
Conceptually, the problem is simple to solve. For
example, it is nearly a special case of the thin curve
removal problem of the recent thread
"Remove thin curves in image".
But my question is purely practical!
Cheers,
Laurent S. |
|
|
| Back to top |
|
|
|
| Raj... |
Posted: Wed Jun 10, 2009 11:32 pm |
|
|
|
Guest
|
On 11 June, 03:12, lcs7... at (no spam) gmail.com (Laurent S.) wrote:
[quote:d6d44c1f95]Scanned bw images of print often have tiny speckles that are
"undesirable noise". Despeckling is the operation of
removing them while leaving the rest of the image intact.
Is there a freeware or shareware utility that does this
efficiently?
Conceptually, the problem is simple to solve. For
example, it is nearly a special case of the thin curve
removal problem of the recent thread
"Remove thin curves in image".
But my question is purely practical!
Cheers,
Laurent S.
[/quote:d6d44c1f95]
Can you post a sample image, Please? Will help to understand the
problem in hand. |
|
|
| Back to top |
|
|
|
| Hesham... |
Posted: Fri Jun 12, 2009 4:02 am |
|
|
|
Guest
|
|
| Back to top |
|
|
|
| Hesham... |
Posted: Fri Jun 12, 2009 7:37 am |
|
|
|
Guest
|
|
| Back to top |
|
|
|
| Laurent S.... |
Posted: Mon Jun 15, 2009 1:31 am |
|
|
|
Guest
|
Aruzinsky writes:
[quote:1780fa5b2f]No, there was no Gaussian convolution in my examples. I used a
WEIGHTED median filter with Gaussian wieghts.
[/quote:1780fa5b2f]
No explicit convolution I agree. However I suspect
(and assumed) that for BITONAL images
any "weighted median filter" is equivalent to a convolution with
a kernel function that is essentially your weighting --
followed by 50% threshholding.
So my question about kernel amounts to :
What are the coefficients of the
your 0.7 sd Gaussian weighting?
[quote:1780fa5b2f]It is more likely to preserve thin lines.
[/quote:1780fa5b2f]
Certainly worth testing. Could you try both
--- 3 x 3 median, and
--- 0.7 sd Gaussian weighted median
on the new test file called "bw_03" in
http://topo.math.u-psud.fr/~slc/speckle_samples/
please?
That test file illustrates narrow isthmuses as well as thin curves.
Ithmuses are common in fonts with serifs, and will be thin in
small but vital characters such as second order subscripts in
scanned math.
Cheers
Laurent S. |
|
|
| Back to top |
|
|
|
| aruzinsky... |
Posted: Mon Jun 15, 2009 5:12 am |
|
|
|
Guest
|
On Jun 15, 1:31 am, lcs7... at (no spam) gmail.com (Laurent S.) wrote:
[quote:21ca1542ca]Aruzinsky writes:
> No, there was no Gaussian convolution in my examples. I used a
> WEIGHTED median filter with Gaussian wieghts.
No explicit convolution I agree. However I suspect
(and assumed) that for BITONAL images
any "weighted median filter" is equivalent to a convolution with
a kernel function that is essentially your weighting --
followed by 50% threshholding.
So my question about kernel amounts to :
What are the coefficients of the
your 0.7 sd Gaussian weighting?
It is more likely to preserve thin lines.
Certainly worth testing. Could you try both
--- 3 x 3 median, and
--- 0.7 sd Gaussian weighted median
on the new test file called "bw_03" in
http://topo.math.u-psud.fr/~slc/speckle_samples/
please?
That test file illustrates narrow isthmuses as well as thin curves.
Ithmuses are common in fonts with serifs, and will be thin in
small but vital characters such as second order subscripts in
scanned math.
Cheers
Laurent S.
[/quote:21ca1542ca]
I typically don't deal with binary images and I had to convert your
images to 8 bit/channel, 3 channel using Irfranview.
http://www.general-cathexis.com/images/bw_03Median.png
http://www.general-cathexis.com/images/bw_03Median0.7.png
http://www.general-cathexis.com/images/bw_03Median0.675.png
http://www.general-cathexis.com/images/bw_03Median0.625.png
If I were you, I would try the following because it is theoretically
much better:
1. Blur image A to get B (floating point)
2. Get gradients of B
3. For each pixel, get asymmetric Gaussian weight kernels elongated in
the gradient directions of the B.
4. Apply weighted median with weights obtained in 3. to A. |
|
|
| Back to top |
|
|
|
| aruzinsky... |
Posted: Mon Jun 15, 2009 5:13 am |
|
|
|
Guest
|
On Jun 15, 9:12 am, aruzinsky <aruzin... at (no spam) general-cathexis.com> wrote:
[quote:10f7e3ac99]On Jun 15, 1:31 am, lcs7... at (no spam) gmail.com (Laurent S.) wrote:
Aruzinsky writes:
> No, there was no Gaussian convolution in my examples. I used a
> WEIGHTED median filter with Gaussian wieghts.
No explicit convolution I agree. However I suspect
(and assumed) that for BITONAL images
any "weighted median filter" is equivalent to a convolution with
a kernel function that is essentially your weighting --
followed by 50% threshholding.
So my question about kernel amounts to :
What are the coefficients of the
your 0.7 sd Gaussian weighting?
It is more likely to preserve thin lines.
Certainly worth testing. Could you try both
--- 3 x 3 median, and
--- 0.7 sd Gaussian weighted median
on the new test file called "bw_03" in
http://topo.math.u-psud.fr/~slc/speckle_samples/
please?
That test file illustrates narrow isthmuses as well as thin curves.
Ithmuses are common in fonts with serifs, and will be thin in
small but vital characters such as second order subscripts in
scanned math.
Cheers
Laurent S.
I typically don't deal with binary images and I had to convert your
images to 8 bit/channel, 3 channel using Irfranview.
http://www.general-cathexis.com/images/bw_03Median.pnghttp://www.general-cathexis.com/images/bw_03Median0.7.pnghttp://www.general-cathexis.com/images/bw_03Median0.675.pnghttp://www.general-cathexis.com/images/bw_03Median0.625.png
If I were you, I would try the following because it is theoretically
much better:
1. Blur image A to get B (floating point)
2. Get gradients of B
3. For each pixel, get asymmetric Gaussian weight kernels elongated in
the gradient directions of the B.
4. Apply weighted median with weights obtained in 3. to A.- Hide quoted text -
- Show quoted text -
[/quote:10f7e3ac99]
CORRECTION:
3. For each pixel, get asymmetric Gaussian weight kernels shortened in
the gradient directions of the B. |
|
|
| Back to top |
|
|
|
| Jens Dierks... |
Posted: Mon Jun 15, 2009 4:48 pm |
|
|
|
Guest
|
Laurent Swrote:
[quote:2ad20341d6]Aruzinsky writes:
No, there was no Gaussian convolution in my examples. I used a
WEIGHTED median filter with Gaussian wieghts.
No explicit convolution I agree. However I suspect
(and assumed) that for BITONAL images
any "weighted median filter" is equivalent to a convolution with
a kernel function that is essentially your weighting --
followed by 50% threshholding.
[/quote:2ad20341d6]
I dont think that this is the same, weighted Median means that the
distances to the center of the sorted values are modified, not the
values itself.
[quote:2ad20341d6]Certainly worth testing. Could you try both
--- 3 x 3 median, and
--- 0.7 sd Gaussian weighted median
on the new test file called "bw_03" in
http://topo.math.u-psud.fr/~slc/speckle_samples/
please?
That test file illustrates narrow isthmuses as well as thin curves.
Ithmuses are common in fonts with serifs, and will be thin in
small but vital characters such as second order subscripts in
scanned math.
[/quote:2ad20341d6]
maybe the medhybrid Filter, described in:
http://www.cs.tau.ac.il/~turkel/notes/meanmed.pdf
would give pleaseable results?
But aruzinskys asymmetric Gaussian weight kernels should also
be quite good. |
|
|
| Back to top |
|
|
|
| ImageAnalyst... |
Posted: Wed Jun 17, 2009 5:20 am |
|
|
|
Guest
|
If I gave you MATLAB code, would you be able to run it? I.e., do you
have MATLAB? |
|
|
| Back to top |
|
|
|
| Laurent S.... |
Posted: Wed Jun 17, 2009 8:03 pm |
|
|
|
Guest
|
Hello Jens Dierks,
It seems you are on the right track (best
performance yet) with both "bw_04_least3" and
"bw_04_med9Cen5". I have added all your tests based on
my "bw_04" to
http://topo.math.u-psud.fr/~slc/speckle_samples/
You are too modest in stating:
[quote:f6104a235f]that [ i.e., the test "bw_04_med9Cen5" ]
should come close to the gaussian07 weighted
median
[/quote:f6104a235f]
since it is incomparably better than Arusinsky's
test "bw_03Median0.7" which uses his weighted sd
0.7 Gaussian weighted median filter.
What tools are you using? Hopefully freely
available?
My original plan was to use nothing but C to
eliminate rather small speckles --- with no other
smoothing/rounding effects whatever. Maybe you are
close to that goal?
Also, I recently wrote;
[quote:f6104a235f]At this point we have three or four good candidates for
despeckling bw scanned print --- the more subtle of which
may also do some desirable smoothing.
[/quote:f6104a235f]
But, so far, I see no desirable smoothing (beyond
despeckling) in the tests presented so far. That
suggests we focus on despeckling alone --- seeking
speed, simplicity, and ready availability.
Cheers
Laurent S. |
|
|
| Back to top |
|
|
|
| Jens Dierks... |
Posted: Fri Jun 19, 2009 10:40 am |
|
|
|
Guest
|
Laurent S. wrote:
[quote:dd0d982464]What tools are you using? Hopefully freely
available?
[/quote:dd0d982464]
I use Delphi (pascal), but if you have some experience
with c and image processing, this shouldnt be a big
problem.
[quote:dd0d982464]My original plan was to use nothing but C to
eliminate rather small speckles --- with no other
smoothing/rounding effects whatever. Maybe you are
close to that goal?
[/quote:dd0d982464]
You can extend the least3 function do greater sizes,
dependend on how big artefacts you want to remove:
Starting from the middle pixel, count all connected
pixel with the same color until you reach the minimum
count for letting the value as is.
Then go on to the next pixel with a different color.
If the minimum count isnt reached, change the color
of the middle pixel (and the next pixels, if they have
the same color).
Because the issue of no changing should be the normal
case, the main procedure is reading and skipping pixels.
No speed problems, beside reading and writing of the
files itself.
And you can do the changing on the fly, no need to
make a copy. And in the case of changing pixels:
change all connected pixels, so you dont have to look
in the direction of prior processed pixels, what makes
the code even easier.
[quote:dd0d982464]Also, I recently wrote;
At this point we have three or four good candidates for
despeckling bw scanned print --- the more subtle of which
may also do some desirable smoothing.
But, so far, I see no desirable smoothing (beyond
despeckling) in the tests presented so far. That
suggests we focus on despeckling alone --- seeking
speed, simplicity, and ready availability.
[/quote:dd0d982464]
I dont know what the OCR prefers, you can do a gaussian
convolution + thresholding to smooth and maybe to
make the letters more bold.
Have you tried these things in matlab so far?
If it is fast enough, no need to program it yourself.
Jens |
|
|
| Back to top |
|
|
|
| Laurent S.... |
Posted: Sat Jun 20, 2009 11:45 pm |
|
|
|
Guest
|
Yesterday I wrote:
[quote:0e5b330e50]I may discuss elimination of bigger blobs at a
later time; most of the methods to be used here
do apply, but enhancements of technique and
reformulation of results seem necessary.
[/quote:0e5b330e50]
Such a discussion seems to fall into place
without a serious hitch provided we agree that
the suppression of a black blob B_0 in general
entails painting in white somewhat more than the
pixels of B_0. Namely we consider the outermost
circle component C of the external rampart
E(B_0) of B_0. The unbounded connected component
of R^2 - C is is disjoint from B_0 and hence
homeomorphic to the open annulus C \times R, as
is the connected component of R^2 - B_0
containing C. The other component of R^2 - C
contains B_0 and has closure in R^2 that is a
2-disk D; it contains B_0 and perhaps some of the
other connected components, say B_1, B_2, ..., of
B. Assuming that B_0 turns out to be small, we
will paint white not just B_0 but \D \cap B which
is all of the connected components of B that
happen to lie in D.
One gets a clearer picture of B1, B_2,... by
considering the frontier loops C', C'',... of
N(B_0) that are distinct from the outermost loop
C. They are the boundaries of disjoint 2-disks
D', D'',... that we call the 'alveoles' of B_0
each disjoint from B_0. The connected components
of B that lie in some one of the alveoles D',
D'', ... are the components of B that get painted
white along with B_0.
This picture is not really useful to our
computer because the painting operation occurs
after it has calculated C but before it has
calculated C', C''. Fortunately, given a
calculation of C there are calculations
(e.g. row-by-row) of the pixels within C
--- well known in computer graphics programs.
Incidentally we have essentially described a slow
one yesterday. Namely use the projection
p_B(C) to paint white the pixels that this singular
path touches. This can be viewed as eroding D and
the collection of pixels in it, strictly reducing
the number of those pixels. One can iterate this
sort of erosion until all pixels in D are painted
white.
This more vigorous blob suppression is
reasonable because each of the other connected
components of B is always no larger than B_0
according to several common measures of size (but
not by all, e.g. perimeter!). The most convenient
measure of size in the present context seems to
be 'bounding box size'. The bounding box of a
compact set in R^2 is the least rectangle
containing it and having sides parallel to the
coordinate axes.
Lemma 1. Let D be a 2-disk embedded in R^2
with boundary loop C. Then the bounding
box \bbox C of C in R^2 is exactly the bounding
box \bbox D of D.\qed
Lemma 2. If X and Y are compacta in R^2
and Y is a neighborhood of X, then
\bbox Y is a neighborhood of \bbox X. \qed
______________
REMARK ON EFFICIENCY: In the algorithm of
yesterday for elimination of blobs of =< 3 pixels
we gained speed by aborting the calculation of
p_B(C) whenever an initial segment touched >= 4 pixels.
A similar trick applies in the present context.
If the bounding box for an initial segment of C
becomes larger than that allowed for a black blob
B_0 whose elimination is being contemplated,
then, by the above lemmas and the fact that
epsilon is arbitrarily small, we immediately know
that \bbox B_0 will also be too big, and we can
forthwith abort efforts to paint D \cap B white.
______________
Putting the above modifications onto the
algorithm described yesterday seems to establish:
ASSERTION.
Let I be a black-white image in the plane
R^2, whose pixels are the unit squares about the
integral points. Suppose that, outside some
bounded rectangle, all pixels are white, or else
all pixels are black.
For this data we have just described an
algorithm that, given a pair of positive integers
(a,b), paints white all the the black blobs that
lie within some pixel rectangle of width =< a and
height =< b. It alters no other black pixels and
it alters no white pixel.
______________
VARIANT 1.
The same of course holds with black and
white exchanged.
______________
Applying this and the first algorithm (or
better running them unterlaced?) yields:
VARIANT 2.
For the same data we have described
algorithms that, given two pairs of positive
integers (a,b) and (a',b'), eliminate all the
the black blobs that lie within some pixel
rectangle of width a and height b, and eliminate
all the white blobs that lie within some pixel
rectangle of width a' and height b'.
In this process:
(i) A black blob B_0 not lying in an a\times b
pixel rectangle is modified by filling with black
poxels all the white alveoles in it that lie in
an a' \times b' pixel rectangle. But B_0 is
altered in no other respect; in particular, its
outermost rampart loop and its bounding box are
unchanged.
(ii) A white blob W_0 not lying in an a'\times b'
pixel rectangle is modified by filling with white
the alveoles in it that lie in an
a \times b pixel rectangle. But W_0 is it is
altered in no other respect; in particular, its
outermost rampart loop and its bounding box are
unchanged.
______________
VARIATION 3. It seems that taxicab length of
diagonal of bounding box can replace the \bbox
size used above. The taxicab length of a pixel
rectangle diagonal is a+b for an (a \times b
bounding) box. Many very different boxes have the
same taxicab diagonal-length.
I hope that all this is not already too
complex to program :)
The blob-enumeration algorithms I have seen
discussed are quite different.
Cheers
Laurent S. |
|
|
| Back to top |
|
|
|
| Hesham... |
Posted: Fri Jul 03, 2009 5:48 am |
|
|
|
Guest
|
Laurent, are you still looking for an algorithm/implementation?
I've implemented very efficient code for another application but I
suppose it can be used to remove 'bad blobs'.
Using integral images and then box filters you can detect the blobs.
And using different sizes of the box filters you can detect the scale
of the blob as well. Instead of box filters can use a small octagon
inside a larger one which is more isotropic, but a bit slower (two
more integral images should be computed and for computing the response
at each pixel 6 more additions are needed). For a 640x480 grayscale
image it takes a few millisecond.
In case it may come handy, next weekend I can work on this for 'bad
blobs' and send you the C code. |
|
|
| Back to top |
|
|
|
| Jens Dierks... |
Posted: Mon Jul 06, 2009 10:22 am |
|
|
|
Guest
|
Laurent S. wrote:
....
[quote:cc90fe2110]In this thread also, Jens Dierks has impressively worked some examples
using medial filtering in raw Pascal; that could lead to the requested
open code and binaries but he has not yet committed himself (yet:).
Nor estimated processinf times. But your methods may be close to his.
[/quote:cc90fe2110]
I thought my explanations would be enough, because the algorithms
are very easy.
You can download a Delphi6 version with sources and a compiled
version for windows:
http://freenet-homepage.de/JDierks/tmp/ScanFilter.zip
I had some errors in the open dialog, but couldnt find the reason,
its more a beta version for testing.
File format is bitmap, 2 colors.
Best regards,
Jens D. |
|
|
| Back to top |
|
|
|
| Jens Dierks... |
Posted: Mon Jul 06, 2009 3:38 pm |
|
|
|
Guest
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Sun Nov 29, 2009 6:35 am
|
|