Main Page | Report this Page
Science Forum Index  »  Image Processing Forum  »  Remove speckles in bilevel (bw) image...
Page 2 of 2    Goto page Previous  1, 2

Remove speckles in bilevel (bw) image...

Author Message
Ethan...
Posted: Wed Jul 08, 2009 6:05 am
Guest
I just wrote a function in matlab that removes blobs of 3 pixels or
less with 8 way connectivity that uses brute force:

It cycles through the binary image one pixel at a time, row by row.
If the pixel is a 1 and the sum of the 3x3 area around the pixel is <
4 continue, otherwise go to the next pixel:
If all of the 8 surrounding pixels are 0, then set the pixel to 0
and set flag as done (skip the rest, e.g. while not done, do).
There are 4 patterns of two connected pixel. Check for each pattern
and set flag as done as soon as any pattern matches. For example, if
the pixel to the right to is 1 and the 10 pixels around the pair are
0, then set the two pixels to zero and mark set flag as done (skip the
rest).
Then if the pixel is still not done, do the same thing for all 20
patterns of 3 connected pixels. (Check that the 3 pixels in the
pattern (actually, only the 2 additional pixels) are 1 and the
surrounding pixels are 0.) As soon as any pattern matches, set the 3
pixels to zero, set flag as done and continue to the next pixel.
Otherwise unset flag and go to next pixel.

In order to work at the edges, I add three rows and columns of 0s to
the image and only cycle through the pixels within this frame. At the
end, I remove the frame.
 
Laurent S....
Posted: Wed Jul 08, 2009 8:31 pm
Guest
Hi Jens,

You wrote Mon, 6 Jul 2009 23:38:20 +0200

[quote:0138792b13]You can download a Delphi6 version with sources and a compiled
version for windows:
http://freenet-homepage.de/JDierks/tmp/ScanFilter.zip

Some speed improvements:
http://freenet-homepage.de/JDierks/tmp/ScanFilterNew.zip
[/quote:0138792b13]

Congratulations, you seem to have reached two key goals:

--- adequate speed for production work
--- an available and usable binary executable

I would like to make something approaching a
real-world despeckling test on one or more legacy math
articles that are bridled by copyrights.

But, for that, we need you to provide a batch processing
mode in your "ScanFilter" program.

Perhaps the easiest way is add to your "ScanFilter"
the ability to process a 'batch file' of which
a typical line might be:

page_17.bmp > page_17_filt.bmp

Of course a multi-file "drag-and-drop" mode
would be more user-freindly. But that could be
introduced later.

Cheers,

Laurent S

PS. Have you system access to translation

bw .bmp format <==> bw .tiff group 4 fax format

? If so, prefer that .tiff format.
Most bw scans use that .tiff format
to reduce file bulk.
 
Jens Dierks...
Posted: Thu Jul 09, 2009 10:51 am
Guest
Laurent S.wrote:
[quote:d8306a0e8a]
Hi Jens,
[/quote:d8306a0e8a]
Hi, but why is this posting behind Ethan's message?

Also i dont know why the real world examples are not public, but
the OCR results will be?

....
[quote:d8306a0e8a]Congratulations, you seem to have reached two key goals:

--- adequate speed for production work
--- an available and usable binary executable

I would like to make something approaching a
real-world despeckling test on one or more legacy math
articles that are bridled by copyrights.

But, for that, we need you to provide a batch processing
mode in your "ScanFilter" program.

Perhaps the easiest way is add to your "ScanFilter"
the ability to process a 'batch file' of which
a typical line might be:

page_17.bmp > page_17_filt.bmp
[/quote:d8306a0e8a]
Ok, nearly the same:
call the program with:

ScanFilter inBmpFile outBmpFile Filtermode

where Filtermode is a number:
1 for Median9cw5
2 for Least3
3 for Least4

You can do a executeable batchfile like:
ScanFilter page_17.bmp page_17_filt.bmp 3
ScanFilter page_18.bmp page_18_filt.bmp 3
....

I know, its not really smart to start the program every
time again, but for know i doubt that these functions
are state of the art for real world examples.
And i dont want to waste much more time for free, if
this will not be useable in the end.

This is the update:
http://freenet-homepage.de/JDierks/tmp/ScanFilterBatch.zip

regards,
Jens


[quote:d8306a0e8a]PS. Have you system access to translation

bw .bmp format <==> bw .tiff group 4 fax format
[/quote:d8306a0e8a]
nope
 
Laurent S....
Posted: Sat Jul 11, 2009 12:32 am
Guest
Hi Ethan,

[quote:301206777b]I am going to assume that the image should be the way
it appears in Matlab & photoshop.
[/quote:301206777b]
Correct.

[quote:301206777b]9.12 seconds

6.65 seconds
[/quote:301206777b]
Not good enough for production. OK for research.

[quote:301206777b]the execution of this "algorithm" after it is compiled in C would
probably be much faster. (Probably by more than x10, but I really
don't have enough experience to say.)
[/quote:301206777b]
I would guess that a pefected brute force algorithm in C would be
fastest of any for =< 2 pixel speckles.
But beyond 4 pixels such programming seems to get horrid.

That is why I moved from brute force to (implicit) use of Jordan
curves; that seems to give a very nice though sophisticated algorithm
that would hopefully be quite competitive for eliminating blobs of many
pixels, say >=6.

[quote:301206777b]I read in a paper,
"Three or less connected pixels
[/quote:301206777b]
It is easy to immagine that this problem is open to
attack by massive parallelism.

On the other hand I am confident that excellent solutions
exist without.

Other considerable sources of extra speed are bit twiddling
and assembly language. The use for one byte per pixel is
convenient but wasteful both of RAM and of processing power.

[quote:301206777b]Hope this helps.
[/quote:301206777b]
Indeed it does.

Could you post the despeckled "gutter.tiff".
It is a fair debugging test.

Cheers

Laurent S.
 
Laurent S....
Posted: Sat Jul 11, 2009 7:22 pm
Guest
Hi Jens,

I proposed:

[quote:0922c75983]PS. A switch in your program requesting
white speckle elimination would be very useful
in practice.
[/quote:0922c75983]
Oops, unless I am mistaken median filtering
(even weighted) treats black and white entirely similarly.

So presumably my proposal should read:

| PS. Switches in your program requesting
| only white speckle elimination
| or only black speckle elimination might be
| of some interest.

My original proposal would apply to Ethan's MatLab
program.

Cheers

Laurent S.
 
Ethan...
Posted: Mon Jul 13, 2009 2:36 am
Guest
[quote:60b48a9c1f]Could you post the despeckled "gutter.tiff".
It is a fair debugging test.
[/quote:60b48a9c1f]
I posted the processed image here:
http://gallery.me.com/emontag#100041
 
Jens Dierks...
Posted: Tue Jul 14, 2009 11:49 am
Guest
Hi,
Laurent S. wrote:
....
[quote:0cebcf2f94]So presumably my proposal should read:

| PS. Switches in your program requesting
| only white speckle elimination
| or only black speckle elimination might be
| of some interest.
[/quote:0cebcf2f94]
Yes, i thought of that.
My update is:
http://freenet-homepage.de/JDierks/tmp/ScanFilter09.zip

I changed the LeastX functions to a LeastN function,
what should be equal to bwareaopen in Matlab.

Plus Median with chooseable center weight and some
words about the using of the program.

regards,
Jens D.
 
Laurent S....
Posted: Wed Jul 15, 2009 7:19 pm
Guest
Jens,

[quote:9433e85d90]Have you made the folder visible? I can't see it.
[/quote:9433e85d90]
Sorry, the full URL was:

http://topo.math.u-psud.fr/~slc/jd/ScanFilter09_doc_var_ls

Laurent S.

PS. I anticipate that to achieve good production speed
it will be necessary to avoid killing the
ScanFilter.exe process whenever another file is filtered.
(It is not urgent for small trials.)
 
Laurent S....
Posted: Wed Jul 15, 2009 11:49 pm
Guest
*** graph component algorithm ***

Jens,

For a bw bitmap, you gave a quick description of the median
filter that you are using in the program ScanFilter. But you
gave no description of your "Least N" algorithm for deleting
small components of the union B of black pixels.

To inspire you to say more, I will describe below a 'graph
component algorithm' applicable to any finite graph G, one
that will enumerate the connected components of G containing
=< N vertices by specifying the =< N vertices in each such
component; the corresponding component of G is then the full
subgraph with exactly these vertices.

This 'graph component algorithm' applies to give a possible
"Least N" procedure by choosing the vertices V of the graph
G to be the black pixels and the edges E to be the
adjacencies of (closed, square) black pixels of the given bw
bitmap.

Is this nearly your algorithm? Or nearly Matlab's?

Here is the 'graph component algorithm'. At each stage we
will have a partition \P of the vertices V of G into
disjoint classes.

Initially the partition \P is into classes consisting of a
single vertex.

In general the partition is into disjoint classes as follows:

--- one class called "big league" consisting entirely of
vertices v, for each of which the connected component of G
containing v is known to contain more than N vertices.

--- any number of classes -- to be called clubs --
each of which is connected and consists of =< N vertices.

Given such a partition \P and a club C in it, we examine
the members of the club (a linked list would be suitable)
searching for an edge e leading to a vertex v' outside the
club. Then there are a few cases to handle:

* If v' is in the big league then we amalgamate C into
the big league.

* if v' is in another club C' and together the two clubs
have > N members then both clubs amalgamate into the
"big league".

In both above cases one restarts this process with the
next club, if any. (One can use a linked list of clubs.)

* if v' is in another club C' and together the two clubs
have =< N members then the clubs amalgamate to form a new
club. Linked lists enumerating the two clubs can be chained
together to get one for the amalgamated club.

* if there is no v' outside C then the club is a connected
component of G; it is output and one moves on to the next
club, if any.

When no club is left, all components have been output, and
the listing of small components is complete. The big league
then consists of all vertices not in a component of =< N
vertices.

Cheers

Laurent S.

Observation. The black despeckling algorithm I have
sketched in this thread can be viewed as a involving a very
trivial case of the above 'graph component algorithm'.
Namely its application to the frontier F of the union B of
black pixels. This F is is already a graph -- its edges
being the edges lying in F of square pixels. Each
'negatively oriented' component component F_0 of F
corresponds naturally to a connected component B_0 of B
namely the one containing F_0. And conversely, given B_0,
its F_0 is the 'outer' frontier component of B_0. The
'graph component algorithm' above becomes quite trivial for
the graph F since its components are essentially canonical
cyclic linked lists, coded for example as 'turtle tracks'.
 
Jens Dierks...
Posted: Thu Jul 16, 2009 6:55 am
Guest
addition:

You described the options of the function:
" Least N : 1..100 = N, minimum pixelcount of connected blobs to stay. "

Since "blob" is defined here as all (8-way) connected pixels having the
same color, the desciption above is a bit misleading?
Because blobs arent connected, else it would be one blob...
N, minimum count of connected pixels to stay(?)
 
Laurent S....
Posted: Thu Jul 16, 2009 11:15 am
Guest
Jens,

Language is always slightly ambiguous like programs are
always slightly buggy. We need to debug the verbiage.
Am trying to contact you by email/telephone.
Will be back late -- after dinner.

Cheers

Laurent S.
 
Laurent S....
Posted: Thu Jul 23, 2009 8:55 pm
Guest
Thanks go to Steve Eddins for showing how a 'oneliner'
in MatLab can eliminate small (=< 4pixel) blobs in bw scans:

[quote:b742bfe796]This code removes both black and white blobs:

bw2 = bwareaopen( ~bwareaopen( ~bw, 4 ), 4);

It runs in about 1.67 seconds on my computer (computer
specs posted previously).

CPU: x86 Family 6 Model 14 Stepping 8, GenuineIntel
The measured CPU speed is: 1815 MHz
Number of processors: 2
RAM: 2046 MB
Swap space: 4959 MB
Microsoft Windows XP

The result is posted here:

http://blogs.mathworks.com/images/steve/2009/gutter_processed_2.tif
[/quote:b742bfe796]
Perfect results from standard commerclal functions!


Jens Dierks' program ScanFilter at

[quote:b742bfe796]http://freenet-homepage.de/JDierks/tmp/ScanFilter09.zip
[/quote:b742bfe796]
as announced Tue, 14 Jul 2009 19:49:51 +0200, is like home brew;
rather special -- but heady stuff. It runs here at (very
approximately) 10 times the speed, on the same file, giving
the same result.

Jens is now improving his nice user interface and batch mode.
That will permit a more accurate speed comparison. And also
massive use; I anticipate that, when the program is stable,
numerous public scientific literature sites will use it to
clean up their freely available legacy literature scans.

Cheers

Laurent S.
 
Laurent S....
Posted: Sat Aug 01, 2009 12:27 am
Guest
** 4-connectity blobs **

I have been inspecting results of despeckling bw scans
of scientific works using the remarkable utility
"ScanFilter" of Jens Dierke, that this thread has been fortunate
enough to instigate:

Jens> My update is:
[quote:854e8f79fe]http://freenet-homepage.de/JDierks/tmp/ScanFilter09.zip
[/quote:854e8f79fe]
Examples have recently led me to recommend elimination of a
'finer' sort of blob that should perhaps be called
"4-connectivity blob".

Here I will just make this notion clear and exhibit some of
the examples I have been looking at.

WHAT ARE 4-CONNECTIVITY BLOBS?

So far I have been discussing elimination small black or
white blobs (or speckles) in a black-white (=bw) bitmap on
the standard (infinite) grid of square pixels.

Let's focus first on white. By definition, a white blob has
so far been understood to be a (path-)connected component of
the union of all (square) white pixels. The path connecting
two pixels of the blob is allowed to pass through the corners
of the square pixels. This connectivity notion is also known
as 8-connectivity since a pixel of the (infinite square) grid
has 8 immediate neighbors. The white pixels are thus
partitioned into disjoint 8-connectivity blobs. Up to now I
have called them simply 'white blobs'. Similarly for black.

But now, there is also a (known!) notion of 4-connectity
blob; it arises if one restricts the connecting paths to
avoid the corners of all pixels. Each pixel has then 4
immediate neighbors rather than 8. There results a partition
of the white pixels into 4-connectity blobs. Every white
8-connectivity blob is thus partitioned into white
4-connectity blobs (one or more).

To be quite clear the original notion of blob I have been
using should now be called an 8-connectivity blob (black or
white).

Example: In the bitmap defined by the 8 x 8 checkerboard
(extended by white to infinity), there is exactly one black
8-connectivity blob consisting of all 32 black pixels. That
one blob breaks up into 32 black 4-connectivity blobs
consisting of one pixel each. There is just one white
8-connectivity blob and it extends to infinity. There are
(look carefully!) 17 white 4-connectivity blobs, one extending
to infinity and 16 consisting of one pixel each.


EXAMPLES FOR ELIMINATION OF SMALL WHITE 4-CONNECTIVITY BLOBS.

I have discussed elimination of 8-connectivity blobs ((black
or white) in bw bitmaps and Jens' ScanFilter is currently
a supremely fast tool for this.

Here are examples suggesting that elimination of white
4-connectivity blobs is desirable in bw scans of scientific
articles; see the directories "4-conn_w_char_bmp"
and "4-conn_w_fig_bmp" at:

http://topo.math.u-psud.fr/~slc/speckle_samples/

These two directories exhibit small white 4-connectivity
blobs that are not eliminated by the 8-connectivity
despeckling so far implemented by Jens' ScanFilter utility. The
first shows blobs in characters, the second shows blobs in
parts of line figures.

Where scans of scientific works without greytones (usually in
photos) are concerned, I have yet to encounter a small white
4-connectivity blob that should NOT be eliminated. In
other words, this new sort of despeckling seems safe.
(So far :)

Such white blobs can often be eliminated using suitable
median filtering, but perhaps not so simply cleanly and safely.

I hope Jens will easily add elimination of (small)
4-connectivity blobs to his ScanFilter utility!

Cheers

Laurent S.
 
Laurent S....
Posted: Sat Aug 01, 2009 11:27 pm
Guest
** Correction

Concerning the bw bitmap defined by the 8 x
8 standard bw checkerboard (extended by white
to infinity) I carelessly wrote on Sat, 01 Aug
2009 08:27:03 +0200 :

[quote:0766925441]There are (look carefully!) 17 white
4-connectivity blobs, one extending to infinity
and 16 consisting of one pixel each.
[/quote:0766925441]
when in fact, there are 19 white 4-connectivity
blobs:- one extending to infinity and 18
consisting of one pixel each. Those 18 are the
18 white pixels of the central 6 x 6 square
region of the 8 x 8 board.

Laurent S.
 
Laurent S....
Posted: Wed Nov 04, 2009 6:42 pm
Guest
*** ScanFilter version 2009v1.2 ***

Greetings all,

Jens Dierks (= JD) has released a new public
version of his free ScanFilter utility;
it is ScanFilter version 2009v1.2.

I have written an accompanying user guide, and some
background material on the 3x3 weighted median filters and
on the the "Least N" color component filters that
ScanFilter implements.

I will be managing the internet distribution site
for ScanFilter at:

http://lcs98.free.fr/soft/scanfilter/

Here are the opening paragraphs of
"ScanFilter_Guide.txt":

| The program ScanFilter is designed to
| efficiently (albeit incompletely) remove 'speckles'
| from bw (= black and white = bilevel) scanned images
| of print. The term 'speckle' here refers to a small
| collection of pixels with a common color (b or w),
| which the program user considers inapproptiate and
| hence wants to switch the color. Such "despeckling"
| is typically a preliminary step in preparing scanned
| images of print for onscreen viewing. It reduces file
| bulk in all compressed formats; it also increases
| image quality when done with care. Despeckling is
| often urgent when the original scan files were bw
| rather then grayscale, the degree of urgency
| depending on the scanner and its settings.
|
| ScanFilter currently operates under the
| Microsoft Windows operating system only. It combines
| a neat interactive user interface with the capacity
| to process many thousands of pages per hour. Its
| speed comes from occasional use of assembly language,
| and also some 64 bit vector processing features that
| are collectively called "MMX". The latter vector
| features were introduced on Intel's Pentium
| microprocessors, and now exist on many others,
| including AMD's K6 and later. Furthermore, ScanFilter
| has good input-output (= i/o) capabilities thanks to
| the free "LibTiffDelphi" code library that was
| accessed by JD from his Delphi Pascal programming
| environment; thus it can even serve as as a format
| converter.
|
| A last-minute addition to this version is a
| bitmap inspection tool intended to collaborate with
| other bw bitmap filtering utilities. It is a setup
| for visual scrutiny of differences between
| any two bw bitmaps on the same pixel grid. It is
| accessed by "drag-and-drop" onto the icon of the
| ScanFilter program.

After his considerable efforts to perfect
ScanFilter in correspondence with me throughout
August, September, and October, Jens has indicated
that he is now 100% focused on other programming
projects. Thus, this release marks an indefinite pause
in his devellopment of ScanFilter. However Jens has
generously provided the complete Delphi6 Pascal
sourcecode for 2009v1.2 allowing very liberal
conditions on other programmers' use thereof.

My ScanFilter site above will track and/or post both
eventual updates and related devellopments. I hope
members of this newsgroup will be offering comments,
suggestions, and announcements related to ScanFilter and
its documentation.

The present version of ScanFilter merely scatches
the surface of what can be called 'morphological bw image
processing'. But I believe it offers a very good basis
for interesting and useful devellopments in that
direction.

Thanks to all who have contributed to this thread,
and a special salute Jens' for his excellent programming.

Laurent S.
 
 
Page 2 of 2    Goto page Previous  1, 2
All times are GMT - 5 Hours
The time now is Sun Dec 06, 2009 6:37 pm