Main Page | Report this Page
Linux Forum Index  »  General Linux Discussion  »  Why Does This Script Rin So Slow?...
Page 1 of 1    

Why Does This Script Rin So Slow?...

Author Message
Eric Robinson...
Posted: Wed Nov 18, 2009 10:43 am
Guest
I have a directory with 2 million files. This script runs pretty fast...

#!/bin/bash
j=0
for i in *
do
let j+=1
echo "$j: $i"
done

The next script runs a little slower....

#!/bin/bash
j=0
for i in *
do
let j+=1
the_file=$i
echo "$j: $the_file"
done

But THE NEXT script runs ridiculously slow...

#!/bin/bash
j=0
for i in *
do
let j+=1
the_file=`echo $i`
echo "$j: $the_file"
done

And the FINAL script (which represents the functionality I need) runs so
slow that it is completely unusable. Why the big difference?

#!/bin/bash
j=0
for i in *
do
let j+=1
the_file_lower_case=`echo $i | tr [:upper:] [:lower:]`
echo "$j: $the_file_lower_case"
done
 
Robert Newson...
Posted: Wed Nov 18, 2009 6:02 pm
Guest
Eric Robinson wrote:
....
Quote:
But THE NEXT script runs ridiculously slow...
....
the_file=`echo $i`
....
And the FINAL script (which represents the functionality I need) runs so
slow that it is completely unusable. Why the big difference?
....
the_file_lower_case=`echo $i | tr [:upper:] [:lower:]`
....

At a guess:


The `command` bits. Each of those executes in a separate process. The
kernel has to fork() for each command, followed by exec().

In looping over the directory of 2+ million entries, the ridiculously
slow script is creating 2+ million child processes (each of which does
"echo <filename>" for each filename); the unusable script is creating 4+
million child processes (one of which does echo <filename> with its
output piped into another doing the tr for each filename). Creating
child processes isn't free and creating 2+ million of them is certainly
going to have some impact, as is the 2+ million runnings of tr along
with their startup.

You may be much better off using awk, eg:

$ /bin/ls | awk '{print "mv -n \"" $0 "\" \"" tolower($0) "\""} | /bin/sh

[I'm no awk expert, just worked that out from the man page.]

which uses 3 processes, the first lists the files, the second creates a
command to rename the file to lower case (using the mv command) and the
third then executes the created commands. The only problem in processes
comes in that /bin/mv will be created as a child process 2+ million times.

The only way I can think of to rename the files FAST is to use a C
program, to either

a) replace the shell part of the above pipeline (along with a modified
awk print command) by reading 2 filenames from stdin and using rename(2)
to do it; or

b) have the C program do the whole lot: scan the directory, convert to
lower case, rename as appropriate - then it'd only be one process
regardless of the number of files
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Sat Mar 20, 2010 6:36 pm