Main Page | Report this Page
 
Computers Forum Index  »  Computer - Databases - Berkeley  »  $str or md5($str) use as key?
Page 1 of 1    

$str or md5($str) use as key?

Author Message
palexvs@gmail.com
Posted: Fri Jan 18, 2008 2:23 pm
Guest
Few month ago I tested berkeley-db with various configurations (B-tree
or Hash, $str or md5($str) for key) and choose B-tree with md5($str)
for key.
But now tested again and get such result:
insert to emty DB 3041977 records
1. key - string whith ~72 chars ([A-Z0-9_-|]{1,72}).
200s - Btree
2000s - Hash
2. key - md5(string) 16 bytes
900s - Btree
1000s - Hash
3. key - md5_hex(string) 32 chars ([A-F0-9]{32}).
1000s - Btree
1200s - Hash

Why it's so?

Use very simple script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.8.8;
use BerkeleyDB;
use Benchmark::Timer;
use Digest::MD5 qw/md5_hex md5/;

my $module = "BerkeleyDB::$ARGV[2]";

my $bdbp = new $module -Filename => $ARGV[1], -Cachesize => 100000000,
-Flags => DB_CREATE or die "File '$ARGV[1]' has no BDB format\n";
open(FH,'<',$ARGV[0]) or die "Can't open input file: $ARGV[0]\n";
my $ST;

my $t = Benchmark::Timer->new();
$t->start('ALL');

while(<FH>) {
chomp();
my $UUID = uc($_);
# my $status = $bdbp->db_put(md5_hex($UUID),$UUID,DB_NOOVERWRITE);
# my $status = $bdbp->db_put(md5($UUID),$UUID,DB_NOOVERWRITE);
my $status = $bdbp->db_put($UUID,$UUID,DB_NOOVERWRITE);
}

close(FH);
undef $bdbp;

$t->stop('ALL');
print $t->report;
 
Guest
Posted: Mon Jan 21, 2008 6:54 am
On Jan 18, 10:23 pm, "pale...@gmail.com" <pale...@gmail.com> wrote:
Quote:
Few month ago I tested berkeley-db with various configurations (B-tree
or Hash, $str or md5($str) for key) and choose B-tree with md5($str)
for key.
But now tested again and get such result:
insert to emty DB 3041977 records
1. key - string whith ~72 chars ([A-Z0-9_-|]{1,72}).
200s - Btree
2000s - Hash
2. key - md5(string) 16 bytes
900s - Btree
1000s - Hash
3. key - md5_hex(string) 32 chars ([A-F0-9]{32}).
1000s - Btree
1200s - Hash

Why it's so?

Use very simple script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.8.8;
use BerkeleyDB;
use Benchmark::Timer;
use Digest::MD5 qw/md5_hex md5/;

my $module = "BerkeleyDB::$ARGV[2]";

my $bdbp = new $module -Filename => $ARGV[1], -Cachesize => 100000000,
-Flags => DB_CREATE or die "File '$ARGV[1]' has no BDB format\n";
open(FH,'<',$ARGV[0]) or die "Can't open input file: $ARGV[0]\n";
my $ST;

my $t = Benchmark::Timer->new();
$t->start('ALL');

while(<FH>) {
chomp();
my $UUID = uc($_);
# my $status = $bdbp->db_put(md5_hex($UUID),$UUID,DB_NOOVERWRITE);
# my $status = $bdbp->db_put(md5($UUID),$UUID,DB_NOOVERWRITE);
my $status = $bdbp->db_put($UUID,$UUID,DB_NOOVERWRITE);

}

close(FH);
undef $bdbp;

$t->stop('ALL');
print $t->report;

seems hash method has a weaker performace than btree when data sets is
small, but i don't know how large the data sets have to be to make
hash a better choice.
 
 
Page 1 of 1    
All times are GMT
The time now is Sun Nov 22, 2009 10:15 am