| |
 |
|
| Computers Forum Index » Computer - Databases - Berkeley » $str or md5($str) use as key? |
|
Page 1 of 1 |
|
| Author |
Message |
| palexvs@gmail.com |
Posted: Fri Jan 18, 2008 2:23 pm |
|
|
|
Guest
|
Few month ago I tested berkeley-db with various configurations (B-tree
or Hash, $str or md5($str) for key) and choose B-tree with md5($str)
for key.
But now tested again and get such result:
insert to emty DB 3041977 records
1. key - string whith ~72 chars ([A-Z0-9_-|]{1,72}).
200s - Btree
2000s - Hash
2. key - md5(string) 16 bytes
900s - Btree
1000s - Hash
3. key - md5_hex(string) 32 chars ([A-F0-9]{32}).
1000s - Btree
1200s - Hash
Why it's so?
Use very simple script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.8.8;
use BerkeleyDB;
use Benchmark::Timer;
use Digest::MD5 qw/md5_hex md5/;
my $module = "BerkeleyDB::$ARGV[2]";
my $bdbp = new $module -Filename => $ARGV[1], -Cachesize => 100000000,
-Flags => DB_CREATE or die "File '$ARGV[1]' has no BDB format\n";
open(FH,'<',$ARGV[0]) or die "Can't open input file: $ARGV[0]\n";
my $ST;
my $t = Benchmark::Timer->new();
$t->start('ALL');
while(<FH>) {
chomp();
my $UUID = uc($_);
# my $status = $bdbp->db_put(md5_hex($UUID),$UUID,DB_NOOVERWRITE);
# my $status = $bdbp->db_put(md5($UUID),$UUID,DB_NOOVERWRITE);
my $status = $bdbp->db_put($UUID,$UUID,DB_NOOVERWRITE);
}
close(FH);
undef $bdbp;
$t->stop('ALL');
print $t->report; |
|
|
| Back to top |
|
|
|
| Guest |
Posted: Mon Jan 21, 2008 6:54 am |
|
|
|
|
On Jan 18, 10:23 pm, "pale...@gmail.com" <pale...@gmail.com> wrote:
Quote: Few month ago I tested berkeley-db with various configurations (B-tree
or Hash, $str or md5($str) for key) and choose B-tree with md5($str)
for key.
But now tested again and get such result:
insert to emty DB 3041977 records
1. key - string whith ~72 chars ([A-Z0-9_-|]{1,72}).
200s - Btree
2000s - Hash
2. key - md5(string) 16 bytes
900s - Btree
1000s - Hash
3. key - md5_hex(string) 32 chars ([A-F0-9]{32}).
1000s - Btree
1200s - Hash
Why it's so?
Use very simple script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.8.8;
use BerkeleyDB;
use Benchmark::Timer;
use Digest::MD5 qw/md5_hex md5/;
my $module = "BerkeleyDB::$ARGV[2]";
my $bdbp = new $module -Filename => $ARGV[1], -Cachesize => 100000000,
-Flags => DB_CREATE or die "File '$ARGV[1]' has no BDB format\n";
open(FH,'<',$ARGV[0]) or die "Can't open input file: $ARGV[0]\n";
my $ST;
my $t = Benchmark::Timer->new();
$t->start('ALL');
while(<FH>) {
chomp();
my $UUID = uc($_);
# my $status = $bdbp->db_put(md5_hex($UUID),$UUID,DB_NOOVERWRITE);
# my $status = $bdbp->db_put(md5($UUID),$UUID,DB_NOOVERWRITE);
my $status = $bdbp->db_put($UUID,$UUID,DB_NOOVERWRITE);
}
close(FH);
undef $bdbp;
$t->stop('ALL');
print $t->report;
seems hash method has a weaker performace than btree when data sets is
small, but i don't know how large the data sets have to be to make
hash a better choice. |
|
|
| Back to top |
|
|
|
|
|
All times are GMT
The time now is Sun Nov 22, 2009 10:15 am
|
|