On Jan 18, 10:23 pm, "pale...@[EMAIL PROTECTED]
" <pale...@[EMAIL PROTECTED]
> wrote:
> Few month ago I tested berkeley-db with various configurations (B-tree
> or Hash, $str or md5($str) for key) and choose B-tree with md5($str)
> for key.
> But now tested again and get such result:
> insert to emty DB 3041977 records
> 1. key - string whith ~72 chars ([A-Z0-9_-|]{1,72}).
> 200s - Btree
> 2000s - Hash
> 2. key - md5(string) 16 bytes
> 900s - Btree
> 1000s - Hash
> 3. key - md5_hex(string) 32 chars ([A-F0-9]{32}).
> 1000s - Btree
> 1200s - Hash
>
> Why it's so?
>
> Use very simple script:
> #!/usr/bin/perl
> use strict;
> use warnings;
> use 5.8.8;
> use BerkeleyDB;
> use Benchmark::Timer;
> use Digest::MD5 qw/md5_hex md5/;
>
> my $module = "BerkeleyDB::$ARGV[2]";
>
> my $bdbp = new $module -Filename => $ARGV[1], -Cachesize => 100000000,
> -Flags => DB_CREATE or die "File '$ARGV[1]' has no BDB format\n";
> open(FH,'<',$ARGV[0]) or die "Can't open input file: $ARGV[0]\n";
> my $ST;
>
> my $t = Benchmark::Timer->new();
> $t->start('ALL');
>
> while(<FH>) {
> chomp();
> my $UUID = uc($_);
> # my $status = $bdbp->db_put(md5_hex($UUID),$UUID,DB_NOOVERWRITE);
> # my $status = $bdbp->db_put(md5($UUID),$UUID,DB_NOOVERWRITE);
> my $status = $bdbp->db_put($UUID,$UUID,DB_NOOVERWRITE);
>
> }
>
> close(FH);
> undef $bdbp;
>
> $t->stop('ALL');
> print $t->re****t;
seems hash method has a weaker performace than btree when data sets is
small, but i don't know how large the data sets have to be to make
hash a better choice.


|