time_hash_map_fill.cpp is a micro-benchmark specifically designed to highlight aspects of concurrent resizing algorithm of the hash tables. It was derived from the Count Strings example that counts the number of unique words. But to exclude synchronization on the counters from the picture, it was simplified to build just a set of unique numbers from an input array. The array is filled evenly by using a pseudo-random number generator from the standard C library for various proportions of unique numbers. For example, for 5% of unique numbers, the same number is repeated 20 times on average. Together, it gives 5% of actual insertions and 95% are just lookups. However, in the beginning, there are more new keys occur than in the end. In addition, a size of the source array correlates with input rates in order to produce the same number of unique keys at the end, and so exclude cache effects from the equation.
This benchmark outputs results in Excel* and html file formats by default. To generate text (CSV) file instead, specify STAT_FORMAT=pivot-csv environment variable. To change the default table size, set TABLE_SIZE.
Or to get statistics from different runs:
src$ make time_hash_map_fill args=-v STAT_FORMAT=pivot-csv TABLE_SIZE=250000
src$ make time_hash_map_fill TABLE_SIZE=50000 run_cmd="bash ../../src/perf/run_statistics.sh"
src$ sort -t , -k 1dr,2 -k 3n,4 -k 7n,7 ../build/{scrambled_path}/time_hash_map_fill.csv -o perf/time_hash_map_fill.csvHere, field 7 is "Column" field that contains input rates because run_statistics.sh adds hostname and number of the run as 5 and 6 fields. Now, to build gif diagram, run:
perf$ pl -maxrows 200000 -maxfields 1500000 -maxvector 1200000 -gif -scale 1.8 time_hash_map_fill.html
#setifnotgiven NAMES = $makelist("1.CHMap 2.CUMap 3.OLD")
#setifnotgiven LABLESIZE = 0.06
#proc settings
encodenames: yes
units: cm
#proc getdata
file: time_hash_map_fill.csv
fieldnameheader: yes
delim: comma
showdata: no
select: @@Mode = insert
pf_fieldnames: Name Mode Threads Value
filter:
##print @@Name,"@@Items on @@Column",@@3,@@Value
#endproc
#proc page
pagesize: 70 50
tightcrop: yes
#endproc
#proc processdata
action: summary
fields: Name Mode Threads
valfield: Value
fieldnames: Name Mode Threads Average sd sem n_obs Min Max
showdata: no
#proc categories
axis: x
datafield: Mode
#proc areadef
title: Throughput on Insert operation
titledetails: size=14 align=C
areaname: slide
xscaletype: categories
xautorange: datafield=Mode
xaxis.stubs: usecategories
xaxis.label: Threads across table sizes and % of input rates
// yrange: 0 70
yautorange: datafield=Max,Min
yaxis.stubs: inc
yaxis.label: ops/ns
// yaxis.stubformat: %3.1f
autowidth: 1.1
autoheight: 0.07
frame: yes
#for LABEL in @NAMES
#set NLABEL = $arithl(@NLABEL+1)
#set COLOR = $icolor( @NLABEL )
#proc legendentry
label: @LABEL
sampletype: color
details: @COLOR
#procdef catlines
select: @Name = @LABEL
catfield: Mode
subcatfield: Threads
subcats: auto
plotwidth: 0.8
#saveas C
#proc catlines
#clone C
dpsymbol: shape=square radius=@LABLESIZE style=solid color=@COLOR
valfield: Average
errfield: sd
#proc catlines
#clone C
valfield: Max
dpsymbol: shape=triangle radius=@LABLESIZE style=solid color=@COLOR
#proc catlines
#clone C
valfield: Min
dpsymbol: shape=downtriangle radius=@LABLESIZE style=solid color=@COLOR
#endloop
#proc legend
location: 3.2 max
seglen: 0.2
#endproc