This patch (re-)gains performance in helgrind, following revision 15207, that
reduced memory use doing SecMap GC, but was slowing down some workloads
(typically, workloads doing a lot of malloc/free).

A significant part of the slowdown came from the clear of the filter,
that was not optimised for big ranges : the filter was working byte
per byte till an 8 alignment. Then working per 8 bytes at a time.

With the patch, the filter clear is done the following way:
   * all the bytes till 8 alignement are done together
   * then 8 bytes at a time till filter_line alignment (32 bytes)
   * then 32 bytes at a time.

Moreover, as the filter cache is small (1024 lines of 32 bytes),
clearing filter for ranges bigger than 32Kb was uselessly checking
several times the same entry. This is now avoided by using a range
check rather than a tag equality check.

As the new filter clear is significanly more complex than the previous simple
algorithm, the old algorithm is kept and used to check the new algorithm
when CHECK_ZSM is defined as 1.

The patch also contains a few micro optimisations and
   // VG_(track_die_mem_stack)       ( evh__die_mem );
as this had no effect and was somewhat costly.

With this patch, we have almost reached for all perf tests the same
performance as we had before revision 15207. Some tests are still
slightly slower than before the SecMap GC (max 2% difference).
Some tests are now significantly faster (e.g. sarp).
For almost all tests, we are now faster than valgrind 3.10.1.
Details below.

Regtested on x86/amd64/ppc64 (and regtested with all compile time
checks set).
I have also regtested with libreoffice and firefox.
(with firefox, also with CHECK_ZSM set to 1).

Details about performance:
hgtrace = this patch
trunk_untouched = trunk
base_secmap = trunk before secmap GC
valgrind 3.10.1 included for comparison
Measured on core i5 2.53GHz

-- Running  tests in perf ----------------------------------------------
-- bigcode1 --
bigcode1 hgtrace   :0.14s  he: 2.6s (18.4x, -----)
bigcode1 trunk_untouched:0.14s  he: 2.6s (18.4x, -0.4%)
bigcode1 base_secmap:0.14s  he: 2.6s (18.6x, -1.2%)
bigcode1 valgrind-3.10.1:0.14s  he: 2.8s (19.8x, -7.8%)
-- bigcode2 --
bigcode2 hgtrace   :0.14s  he: 6.3s (44.7x, -----)
bigcode2 trunk_untouched:0.14s  he: 6.2s (44.6x,  0.2%)
bigcode2 base_secmap:0.14s  he: 6.3s (45.0x, -0.6%)
bigcode2 valgrind-3.10.1:0.14s  he: 6.6s (47.1x, -5.4%)
-- bz2 --
bz2      hgtrace   :0.64s  he:11.3s (17.7x, -----)
bz2      trunk_untouched:0.64s  he:11.7s (18.2x, -3.2%)
bz2      base_secmap:0.64s  he:11.1s (17.3x,  1.9%)
bz2      valgrind-3.10.1:0.64s  he:12.6s (19.7x,-11.3%)
-- fbench --
fbench   hgtrace   :0.29s  he: 3.4s (11.8x, -----)
fbench   trunk_untouched:0.29s  he: 3.4s (11.7x,  0.6%)
fbench   base_secmap:0.29s  he: 3.6s (12.4x, -5.0%)
fbench   valgrind-3.10.1:0.29s  he: 3.5s (12.2x, -3.5%)
-- ffbench --
ffbench  hgtrace   :0.26s  he: 9.8s (37.7x, -----)
ffbench  trunk_untouched:0.26s  he:10.0s (38.4x, -1.9%)
ffbench  base_secmap:0.26s  he: 9.8s (37.8x, -0.2%)
ffbench  valgrind-3.10.1:0.26s  he:10.0s (38.4x, -1.9%)
-- heap --
heap     hgtrace   :0.11s  he: 9.2s (84.0x, -----)
heap     trunk_untouched:0.11s  he: 9.6s (87.1x, -3.7%)
heap     base_secmap:0.11s  he: 9.0s (81.9x,  2.5%)
heap     valgrind-3.10.1:0.11s  he: 9.1s (82.9x,  1.3%)
-- heap_pdb4 --
heap_pdb4 hgtrace   :0.13s  he:10.7s (82.3x, -----)
heap_pdb4 trunk_untouched:0.13s  he:11.0s (84.8x, -3.0%)
heap_pdb4 base_secmap:0.13s  he:10.5s (80.8x,  1.8%)
heap_pdb4 valgrind-3.10.1:0.13s  he:10.6s (81.8x,  0.7%)
-- many-loss-records --
many-loss-records hgtrace   :0.01s  he: 1.5s (152.0x, -----)
many-loss-records trunk_untouched:0.01s  he: 1.6s (157.0x, -3.3%)
many-loss-records base_secmap:0.01s  he: 1.6s (158.0x, -3.9%)
many-loss-records valgrind-3.10.1:0.01s  he: 1.7s (167.0x, -9.9%)
-- many-xpts --
many-xpts hgtrace   :0.03s  he: 2.8s (91.7x, -----)
many-xpts trunk_untouched:0.03s  he: 2.8s (94.7x, -3.3%)
many-xpts base_secmap:0.03s  he: 2.8s (94.0x, -2.5%)
many-xpts valgrind-3.10.1:0.03s  he: 2.9s (97.7x, -6.5%)
-- memrw --
memrw    hgtrace   :0.06s  he: 7.3s (121.2x, -----)
memrw    trunk_untouched:0.06s  he: 7.2s (120.3x,  0.7%)
memrw    base_secmap:0.06s  he: 7.1s (117.7x,  2.9%)
memrw    valgrind-3.10.1:0.06s  he: 8.1s (135.2x,-11.6%)
-- sarp --
sarp     hgtrace   :0.02s  he: 7.6s (378.5x, -----)
sarp     trunk_untouched:0.02s  he: 8.4s (422.0x,-11.5%)
sarp     base_secmap:0.02s  he: 8.6s (431.0x,-13.9%)
sarp     valgrind-3.10.1:0.02s  he: 8.8s (442.0x,-16.8%)
-- tinycc --
tinycc   hgtrace   :0.20s  he:12.4s (62.0x, -----)
tinycc   trunk_untouched:0.20s  he:12.6s (63.2x, -1.9%)
tinycc   base_secmap:0.20s  he:12.6s (63.0x, -1.6%)
tinycc   valgrind-3.10.1:0.20s  he:12.7s (63.5x, -2.3%)
-- Finished tests in perf ----------------------------------------------

== 12 programs, 48 timings =================

git-svn-id: svn:// a5019735-40e9-0310-863c-91ae7b9d1cf9
2 files changed