cabdbb5cab3740c7082e44b770a582c8186888e9 - platform/external/valgrind

commit	cabdbb5cab3740c7082e44b770a582c8186888e9	[log] [tgz]
author	philippe <philippe@a5019735-40e9-0310-863c-91ae7b9d1cf9>	Mon Apr 20 21:33:16 2015 +0000
committer	philippe <philippe@a5019735-40e9-0310-863c-91ae7b9d1cf9>	Mon Apr 20 21:33:16 2015 +0000
tree	13129b251d33b24dc72d5b106fa446a145797352
parent	72dabf4f1693d23da3bfa972cb3c9a149372e4f4 [diff]

This patch changes the policy that does the GC of OldRef and RCEC
conflict cache size.

The current policy is:
A 'more or less' LRU policy is implemented by giving
to each OldRef a generation nr in which it was last touched.
A new generation is created every 50000 new access.
GC is done when the nr of OldRef reaches --conflict-cache-size.
The GC consists in removing enough generations to free
half of the entries.
After GC of OldRef, the RCEC (Ref Counted Exe Contexts)
not referenced anymore are GC-ed.

The new policy is:
An exact LRU policy is implemented using a doubly linked list
of OldRef.
When reaching --conflict-cache-size, the LRU entry is re-used.

The not referenced RCEC are GC-ed when less than 75% of the RCEC
are referenced, and the nr of RCEC is 'big' (at least half the
size of the contextTab, and at least the max nr of RCEC reached
previously).
(note: tried to directly recover a unref'ed RCEC when recovering
the LRU oldref, but that gives a lot of re-creation of RCEC).

new policy has the following advantages/disadvantages:
1. It is faster (at least for big applications)
On a firefox startup/exit, we gain about 1m30 second on 11m.
Similar 5..10% speed up encountered on other big applications
or on the new perf/memrw test.
The speed increase depends on the amount of memory
touched by the application. For applications with a
working set fitting in conflict-cache-size, the new policy
might be marginally slower than previous policy on platforms
having a small cache : the current policy only sets a generation
nr when an address is re-accessed, while the new policy
has to unchain and rechain the OldRef access in the LRU
doubly linked list.
2. It uses less memory (at least for big applications)
Firefox startup/exit "core" arena max use decreases from
1175MB mmap-ed/1060MB alloc-ed
to
994MB mmap-ed/913MB alloc-ed

The decrease in memory is the result of having a lot less RCEC:
The current policy let the nr of RCEC grow till the conflict
cache size is GC-ed.

The new policy limits the nr of RCEC to 133% of the RCEC
really referenced. So, we end up with a max nr of RCEC
a lot smaller with the new policy : max RCEC 191000
versus 1317000, for a total nr of discard RCEC operations
almost the same: 33M versus 32M.
Also, the current policy allocates a big temporary array
to do the GC of OldRef.

With the new policy, size of an OldRef increases because
we need 2 pointers for the LRU doubly linked list, and
we need the accessed address.
In total, the OldRef increase is limited to one Word,
as we do not need anymore the gen, and the 'magic'
for sanity check was removed (the check somewhat
becomes less needed, because an OldRef is never freed
anymore. Also, we do a new cross-check between
the ga in the OldRef and the sparseWA key).

For applications using small memory and having
a small nr of different stack traces accessing memory,
the new policy causes an increase in memory (one Word
per OldRef).

3. Functionally, the new policy gives better past information:
once the steady state is reached (i.e. the conflict cache
is full), the new policy has always --conflict-cache-size
entries of past information.
The current policy has a nr of past information varying
between --conflict-cache-size/2 and --conflict-cache-size
(so in average, 75% of conflict-cache-size).

4. The new code is a little bit smaller/simpler:
The generation based GC is replaced by a simpler LRU policy.

So, in summary, this patch should allow big applications
to use less cpu/memory, while having very little
or no impact on memory/cpu of small applications.

Note that the OldRef data structure LRU policy
is not really explicitely tested by a regtest.
Not easy at first sight to make such a test portable
between platforms/OS/compilers/....

git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15119 a5019735-40e9-0310-863c-91ae7b9d1cf9

4 files changed

tree: 13129b251d33b24dc72d5b106fa446a145797352