ART: Profile all branches for on-stack replacement

Change the switch, goto and mterp interpreters to profile
not-taken as well as taken branches.  This allows for on-stack
replacement when the cfg has been rearranged such that the loop
header was originally the fallthrough of a Dalvik byte-code branch.

Note that this increases the already-heavy cost of branch profiling.
Measuring on a Nexus 6 using a very branchy benchmark (logic subtest
from Caffeinemark), we see:

            No profiling     Taken only     Taken & not-taken
mterp          9728            3434              2384
C++ goto       3914            2422              2037
C++ switch     2986            2411              2112

As measured, the cost of branch profiling is dominating execution
time.  This will be addressed in follow-up CLs.

Change-Id: Ibc858f317398dd991ed8e4f3c3d72bd4c9a60594
8 files changed