Implemented BitCount as an intrinsic. With unit test.

Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
    much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
    support for this operation.

Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS  =  + 35%
ReversiEvalBench  (64-bit bitboard): 52,562KNS -> 89,086KNS  =  + 69%

Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
23 files changed