8185976: PPC64: Implement MulAdd and SquareToLen intrinsics

This implementation is based on the algorithm implemented in java. It yields a performance speedup of: JDK8: 23% JDK9: 5% JDK10: 5%

Reviewed-by: mdoerr, goetz
6 files changed