string: Add optimized strcpy-mte and stpcpy-mte

Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro
architectures the speedup over the non-MTE version is 53% on large strings
and 20-60% on small strings.
5 files changed