Avoid assumption of 32 bit aligned pixel data in RP's lowp gather()

The attached bug has most of the context, but the problem boils down
to the compiler assuming that a uint32_t* was aligned to 32 bits
and generated instructions like:
    vld1.32	{d16[0]}, [r4:32]
where the :32 means "aligned to 32 bits" [1]. Pixel data is usually
aligned to the so-called natural alignment (pointer size) because
we allocate it with `new` or `calloc`. However, if we were to call
drawRect and use a source starting at x = 1, this eventually leads
to SkBitmap::extractSubset creating a SkPixelRef that starts at
the original pixel* + 1 (which is now no longer evenly divisible
by 4).

Certain ARM devices were crashing on this generated assembly. Rather
than trying to write the assembly code by hand or use intrinsics,
we can tell GCC and Clang that the pointer might be unaligned and
then the generated instructions lack the :32 modifier (and make these
devices not crash anymore):
    vld1.32	{d16[0]}, [r4]

This fix is in our low precision pipeline code but may also be needed
in our high precision code too (which uses similar code). I wanted
to be careful with this change because it's pretty critical to
performance, so I kept the aligned version for cases where we know
the data is aligned (e.g. reading factors and biases for our gradient
stages).

This solution was inspired by Open CV
https://github.com/opencv/opencv/issues/25265

[1] https://developer.arm.com/documentation/ddi0597/2025-03/SIMD-FP-Instructions/VLD1--multiple-single-elements---Load-multiple-single-1-element-structures-to-one--two--three--or-four-registers-

Change-Id: I2892740acbb9db7434aab897e11fa41c3548a196
Bug: b/409859319
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/981638
Commit-Queue: Kaylee Lubick <kjlubick@google.com>
Commit-Queue: Daniel Dilan <danieldilan@google.com>
Auto-Submit: Kaylee Lubick <kjlubick@google.com>
Reviewed-by: Daniel Dilan <danieldilan@google.com>
1 file changed