Add SK_PREFETCH and use in SkBlurImageFilter.

Relative speed is 1.2-1.6x on desktop, 1.0-1.2x on Nexus 4.

(Division remains the bottleneck, now more so.)

BUG=
R=senorblanco@google.com, reed@google.com, senorblanco@chromium.org

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/57823003

git-svn-id: http://skia.googlecode.com/svn/trunk/src@12129 2bbb7eff-a529-9590-31e7-b0007b416f81
diff --git a/effects/SkBlurImageFilter.cpp b/effects/SkBlurImageFilter.cpp
index 2b823bc..0fa54b5 100644
--- a/effects/SkBlurImageFilter.cpp
+++ b/effects/SkBlurImageFilter.cpp
@@ -144,6 +144,8 @@
                 sumB += SkGetPackedB32(r);
             }
             sptr += srcStride;
+            // The next leading pixel seems to be too hard to predict.  Hint the fetch.
+            SK_PREFETCH(sptr + (bottomOffset + 1) * srcStride);
             dptr += dstStride;
         }
     }