expand float peepholes

I started adding more _imm Ops like mul_f32_imm and noticed that there
was some low hanging peephole fruit here that we can harvest first.

Each of these peepholes I've added is sound in the way that I've
documented, and the div_f32 and mul_f32 peepholes are actively being
hit.  I think it's inevitable that the add_f32 and sub_f32 ones will be
hit eventually too as we expand how skvm is used.

The easiest place to see this in action is
when SkColorFilter_Matrix::program() is passed an opaque color:

    skvm::F32 invA = p->div(p->splat(1.0f), *a)

now leaves invA as splat(1.0f) directly, skipping the premul divide.
We still do the next part checking to see if invA is infinity; more
on that at the end of the CL description.

In this way I think we ought to never need to thread through a
shader_is_opaque parameter to color filters, instead letting the program
builder simply look and see if it is, among any number of other similar
optimizations.

Interestingly, though we may know sometimes the input to
SkColorFilter_Matrix::program() is opaque, as written today we'll never
know if the output is.  Since we don't specialize the code based on the
0/1 values of the matrix, we lose the knowledge that alpha is 1.0f as it
goes through the matrix.  There are two good ways to fix this:

   1) do specialize on 0/1 values of the matrix so that the program
      sees alpha is not changed when alpha is not changed;

   2) wrap today's virtual program() with a base-class non-virtual
      that queries getFlags() & kAlphaUnchanged_Flag to save and restore
      the input alpha, essentially marking any code that changes alpha
      as dead code.

2) is kind of the brute force, trusting version of 1), but does have
the advantage that the generated code need not change.  Still, in this
case of a matrix, I think we'll want to look at 0/1 values anyway...
they'll come up more often than just for the alpha channel.

You can see a mul_f32 peephole happen when the blitter goes to store any
known-opaque color to memory.  We go through some logic like

     alpha_as_byte = round(mul(alpha, splat(255.0f)))

and if alpha is known to be splat(1.0f), we'll now skip that mul(),

     alpha_as_byte = round(splat(255.0f))

I think this is all the strictly viable float peepholes where one
argument is a constant.  Obviously there are lots of peepholes we can
write for int32, int16, and bitwise instructions, and then there's a
whole untapped world of peepholes to explore when _all_ arguments are
constant.  These all-constant peepholes let our program notice that,
e.g., 1.0f < infinity and we can skip that part of the unpremul too, or
that round(splat(255.0f)) == 0xff and we can skip that work.

These all-constant peepholes may not be super important, as anywhere
they can trigger, the instruction must be hoistable: since all arguments
are constant, none depend on loop variables.  But still, nicer to run
once ever at compile time than once per invocation at runtime.  And it's
less code to analzye, less code to JIT, fewer instructions to interpret,
etc.

Change-Id: Ia2dc5af2cfff71a12693a2903f579a57c9302d12
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/255616
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2 files changed