make g2p ~30% faster on mobile by suppressing a log (#85907)

Summary: using the tool from D39559248 i was able to make g2p faster on mobile by taking a look at profiles on stella frames. It turned out that the pytorch interpreter code does some logging that ends up being a pretty big bottleneck.

Differential Revision: D39901455

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85907
Approved by: https://github.com/dzdang
diff --git a/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp b/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
index ff3b0c0..e7f3940 100644
--- a/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
+++ b/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
@@ -236,7 +236,7 @@
     at::Tensor input,
     bool reduce_range) {
   if (reduce_range) {
-    TORCH_WARN("Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release.");
+    TORCH_WARN_ONCE("Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release.");
   }
 
   using at::Tensor;
diff --git a/torch/csrc/jit/mobile/interpreter.cpp b/torch/csrc/jit/mobile/interpreter.cpp
index 1e52754..f989f49 100644
--- a/torch/csrc/jit/mobile/interpreter.cpp
+++ b/torch/csrc/jit/mobile/interpreter.cpp
@@ -110,6 +110,10 @@
       // Check with iliacher if has been done.
       // Plus this is not safe as if you throw exception record function will be
       // left enabled. That is a TODO
+      // NOTE: this recordFunction logic takes up ~2-3% of cpu cycles in some
+      // workflows. do we need it and/or can we opt-out of
+      // isRecordFunctionEnabled with a macro? if we delete it, things appear to
+      // work just fine.
       bool prev_value = isRecordFunctionEnabled();
       if (!prev_value) {
         // enable only for the RecordFunction