[nnc] Separate printing of optimized llvm bitcode from assembly (#56117)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56117

I was debugging an issue during instruction selection and wanted to
see the input bitcode.  This way we always print it before going into the asm
generation pass.
ghstack-source-id: 126592596

Test Plan: Run with `PYTORCH_JIT_LOG_LEVEL=">>llvm_codegen"`

Reviewed By: huiguoo

Differential Revision: D27781683

fbshipit-source-id: 84635d0ca2a1318ae7a9a73cc1d2df450d8b6a08
diff --git a/torch/csrc/jit/tensorexpr/llvm_codegen.cpp b/torch/csrc/jit/tensorexpr/llvm_codegen.cpp
index b25e4d6..e1903d8 100644
--- a/torch/csrc/jit/tensorexpr/llvm_codegen.cpp
+++ b/torch/csrc/jit/tensorexpr/llvm_codegen.cpp
@@ -550,10 +550,13 @@
 
   optimize(*module_);
 
-  // print graph debug info after optimization
   asmBuffer.set_size(0);
   module_->print(asmStream, nullptr);
   llvmCode = asmStream.str().str();
+  GRAPH_DEBUG(
+      "\nLLVM module after optimizations\n\n", asmStream.str().str(), "\n");
+
+  // print graph debug info after optimization
   asmBuffer.set_size(0);
   llvm::legacy::PassManager PM;
   jit_->getTargetMachine().addPassesToEmitFile(
@@ -568,8 +571,7 @@
   PM.run(*module_);
   asmCode = asmStream.str().str();
 
-  GRAPH_DEBUG(
-      "\nLLVM module after optimizations\n\n", llvmCode, "\n", asmCode, "\n");
+  GRAPH_DEBUG("\nLLVM generated assembly code\n\n", asmCode, "\n");
 }
 
 // TODO: The binary ops are copypasta.