aco: improve get_wait_states()

pipeline-db (Tahiti):
Totals from affected shaders:
SGPRS: 21208 -> 21208 (0.00 %)
VGPRS: 22388 -> 22388 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 3278596 -> 3277004 (-0.05 %) bytes
LDS: 19 -> 19 (0.00 %) blocks
Max Waves: 238 -> 238 (0.00 %)

pipeline-db (Polaris):
Totals from affected shaders:
SGPRS: 64 -> 64 (0.00 %)
VGPRS: 96 -> 96 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5200 -> 5192 (-0.15 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 10 -> 10 (0.00 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 0 -> 0 (0.00 %)
VGPRS: 0 -> 0 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 0 -> 0 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4004>
diff --git a/src/amd/compiler/aco_insert_NOPs.cpp b/src/amd/compiler/aco_insert_NOPs.cpp
index 7c6e100..9c5b1c8 100644
--- a/src/amd/compiler/aco_insert_NOPs.cpp
+++ b/src/amd/compiler/aco_insert_NOPs.cpp
@@ -179,7 +179,12 @@
 
 int get_wait_states(aco_ptr<Instruction>& instr)
 {
-   return 1;
+   if (instr->opcode == aco_opcode::s_nop)
+      return static_cast<SOPP_instruction*>(instr.get())->imm + 1;
+   else if (instr->opcode == aco_opcode::p_constaddr)
+      return 3; /* lowered to 3 instructions in the assembler */
+   else
+      return 1;
 }
 
 bool regs_intersect(PhysReg a_reg, unsigned a_size, PhysReg b_reg, unsigned b_size)