[inductor] enable software pipelining on AMD devices (#125858) Summary: per-AMD, software pipelining is enabled by setting `num_stages=0`, and should provide a nice perf boost for GEMMs. caveat is that `num_stages=1` is preferred for instances of back-to-back GEMMs, but take `num_stages=0` as the better default. wait to land until triton upstream lands in OSS, pipelining does not work well on the fork Test Plan: n/a Reviewed By: xw285cornell, yoyoyocmu Differential Revision: D56221447 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125858 Approved by: https://github.com/pragupta, https://github.com/yoyoyocmu

commit: 1e47c7b11b312b47a621efd547f5c90081f0d9cb [log] [tgz]
author: Nicolas Macchioni <nmacchioni@meta.com> Mon May 13 22:36:23 2024 +0000
committer: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Mon May 13 22:36:23 2024 +0000
tree: a90c19e8b6021c191357f6122defa27fc8ab2ca3
parent: ec7f2b2626dfba1803372f6e4520e9f7479e558b [diff]
diff --git a/torch/_inductor/kernel/mm_common.py b/torch/_inductor/kernel/mm_common.py
index 5a7f60e5..26d0818 100644
--- a/torch/_inductor/kernel/mm_common.py
+++ b/torch/_inductor/kernel/mm_common.py

@@ -178,14 +178,14 @@
     if config["cond"]
 )
 
-# On ROCm convert num_stages to 1 as pipelining provides no benefit
+# On ROCm convert num_stages to 0 to enable software pipelining
 if torch.version.hip:
     mm_platform_configs = tuple(
-        (config[0], config[1], config[2], 1, config[4])
+        (config[0], config[1], config[2], 0, config[4])
         for config in mm_platform_configs
     )
     int8_platform_configs = tuple(
-        (config[0], config[1], config[2], 1, config[4])
+        (config[0], config[1], config[2], 0, config[4])
         for config in mm_platform_configs
     )
commit	1e47c7b11b312b47a621efd547f5c90081f0d9cb	[log] [tgz]
author	Nicolas Macchioni <nmacchioni@meta.com>	Mon May 13 22:36:23 2024 +0000
committer	PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>	Mon May 13 22:36:23 2024 +0000
tree	a90c19e8b6021c191357f6122defa27fc8ab2ca3
parent	ec7f2b2626dfba1803372f6e4520e9f7479e558b [diff]