Small doc update for torch_compile_debug (#95809)
Updates the troubleshooting documentation with the folder structure of the debug directory
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95809
Approved by: https://github.com/msaroufim
diff --git a/docs/source/dynamo/troubleshooting.rst b/docs/source/dynamo/troubleshooting.rst
index 36cfb39..832e961 100644
--- a/docs/source/dynamo/troubleshooting.rst
+++ b/docs/source/dynamo/troubleshooting.rst
@@ -447,33 +447,89 @@
Torch._Dynamo. By default, this returns a string representation of the
compile times spent in each TorchDynamo function by name.
-TorchInductor Debug Tracing
----------------------------
+TorchInductor Debugging using TORCH_COMPILE_DEBUG
+-------------------------------------------------
TorchInductor has a builtin stats and trace function for displaying time
spent in each compilation phase, output code, output graph visualization
and IR dump. This is a debugging tool designed to make it easier to
understand and troubleshoot the internals of TorchInductor.
-Setting the environment variable ``TORCH_COMPILE_DEBUG=1`` will cause a
-debug trace directory to be created and printed:
+Let's run an example with the following test program (repro.py):
::
- $ env TORCH_COMPILE_DEBUG=1 python repro.py
- torch._inductor.debug: [WARNING] model_forward_0 debug trace: /tmp/torchinductor_jansel/rh/crhwqgmbqtchqt3v3wdeeszjb352m4vbjbvdovaaeqpzi7tdjxqr.debug
+ import torch
-Here is an `example debug directory
-output <https://gist.github.com/jansel/f4af078791ad681a0d4094adeb844396>`__
+ @torch.compile()
+ def test_model(x):
+ model = torch.nn.Sequential(
+ torch.nn.Linear(10, 10),
+ torch.nn.LayerNorm(10),
+ torch.nn.ReLU(),
+ )
+ return model(x)
+
+
+ y = test_model(torch.ones(10, 10))
+
+Setting the environment variable ``TORCH_COMPILE_DEBUG=1`` will cause a
+debug trace directory to be created, by default this directory will be in the current directory and named torch_compile_debug
+(this can be overridden in the torchdynamo configuration field ``debug_dir_root`` and also the env var TORCH_COMPILE_DEBUG_DIR).
+Inside this directory, each run will have a separate folder named with the timestamp and process id of the run:
+::
+
+ $ env TORCH_COMPILE_DEBUG=1 python repro.py
+ $ cd torch_compile_debug
+ $ ls
+ run_2023_03_01_08_20_52_143510-pid_180167
+
+In the run folder there will be a torchdynamo directory which contains debug logs, and an aot_torchinductor
+folder which contains a subfolder for each compiled kernel with inductor debug artifacts.
+
+::
+
+ $ cd
+ run_2023_03_01_08_20_52_143510-pid_180167
+ $ ls
+ aot_torchinductor torchdynamo
+
+Moving further into the aot_torchinductor directory, the \*.log files are logs from the aot autograd phase of compilation, model__0_forward_1.0 contains the inductor debug artifacts.
+
+::
+
+ $ cd aot_torchinductor
+ $ ls
+ aot_model___0_debug.log model__0_forward_1.0
+ $ cd model__0_forward_1.0
+ $ ls
+ debug.log fx_graph_readable.py fx_graph_runnable.py fx_graph_transformed.py ir_post_fusion.txt ir_pre_fusion.txt output_code.py
+
+Here is a summary of the contents:
+ - fx_graph_readable.py and fx_graph_runnable.py are the readable and runnable versions of the fx_graph received by inductor.
+ - fx_graph_transformed.py is the fx graph after inductor has run all fx passes.
+ - ir\*.txt is the inductor ir pre and post fusion.
+ - output_code.py is the compiled triton kernel for the subgraph.
+
+Here are `example debug directory contents
+<https://gist.github.com/jansel/f4af078791ad681a0d4094adeb844396>`__
for the test program:
::
- torch.nn.Sequential(
- torch.nn.Linear(10, 10),
- torch.nn.LayerNorm(10),
- torch.nn.ReLU(),
- )
+ import torch
+
+ @torch.compile()
+ def test_model(x):
+ model = torch.nn.Sequential(
+ torch.nn.Linear(10, 10),
+ torch.nn.LayerNorm(10),
+ torch.nn.ReLU(),
+ )
+ return model(x)
+
+
+ y = test_model(torch.ones(10, 10))
Each file in that debug trace can be enabled and disabled through
``torch._inductor.config.trace.*``. The profile and the diagram are both