Created docs (and example) for cudart function in torch.cuda (#128741) Fixes #127908 ## Description Created docs to document the torch.cuda.cudart function to solve the issue #127908. I tried to stick to the [guidelines to document a function](https://github.com/pytorch/pytorch/wiki/Docstring-Guidelines#documenting-a-function) but I was not sure if there is a consensus on how to handle the docs of a function that calls an internal function. So I went ahead and tried what the function will raise, etc. from the user endpoint and documented it (i.e. I am giving what actually _lazy_init() will raise). Updated PR from #128298 since I made quite a big mistake in my branch. I apologize for the newbie mistake. ### Summary of Changes - Added docs for torch.cuda.cudart - Added the cudart function in the autosummary of docs/source/cuda.rst ## Checklist - [X] The issue that is being fixed is referred in the description - [X] Only one issue is addressed in this pull request - [X] Labels from the issue that this PR is fixing are added to this pull request - [X] No unnecesary issues are included into this pull request Pull Request resolved: https://github.com/pytorch/pytorch/pull/128741 Approved by: https://github.com/msaroufim

commit: c6b180a3166220ca7e505b891c79a67f53c23dce [log] [tgz]
author: ibartol <ignaciobartol@hotmail.com> Mon Jun 17 16:50:37 2024 +0000
committer: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Mon Jun 17 16:50:37 2024 +0000
tree: 8c559dd1496aadf5c0141ab46d8493b5f57622b8
parent: fc2913fb808dd67667c4c57d01983a4dccec0f66 [diff]
diff --git a/docs/source/cuda.rst b/docs/source/cuda.rst
index 7b9bf53..7f6f2d2 100644
--- a/docs/source/cuda.rst
+++ b/docs/source/cuda.rst

@@ -12,6 +12,7 @@
     current_blas_handle
     current_device
     current_stream
+    cudart
     default_stream
     device
     device_count

diff --git a/torch/cuda/__init__.py b/torch/cuda/__init__.py
index 6722114..e08572f 100644
--- a/torch/cuda/__init__.py
+++ b/torch/cuda/__init__.py

@@ -334,6 +334,59 @@
 
 
 def cudart():
+    r"""Retrieves the CUDA runtime API module.
+
+
+    This function initializes the CUDA runtime environment if it is not already
+    initialized and returns the CUDA runtime API module (_cudart). The CUDA
+    runtime API module provides access to various CUDA runtime functions.
+
+    Args:
+        ``None``
+
+    Returns:
+        module: The CUDA runtime API module (_cudart).
+
+    Raises:
+        RuntimeError: If CUDA cannot be re-initialized in a forked subprocess.
+        AssertionError: If PyTorch is not compiled with CUDA support or if libcudart functions are unavailable.
+
+    Example of CUDA operations with profiling:
+        >>> import torch
+        >>> from torch.cuda import cudart, check_error
+        >>> import os
+        >>>
+        >>> os.environ['CUDA_PROFILE'] = '1'
+        >>>
+        >>> def perform_cuda_operations_with_streams():
+        >>>     stream = torch.cuda.Stream()
+        >>>     with torch.cuda.stream(stream):
+        >>>         x = torch.randn(100, 100, device='cuda')
+        >>>         y = torch.randn(100, 100, device='cuda')
+        >>>         z = torch.mul(x, y)
+        >>>     return z
+        >>>
+        >>> torch.cuda.synchronize()
+        >>> print("====== Start nsys profiling ======")
+        >>> check_error(cudart().cudaProfilerStart())
+        >>> with torch.autograd.profiler.emit_nvtx():
+        >>>     result = perform_cuda_operations_with_streams()
+        >>>     print("CUDA operations completed.")
+        >>> check_error(torch.cuda.cudart().cudaProfilerStop())
+        >>> print("====== End nsys profiling ======")
+
+    To run this example and save the profiling information, execute:
+        >>> $ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py
+
+    This command profiles the CUDA operations in the provided script and saves
+    the profiling information to a file named `trace_name.prof`.
+    The `--profile-from-start off` option ensures that profiling starts only
+    after the `cudaProfilerStart` call in the script.
+    The `--csv` and `--print-summary` options format the profiling output as a
+    CSV file and print a summary, respectively.
+    The `-o` option specifies the output file name, and the `-f` option forces the
+    overwrite of the output file if it already exists.
+    """
     _lazy_init()
     return _cudart
commit	c6b180a3166220ca7e505b891c79a67f53c23dce	[log] [tgz]
author	ibartol <ignaciobartol@hotmail.com>	Mon Jun 17 16:50:37 2024 +0000
committer	PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>	Mon Jun 17 16:50:37 2024 +0000
tree	8c559dd1496aadf5c0141ab46d8493b5f57622b8
parent	fc2913fb808dd67667c4c57d01983a4dccec0f66 [diff]