Created docs (and example) for cudart function in torch.cuda (#128741)
Fixes #127908
## Description
Created docs to document the torch.cuda.cudart function to solve the issue #127908.
I tried to stick to the [guidelines to document a function](https://github.com/pytorch/pytorch/wiki/Docstring-Guidelines#documenting-a-function) but I was not sure if there is a consensus on how to handle the docs of a function that calls an internal function. So I went ahead and tried what the function will raise, etc. from the user endpoint and documented it (i.e. I am giving what actually _lazy_init() will raise).
Updated PR from #128298 since I made quite a big mistake in my branch. I apologize for the newbie mistake.
### Summary of Changes
- Added docs for torch.cuda.cudart
- Added the cudart function in the autosummary of docs/source/cuda.rst
## Checklist
- [X] The issue that is being fixed is referred in the description
- [X] Only one issue is addressed in this pull request
- [X] Labels from the issue that this PR is fixing are added to this pull request
- [X] No unnecesary issues are included into this pull request
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128741
Approved by: https://github.com/msaroufim
diff --git a/docs/source/cuda.rst b/docs/source/cuda.rst
index 7b9bf53..7f6f2d2 100644
--- a/docs/source/cuda.rst
+++ b/docs/source/cuda.rst
@@ -12,6 +12,7 @@
current_blas_handle
current_device
current_stream
+ cudart
default_stream
device
device_count
diff --git a/torch/cuda/__init__.py b/torch/cuda/__init__.py
index 6722114..e08572f 100644
--- a/torch/cuda/__init__.py
+++ b/torch/cuda/__init__.py
@@ -334,6 +334,59 @@
def cudart():
+ r"""Retrieves the CUDA runtime API module.
+
+
+ This function initializes the CUDA runtime environment if it is not already
+ initialized and returns the CUDA runtime API module (_cudart). The CUDA
+ runtime API module provides access to various CUDA runtime functions.
+
+ Args:
+ ``None``
+
+ Returns:
+ module: The CUDA runtime API module (_cudart).
+
+ Raises:
+ RuntimeError: If CUDA cannot be re-initialized in a forked subprocess.
+ AssertionError: If PyTorch is not compiled with CUDA support or if libcudart functions are unavailable.
+
+ Example of CUDA operations with profiling:
+ >>> import torch
+ >>> from torch.cuda import cudart, check_error
+ >>> import os
+ >>>
+ >>> os.environ['CUDA_PROFILE'] = '1'
+ >>>
+ >>> def perform_cuda_operations_with_streams():
+ >>> stream = torch.cuda.Stream()
+ >>> with torch.cuda.stream(stream):
+ >>> x = torch.randn(100, 100, device='cuda')
+ >>> y = torch.randn(100, 100, device='cuda')
+ >>> z = torch.mul(x, y)
+ >>> return z
+ >>>
+ >>> torch.cuda.synchronize()
+ >>> print("====== Start nsys profiling ======")
+ >>> check_error(cudart().cudaProfilerStart())
+ >>> with torch.autograd.profiler.emit_nvtx():
+ >>> result = perform_cuda_operations_with_streams()
+ >>> print("CUDA operations completed.")
+ >>> check_error(torch.cuda.cudart().cudaProfilerStop())
+ >>> print("====== End nsys profiling ======")
+
+ To run this example and save the profiling information, execute:
+ >>> $ nvprof --profile-from-start off --csv --print-summary -o trace_name.prof -f -- python cudart_test.py
+
+ This command profiles the CUDA operations in the provided script and saves
+ the profiling information to a file named `trace_name.prof`.
+ The `--profile-from-start off` option ensures that profiling starts only
+ after the `cudaProfilerStart` call in the script.
+ The `--csv` and `--print-summary` options format the profiling output as a
+ CSV file and print a summary, respectively.
+ The `-o` option specifies the output file name, and the `-f` option forces the
+ overwrite of the output file if it already exists.
+ """
_lazy_init()
return _cudart