PR #36267: [ROCm] Reverting ROCm to use MIOpen Find Mode APIs (be default) for convolution

Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/36267

This PR reverts ROCm to use MIOpen Find Mode APIs (be default) for convolution.  The use of MIOpen Immediate Mode API (instead of the Find Mode APIs) can be specified by the setting the env var `TF_ROCM_USE_IMMEDIATE_MODE=1`.

Almost all of the changes in this PR are within code that is specific to the ROCm platform, so this PR should not have any impact on non ROCm builds.

----------------

/cc @chsigg @whchung
Copybara import of the project:

--
5675e37e5f9b595dab45f44239cbfab222e9dcc2 by Deven Desai <deven.desai.amd@gmail.com>:

Renaming MIMIC_FIND_MODE to RETURN_BEST_ALGO_ONLY. This is being done as preparation for the implementation to re-insert calls to Find Mode API. MIMIC_FIND_MODE was a poor name for what it was doing, and would have resulted in confusion once Find Mode APIs are re-inserted. This commit also simplifies the implementation associated with RETURN_BEST_ALGO_ONLY

--
5fe0ad377dc7e333acf8aac91e3333781242fe5c by Deven Desai <deven.desai.amd@gmail.com>:

changes to fix compile time warnings in rocm_dnn.cc

--
e3dcc169353646c4b5e684b7398cf1db743079cb by Deven Desai <deven.desai.amd@gmail.com>:

Making the implementation of the Conv3D Gradient kernels consistent with the implementations of all the other Conv2D/3D kernels

--
4d4a5cede3b6e959fcc06fdb6211e4c9ef5343f5 by Deven Desai <deven.desai.amd@gmail.com>:

Updating the convolution kernel implementation(s) to ensure that the AlgorithmConfig::scratch_size field is always populated correctly before it is passed as an argument to the ThenConvolve* routine(s)

--
64ffda476af322ad804d1f5b8d7a05719e2f183c by Deven Desai <deven.desai.amd@gmail.com>:

Using the workspace memory size from the AlgorithmConfig argument, instead of calling an MIOpen API to determine it (during the call the DoPrepareForConvolution)

--
d42a76e177a26124e966c83cbbb809dbdbdcabbe by Deven Desai <deven.desai.amd@gmail.com>:

Updating the ROCm XLA Convolution Algorithm Picker implementation, to use the scratch_size that was returned in the prior call to GetMIOpenAlgorithms. Note that the code to save the scratch_size information in the new custom-call instruction (once the best conv algorithm has been determined) already exists, this commit does not change that part at all. This commit modifies how the scratch_size is determined for RunGpuConv calls that happen during the call to determine the best algorithm for a given convolution

--
416aeccbfc430c71b27cbe04a57dcd1577b34fae by Deven Desai <deven.desai.amd@gmail.com>:

Changes for the TF Convolution Kernel implementation and the Stream Executor DNN layer/api to accomodate support for Find Mode. Putting in empty placeholders in places where the Find Mode implementation will live

--
253664ce7ee59bb2ffbc2b4b3fe94963e54837c1 by Deven Desai <deven.desai.amd@gmail.com>:

Re-inserting the Find Mode Implementation. It is still disabled by default

--
30debc7b11afdbc1651c860b65cdd2fba1b9ba50 by Deven Desai <deven.desai.amd@gmail.com>:

Switching the default to Find Mode

--
b0b670e6ee2eaa6823618d4aa8858846a4cbbd89 by Deven Desai <deven.desai.amd@gmail.com>:

Disabling a subtest that fails because of bug in MIOpen Find Mode. MLOpen Issue #2379 has been filed to trach the bug.

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/tensorflow/pull/36267 from ROCmSoftwarePlatform:google_upstream_rocm_miopen_find_mode b0b670e6ee2eaa6823618d4aa8858846a4cbbd89
PiperOrigin-RevId: 305424670
Change-Id: Ibd02cd2c43f88e619bd77e996614ded0d96d42d5
16 files changed
tree: e07a5614e258b09d7ebfb3ee9293eae1756c52a3
  1. .github/
  2. tensorflow/
  3. third_party/
  4. tools/
  5. .bazelrc
  6. .bazelversion
  7. .gitignore
  8. ACKNOWLEDGMENTS
  9. ADOPTERS.md
  10. arm_compiler.BUILD
  11. AUTHORS
  12. BUILD
  13. CODE_OF_CONDUCT.md
  14. CODEOWNERS
  15. configure
  16. configure.cmd
  17. configure.py
  18. CONTRIBUTING.md
  19. ISSUE_TEMPLATE.md
  20. ISSUES.md
  21. LICENSE
  22. models.BUILD
  23. README.md
  24. RELEASE.md
  25. SECURITY.md
  26. WORKSPACE
README.md
Documentation
Documentation

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

CII Best Practices Contributor Covenant

Continuous build status

Official Builds

Build TypeStatusArtifacts
Linux CPUStatusPyPI
Linux GPUStatusPyPI
Linux XLAStatusTBA
macOSStatusPyPI
Windows CPUStatusPyPI
Windows GPUStatusPyPI
AndroidStatusDownload
Raspberry Pi 0 and 1Status StatusPy2 Py3
Raspberry Pi 2 and 3Status StatusPy2 Py3

Community Supported Builds

Build TypeStatusArtifacts
Linux AMD ROCm GPU NightlyBuild StatusNightly
Linux AMD ROCm GPU Stable ReleaseBuild StatusRelease 1.15 / 2.x
Linux s390x NightlyBuild StatusNightly
Linux s390x CPU Stable ReleaseBuild StatusRelease
Linux ppc64le CPU NightlyBuild StatusNightly
Linux ppc64le CPU Stable ReleaseBuild StatusRelease 1.15 / 2.x
Linux ppc64le GPU NightlyBuild StatusNightly
Linux ppc64le GPU Stable ReleaseBuild StatusRelease 1.15 / 2.x
Linux CPU with Intel® MKL-DNN NightlyBuild StatusNightly
Linux CPU with Intel® MKL-DNN Stable ReleaseBuild StatusRelease 1.15 / 2.x
Red Hat® Enterprise Linux® 7.6 CPU & GPU
Python 2.7, 3.6
Build Status1.13.1 PyPI

Resources

Learn more about the TensorFlow community and how to contribute.

License

Apache License 2.0