[ROCm] Workaround for a LLVM crash when doing codegen for MLIR generated Cast kernel

When you enable (uncomment) the `gen_kernel_library` rule for `cast` in `tensorflow/core/kernels/mlir_generated/BUILD`, the build would fail on the ROCm platform with the following error

```
LLVM ERROR: Cannot select: 0x56134e3c5b10: i1 = fp_to_sint 0x56134d1f53d8
  0x56134d1f53d8: f16 = bitcast 0x56134e3c58a0
    0x56134e3c58a0: i16,ch = load<(load 2 from %ir.lsr.iv)> 0x56134eaa3788, 0x56134e3c5698, undef:i64
      0x56134e3c5698: i64,ch = CopyFromReg 0x56134eaa3788, Register:i64 %16
        0x56134d41d620: i64 = Register %16
      0x56134d1f5850: i64 = undef
In function: Cast_f16_i1_kernel
TensorFlow crashed, please file a bug on https://github.com/tensorflow/tensorflow/issues with the trace below.
Stack dump:
0.	Program arguments: bazel-out/host/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel --unroll_factors=4 --tile_sizes=256 --arch=gfx803,gfx900,gfx906,gfx908 --input=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/cast_f16_i1.mlir --output=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/cast_f16_i1_kernel_generator_kernel.o --enable_ftz=False
1.	2.	Running pass 'CallGraph Pass Manager' on module 'acme'.
3.	Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function '@Cast_f16_i1_kernel'
...
...
```

Jack (@whchung) identified the root cause of this crash as lack of support for FPTOSI (fp16 to i1) instruction in the AMDGPU LLVM backend. The correct fix for this bug, is to add support for the same in the AMDGPU LLVM backend. Jack is in the process of upstreaming that fix to the LLVM repo.

In the meantime (i.e. until the TF LLVM commit pointer is updated to point to a commit that includes Jack's fix), we need to workaround this on the TF side, by adding a pass that converts the `fptosi f16 to i1` op to `fptosi f16 to i16` + `trunci i16 to i1`, which is what this commit does.
1 file changed
tree: e20307f8df175b013b52c05dbf3e7fc16069a4e5
  1. .github/
  2. tensorflow/
  3. third_party/
  4. tools/
  5. .bazelrc
  6. .bazelversion
  7. .gitignore
  8. ACKNOWLEDGMENTS
  9. arm_compiler.BUILD
  10. AUTHORS
  11. BUILD
  12. CODE_OF_CONDUCT.md
  13. CODEOWNERS
  14. configure
  15. configure.cmd
  16. configure.py
  17. CONTRIBUTING.md
  18. ISSUE_TEMPLATE.md
  19. LICENSE
  20. models.BUILD
  21. README.md
  22. RELEASE.md
  23. SECURITY.md
  24. WORKSPACE
README.md

Python PyPI

Documentation
Documentation

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

Fuzzing Status CII Best Practices Contributor Covenant

Continuous build status

Official Builds

Build TypeStatusArtifacts
Linux CPUStatusPyPI
Linux GPUStatusPyPI
Linux XLAStatusTBA
macOSStatusPyPI
Windows CPUStatusPyPI
Windows GPUStatusPyPI
AndroidStatusDownload
Raspberry Pi 0 and 1StatusPy3
Raspberry Pi 2 and 3StatusPy3
Libtensorflow MacOS CPUStatusNightly GCS Official GCS
Libtensorflow Linux CPUStatusNightly GCS Official GCS
Libtensorflow Linux GPUStatusNightly GCS Official GCS
Libtensorflow Windows CPUStatusNightly GCS Official GCS
Libtensorflow Windows GPUStatusNightly GCS Official GCS

Community Supported Builds

Build TypeStatusArtifacts
Linux AMD ROCm GPU NightlyBuild StatusNightly
Linux AMD ROCm GPU Stable ReleaseBuild StatusRelease 1.15 / 2.x
Linux s390x NightlyBuild StatusNightly
Linux s390x CPU Stable ReleaseBuild StatusRelease
Linux ppc64le CPU NightlyBuild StatusNightly
Linux ppc64le CPU Stable ReleaseBuild StatusRelease 1.15 / 2.x
Linux ppc64le GPU NightlyBuild StatusNightly
Linux ppc64le GPU Stable ReleaseBuild StatusRelease 1.15 / 2.x
Linux aarch64 CPU Nightly (Linaro)Build StatusNightly
Linux aarch64 CPU Stable Release (Linaro)Build StatusRelease 1.x & 2.x
Linux aarch64 CPU Nightly (OpenLab)
Python 3.6
Build StatusNightly
Linux aarch64 CPU Stable Release (OpenLab)Build StatusRelease 1.15 / 2.x
Linux CPU with Intel oneAPI Deep Neural Network Library (oneDNN) NightlyBuild StatusNightly
Linux CPU with Intel oneAPI Deep Neural Network Library (oneDNN) Stable ReleaseBuild StatusRelease 1.15 / 2.x
Red Hat® Enterprise Linux® 7.6 CPU & GPU
Python 2.7, 3.6
Build Status1.13.1 PyPI

Community Supported Containers

Container TypeStatusArtifacts
TensorFlow aarch64 Neoverse-N1 CPU Stable (Linaro)
Debian
StaticRelease 2.3

Resources

Learn more about the TensorFlow community and how to contribute.

License

Apache License 2.0