[XLA:GPU] Decompose bitcast inside fusion into bitcast+transpose.

Within XLA:GPU fusions, bitcast ops can be either fast (ReshapeIsBitcast is
true) or slow (ReshapeIsBitcast is false).

Previously we tried to avoid creating slow ones by telling algsimp to avoid
translating transpose ops to bitcasts (which generally do not have
ReshapeIsBitcast true).

The problem with this is that if you have a transpose+bitcast that *doesn't*
end up as part of a larger fusion, this approach ends up creating a fusion just
for the transpose+bitcast (tantamount to a memcpy).  Whereas if we *had*
converted the transpose to a bitcast, then we'd have had bitcast+bitcast =>
bitcast, and, if that bitcast is outside of a fusion node, it's free.

This patch takes us in a different direction.  Now we always transform
transpose to bitcast where possible.  But then right before codegen, we find
all the bitcast ops inside fusion nodes and split them up into
transpose+reshape-is-bitcast.

We prove empirically that we can *always* split a bitcast into
transpose+reshape-is-bitcast, so this shouldn't make any fusions slower than
they were.

We leave removing the don't-rewrite-transposes-into-bitcasts feature in algsimp
for a later patch, after this one sticks.

PiperOrigin-RevId: 452191666
10 files changed
tree: 79220b389b018bae9c438686aca530357f2d6392
  1. .github/
  2. tensorflow/
  3. third_party/
  4. tools/
  5. .bazelrc
  6. .bazelversion
  7. .clang-format
  8. .gitignore
  9. .zenodo.json
  10. arm_compiler.BUILD
  11. AUTHORS
  12. BUILD
  13. CITATION.cff
  14. CODE_OF_CONDUCT.md
  15. CODEOWNERS
  16. configure
  17. configure.cmd
  18. configure.py
  19. CONTRIBUTING.md
  20. ISSUE_TEMPLATE.md
  21. ISSUES.md
  22. LICENSE
  23. models.BUILD
  24. README.md
  25. RELEASE.md
  26. SECURITY.md
  27. WORKSPACE
README.md

Python PyPI DOI

Documentation
Documentation

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

Fuzzing Status CII Best Practices Contributor Covenant

Continuous build status

You can find more community-supported platforms and configurations in the TensorFlow SIG Build community builds table.

Official Builds

Build TypeStatusArtifacts
Linux CPUStatusPyPI
Linux GPUStatusPyPI
Linux XLAStatusTBA
macOSStatusPyPI
Windows CPUStatusPyPI
Windows GPUStatusPyPI
AndroidStatusDownload
Raspberry Pi 0 and 1StatusPy3
Raspberry Pi 2 and 3StatusPy3
Libtensorflow MacOS CPUStatus Temporarily UnavailableNightly Binary Official GCS
Libtensorflow Linux CPUStatus Temporarily UnavailableNightly Binary Official GCS
Libtensorflow Linux GPUStatus Temporarily UnavailableNightly Binary Official GCS
Libtensorflow Windows CPUStatus Temporarily UnavailableNightly Binary Official GCS
Libtensorflow Windows GPUStatus Temporarily UnavailableNightly Binary Official GCS

Resources

Learn more about the TensorFlow community and how to contribute.

License

Apache License 2.0