f9c651943d4 - platform/external/tensorflow

commit	f9c651943d442f98d4c7c36839bde70f27a9a135	[log] [tgz]
author	Ben Barsdell <benbarsdell@gmail.com>	Wed Jun 09 23:17:13 2021 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	Wed Jun 09 23:25:49 2021 -0700
tree	2acd0cc131a34249e41fee8d02546e1ccbc9057e
parent	8f055c74296dc9bfc6b3d3085541a037b4453ec3 [diff]

PR #49104: Add GPU implem of sparse segment reduction ops [resubmission]

Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49104

This is a resubmission of https://github.com/tensorflow/tensorflow/pull/47974 that includes an additional commit to (attempt to) workaround errors in the Windows CI build.

cc @nluehr @sanjoy
Copybara import of the project:

--
2fb8e1d5f0161a23fb733c4223e503709b5da59b by Ben Barsdell <bbarsdell@nvidia.com>:

Add GpuSegmentedReduce helper function and test

--
9da103adcb02b1e8c71227e78adc440676c2eca9 by Ben Barsdell <bbarsdell@nvidia.com>:

Add constructors and operators to AlignedVector

--
2fca9d09177655b5a9accde6564f7555a9417545 by Ben Barsdell <bbarsdell@nvidia.com>:

Add GPU implem of sparse segment reduction ops

- This implementation is deterministic, vectorized, and generally faster
  than an atomic-based implementation like the existing segment
  reduction kernels (it can in fact replace them in a future commit).

--
c148fbc349e88075a7d2bd0ad23477603f1b99cf by Ben Barsdell <bbarsdell@nvidia.com>:

Fix minor CI issues in PR 47974

- Reformat long line in Python test file.
- Add GPU compilation guard around GPU class specialization.

--
b56a1318b6d89d337cb253cbaecf6321089664af by Ben Barsdell <bbarsdell@nvidia.com>:

Attempt to fix Windows build error in PR 47974

- Pass vector types to function by reference instead of by value.
- Error was: "error C2719: formal parameter with requested alignment of
  128 won't be aligned".

--
00ef95411f0ec9e92b9302588a58bd6de1de745f by Ben Barsdell <bbarsdell@nvidia.com>:

Further attempt to fix CI build errors in PR 47974

- Avoids implicit conversion from Tinit to Tvec in kernel argument
  (the ROCm build complains about this).
- Avoids passing an aligned vector type by value to a kernel (the
  Windows build doesn't like this).

--
355e18ac3c03fa591464739307e7ffb56011e5a1 by Ben Barsdell <bbarsdell@nvidia.com>:

Change SetToValue to support explicit conversion

- This avoids the need for SetToValueVectorized.

--
4db8bcf7ab03b40a9b8222f592c658fd6469eb91 by Ben Barsdell <bbarsdell@nvidia.com>:

Fix SparseSegmentReduction num_segments

- Handles int32 or int64 num_segments.
- Changes variable names from output_rows to last_segment_id.

--
d00c25525c73de3a22cefd57aeb0cff2ef235614 by Ben Barsdell <bbarsdell@nvidia.com>:

Apply GPU kernel suggestions from PR 47974

- Renames `smem` to `shared_memory`.
- Renames `beg` to `begin`.
- Renames `yy` to `y_offset`.
- Pulls GPU block reduction out into a helper device function.

--
de97fe3fb11456f7bade0c1b3e9a510373e41ff6 by Ben Barsdell <bbarsdell@nvidia.com>:

Add __host__ __device__ to AlignedVector operator[]

--
336f2267322fe779e53d2d61ef13e8779c13e370 by Ben Barsdell <bbarsdell@nvidia.com>:

Address minor review comments from PR 47974

- Add comment noting that function is deterministic.
- Rename `shared_memory` to `shared_partial_reduction`.

--
0d6eb654b874a4a1b94e78ecfc120739f72540a2 by Ben Barsdell <bbarsdell@nvidia.com>:

Address more minor review comments from PR 47974

- Add parens for clarity.
- Add comment re Tvec vs. Treducevec.
- Add comment re the size of the input_vec array.
- Remove duplicate early-exit case.
- Remove unneeded "Crash OK" comments.

--
6143217de83b532de2625448ab6bf37fbba734b5 by Ben Barsdell <bbarsdell@nvidia.com>:

Refactor and tune SegmentReduceGPU special case

- Pulls the special ninner==1 implementation out into a separate helper
  function.
- Adds a heuristic threshold for deciding when to use the special case.
  This cub-based implementation can be much faster for large reductions,
  but can also be slower for small reductions. This is only a first pass
  at tuning the implementation; a more detailed performance study may be
  needed in future (e.g., over more of the parameter space, different
  GPUs etc.), particularly if motivated by important applications.

--
b4d2db3c40b1071ae9d1954671abdedd0967a595 by Ben Barsdell <bbarsdell@nvidia.com>:

Disable SparseSegmentReduction kernels on Windows

- Workaround for a Windows CI build error.

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/tensorflow/pull/49104 from benbarsdell:gpu-SparseSegmentReductions-new b4d2db3c40b1071ae9d1954671abdedd0967a595
PiperOrigin-RevId: 378582882
Change-Id: I24febef16f7633362f8964e9bc6e1064ede56cc6

11 files changed

tree: 2acd0cc131a34249e41fee8d02546e1ccbc9057e

README.md

`Documentation`

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python

>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

Continuous build status

You can find more community-supported platforms and configurations in the TensorFlow SIG Build community builds table.

Official Builds

Build Type	Status	Artifacts
Linux CPU		PyPI
Linux GPU		PyPI
Linux XLA		TBA
macOS		PyPI
Windows CPU		PyPI
Windows GPU		PyPI
Android
Raspberry Pi 0 and 1		Py3
Raspberry Pi 2 and 3		Py3
Libtensorflow MacOS CPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Linux CPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Linux GPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Windows CPU	Status Temporarily Unavailable	Nightly Binary Official GCS
Libtensorflow Windows GPU	Status Temporarily Unavailable	Nightly Binary Official GCS

Resources

Learn more about the TensorFlow community and how to contribute.

License

Apache License 2.0