commit | f9c651943d442f98d4c7c36839bde70f27a9a135 | [log] [tgz] |
---|---|---|
author | Ben Barsdell <benbarsdell@gmail.com> | Wed Jun 09 23:17:13 2021 -0700 |
committer | TensorFlower Gardener <gardener@tensorflow.org> | Wed Jun 09 23:25:49 2021 -0700 |
tree | 2acd0cc131a34249e41fee8d02546e1ccbc9057e | |
parent | 8f055c74296dc9bfc6b3d3085541a037b4453ec3 [diff] |
PR #49104: Add GPU implem of sparse segment reduction ops [resubmission] Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49104 This is a resubmission of https://github.com/tensorflow/tensorflow/pull/47974 that includes an additional commit to (attempt to) workaround errors in the Windows CI build. cc @nluehr @sanjoy Copybara import of the project: -- 2fb8e1d5f0161a23fb733c4223e503709b5da59b by Ben Barsdell <bbarsdell@nvidia.com>: Add GpuSegmentedReduce helper function and test -- 9da103adcb02b1e8c71227e78adc440676c2eca9 by Ben Barsdell <bbarsdell@nvidia.com>: Add constructors and operators to AlignedVector -- 2fca9d09177655b5a9accde6564f7555a9417545 by Ben Barsdell <bbarsdell@nvidia.com>: Add GPU implem of sparse segment reduction ops - This implementation is deterministic, vectorized, and generally faster than an atomic-based implementation like the existing segment reduction kernels (it can in fact replace them in a future commit). -- c148fbc349e88075a7d2bd0ad23477603f1b99cf by Ben Barsdell <bbarsdell@nvidia.com>: Fix minor CI issues in PR 47974 - Reformat long line in Python test file. - Add GPU compilation guard around GPU class specialization. -- b56a1318b6d89d337cb253cbaecf6321089664af by Ben Barsdell <bbarsdell@nvidia.com>: Attempt to fix Windows build error in PR 47974 - Pass vector types to function by reference instead of by value. - Error was: "error C2719: formal parameter with requested alignment of 128 won't be aligned". -- 00ef95411f0ec9e92b9302588a58bd6de1de745f by Ben Barsdell <bbarsdell@nvidia.com>: Further attempt to fix CI build errors in PR 47974 - Avoids implicit conversion from Tinit to Tvec in kernel argument (the ROCm build complains about this). - Avoids passing an aligned vector type by value to a kernel (the Windows build doesn't like this). -- 355e18ac3c03fa591464739307e7ffb56011e5a1 by Ben Barsdell <bbarsdell@nvidia.com>: Change SetToValue to support explicit conversion - This avoids the need for SetToValueVectorized. -- 4db8bcf7ab03b40a9b8222f592c658fd6469eb91 by Ben Barsdell <bbarsdell@nvidia.com>: Fix SparseSegmentReduction num_segments - Handles int32 or int64 num_segments. - Changes variable names from output_rows to last_segment_id. -- d00c25525c73de3a22cefd57aeb0cff2ef235614 by Ben Barsdell <bbarsdell@nvidia.com>: Apply GPU kernel suggestions from PR 47974 - Renames `smem` to `shared_memory`. - Renames `beg` to `begin`. - Renames `yy` to `y_offset`. - Pulls GPU block reduction out into a helper device function. -- de97fe3fb11456f7bade0c1b3e9a510373e41ff6 by Ben Barsdell <bbarsdell@nvidia.com>: Add __host__ __device__ to AlignedVector operator[] -- 336f2267322fe779e53d2d61ef13e8779c13e370 by Ben Barsdell <bbarsdell@nvidia.com>: Address minor review comments from PR 47974 - Add comment noting that function is deterministic. - Rename `shared_memory` to `shared_partial_reduction`. -- 0d6eb654b874a4a1b94e78ecfc120739f72540a2 by Ben Barsdell <bbarsdell@nvidia.com>: Address more minor review comments from PR 47974 - Add parens for clarity. - Add comment re Tvec vs. Treducevec. - Add comment re the size of the input_vec array. - Remove duplicate early-exit case. - Remove unneeded "Crash OK" comments. -- 6143217de83b532de2625448ab6bf37fbba734b5 by Ben Barsdell <bbarsdell@nvidia.com>: Refactor and tune SegmentReduceGPU special case - Pulls the special ninner==1 implementation out into a separate helper function. - Adds a heuristic threshold for deciding when to use the special case. This cub-based implementation can be much faster for large reductions, but can also be slower for small reductions. This is only a first pass at tuning the implementation; a more detailed performance study may be needed in future (e.g., over more of the parameter space, different GPUs etc.), particularly if motivated by important applications. -- b4d2db3c40b1071ae9d1954671abdedd0967a595 by Ben Barsdell <bbarsdell@nvidia.com>: Disable SparseSegmentReduction kernels on Windows - Workaround for a Windows CI build error. COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/tensorflow/pull/49104 from benbarsdell:gpu-SparseSegmentReductions-new b4d2db3c40b1071ae9d1954671abdedd0967a595 PiperOrigin-RevId: 378582882 Change-Id: I24febef16f7633362f8964e9bc6e1064ede56cc6
Documentation |
---|
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.
TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.
Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.
See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.
To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):
$ pip install tensorflow
A smaller CPU-only package is also available:
$ pip install tensorflow-cpu
To update TensorFlow to the latest version, add --upgrade
flag to the above commands.
Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.
$ python
>>> import tensorflow as tf >>> tf.add(1, 2).numpy() 3 >>> hello = tf.constant('Hello, TensorFlow!') >>> hello.numpy() b'Hello, TensorFlow!'
For more examples, see the TensorFlow tutorials.
If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.
We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.
The TensorFlow project strives to abide by generally accepted best practices in open-source software development:
You can find more community-supported platforms and configurations in the TensorFlow SIG Build community builds table.
Build Type | Status | Artifacts |
---|---|---|
Linux CPU | PyPI | |
Linux GPU | PyPI | |
Linux XLA | TBA | |
macOS | PyPI | |
Windows CPU | PyPI | |
Windows GPU | PyPI | |
Android | ||
Raspberry Pi 0 and 1 | Py3 | |
Raspberry Pi 2 and 3 | Py3 | |
Libtensorflow MacOS CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Linux CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Linux GPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Windows CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Windows GPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Learn more about the TensorFlow community and how to contribute.