3c4a89c0eba - platform/external/tensorflow

commit	3c4a89c0ebac05219a4532f4e6c647915fb4d4cd	[log] [tgz]
author	Deven Desai <deven.desai.amd@gmail.com>	Mon Jun 08 12:35:50 2020 +0000
committer	Deven Desai <deven.desai.amd@gmail.com>	Wed Jun 17 11:53:53 2020 +0000
tree	46671debf622dbaac18f5d49fbcf4d95cb41f24e
parent	950cffcd8deb881dcbfdf92f22c37eaa36f61e04 [diff]

[ROCm] Fix for XLA "scatter" op related unit test failures.

After the upstream commit 4de4c60972da38d09662842614ad4dcfd019a6be, the following unit-tests started failing on the ROCm platform

```
//tensorflow/python/keras/optimizer_v2:adam_test_gpu
//tensorflow/compiler/xla/tests:scatter_test_gpu
//tensorflow/compiler/tests:scatter_nd_op_test_gpu
```

The cause seems to be a change in the commit above that updates the LLVM version in use.

The LLVM version change (more specifically some AMDGPU backend change contained within the LLVM version change) either introduces an issue or lets manifest an existing issue, w.r.t alloca instructions outside of the entry basic block of a function. The AMDGPU backend seems to expect all alloca instructions to be inside the entry basic block. Having this assumption broken, leads to the regression failures we see above.

This PR/commit changes IR generation for "scatter" op to ensure that the alloca instruction gets emitted in the entry basic block of the function. This changesmakes the above unit tests pass again. This commit also updates other instances in XLA code where alloca instructions were getting added outside of the entry basic block of a function.

-----------------------------

Details on how to isolate the change the causes the `//tensorflow/python/keras/optimizer_v2:adam_test_gpu` testcase to fail

build TF using the commit a1ae008076e14f7e445abf2605759779d2a1fb8b (which is the parent commit of 4de4c60972da38d09662842614ad4dcfd019a6be), and the unit test...it should pass.

The commit 4de4c60972da38d09662842614ad4dcfd019a6be has several changes in it (in addition to the LLVM version change), so apply the following patch to pick up just the LLVM version change

```diff
root@prj47-rack-37:/root/tensorflow# git diff
diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index bf64405..15fd1f7 100755
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -655,8 +655,8 @@ def tf_repositories(path_prefix = "", tf_repo_name = ""):
     )

     # Check out LLVM and MLIR from llvm-project.
-    LLVM_COMMIT = "cf86a234ba86acf0bb875e21d27833be36e08be4"
-    LLVM_SHA256 = "5375bdcdabd4886ab86eddfddef6e21dbc3cac9df67af7d3c44fadb527f74e25"
+    LLVM_COMMIT = "b726d071b4aa46004228fc38ee5bfd167f999bfe"
+    LLVM_SHA256 = "d7e67036dc89906cb2f80df7b0b7de6344d86eddf6e98bb4d01a578242889a73"
     LLVM_URLS = [
         "https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/{commit}.tar.gz".format(commit = LLVM_COMMIT),
         "https://github.com/llvm/llvm-project/archive/{commit}.tar.gz".format(commit = LLVM_COMMIT),
diff --git a/third_party/mlir/BUILD b/third_party/mlir/BUILD
index df875eb..624f17e 100644
--- a/third_party/mlir/BUILD
+++ b/third_party/mlir/BUILD
@@ -1176,28 +1176,6 @@ cc_library(
     ],
 )

-cc_library(
-    name = "GPURuntimeTransforms",
-    srcs = [
-        "lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp",
-        "lib/Conversion/PassDetail.h",
-    ],
-    hdrs = [
-        "include/mlir/Conversion/GPUCommon/GPUCommonPass.h",
-    ],
-    includes = ["include"],
-    deps = [
-        ":ConversionPassIncGen",
-        ":GPUDialect",
-        ":IR",
-        ":LLVMDialect",
-        ":Pass",
-        ":Support",
-        "@llvm-project//llvm:core",
-        "@llvm-project//llvm:support",
-    ],
-)
-
 gentbl(
     name = "GPUToNVVMGen",
     strip_include_prefix = "lib/Conversion/GPUToNVVM",
@@ -1307,12 +1285,13 @@ cc_library(
 )

 cc_library(
-    name = "GPUToCUDATransforms",
+    name = "GPUToGPURuntimeTransforms",
     srcs = [
-        "lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp",
+        "lib/Conversion/GPUCommon/ConvertKernelFuncToBlob.cpp",
+        "lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp",
         "lib/Conversion/PassDetail.h",
     ],
-    hdrs = ["include/mlir/Conversion/GPUToCUDA/GPUToCUDAPass.h"],
+    hdrs = ["include/mlir/Conversion/GPUCommon/GPUCommonPass.h"],
     includes = ["include"],
     deps = [
         ":ConversionPassIncGen",
@@ -2490,7 +2469,7 @@ cc_library(
     includes = ["include"],
     deps = [
         ":Analysis",
-        ":GPURuntimeTransforms",
+        ":GPUToGPURuntimeTransforms",
         ":GPUToNVVMTransforms",
         ":GPUToROCDLTransforms",
         ":GPUToSPIRVTransforms",
@@ -2570,8 +2549,7 @@ cc_library(
         ":ConversionPassIncGen",
         ":GPUDialect",
         ":GPUPassIncGen",
-        ":GPURuntimeTransforms",
-        ":GPUToCUDATransforms",
+        ":GPUToGPURuntimeTransforms",
         ":GPUToNVVMTransforms",
         ":GPUToROCDLTransforms",
         ":GPUToSPIRVTransforms",
@@ -2776,7 +2754,7 @@ cc_binary(
         ":AllPassesAndDialectsNoRegistration",
         ":ExecutionEngineUtils",
         ":GPUDialect",
-        ":GPURuntimeTransforms",
+        ":GPUToGPURuntimeTransforms",
         ":GPUToNVVMTransforms",
         ":GPUToROCDLTransforms",
         ":GPUTransforms",
@@ -2786,6 +2764,7 @@ cc_binary(
         ":MlirJitRunner",
         ":NVVMDialect",
         ":Pass",
+        ":TargetNVVMIR",
         ":Transforms",
         "//devtools/build/runtime:get_runfiles_dir",
         "//third_party/gpus/cuda:cuda_headers",
diff --git a/third_party/mlir/test.BUILD b/third_party/mlir/test.BUILD
index 24b310f..9b6cb28 100644
--- a/third_party/mlir/test.BUILD
+++ b/third_party/mlir/test.BUILD
@@ -158,7 +158,7 @@ cc_library(
         "@llvm-project//mlir:Analysis",
         "@llvm-project//mlir:EDSC",
         "@llvm-project//mlir:GPUDialect",
-        "@llvm-project//mlir:GPUToCUDATransforms",
+        "@llvm-project//mlir:GPUToGPURuntimeTransforms",
         "@llvm-project//mlir:GPUTransforms",
         "@llvm-project//mlir:IR",
         "@llvm-project//mlir:LinalgOps",
@@ -167,6 +167,8 @@ cc_library(
         "@llvm-project//mlir:SCFDialect",
         "@llvm-project//mlir:StandardOps",
         "@llvm-project//mlir:Support",
+        "@llvm-project//mlir:TargetNVVMIR",
+        "@llvm-project//mlir:TargetROCDLIR",
         "@llvm-project//mlir:TransformUtils",
         "@llvm-project//mlir:Transforms",
         "@llvm-project//mlir:VectorOps",
```

Re-run the unit test, and it will fail.

2 files changed

tree: 46671debf622dbaac18f5d49fbcf4d95cb41f24e

README.md

`Documentation`

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python

>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

Continuous build status

Official Builds

Build Type	Status	Artifacts
Linux CPU		PyPI
Linux GPU		PyPI
Linux XLA		TBA
macOS		PyPI
Windows CPU		PyPI
Windows GPU		PyPI
Android
Raspberry Pi 0 and 1		Py3
Raspberry Pi 2 and 3		Py3
Libtensorflow MacOS CPU		GCS
Libtensorflow Linux CPU		GCS
Libtensorflow Linux GPU		GCS
Libtensorflow Windows CPU		GCS
Libtensorflow Windows GPU		GCS

Community Supported Builds

Build Type	Status	Artifacts
Linux AMD ROCm GPU Nightly		Nightly
Linux AMD ROCm GPU Stable Release		Release 1.15 / 2.x
Linux s390x Nightly		Nightly
Linux s390x CPU Stable Release		Release
Linux ppc64le CPU Nightly		Nightly
Linux ppc64le CPU Stable Release		Release 1.15 / 2.x
Linux ppc64le GPU Nightly		Nightly
Linux ppc64le GPU Stable Release		Release 1.15 / 2.x
Linux CPU with Intel® MKL-DNN Nightly		Nightly
Linux CPU with Intel® MKL-DNN Stable Release		Release 1.15 / 2.x
Red Hat® Enterprise Linux® 7.6 CPU & GPU Python 2.7, 3.6		1.13.1 PyPI

Resources

Learn more about the TensorFlow community and how to contribute.

License

Apache License 2.0