commit | 55633bf5c1b9c9004a4bbc6b884dd6b6d5f9fab7 | [log] [tgz] |
---|---|---|
author | Advait Jain <advaitjain@users.noreply.github.com> | Thu Feb 11 15:52:01 2021 -0800 |
committer | Advait Jain <advaitjain@users.noreply.github.com> | Fri Feb 12 14:23:00 2021 -0800 |
tree | f2aad67f1bd591f8fb6db2c97c2f6a8e3e456f81 | |
parent | 9421ecff2a0ee4bd6a74116e339862aefa411feb [diff] |
Use xa_nnlib for svdf for Fusion F1. The code in this change is the subset of functionality needed for int8 svdf for Hifi4 copied from https://github.com/pnikam-cad/tensorflow/blob/a737c1e3945bc70022259479ad24133a343ec906/tensorflow/lite/micro/kernels/xtensa_hifi/svdf.cc Note that the current change has not pulled in either the floating point implementation or the Hifi5 implementation. Profiled the keryword_benchmark with the following command: ``` make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade run_keyword_benchmark -j8 ``` gives a latency of 38516 ticks with this change vs 152642 ticks without this change. Per OP latency with this change: ``` KeywordRunNIerations(1) took 38516 ticks (38 ms) QUANTIZE took 3758 ticks (3 ms). SVDF took 4753 ticks (4 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 4211 ticks (4 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 3145 ticks (3 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 4211 ticks (4 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 2890 ticks (2 ms). SVDF took 3583 ticks (3 ms). SVDF took 3054 ticks (3 ms). FULLY_CONNECTED took 1091 ticks (1 ms). SOFTMAX took 2042 ticks (2 ms). QUANTIZE took 366 ticks (0 ms). ``` Without this change: ``` KeywordRunNIerations(1) took 152642 ticks (152 ms) QUANTIZE took 3758 ticks (3 ms). SVDF took 38003 ticks (38 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 18803 ticks (18 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 18803 ticks (18 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 18803 ticks (18 ms). FULLY_CONNECTED took 1353 ticks (1 ms). SVDF took 13907 ticks (13 ms). SVDF took 15827 ticks (15 ms). SVDF took 15827 ticks (15 ms). FULLY_CONNECTED took 1091 ticks (1 ms). SOFTMAX took 2042 ticks (2 ms). QUANTIZE took 366 ticks (0 ms). ``` Also confirmed that the kernel_svdf_test passes with: ``` make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade test_kernel_svdf_test -j8 ```
Documentation |
---|
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.
TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.
Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.
See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.
To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):
$ pip install tensorflow
A smaller CPU-only package is also available:
$ pip install tensorflow-cpu
To update TensorFlow to the latest version, add --upgrade
flag to the above commands.
Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.
$ python
>>> import tensorflow as tf >>> tf.add(1, 2).numpy() 3 >>> hello = tf.constant('Hello, TensorFlow!') >>> hello.numpy() b'Hello, TensorFlow!'
For more examples, see the TensorFlow tutorials.
If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.
We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.
The TensorFlow project strives to abide by generally accepted best practices in open-source software development:
Build Type | Status | Artifacts |
---|---|---|
Linux CPU | PyPI | |
Linux GPU | PyPI | |
Linux XLA | TBA | |
macOS | PyPI | |
Windows CPU | PyPI | |
Windows GPU | PyPI | |
Android | ||
Raspberry Pi 0 and 1 | Py3 | |
Raspberry Pi 2 and 3 | Py3 | |
Libtensorflow MacOS CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Linux CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Linux GPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Windows CPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Libtensorflow Windows GPU | Status Temporarily Unavailable | Nightly Binary Official GCS |
Build Type | Status | Artifacts |
---|---|---|
Linux AMD ROCm GPU Nightly | Nightly | |
Linux AMD ROCm GPU Stable Release | Release 1.15 / 2.x | |
Linux s390x Nightly | Nightly | |
Linux s390x CPU Stable Release | Release | |
Linux ppc64le CPU Nightly | Nightly | |
Linux ppc64le CPU Stable Release | Release 1.15 / 2.x | |
Linux ppc64le GPU Nightly | Nightly | |
Linux ppc64le GPU Stable Release | Release 1.15 / 2.x | |
Linux aarch64 CPU Nightly (Linaro) | Nightly | |
Linux aarch64 CPU Stable Release (Linaro) | Release 1.x & 2.x | |
Linux aarch64 CPU Nightly (OpenLab) Python 3.6 | Nightly | |
Linux aarch64 CPU Stable Release (OpenLab) | Release 1.15 / 2.x | |
Linux CPU with Intel oneAPI Deep Neural Network Library (oneDNN) Nightly | Nightly | |
Linux CPU with Intel oneAPI Deep Neural Network Library (oneDNN) Stable Release | Release 1.15 / 2.x | |
Red Hat® Enterprise Linux® 7.6 CPU & GPU Python 2.7, 3.6 | 1.13.1 PyPI |
Container Type | Status | Artifacts |
---|---|---|
TensorFlow aarch64 Neoverse-N1 CPU Stable (Linaro) Debian | Static | Release 2.3 |
Learn more about the TensorFlow community and how to contribute.