bb5d0144b9f - platform/external/tensorflow

commit	bb5d0144b9f2a54c4202e7583e85c154447d09dc	[log] [tgz]
author	A. Unique TensorFlower <gardener@tensorflow.org>	Tue Feb 25 08:48:40 2020 -0800
committer	TensorFlower Gardener <gardener@tensorflow.org>	Tue Feb 25 08:57:54 2020 -0800
tree	a4dff79167b5e180733735b0095c5e6ffb47fc63
parent	cf608db45cd483c3a1cc452d0526c252dcf5a91a [diff]

PR #36468: Add block cache for low level table library

Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/36468

This is part of a patch series aiming to improve the performance of on-disk dataset.cache() (CacheDatasetV2).

Currently CacheDataset uses core/util/tensor_bundle to cache dataset elements on disks. It uses sorted string table (SST) to index dataset elements. Unlike checkpoints which do not have a great number of tensors, caching a large dataset may incur a greater number of tensors as well as index blocks.

If the index block is present in an in-memory LRU block cache, fetching a dataset element only needs 1 round trip instead of 2. This is particularly useful when CacheDataset are read from remote file system at a higher latency such as HDFS and GCS.

Almost all code are imported from the LevelDB project, in particular the hash function to shard LRU cache. Currently using Hash32 in core/lib/hash fails the EvictionPolicy test.

I only make 2 modifications to the original cache:

1. Alias leveldb::Slice to tensorflow::StringPiece
2. Switch to tensorflow::mutex for all mutexes.

Ping @jsimsa to review.
Copybara import of the project:

--
4c28247f5f3f6fcd12e82757befd7d90bf413e2c by Bairen Yi <yibairen.byron@bytedance.com>:

Add block cache for low level table library

This is part of a patch series aiming to improve the
performance of on-disk dataset.cache() (CacheDatasetV2).

Currently CacheDataset uses core/util/tensor_bundle to cache
dataset elements on disks. It uses sorted string table (SST)
to index dataset elements. Unlike checkpoints which do not
have a great number of tensors, caching a large dataset may
incur a greater number of tensors as well as index blocks.

If the index block is present in an in-memory LRU block cache,
fetching a dataset element only needs 1 round trip instead of 2.
This is particularly useful when CacheDataset are read from
remote file system at a higher latency such as HDFS and GCS.

Almost all code are imported from the LevelDB project,
in particular the hash function to shard LRU cache. Currently
using Hash32 in core/lib/hash fails the EvictionPolicy test.

I only make 2 modifications to the original cache:

1. Alias leveldb::Slice to tensorflow::StringPiece, which
   transitively aliases
2. Switch to tensorflow::mutex for all mutexes.

Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>

--
b69b43382ea7692ffd60ad50b118ac0646ceecc8 by Bairen Yi <yibairen.byron@bytedance.com>:

tensor_bundle: Enable cache for metadata table

The index cache is by default disabled unless one set
the TF_TABLE_INDEX_CACHE_SIZE_IN_MB environment variable.

Signed-off-by: Bairen Yi <yibairen.byron@bytedance.com>
PiperOrigin-RevId: 297125962
Change-Id: Ibfec97b19f337d40f5726f656ee9c6487ce552d0

8 files changed

tree: a4dff79167b5e180733735b0095c5e6ffb47fc63

README.md

`Documentation`

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python

>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

Continuous build status

Official Builds

Build Type	Status	Artifacts
Linux CPU		PyPI
Linux GPU		PyPI
Linux XLA		TBA
macOS		PyPI
Windows CPU		PyPI
Windows GPU		PyPI
Android
Raspberry Pi 0 and 1		Py2 Py3
Raspberry Pi 2 and 3		Py2 Py3

Community Supported Builds

Build Type	Status	Artifacts
Linux AMD ROCm GPU Nightly		Nightly
Linux AMD ROCm GPU Stable Release		Release 1.15 / 2.x
Linux s390x Nightly		Nightly
Linux s390x CPU Stable Release		Release
Linux ppc64le CPU Nightly		Nightly
Linux ppc64le CPU Stable Release		Release 1.15 / 2.x
Linux ppc64le GPU Nightly		Nightly
Linux ppc64le GPU Stable Release		Release 1.15 / 2.x
Linux CPU with Intel® MKL-DNN Nightly		Nightly
Linux CPU with Intel® MKL-DNN Stable Release		Release 1.15 / 2.x
Red Hat® Enterprise Linux® 7.6 CPU & GPU Python 2.7, 3.6		1.13.1 PyPI

Resources

Learn more about the TensorFlow community and how to contribute.

License

Apache License 2.0