Optionally specify stream for pointers in CUDA algorithms

Summary:
Work may be queued on CUDA streams for asynchronous execution. The
memory backed by pointers passed to any algorithm can therefore be
mutated after constructing an algorithm instance. By also passing in
the streams these mutations happen on, the algorithms can synchronize
with these mutations to ensure no invalid data is used.

By passing in these streams, any work done by these algorithms will
*also* be queued, which effectively removes a single synchronization
step from any algorithm run.

Differential Revision: D4589394

fbshipit-source-id: 0c8cd6ba9c9018f33d6f4c55a037083fc4164acb
16 files changed
tree: 1b6b50e3f5a1e09c8385ec4371a395a8c35dc459
  1. cmake/
  2. gloo/
  3. CMakeLists.txt
  4. LICENSE
  5. PATENTS
  6. README.md
README.md

gloo

TODO