Add THCCachingAllocator_recordStream()

This is similar to THCCachingHostAllocator_recordEvent() but on CUDA
allocations. It's useful for overlapping copies with computation. The
workflow is approximately:

  0. allocate dst tensor on copy stream
  1. copy from CPU to GPU on copy stream
  2. synchronize the main stream with the copy stream via
     cudaStreamWaitEvent
  3. THCCachingAllocator_recordStream(dst, main_stream)

The recordStream() call is necessary to prevent the dst tensor from
begin reused on the copy stream before the main stream finishes work.

Previously, you would need to insert a second cudaStreamWaitEvent before
dst is freed to force the copy stream to wait on the main stream.
2 files changed
tree: 6c2198d1fff1ba453f829119a1aaf2d2b8804652
  1. cmake/
  2. generated/
  3. generic/
  4. CMakeLists.txt
  5. THC.h
  6. THCAllocator.c
  7. THCAllocator.h
  8. THCApply.cuh
  9. THCAsmUtils.cuh
  10. THCAtomics.cuh
  11. THCBlas.cu
  12. THCBlas.h
  13. THCCachingAllocator.cpp
  14. THCCachingAllocator.h
  15. THCCachingHostAllocator.cpp
  16. THCCachingHostAllocator.h
  17. THCDeviceTensor-inl.cuh
  18. THCDeviceTensor.cuh
  19. THCDeviceTensorUtils-inl.cuh
  20. THCDeviceTensorUtils.cuh
  21. THCDeviceUtils.cuh
  22. THCGeneral.c
  23. THCGeneral.h.in
  24. THCGenerateAllTypes.h
  25. THCGenerateByteType.h
  26. THCGenerateCharType.h
  27. THCGenerateDoubleType.h
  28. THCGenerateFloatType.h
  29. THCGenerateFloatTypes.h
  30. THCGenerateHalfType.h
  31. THCGenerateIntType.h
  32. THCGenerateLongType.h
  33. THCGenerateShortType.h
  34. THCHalf.cu
  35. THCHalf.h
  36. THCNumerics.cuh
  37. THCReduce.cuh
  38. THCReduceAll.cuh
  39. THCReduceApplyUtils.cu
  40. THCReduceApplyUtils.cuh
  41. THCScanUtils.cuh
  42. THCSleep.cu
  43. THCSleep.h
  44. THCSortUtils.cuh
  45. THCStorage.c
  46. THCStorage.cu
  47. THCStorage.h
  48. THCStorageCopy.c
  49. THCStorageCopy.cu
  50. THCStorageCopy.h
  51. THCStream.cpp
  52. THCStream.h
  53. THCTensor.c
  54. THCTensor.cu
  55. THCTensor.h
  56. THCTensorConv.cu
  57. THCTensorConv.h
  58. THCTensorCopy.c
  59. THCTensorCopy.cu
  60. THCTensorCopy.h
  61. THCTensorIndex.cu
  62. THCTensorInfo.cuh
  63. THCTensorMasked.cuh
  64. THCTensorMath.cu
  65. THCTensorMath.cuh
  66. THCTensorMath.h
  67. THCTensorMath2.cu
  68. THCTensorMathBlas.cu
  69. THCTensorMathCompare.cuh
  70. THCTensorMathCompareT.cuh
  71. THCTensorMathMagma.cu
  72. THCTensorMathMagma.cuh
  73. THCTensorMathPairwise.cu
  74. THCTensorMathPointwise.cuh
  75. THCTensorMathReduce.cu
  76. THCTensorMathReduce.cuh
  77. THCTensorMathScan.cu
  78. THCTensorRandom.cpp
  79. THCTensorRandom.cu
  80. THCTensorRandom.cuh
  81. THCTensorRandom.h
  82. THCTensorScatterGather.cu
  83. THCTensorSort.cu
  84. THCTensorSort.cuh
  85. THCTensorTopK.cu
  86. THCTensorTopK.h
  87. THCTensorTypeUtils.cu
  88. THCTensorTypeUtils.cuh
  89. THCThreadLocal.c
  90. THCThreadLocal.h
  91. THCThrustAllocator.cuh