tree: 4d778b475970cdf660d42261c27de98f709cb254 [path history] [tgz]
  1. Activation.cpp
  2. AtomicAddFloat.h
  3. avx_mathfun.h
  4. batch_norm_kernel.cpp
  5. BinaryOpsKernel.cpp
  6. BlasKernel.cpp
  7. CatKernel.cpp
  8. CatKernel.h
  9. ComplexKernel.cpp
  10. CopyKernel.cpp
  11. CrossKernel.cpp
  12. DepthwiseConvKernel.cpp
  13. DepthwiseConvKernel.h
  14. DistanceOpsKernel.cpp
  15. DistributionTemplates.h
  16. FillKernel.cpp
  17. FunctionOfAMatrixUtilsKernel.cpp
  18. GridSamplerKernel.cpp
  19. GridSamplerKernel.h
  20. group_norm_kernel.cpp
  21. IndexKernel.cpp
  22. Intrinsics.h
  23. IsContiguous.h
  24. layer_norm_kernel.cpp
  25. LerpKernel.cpp
  26. Loops.h
  27. MaxPooling.cpp
  28. MultinomialKernel.cpp
  29. PointwiseOpsKernel.cpp
  30. PowKernel.cpp
  31. RangeFactoriesKernel.cpp
  32. README.md
  33. Reduce.h
  34. ReduceAllOpsKernel.cpp
  35. ReduceOpsKernel.cpp
  36. ScatterGatherKernel.cpp
  37. SoftMaxKernel.cpp
  38. SoftmaxKernel.h
  39. SortingKernel.cpp
  40. SumKernel.cpp
  41. TensorCompareKernel.cpp
  42. UnaryOpsKernel.cpp
  43. Unfold2d.cpp
  44. UnfoldBackwardKernel.cpp
  45. UpSampleKernel.cpp
  46. UpSampleMoreKernel.cpp
  47. zmath.h
aten/src/ATen/native/cpu/README.md

The most important things to know:

Don't add a kernel to this folder unless you want it to be compiled multiple times for different instruction sets. Yes, this folder is named cpu, but that doesn't mean put any old CPU kernel it. Only put CPU kernels which need to be compiled multiple times to take advantage of AVX/SSE instructions, but only on processors that support them.

Ensure that all implementations in this folder are put in an anonymous namespace. The files in this folder are compiled multiple times with different headers. It‘s important that these functions have internal linkage so that kernels for different architectures don’t get combined during linking. It's sufficient to label functions “static”, but class methods must be an unnamed namespace to have internal linkage (since static means something different in the context of classes).

The basic recipe is to define your kernel, and then register it using DECLARE/REGISTER DISPATCH. Writing a kernel requires three steps:

  1. Declare your dispatch in a header file using DECLARE_DISPATCH(fn_type, fnNameImpl); where fn_type is the function pointer type of the kernel (e.g., defined as using fn_type = void(*)(Tensor&, const Tensor&) and fnNameImpl is the name of your dispatch registry. (It doesn't really matter where you put this declaration.)

  2. Define your dispatch in a C++ file that is NOT in the cpu directory (dispatch must be defined exactly once) using DEFINE_DISPATCH(fnNameImpl) (matching the name of your declaration.) Include the header file that declares the dispatch in this C++ file. Conventionally, we define the dispatch in the same file we will define our native function in.

  3. Define a native function which calls into the dispatch using fnNameImpl(kCPU, arguments...), where the arguments are the arguments according to the fn_type you defined in the declaration.

  4. Write your actual kernel (e.g., your_kernel) in the cpu directory, and register it to the dispatch using REGISTER_DISPATCH(fnNameImpl, &your_kernel).

There are plenty of existing examples, look at them for more details.


TODO: Clarify and add more documentation all around.

All of the *.cpp files in this folder will be compiled under all compiler flags specified by CPU_CAPABILITY_FLAGS in aten/src/ATen/CMakeLists.txt.

The purpose of this is to allow the compilation with various compiler flags to enable features such as AVX instructions, while using runtime dispatch, which makes sure only valid instructions will be used on any given platform.

Vec256.h provides a generic implementation of a vec256 type that allows the programmer to write code packing various primitives (such as floats) within 256bit registers. vec256 defines various operators such as + and * and provides functions to allow operations such as max, min, etc.

As an example ReduceOpsKernel.cpp implements a generic kernel_ that reduces an entire array using a given associative binary operation such as +.

More explicity, calling kernel_ with template argument std::plus will cause it to sum up the entire array into a single value.

ReduceOpsKernel.cpp uses the CPU_CAPABILITY_* macros to “know” under which compiler flags it is currently compiled. This allows the programmer to write generic code, which will be compiled under multipled compilation settings.

../ReduceOps.cpp now includes the header ReduceOpsKernel.h, which contains a generic definition of sumImplAll. This function allows the user to reduce over a dimension or all dimensions. The appropiate capability is chosen at runtime using cpuinfo. If the current platform has AVX, sumImpl will be set to sumImplAll<CPUCapability::AVX>.