docs/source/notes/windows.rst - platform/external/pytorch - Git at Google

 Windows FAQ
 ==========================

 Building from source
 --------------------

 Include optional components
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

 There are two supported components for Windows PyTorch:
 MKL and MAGMA. Here are the steps to build with them.

 .. code-block:: bat

     REM Make sure you have 7z and curl installed.

     REM Download MKL files
     curl https://s3.amazonaws.com/ossci-windows/mkl_2020.0.166.7z -k -O
     7z x -aoa mkl_2018.2.185.7z -omkl

     REM Download MAGMA files
     REM version available:
     REM 2.5.2 (CUDA 9.2 10.0 10.1 10.2) x (Debug Release)
     REM 2.5.1 (CUDA 9.2 10.0 10.1 10.2) x (Debug Release)
     REM 2.5.0 (CUDA 9.0 9.2 10.0 10.1) x (Debug Release)
     REM 2.4.0 (CUDA 8.0 9.2) x (Release)
     set CUDA_PREFIX=cuda92
     set CONFIG=release
     curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.1_%CUDA_PREFIX%_%CONFIG%.7z -o magma.7z
     7z x -aoa magma.7z -omagma

     REM Setting essential environment variables
     set "CMAKE_INCLUDE_PATH=%cd%\\mkl\\include"
     set "LIB=%cd%\\mkl\\lib;%LIB%"
     set "MAGMA_HOME=%cd%\\magma"

 Speeding CUDA build for Windows
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Visual Studio doesn't support parallel custom task currently.
 As an alternative, we can use ``Ninja`` to parallelize CUDA
 build tasks. It can be used by typing only a few lines of code.

 .. code-block:: bat

     REM Let's install ninja first.
     pip install ninja

     REM Set it as the cmake generator
     set CMAKE_GENERATOR=Ninja


 One key install script
 ^^^^^^^^^^^^^^^^^^^^^^

 You can take a look at `this set of scripts
 <https://github.com/peterjc123/pytorch-scripts>`_.
 It will lead the way for you.

 Extension
 ---------

 CFFI Extension
 ^^^^^^^^^^^^^^

 The support for CFFI Extension is very experimental. There're
 generally two steps to enable it under Windows.

 First, specify additional ``libraries`` in ``Extension``
 object to make it build on Windows.

 .. code-block:: python

    ffi = create_extension(
        '_ext.my_lib',
        headers=headers,
        sources=sources,
        define_macros=defines,
        relative_to=__file__,
        with_cuda=with_cuda,
        extra_compile_args=["-std=c99"],
        libraries=['ATen', '_C'] # Append cuda libraries when necessary, like cudart
    )

 Second, here is a workground for "unresolved external symbol
 state caused by ``extern THCState *state;``"

 Change the source code from C to C++. An example is listed below.

 .. code-block:: cpp

     #include <THC/THC.h>
     #include <ATen/ATen.h>

     THCState *state = at::globalContext().thc_state;

     extern "C" int my_lib_add_forward_cuda(THCudaTensor *input1, THCudaTensor *input2,
                                             THCudaTensor *output)
     {
         if (!THCudaTensor_isSameSizeAs(state, input1, input2))
         return 0;
         THCudaTensor_resizeAs(state, output, input1);
         THCudaTensor_cadd(state, output, input1, 1.0, input2);
         return 1;
     }

     extern "C" int my_lib_add_backward_cuda(THCudaTensor *grad_output, THCudaTensor *grad_input)
     {
         THCudaTensor_resizeAs(state, grad_input, grad_output);
         THCudaTensor_fill(state, grad_input, 1);
         return 1;
     }

 Cpp Extension
 ^^^^^^^^^^^^^

 This type of extension has better support compared with
 the previous one. However, it still needs some manual
 configuration. First, you should open the
 **x86_x64 Cross Tools Command Prompt for VS 2017**.
 And then, you can start your compiling process.

 Installation
 ------------

 Package not found in win-32 channel.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. code-block:: bat

     Solving environment: failed

     PackagesNotFoundError: The following packages are not available from current channels:

     - pytorch

     Current channels:
     - https://conda.anaconda.org/pytorch/win-32
     - https://conda.anaconda.org/pytorch/noarch
     - https://repo.continuum.io/pkgs/main/win-32
     - https://repo.continuum.io/pkgs/main/noarch
     - https://repo.continuum.io/pkgs/free/win-32
     - https://repo.continuum.io/pkgs/free/noarch
     - https://repo.continuum.io/pkgs/r/win-32
     - https://repo.continuum.io/pkgs/r/noarch
     - https://repo.continuum.io/pkgs/pro/win-32
     - https://repo.continuum.io/pkgs/pro/noarch
     - https://repo.continuum.io/pkgs/msys2/win-32
     - https://repo.continuum.io/pkgs/msys2/noarch

 PyTorch doesn't work on 32-bit system. Please use Windows and
 Python 64-bit version.


 Import error
 ^^^^^^^^^^^^

 .. code-block:: python

     from torch._C import *

     ImportError: DLL load failed: The specified module could not be found.


 The problem is caused by the missing of the essential files. Actually,
 we include almost all the essential files that PyTorch need for the conda
 package except VC2017 redistributable and some mkl libraries.
 You can resolve this by typing the following command.

 .. code-block:: bat

     conda install -c peterjc123 vc vs2017_runtime
     conda install mkl_fft intel_openmp numpy mkl

 As for the wheels package, since we didn't pack some libraries and VS2017
 redistributable files in, please make sure you install them manually.
 The `VS 2017 redistributable installer
 <https://aka.ms/vs/15/release/VC_redist.x64.exe>`_ can be downloaded.
 And you should also pay attention to your installation of Numpy. Make sure it
 uses MKL instead of OpenBLAS. You may type in the following command.

 .. code-block:: bat

     pip install numpy mkl intel-openmp mkl_fft

 Another possible cause may be you are using GPU version without NVIDIA
 graphics cards. Please replace your GPU package with the CPU one.

 .. code-block:: python

     from torch._C import *

     ImportError: DLL load failed: The operating system cannot run %1.


 This is actually an upstream issue of Anaconda. When you initialize your
 environment with conda-forge channel, this issue will emerge. You may fix
 the intel-openmp libraries through this command.

 .. code-block:: bat

     conda install -c defaults intel-openmp -f


 Usage (multiprocessing)
 -------------------------------------------------------

 Multiprocessing error without if-clause protection
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. code-block:: python

     RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

 The implementation of ``multiprocessing`` is different on Windows, which
 uses ``spawn`` instead of ``fork``. So we have to wrap the code with an
 if-clause to protect the code from executing multiple times. Refactor
 your code into the following structure.

 .. code-block:: python

     import torch

     def main()
         for i, data in enumerate(dataloader):
             # do something here

     if __name__ == '__main__':
         main()


 Multiprocessing error "Broken pipe"
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. code-block:: python

     ForkingPickler(file, protocol).dump(obj)

     BrokenPipeError: [Errno 32] Broken pipe

 This issue happens when the child process ends before the parent process
 finishes sending data. There may be something wrong with your code. You
 can debug your code by reducing the ``num_worker`` of
 :class:`~torch.utils.data.DataLoader` to zero and see if the issue persists.

 Multiprocessing error "driver shut down"
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 ::

     Couldn’t open shared file mapping: <torch_14808_1591070686>, error code: <1455> at torch\lib\TH\THAllocator.c:154

     [windows] driver shut down

 Please update your graphics driver. If this persists, this may be that your
 graphics card is too old or the calculation is too heavy for your card. Please
 update the TDR settings according to this `post
 <https://www.pugetsystems.com/labs/hpc/Working-around-TDR-in-Windows-for-a-better-GPU-computing-experience-777/>`_.

 CUDA IPC operations
 ^^^^^^^^^^^^^^^^^^^

 .. code-block:: python

    THCudaCheck FAIL file=torch\csrc\generic\StorageSharing.cpp line=252 error=63 : OS call failed or operation not supported on this OS

 They are not supported on Windows. Something like doing multiprocessing on CUDA
 tensors cannot succeed, there are two alternatives for this.

 1. Don't use ``multiprocessing``. Set the ``num_worker`` of
 :class:`~torch.utils.data.DataLoader` to zero.

 2. Share CPU tensors instead. Make sure your custom
 :class:`~torch.utils.data.DataSet` returns CPU tensors.
	Windows FAQ
	==========================

	Building from source
	--------------------

	Include optional components
	^^^^^^^^^^^^^^^^^^^^^^^^^^^

	There are two supported components for Windows PyTorch:
	MKL and MAGMA. Here are the steps to build with them.

	.. code-block:: bat

	REM Make sure you have 7z and curl installed.

	REM Download MKL files
	curl https://s3.amazonaws.com/ossci-windows/mkl_2020.0.166.7z -k -O
	7z x -aoa mkl_2018.2.185.7z -omkl

	REM Download MAGMA files
	REM version available:
	REM 2.5.2 (CUDA 9.2 10.0 10.1 10.2) x (Debug Release)
	REM 2.5.1 (CUDA 9.2 10.0 10.1 10.2) x (Debug Release)
	REM 2.5.0 (CUDA 9.0 9.2 10.0 10.1) x (Debug Release)
	REM 2.4.0 (CUDA 8.0 9.2) x (Release)
	set CUDA_PREFIX=cuda92
	set CONFIG=release
	curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.1_%CUDA_PREFIX%_%CONFIG%.7z -o magma.7z
	7z x -aoa magma.7z -omagma

	REM Setting essential environment variables
	set "CMAKE_INCLUDE_PATH=%cd%\\mkl\\include"
	set "LIB=%cd%\\mkl\\lib;%LIB%"
	set "MAGMA_HOME=%cd%\\magma"

	Speeding CUDA build for Windows
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Visual Studio doesn't support parallel custom task currently.
	As an alternative, we can use ``Ninja`` to parallelize CUDA
	build tasks. It can be used by typing only a few lines of code.

	.. code-block:: bat

	REM Let's install ninja first.
	pip install ninja

	REM Set it as the cmake generator
	set CMAKE_GENERATOR=Ninja


	One key install script
	^^^^^^^^^^^^^^^^^^^^^^

	You can take a look at `this set of scripts
	<https://github.com/peterjc123/pytorch-scripts>`_.
	It will lead the way for you.

	Extension
	---------

	CFFI Extension
	^^^^^^^^^^^^^^

	The support for CFFI Extension is very experimental. There're
	generally two steps to enable it under Windows.

	First, specify additional ``libraries`` in ``Extension``
	object to make it build on Windows.

	.. code-block:: python

	ffi = create_extension(
	'_ext.my_lib',
	headers=headers,
	sources=sources,
	define_macros=defines,
	relative_to=__file__,
	with_cuda=with_cuda,
	extra_compile_args=["-std=c99"],
	libraries=['ATen', '_C'] # Append cuda libraries when necessary, like cudart
	)

	Second, here is a workground for "unresolved external symbol
	state caused by ``extern THCState *state;``"

	Change the source code from C to C++. An example is listed below.

	.. code-block:: cpp

	#include <THC/THC.h>
	#include <ATen/ATen.h>

	THCState *state = at::globalContext().thc_state;

	extern "C" int my_lib_add_forward_cuda(THCudaTensor input1, THCudaTensor input2,
	THCudaTensor *output)
	{
	if (!THCudaTensor_isSameSizeAs(state, input1, input2))
	return 0;
	THCudaTensor_resizeAs(state, output, input1);
	THCudaTensor_cadd(state, output, input1, 1.0, input2);
	return 1;
	}

	extern "C" int my_lib_add_backward_cuda(THCudaTensor grad_output, THCudaTensor grad_input)
	{
	THCudaTensor_resizeAs(state, grad_input, grad_output);
	THCudaTensor_fill(state, grad_input, 1);
	return 1;
	}

	Cpp Extension
	^^^^^^^^^^^^^

	This type of extension has better support compared with
	the previous one. However, it still needs some manual
	configuration. First, you should open the
	x86_x64 Cross Tools Command Prompt for VS 2017.
	And then, you can start your compiling process.

	Installation
	------------

	Package not found in win-32 channel.
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. code-block:: bat

	Solving environment: failed

	PackagesNotFoundError: The following packages are not available from current channels:

	- pytorch

	Current channels:
	- https://conda.anaconda.org/pytorch/win-32
	- https://conda.anaconda.org/pytorch/noarch
	- https://repo.continuum.io/pkgs/main/win-32
	- https://repo.continuum.io/pkgs/main/noarch
	- https://repo.continuum.io/pkgs/free/win-32
	- https://repo.continuum.io/pkgs/free/noarch
	- https://repo.continuum.io/pkgs/r/win-32
	- https://repo.continuum.io/pkgs/r/noarch
	- https://repo.continuum.io/pkgs/pro/win-32
	- https://repo.continuum.io/pkgs/pro/noarch
	- https://repo.continuum.io/pkgs/msys2/win-32
	- https://repo.continuum.io/pkgs/msys2/noarch

	PyTorch doesn't work on 32-bit system. Please use Windows and
	Python 64-bit version.


	Import error
	^^^^^^^^^^^^

	.. code-block:: python

	from torch._C import *

	ImportError: DLL load failed: The specified module could not be found.


	The problem is caused by the missing of the essential files. Actually,
	we include almost all the essential files that PyTorch need for the conda
	package except VC2017 redistributable and some mkl libraries.
	You can resolve this by typing the following command.

	.. code-block:: bat

	conda install -c peterjc123 vc vs2017_runtime
	conda install mkl_fft intel_openmp numpy mkl

	As for the wheels package, since we didn't pack some libraries and VS2017
	redistributable files in, please make sure you install them manually.
	The `VS 2017 redistributable installer
	<https://aka.ms/vs/15/release/VC_redist.x64.exe>`_ can be downloaded.
	And you should also pay attention to your installation of Numpy. Make sure it
	uses MKL instead of OpenBLAS. You may type in the following command.

	.. code-block:: bat

	pip install numpy mkl intel-openmp mkl_fft

	Another possible cause may be you are using GPU version without NVIDIA
	graphics cards. Please replace your GPU package with the CPU one.

	.. code-block:: python

	from torch._C import *

	ImportError: DLL load failed: The operating system cannot run %1.


	This is actually an upstream issue of Anaconda. When you initialize your
	environment with conda-forge channel, this issue will emerge. You may fix
	the intel-openmp libraries through this command.

	.. code-block:: bat

	conda install -c defaults intel-openmp -f


	Usage (multiprocessing)
	-------------------------------------------------------

	Multiprocessing error without if-clause protection
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. code-block:: python

	RuntimeError:
	An attempt has been made to start a new process before the
	current process has finished its bootstrapping phase.

	This probably means that you are not using fork to start your
	child processes and you have forgotten to use the proper idiom
	in the main module:

	if __name__ == '__main__':
	freeze_support()
	...

	The "freeze_support()" line can be omitted if the program
	is not going to be frozen to produce an executable.

	The implementation of ``multiprocessing`` is different on Windows, which
	uses ``spawn`` instead of ``fork``. So we have to wrap the code with an
	if-clause to protect the code from executing multiple times. Refactor
	your code into the following structure.

	.. code-block:: python

	import torch

	def main()
	for i, data in enumerate(dataloader):
	# do something here

	if __name__ == '__main__':
	main()


	Multiprocessing error "Broken pipe"
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. code-block:: python

	ForkingPickler(file, protocol).dump(obj)

	BrokenPipeError: [Errno 32] Broken pipe

	This issue happens when the child process ends before the parent process
	finishes sending data. There may be something wrong with your code. You
	can debug your code by reducing the ``num_worker`` of
	:class:`~torch.utils.data.DataLoader` to zero and see if the issue persists.

	Multiprocessing error "driver shut down"
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	::

	Couldn’t open shared file mapping: <torch_14808_1591070686>, error code: <1455> at torch\lib\TH\THAllocator.c:154

	[windows] driver shut down

	Please update your graphics driver. If this persists, this may be that your
	graphics card is too old or the calculation is too heavy for your card. Please
	update the TDR settings according to this `post
	<https://www.pugetsystems.com/labs/hpc/Working-around-TDR-in-Windows-for-a-better-GPU-computing-experience-777/>`_.

	CUDA IPC operations
	^^^^^^^^^^^^^^^^^^^

	.. code-block:: python

	THCudaCheck FAIL file=torch\csrc\generic\StorageSharing.cpp line=252 error=63 : OS call failed or operation not supported on this OS

	They are not supported on Windows. Something like doing multiprocessing on CUDA
	tensors cannot succeed, there are two alternatives for this.

	1. Don't use ``multiprocessing``. Set the ``num_worker`` of
	:class:`~torch.utils.data.DataLoader` to zero.

	2. Share CPU tensors instead. Make sure your custom
	:class:`~torch.utils.data.DataSet` returns CPU tensors.