docs/source/multiprocessing.rst - platform/external/pytorch - Git at Google

 Multiprocessing package - torch.multiprocessing
 ===============================================

 .. automodule:: torch.multiprocessing
 .. currentmodule:: torch.multiprocessing

 .. warning::

     If the main process exits abruptly (e.g. because of an incoming signal),
     Python's ``multiprocessing`` sometimes fails to clean up its children.
     It's a known caveat, so if you're seeing any resource leaks after
     interrupting the interpreter, it probably means that this has just happened
     to you.

 Strategy management
 -------------------

 .. autofunction:: get_all_sharing_strategies
 .. autofunction:: get_sharing_strategy
 .. autofunction:: set_sharing_strategy

 Sharing CUDA tensors
 --------------------

 Sharing CUDA tensors between processes is supported only in Python 3, using
 a ``spawn`` or ``forkserver`` start methods. :mod:`python:multiprocessing` in
 Python 2 can only create subprocesses using ``fork``, and it's not supported
 by the CUDA runtime.

 Unlike CPU tensors, the sending process is required to keep the original tensor
 as long as the receiving process retains a copy of the tensor.
 This shouldn't be a problem for sharing model parameters (which stay live
 for the entire execution of the model), but passing other
 kinds of data should be done with care.

 Here is an example program which handles these requirements correctly:

 ::
     import torch
     import torch.multiprocessing as mp

     torch.set_default_tensor_type(torch.cuda.FloatTensor)

     def sender(q, e):
         for i in range(10):
             s_sample = [torch.zeros(1), torch.ones(1)]
             q.put(s_sample)
             e.wait()
             del s_sample
             e.clear()

     if __name__ == "__main__":
         ctx = mp.get_context("spawn")
         q = ctx.Queue()
         e = ctx.Event()
         p = ctx.Process(target=sender, args=(q, e))
         p.start()

         for i in range(10):
             print('=== ITER {} ===".format(i))
             r_sample = q.get()
             del r_sample
             e.set()

         p.join()

 In the example above, calling `e.wait()`
 on sender side ensures tensor `s_sample` doesn't get deleted while
 receiver is working on it.  The receiver signals when it is done
 with the tensor using `e.set()`, being careful to `del` its reference
 to the received tensor first.  It is INSUFFICIENT to promise never to call
 `r_sample` again; while `r_sample` is live, it may be confused with
 any subsequent tensors allocated by the source process at the same address.

 If a receiver wants to save the data of `r_sample` for future use while
 letting the source process deallocate the original, it must
 `clone()` it.

 This behavior is very confusing, and we are tracking a fix for it
 at https://github.com/pytorch/pytorch/issues/16141

 Sharing strategies
 ------------------

 This section provides a brief overview into how different sharing strategies
 work. Note that it applies only to CPU tensor - CUDA tensors will always use
 the CUDA API, as that's the only way they can be shared.

 File descriptor - ``file_descriptor``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


 .. note::

     This is the default strategy (except for macOS and OS X where it's not
     supported).

 This strategy will use file descriptors as shared memory handles. Whenever a
 storage is moved to shared memory, a file descriptor obtained from ``shm_open``
 is cached with the object, and when it's going to be sent to other processes,
 the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
 receiver will also cache the file descriptor and ``mmap`` it, to obtain a shared
 view onto the storage data.

 Note that if there will be a lot of tensors shared, this strategy will keep a
 large number of file descriptors open most of the time. If your system has low
 limits for the number of open file descriptors, and you can't raise them, you
 should use the ``file_system`` strategy.

 File system - ``file_system``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 This strategy will use file names given to ``shm_open`` to identify the shared
 memory regions. This has a benefit of not requiring the implementation to cache
 the file descriptors obtained from it, but at the same time is prone to shared
 memory leaks. The file can't be deleted right after its creation, because other
 processes need to access it to open their views. If the processes fatally
 crash, or are killed, and don't call the storage destructors, the files will
 remain in the system. This is very serious, because they keep using up the
 memory until the system is restarted, or they're freed manually.

 To counter the problem of shared memory file leaks, :mod:`torch.multiprocessing`
 will spawn a daemon named ``torch_shm_manager`` that will isolate itself from
 the current process group, and will keep track of all shared memory allocations.
 Once all processes connected to it exit, it will wait a moment to ensure there
 will be no new connections, and will iterate over all shared memory files
 allocated by the group. If it finds that any of them still exist, they will be
 deallocated. We've tested this method and it proved to be robust to various
 failures. Still, if your system has high enough limits, and ``file_descriptor``
 is a supported strategy, we do not recommend switching to this one.

 Spawning subprocesses
 ---------------------

 .. note::

    Available for Python >= 3.4.

    This depends on the ``spawn`` start method in Python's
    ``multiprocessing`` package.

 Spawning a number of subprocesses to perform some function can be done
 by creating ``Process`` instances and calling ``join`` to wait for
 their completion. This approach works fine when dealing with a single
 subprocess but presents potential issues when dealing with multiple
 processes.

 Namely, joining processes sequentially implies they will terminate
 sequentially. If they don't, and the first process does not terminate,
 the process termination will go unnoticed. Also, there are no native
 facilities for error propagation.

 The ``spawn`` function below addresses these concerns and takes care
 of error propagation, out of order termination, and will actively
 terminate processes upon detecting an error in one of them.

 .. autofunction:: spawn

 .. class:: SpawnContext

    Returned by :func:`~spawn` when called with ``join=False``.

    .. automethod:: join
	Multiprocessing package - torch.multiprocessing
	===============================================

	.. automodule:: torch.multiprocessing
	.. currentmodule:: torch.multiprocessing

	.. warning::

	If the main process exits abruptly (e.g. because of an incoming signal),
	Python's ``multiprocessing`` sometimes fails to clean up its children.
	It's a known caveat, so if you're seeing any resource leaks after
	interrupting the interpreter, it probably means that this has just happened
	to you.

	Strategy management
	-------------------

	.. autofunction:: get_all_sharing_strategies
	.. autofunction:: get_sharing_strategy
	.. autofunction:: set_sharing_strategy

	Sharing CUDA tensors
	--------------------

	Sharing CUDA tensors between processes is supported only in Python 3, using
	a ``spawn`` or ``forkserver`` start methods. :mod:`python:multiprocessing` in
	Python 2 can only create subprocesses using ``fork``, and it's not supported
	by the CUDA runtime.

	Unlike CPU tensors, the sending process is required to keep the original tensor
	as long as the receiving process retains a copy of the tensor.
	This shouldn't be a problem for sharing model parameters (which stay live
	for the entire execution of the model), but passing other
	kinds of data should be done with care.

	Here is an example program which handles these requirements correctly:

	::
	import torch
	import torch.multiprocessing as mp

	torch.set_default_tensor_type(torch.cuda.FloatTensor)

	def sender(q, e):
	for i in range(10):
	s_sample = [torch.zeros(1), torch.ones(1)]
	q.put(s_sample)
	e.wait()
	del s_sample
	e.clear()

	if __name__ == "__main__":
	ctx = mp.get_context("spawn")
	q = ctx.Queue()
	e = ctx.Event()
	p = ctx.Process(target=sender, args=(q, e))
	p.start()

	for i in range(10):
	print('=== ITER {} ===".format(i))
	r_sample = q.get()
	del r_sample
	e.set()

	p.join()

	In the example above, calling `e.wait()`
	on sender side ensures tensor `s_sample` doesn't get deleted while
	receiver is working on it. The receiver signals when it is done
	with the tensor using `e.set()`, being careful to `del` its reference
	to the received tensor first. It is INSUFFICIENT to promise never to call
	`r_sample` again; while `r_sample` is live, it may be confused with
	any subsequent tensors allocated by the source process at the same address.

	If a receiver wants to save the data of `r_sample` for future use while
	letting the source process deallocate the original, it must
	`clone()` it.

	This behavior is very confusing, and we are tracking a fix for it
	at https://github.com/pytorch/pytorch/issues/16141

	Sharing strategies
	------------------

	This section provides a brief overview into how different sharing strategies
	work. Note that it applies only to CPU tensor - CUDA tensors will always use
	the CUDA API, as that's the only way they can be shared.

	File descriptor - ``file_descriptor``
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


	.. note::

	This is the default strategy (except for macOS and OS X where it's not
	supported).

	This strategy will use file descriptors as shared memory handles. Whenever a
	storage is moved to shared memory, a file descriptor obtained from ``shm_open``
	is cached with the object, and when it's going to be sent to other processes,
	the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
	receiver will also cache the file descriptor and ``mmap`` it, to obtain a shared
	view onto the storage data.

	Note that if there will be a lot of tensors shared, this strategy will keep a
	large number of file descriptors open most of the time. If your system has low
	limits for the number of open file descriptors, and you can't raise them, you
	should use the ``file_system`` strategy.

	File system - ``file_system``
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	This strategy will use file names given to ``shm_open`` to identify the shared
	memory regions. This has a benefit of not requiring the implementation to cache
	the file descriptors obtained from it, but at the same time is prone to shared
	memory leaks. The file can't be deleted right after its creation, because other
	processes need to access it to open their views. If the processes fatally
	crash, or are killed, and don't call the storage destructors, the files will
	remain in the system. This is very serious, because they keep using up the
	memory until the system is restarted, or they're freed manually.

	To counter the problem of shared memory file leaks, :mod:`torch.multiprocessing`
	will spawn a daemon named ``torch_shm_manager`` that will isolate itself from
	the current process group, and will keep track of all shared memory allocations.
	Once all processes connected to it exit, it will wait a moment to ensure there
	will be no new connections, and will iterate over all shared memory files
	allocated by the group. If it finds that any of them still exist, they will be
	deallocated. We've tested this method and it proved to be robust to various
	failures. Still, if your system has high enough limits, and ``file_descriptor``
	is a supported strategy, we do not recommend switching to this one.

	Spawning subprocesses
	---------------------

	.. note::

	Available for Python >= 3.4.

	This depends on the ``spawn`` start method in Python's
	``multiprocessing`` package.

	Spawning a number of subprocesses to perform some function can be done
	by creating ``Process`` instances and calling ``join`` to wait for
	their completion. This approach works fine when dealing with a single
	subprocess but presents potential issues when dealing with multiple
	processes.

	Namely, joining processes sequentially implies they will terminate
	sequentially. If they don't, and the first process does not terminate,
	the process termination will go unnoticed. Also, there are no native
	facilities for error propagation.

	The ``spawn`` function below addresses these concerns and takes care
	of error propagation, out of order termination, and will actively
	terminate processes upon detecting an error in one of them.

	.. autofunction:: spawn

	.. class:: SpawnContext

	Returned by :func:`~spawn` when called with ``join=False``.

	.. automethod:: join