|  | PyTorch Design Philosophy | 
|  | ========================= | 
|  |  | 
|  | This document is designed to help contributors and module maintainers | 
|  | understand the high-level design principles that have developed over | 
|  | time in PyTorch. These are not meant to be hard-and-fast rules, but to | 
|  | serve as a guide to help trade off different concerns and to resolve | 
|  | disagreements that may come up while developing PyTorch. For more | 
|  | information on contributing, module maintainership, and how to escalate a | 
|  | disagreement to the Core Maintainers, please see `PyTorch | 
|  | Governance <https://pytorch.org/docs/master/community/governance.html>`__. | 
|  |  | 
|  | Design Principles | 
|  | ----------------- | 
|  |  | 
|  | Principle 1: Usability over Performance | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | This principle may be surprising! As one Hacker News poster wrote: | 
|  | *PyTorch is amazing! [...] Although I’m confused. How can a ML framework be | 
|  | not obsessed with speed/performance?* See `Hacker News discussion on | 
|  | PyTorch <https://news.ycombinator.com/item?id=28066093>`__. | 
|  |  | 
|  | Soumith’s blog post on `Growing the PyTorch | 
|  | Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__ | 
|  | goes into this in some depth, but at a high-level: | 
|  |  | 
|  | -  PyTorch’s primary goal is usability | 
|  | -  A secondary goal is to have *reasonable* performance | 
|  |  | 
|  | We believe the ability to maintain our flexibility to support | 
|  | researchers who are building on top of our abstractions remains | 
|  | critical. We can’t see what the future of what workloads will be, but we | 
|  | know we want them to be built first on PyTorch and that requires | 
|  | flexibility. | 
|  |  | 
|  | In more concrete terms, we operate in a *usability-first* manner and try | 
|  | to avoid jumping to *restriction-first* regimes (for example, static shapes, | 
|  | graph-mode only) without a clear-eyed view of the tradeoffs. Often there | 
|  | is a temptation to impose strict user restrictions upfront because it | 
|  | can simplify implementation, but this comes with risks: | 
|  |  | 
|  | -  The performance may not be worth the user friction, either because | 
|  | the performance benefit is not compelling enough or it only applies to | 
|  | a relatively narrow set of subproblems. | 
|  | -  Even if the performance benefit is compelling, the restrictions can | 
|  | fragment the ecosystem into different sets of limitations that can | 
|  | quickly become incomprehensible to users. | 
|  |  | 
|  | We want users to be able to seamlessly move their PyTorch code to | 
|  | different hardware and software platforms, to interoperate with | 
|  | different libraries and frameworks, and to experience the full richness | 
|  | of the PyTorch user experience, not a least common denominator subset. | 
|  |  | 
|  | Principle 2: Simple Over Easy | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Here, we borrow from `The Zen of | 
|  | Python <https://peps.python.org/pep-0020/>`__: | 
|  |  | 
|  | -  *Explicit is better than implicit* | 
|  | -  *Simple is better than complex* | 
|  |  | 
|  | A more concise way of describing these two goals is `Simple Over | 
|  | Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are | 
|  | often used interchangeably in everyday English. Consider how one may | 
|  | model `devices <https://pytorch.org/docs/master/tensor_attributes.html#torch.device>`__ | 
|  | in PyTorch: | 
|  |  | 
|  | -  **Simple / Explicit (to understand, debug):** every tensor is associated | 
|  | with a device. The user explicitly specifies tensor device movement. | 
|  | Operations that require cross-device movement result in an error. | 
|  | -  **Easy / Implicit (to use):** the user does not have to worry about | 
|  | devices; the system figures out the globally optimal device | 
|  | placement. | 
|  |  | 
|  | In this specific case, and as a general design philosophy, PyTorch | 
|  | favors exposing simple and explicit building blocks rather than APIs | 
|  | that are easy-to-use by practitioners. The simple version is immediately | 
|  | understandable and debuggable by a new PyTorch user: you get a clear | 
|  | error if you call an operator requiring cross-device movement at the | 
|  | point in the program where the operator is actually invoked. The easy | 
|  | solution may let a new user move faster initially, but debugging such a | 
|  | system can be complex: How did the system make its determination? What | 
|  | is the API for plugging into such a system and how are objects | 
|  | represented in its IR? | 
|  |  | 
|  | Some classic arguments in favor of this sort of design come from `A | 
|  | Note on Distributed | 
|  | Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not | 
|  | model resources with very different performance characteristics | 
|  | uniformly, the details will leak) and the `End-to-End | 
|  | Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__ | 
|  | (TLDR: building smarts into the lower-layers of the stack can prevent | 
|  | building performant features at higher layers in the stack, and often | 
|  | doesn’t work anyway). For example, we could build operator-level or | 
|  | global device movement rules, but the precise choices aren’t obvious and | 
|  | building an extensible mechanism has unavoidable complexity and latency | 
|  | costs. | 
|  |  | 
|  | A caveat here is that this does not mean that higher-level “easy” APIs | 
|  | are not valuable; certainly there is a value in, for example, | 
|  | higher-levels in the stack to support efficient tensor computations | 
|  | across heterogeneous compute in a large cluster. Instead, what we mean | 
|  | is that focusing on simple lower-level building blocks helps inform the | 
|  | easy API while still maintaining a good experience when users need to | 
|  | leave the beaten path. It also allows space for innovation and the | 
|  | growth of more opinionated tools at a rate we cannot support in the | 
|  | PyTorch core library, but ultimately benefit from, as evidenced by | 
|  | our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other | 
|  | words, not automating at the start allows us to potentially reach levels | 
|  | of good automation faster. | 
|  |  | 
|  | Principle 3: Python First with Best In Class Language Interoperability | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | This principle began as **Python First**: | 
|  |  | 
|  | PyTorch is not a Python binding into a monolithic C++ framework. | 
|  | It is built to be deeply integrated into Python. You can use it | 
|  | naturally like you would use `NumPy <https://www.numpy.org/>`__, | 
|  | `SciPy <https://www.scipy.org/>`__, `scikit-learn <(https://scikit-learn.org/>`__, | 
|  | or other Python libraries. You can write your new neural network | 
|  | layers in Python itself, using your favorite libraries and use | 
|  | packages such as `Cython <https://cython.org/>`__ and | 
|  | `Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent | 
|  | the wheel where appropriate. | 
|  |  | 
|  | One thing PyTorch has needed to deal with over the years is Python | 
|  | overhead: we first rewrote the `autograd` engine in C++, then the majority | 
|  | of operator definitions, then developed TorchScript and the C++ | 
|  | frontend. | 
|  |  | 
|  | Still, working in Python provides easily the best experience for our | 
|  | users: it is flexible, familiar, and perhaps most importantly, has a | 
|  | huge ecosystem of scientific computing libraries and extensions | 
|  | available for use. This fact motivates a few of our most recent | 
|  | contributions, which attempt to hit a Pareto optimal point close to the | 
|  | Python usability end of the curve: | 
|  |  | 
|  | -  `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__, | 
|  | a Python frame evaluation tool capable of speeding up existing | 
|  | eager-mode PyTorch programs with minimal user intervention. | 
|  | -  `torch_function <https://pytorch.org/docs/master/notes/extending.html#extending-torch>`__ | 
|  | and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__ | 
|  | extension points, which have enabled Python-first functionality to be | 
|  | built on-top of C++ internals, such as the `torch.fx | 
|  | tracer <https://pytorch.org/docs/stable/fx.html>`__ | 
|  | and `functorch <https://github.com/pytorch/functorch>`__ | 
|  | respectively. | 
|  |  | 
|  | These design principles are not hard-and-fast rules, but hard won | 
|  | choices and anchor how we built PyTorch to be the debuggable, hackable | 
|  | and flexible framework it is today. As we have more contributors and | 
|  | maintainers, we look forward to applying these core principles with you | 
|  | across our libraries and ecosystem. We are also open to evolving them as | 
|  | we learn new things and the AI space evolves, as we know it will. |