[pytorch] add '__BASE__' section to op deps to factor out frequently used util ops (#37404) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37404 Many aten operators are really like util functions, e.g.: aten::is_nonzero, aten::is_floating_point, etc. These ops can be called via overloaded c++ operator, so seemingly trivial and innocent code changes can affect how these ops are used by other ops (thus changes the output of static analyzer). Most of these util ops are rather small in terms of build size cost, so for the purpose of optimizing binary size with custom build, whether to include these ops or not does not make significant difference. In fact for non-trivial models a set of these ops are almost always used. This PR introduced the (optional) '__BASE__' ops section to the dependency graph. We can maintain the list of frequently used small util ops for internal BUCK build. This way, the output dependency graph will only contain meaningful edges with significant binary size impact, and it will be more stable from trivial code changes (which is checked in FB codebase). Having a stable and sparse deps graph by factoring out frequently used based ops is also a nice property to allow us to explore alternative custom build solutions in case we find it hard to maintain the static code analyzer. Test Plan: Imported from OSS Differential Revision: D21280835 Pulled By: ljk53 fbshipit-source-id: c4d0d1f07ca868c60f23118d877fc1eeead4c875

commit: 8258d42bd0e89c6f011dc1232391fce34e1af594 [log] [tgz]
author: Jiakai Liu <liujiakai@fb.com> Tue Apr 28 17:12:37 2020 -0700
committer: Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com> Tue Apr 28 17:18:09 2020 -0700
tree: 9bed744b9ffaeb89098f3f35340cfafb693a21ee
parent: e0a5b443d6154c9ea1294d3da2343e69d0a8b4c1 [diff]
diff --git a/tools/code_analyzer/build.sh b/tools/code_analyzer/build.sh
index a7a1f06..d7a13c9 100755
--- a/tools/code_analyzer/build.sh
+++ b/tools/code_analyzer/build.sh

@@ -107,9 +107,18 @@
 
   DEST="${BUILD_ROOT}/pt_deps.bzl"
 
-  python -m tools.code_analyzer.op_deps_processor \
-    --op_dependency "${OUTPUT}" \
+  args=(
+    --op_dependency "${OUTPUT}"
     --output "${DEST}"
+  )
+
+  if [ -n "${BASE_OPS_FILE}" ] && [ -f "${BASE_OPS_FILE}" ]; then
+    args+=(
+      --base_ops $(< ${BASE_OPS_FILE})
+    )
+  fi
+
+  python -m tools.code_analyzer.op_deps_processor "${args[@]}"
 
   echo "Deployed file at: ${DEST}"
 }

diff --git a/tools/code_analyzer/gen_op_registration_whitelist.py b/tools/code_analyzer/gen_op_registration_whitelist.py
index fe4f2b1..5971864 100644
--- a/tools/code_analyzer/gen_op_registration_whitelist.py
+++ b/tools/code_analyzer/gen_op_registration_whitelist.py

@@ -42,6 +42,10 @@
     result = set(root_ops)
     queue = root_ops[:]
 
+    # The dependency graph might contain a special entry with key = `__BASE__`
+    # and value = (set of `base` ops to always include in custom build).
+    queue.append('__BASE__')
+
     # The dependency graph might contain a special entry with key = `__ROOT__`
     # and value = (set of ops reachable from C++ functions). Insert the special
     # `__ROOT__` key to include ops which can be called from C++ code directly,

diff --git a/tools/code_analyzer/op_deps_processor.py b/tools/code_analyzer/op_deps_processor.py
index f71aab4..3aa78dc 100644
--- a/tools/code_analyzer/op_deps_processor.py
+++ b/tools/code_analyzer/op_deps_processor.py

@@ -54,6 +54,19 @@
         return yaml.safe_load(stream)
 
 
+def process_base_ops(graph, base_ops):
+    # remove base ops from all `depends` lists to compress the output graph
+    for op in graph:
+        op['depends'] = [
+            dep for dep in op.get('depends', []) if dep['name'] not in base_ops
+        ]
+
+    # add base ops section at the beginning
+    graph.insert(0, {
+        'name': '__BASE__',
+        'depends': [{'name': name} for name in base_ops]})
+
+
 def convert(fname, graph, output_template, op_template, op_dep_template):
     ops = []
     for op in graph:
@@ -95,12 +108,23 @@
         default='bazel',
         help='output file format [bazel, dot]')
     parser.add_argument(
+        '--base_ops',
+        nargs='*',
+        help='optional list of `base` ops that should always be kept in '
+             'custom build, to make the output stable from trivial changes; '
+             'each item is `namespace`::`operator name` without overload; '
+             'e.g.: aten::empty aten::size ...')
+    parser.add_argument(
         '--output',
         required=True,
         help='output file')
     args = parser.parse_args()
 
     deps = load_op_deps(args.op_dependency)
+
+    if args.base_ops:
+        process_base_ops(deps, args.base_ops)
+
     if args.format == 'bazel':
         convert(args.output, deps, BAZEL_OUTPUT, BAZEL_OP, BAZEL_OP_DEP)
     elif args.format == 'dot':
commit	8258d42bd0e89c6f011dc1232391fce34e1af594	[log] [tgz]
author	Jiakai Liu <liujiakai@fb.com>	Tue Apr 28 17:12:37 2020 -0700
committer	Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>	Tue Apr 28 17:18:09 2020 -0700
tree	9bed744b9ffaeb89098f3f35340cfafb693a21ee
parent	e0a5b443d6154c9ea1294d3da2343e69d0a8b4c1 [diff]