offline_all_gpu_experiment

Summary:
similar to sparse_nn all gpu, this is our first step towards offline full gpu experiment.

**Compare Run**
cat(128, 32)512-512 :
GPU 21138598 https://fburl.com/jpeod1pi
CPU 21138787 https://fburl.com/vma7225l

Reviewed By: dzhulgakov

Differential Revision: D5308789

fbshipit-source-id: 413819bf9c5fff125d6967ed48faa5c7b3d6fa85
diff --git a/caffe2/python/layers/tags.py b/caffe2/python/layers/tags.py
index 559a2ee..a3e4c7f 100644
--- a/caffe2/python/layers/tags.py
+++ b/caffe2/python/layers/tags.py
@@ -37,6 +37,7 @@
     HANDLE_AS_SPARSE_LAYER = 'handle_as_sparse_layer'
     GRADIENT_FROM_PS = 'gradient_from_ps'
     PREFER_GPU = 'prefer_gpu'
+    CPU_ONLY = 'cpu_only'
 
     # In certain cases we want to have different schema for training and
     # prediction, as an example in prediction we might need to have only