Lock unpickling of source ranges
Summary:
The source is shared across all threads running the torchscript
interpreter, so if several threads encounter errors at once, they will all race
to unpickle the source, leading to memory corruption.
Test Plan:
Model 217993215_0 is the problematic model; I wasn't able to repro
the crash with requests stored in Hive, but I could easily by adding my
devserver (SMC tier predictor.bertrand) as a shadow tier to the model's tier
(inference_platform.predictor_model.prod.bi.217993215_latest). (i.e., set
shadow_tier property to predictor.bertrand=1 to proxy 1% of traffic).
With this diff, the ASAN/TSAN errors go away.
Reviewed By: suo
Differential Revision: D31044009
fbshipit-source-id: 56f9ef3880e7cf09f334db71b4256e362b4de965
diff --git a/torch/csrc/jit/serialization/source_range_serialization.cpp b/torch/csrc/jit/serialization/source_range_serialization.cpp
index 6a30107..1e69cfa 100644
--- a/torch/csrc/jit/serialization/source_range_serialization.cpp
+++ b/torch/csrc/jit/serialization/source_range_serialization.cpp
@@ -107,6 +107,7 @@
unpickled_records(nullptr) {}
void ConcreteSourceRangeUnpickler::unpickle() {
+ std::lock_guard<std::mutex> guard(mutex);
if (unpickled_records) {
return;
}
diff --git a/torch/csrc/jit/serialization/source_range_serialization_impl.h b/torch/csrc/jit/serialization/source_range_serialization_impl.h
index 7f6e418..2b7cd5a 100644
--- a/torch/csrc/jit/serialization/source_range_serialization_impl.h
+++ b/torch/csrc/jit/serialization/source_range_serialization_impl.h
@@ -21,6 +21,7 @@
void unpickle();
+ std::mutex mutex;
std::shared_ptr<SourceRangeDeserializer> deserializer;
std::shared_ptr<SourceRangeRecords> unpickled_records;
};