Make RefBase more robust and debuggable

This prevents two different kinds of client errors from causing
undetected memory corruption, and helps with the detection of others:

1. We no longer deallocate objects when the weak count goes to zero
and there have been no strong references.  This otherwise causes
us to return a garbage object from a constructor if the constructor
allocates and deallocates a weak pointer to this. And we do know
that clients allocate such weak pointers in constructors and their
lifetime is hard to trace.

2. We abort if a RefBase object is explicitly destroyed while
the weak count is nonzero.  Otherwise a subsequent decrement
would cause a write to potentially reallocated memory.

3. We check counter values returned by atomic decrements for
plausibility, and fail immediately if they are not plausible.

We unconditionally log any cases in which 1 changes behavior
from before. We abort in cases in which 2 changes behavior, since
those reflect clear bugs.
In case 1, a log message now indicates a possible leak. We have
not seen such a message in practice.

The third point introduces a small amount of overhead into the
reference count decrement path. But this should be negligible
compared to the actual decrement cost.

Add a test for promote/attemptIncStrong that tries to check for
both (1) above and concurrent operation of attemptIncStrong.

Add some additional warnings and explanations to the RefBase
documentation.

Bug: 30503444
Bug: 30292291
Bug: 30292538

Change-Id: Ida92b9a2e247f543a948a75d221fbc0038dea66c
diff --git a/include/utils/RefBase.h b/include/utils/RefBase.h
index a232a65..3c318c4 100644
--- a/include/utils/RefBase.h
+++ b/include/utils/RefBase.h
@@ -105,16 +105,14 @@
 
 // Other more specific restrictions for wp<> and sp<>:
 
-// Constructing a strong or weak pointer to "this" in its constructors is almost
-// always wrong.  In the case of strong pointers. it is always wrong with RefBase
-// because the onFirstRef() callback will be mode on an incompletely constructed
-// object. In either case, it is wrong if such a pointer does not outlive the
-// constructor, since destruction of the smart pointer will attempt to destroy the
-// object before construction is finished, normally resulting in a pointer to a
-// destroyed object being returned from a new expression.
-
-// In the case of weak pointers, this occurs because an object that has never been
-// referenced by a strong pointer is destroyed when the last weak pointer disappears.
+// Do not construct a strong pointer to "this" in an object's constructor.
+// The onFirstRef() callback would be made on an incompletely constructed
+// object.
+// Construction of a weak pointer to "this" in an object's constructor is also
+// discouraged. But the implementation was recently changed so that, in the
+// absence of extendObjectLifetime() calls, weak pointers no longer impact
+// object lifetime, and hence this no longer risks premature deallocation,
+// and hence usually works correctly.
 
 // Such strong or weak pointers can be safely created in the RefBase onFirstRef()
 // callback.
@@ -126,8 +124,23 @@
 // is a longer-lived sp<>, why not use an sp<> directly?) A wp<> should only be
 // dereferenced by using promote().
 
+// Any object inheriting from RefBase should always be destroyed as the result
+// of a reference count decrement, not via any other means.  Such objects
+// should never be stack allocated, or appear directly as data members in other
+// objects. Objects inheriting from RefBase should have their strong reference
+// count incremented as soon as possible after construction. Usually this
+// will be done via construction of an sp<> to the object, but may instead
+// involve other means of calling RefBase::incStrong().
 // Explicitly deleting or otherwise destroying a RefBase object with outstanding
-// wp<> or sp<> pointers to it will result in heap corruption.
+// wp<> or sp<> pointers to it will result in an abort or heap corruption.
+
+// It is particularly important not to mix sp<> and direct storage management
+// since the sp from raw pointer constructor is implicit. Thus if a RefBase-
+// -derived object of type T is managed without ever incrementing its strong
+// count, and accidentally passed to f(sp<T>), a strong pointer to the object
+// will be temporarily constructed and destroyed, prematurely deallocating the
+// object, and resulting in heap corruption. None of this would be easily
+// visible in the source.
 
 // Extra Features:
 
@@ -144,7 +157,7 @@
 // events, as well as some debugging facilities.
 
 // Debugging support can be enabled by turning on DEBUG_REFS in RefBase.cpp.
-// Otherwise essentially no checking is provided.
+// Otherwise little checking is provided.
 
 // Thread safety:
 
diff --git a/libutils/RefBase.cpp b/libutils/RefBase.cpp
index df49a2f..fee9984 100644
--- a/libutils/RefBase.cpp
+++ b/libutils/RefBase.cpp
@@ -84,15 +84,16 @@
 //
 // A weakref_impl is allocated as the value of mRefs in a RefBase object on
 // construction.
-// In the OBJECT_LIFETIME_STRONG case, it is deallocated in the RefBase
-// destructor iff the strong reference count was never incremented. The
-// destructor can be invoked either from decStrong, or from decWeak if there
-// was never a strong reference. If the reference count had been incremented,
-// it is deallocated directly in decWeak, and hence still lives as long as
-// the last weak reference.
-// In the OBJECT_LIFETIME_WEAK case, it is always deallocated from the RefBase
-// destructor, which is always invoked by decWeak. DecStrong explicitly avoids
-// the deletion in this case.
+// In the OBJECT_LIFETIME_STRONG case, it is normally deallocated in decWeak,
+// and hence lives as long as the last weak reference. (It can also be
+// deallocated in the RefBase destructor iff the strong reference count was
+// never incremented and the weak count is zero, e.g.  if the RefBase object is
+// explicitly destroyed without decrementing the strong count.  This should be
+// avoided.) In this case, the RefBase destructor should be invoked from
+// decStrong.
+// In the OBJECT_LIFETIME_WEAK case, the weakref_impl is always deallocated in
+// the RefBase destructor, which is always invoked by decWeak. DecStrong
+// explicitly avoids the deletion in this case.
 //
 // Memory ordering:
 // The client must ensure that every inc() call, together with all other
@@ -126,6 +127,19 @@
 
 #define INITIAL_STRONG_VALUE (1<<28)
 
+#define MAX_COUNT 0xfffff
+
+// Test whether the argument is a clearly invalid strong reference count.
+// Used only for error checking on the value before an atomic decrement.
+// Intended to be very cheap.
+// Note that we cannot just check for excess decrements by comparing to zero
+// since the object would be deallocated before that.
+#define BAD_STRONG(c) \
+        ((c) == 0 || ((c) & (~(MAX_COUNT | INITIAL_STRONG_VALUE))) != 0)
+
+// Same for weak counts.
+#define BAD_WEAK(c) ((c) == 0 || ((c) & (~MAX_COUNT)) != 0)
+
 // ---------------------------------------------------------------------------
 
 class RefBase::weakref_impl : public RefBase::weakref_type
@@ -421,15 +435,15 @@
 #if PRINT_REFS
     ALOGD("decStrong of %p from %p: cnt=%d\n", this, id, c);
 #endif
-    ALOG_ASSERT(c >= 1, "decStrong() called on %p too many times", refs);
+    LOG_ALWAYS_FATAL_IF(BAD_STRONG(c), "decStrong() called on %p too many times",
+            refs);
     if (c == 1) {
         std::atomic_thread_fence(std::memory_order_acquire);
         refs->mBase->onLastStrongRef(id);
         int32_t flags = refs->mFlags.load(std::memory_order_relaxed);
         if ((flags&OBJECT_LIFETIME_MASK) == OBJECT_LIFETIME_STRONG) {
             delete this;
-            // Since mStrong had been incremented, the destructor did not
-            // delete refs.
+            // The destructor does not delete refs in this case.
         }
     }
     // Note that even with only strong reference operations, the thread
@@ -492,7 +506,8 @@
     weakref_impl* const impl = static_cast<weakref_impl*>(this);
     impl->removeWeakRef(id);
     const int32_t c = impl->mWeak.fetch_sub(1, std::memory_order_release);
-    ALOG_ASSERT(c >= 1, "decWeak called on %p too many times", this);
+    LOG_ALWAYS_FATAL_IF(BAD_WEAK(c), "decWeak called on %p too many times",
+            this);
     if (c != 1) return;
     atomic_thread_fence(std::memory_order_acquire);
 
@@ -500,13 +515,19 @@
     if ((flags&OBJECT_LIFETIME_MASK) == OBJECT_LIFETIME_STRONG) {
         // This is the regular lifetime case. The object is destroyed
         // when the last strong reference goes away. Since weakref_impl
-        // outlive the object, it is not destroyed in the dtor, and
+        // outlives the object, it is not destroyed in the dtor, and
         // we'll have to do it here.
         if (impl->mStrong.load(std::memory_order_relaxed)
                 == INITIAL_STRONG_VALUE) {
-            // Special case: we never had a strong reference, so we need to
-            // destroy the object now.
-            delete impl->mBase;
+            // Decrementing a weak count to zero when object never had a strong
+            // reference.  We assume it acquired a weak reference early, e.g.
+            // in the constructor, and will eventually be properly destroyed,
+            // usually via incrementing and decrementing the strong count.
+            // Thus we no longer do anything here.  We log this case, since it
+            // seems to be extremely rare, and should not normally occur. We
+            // used to deallocate mBase here, so this may now indicate a leak.
+            ALOGW("RefBase: Object at %p lost last weak reference "
+                    "before it had a strong reference", impl->mBase);
         } else {
             // ALOGV("Freeing refs %p of old RefBase %p\n", this, impl->mBase);
             delete impl;
@@ -675,25 +696,28 @@
 
 RefBase::~RefBase()
 {
-    if (mRefs->mStrong.load(std::memory_order_relaxed)
+    int32_t flags = mRefs->mFlags.load(std::memory_order_relaxed);
+    // Life-time of this object is extended to WEAK, in
+    // which case weakref_impl doesn't out-live the object and we
+    // can free it now.
+    if ((flags & OBJECT_LIFETIME_MASK) == OBJECT_LIFETIME_WEAK) {
+        // It's possible that the weak count is not 0 if the object
+        // re-acquired a weak reference in its destructor
+        if (mRefs->mWeak.load(std::memory_order_relaxed) == 0) {
+            delete mRefs;
+        }
+    } else if (mRefs->mStrong.load(std::memory_order_relaxed)
             == INITIAL_STRONG_VALUE) {
         // We never acquired a strong reference on this object.
-        // We assume there are no outstanding weak references.
+        LOG_ALWAYS_FATAL_IF(mRefs->mWeak.load() != 0,
+                "RefBase: Explicit destruction with non-zero weak "
+                "reference count");
+        // TODO: Always report if we get here. Currently MediaMetadataRetriever
+        // C++ objects are inconsistently managed and sometimes get here.
+        // There may be other cases, but we believe they should all be fixed.
         delete mRefs;
-    } else {
-        // life-time of this object is extended to WEAK, in
-        // which case weakref_impl doesn't out-live the object and we
-        // can free it now.
-        int32_t flags = mRefs->mFlags.load(std::memory_order_relaxed);
-        if ((flags & OBJECT_LIFETIME_MASK) != OBJECT_LIFETIME_STRONG) {
-            // It's possible that the weak count is not 0 if the object
-            // re-acquired a weak reference in its destructor
-            if (mRefs->mWeak.load(std::memory_order_relaxed) == 0) {
-                delete mRefs;
-            }
-        }
     }
-    // for debugging purposes, clear this.
+    // For debugging purposes, clear mRefs.  Ineffective against outstanding wp's.
     const_cast<weakref_impl*&>(mRefs) = NULL;
 }
 
diff --git a/libutils/tests/RefBase_test.cpp b/libutils/tests/RefBase_test.cpp
index 224c2ca..2e0cf6e 100644
--- a/libutils/tests/RefBase_test.cpp
+++ b/libutils/tests/RefBase_test.cpp
@@ -87,7 +87,7 @@
     EXPECT_EQ(1, foo->getWeakRefs()->getWeakCount());
     ASSERT_FALSE(isDeleted) << "deleted too early! still has a reference!";
     wp1 = nullptr;
-    ASSERT_TRUE(isDeleted) << "foo2 was leaked!";
+    ASSERT_FALSE(isDeleted) << "Deletion on wp destruction should no longer occur";
 }
 
 
@@ -121,8 +121,33 @@
 
 cpu_set_t otherCpus;
 
+// Divide the cpus we're allowed to run on into myCpus and otherCpus.
+// Set origCpus to the processors we were originally allowed to run on.
+// Return false if origCpus doesn't include at least processors 0 and 1.
+static bool setExclusiveCpus(cpu_set_t* origCpus /* out */,
+        cpu_set_t* myCpus /* out */, cpu_set_t* otherCpus) {
+    if (sched_getaffinity(0, sizeof(cpu_set_t), origCpus) != 0) {
+        return false;
+    }
+    if (!CPU_ISSET(0,  origCpus) || !CPU_ISSET(1, origCpus)) {
+        return false;
+    }
+    CPU_ZERO(myCpus);
+    CPU_ZERO(otherCpus);
+    CPU_OR(myCpus, myCpus, origCpus);
+    CPU_OR(otherCpus, otherCpus, origCpus);
+    for (unsigned i = 0; i < CPU_SETSIZE; ++i) {
+        // I get the even cores, the other thread gets the odd ones.
+        if (i & 1) {
+            CPU_CLR(i, myCpus);
+        } else {
+            CPU_CLR(i, otherCpus);
+        }
+    }
+    return true;
+}
+
 static void visit2AndRemove() {
-    EXPECT_TRUE(CPU_ISSET(1,  &otherCpus));
     if (sched_setaffinity(0, sizeof(cpu_set_t), &otherCpus) != 0) {
         FAIL() << "setaffinity returned:" << errno;
     }
@@ -139,27 +164,10 @@
     cpu_set_t myCpus;
     // Restrict us and the helper thread to disjoint cpu sets.
     // This prevents us from getting scheduled against each other,
-    // which would be atrociously slow.  We fail if that's impossible.
-    if (sched_getaffinity(0, sizeof(cpu_set_t), &origCpus) != 0) {
-        FAIL();
-    }
-    EXPECT_TRUE(CPU_ISSET(0,  &origCpus));
-    if (CPU_ISSET(1, &origCpus)) {
-        CPU_ZERO(&myCpus);
-        CPU_ZERO(&otherCpus);
-        CPU_OR(&myCpus, &myCpus, &origCpus);
-        CPU_OR(&otherCpus, &otherCpus, &origCpus);
-        for (unsigned i = 0; i < CPU_SETSIZE; ++i) {
-            // I get the even cores, the other thread gets the odd ones.
-            if (i & 1) {
-                CPU_CLR(i, &myCpus);
-            } else {
-                CPU_CLR(i, &otherCpus);
-            }
-        }
+    // which would be atrociously slow.
+    if (setExclusiveCpus(&origCpus, &myCpus, &otherCpus)) {
         std::thread t(visit2AndRemove);
         std::atomic<int> deleteCount(0);
-        EXPECT_TRUE(CPU_ISSET(0,  &myCpus));
         if (sched_setaffinity(0, sizeof(cpu_set_t), &myCpus) != 0) {
             FAIL() << "setaffinity returned:" << errno;
         }
@@ -182,3 +190,69 @@
         ASSERT_EQ(NITERS, deleteCount) << "Deletions missed!";
     }  // Otherwise this is slow and probably pointless on a uniprocessor.
 }
+
+static wp<Bar> wpBuffer;
+static std::atomic<bool> wpBufferFull(false);
+
+// Wait until wpBufferFull has value val.
+static inline void wpWaitFor(bool val) {
+    while (wpBufferFull != val) {}
+}
+
+static void visit3AndRemove() {
+    if (sched_setaffinity(0, sizeof(cpu_set_t), &otherCpus) != 0) {
+        FAIL() << "setaffinity returned:" << errno;
+    }
+    for (int i = 0; i < NITERS; ++i) {
+        wpWaitFor(true);
+        {
+            sp<Bar> sp1 = wpBuffer.promote();
+            // We implicitly check that sp1 != NULL
+            sp1->mVisited2 = true;
+        }
+        wpBuffer = nullptr;
+        wpBufferFull = false;
+    }
+}
+
+TEST(RefBase, RacingPromotions) {
+    cpu_set_t origCpus;
+    cpu_set_t myCpus;
+    // Restrict us and the helper thread to disjoint cpu sets.
+    // This prevents us from getting scheduled against each other,
+    // which would be atrociously slow.
+    if (setExclusiveCpus(&origCpus, &myCpus, &otherCpus)) {
+        std::thread t(visit3AndRemove);
+        std::atomic<int> deleteCount(0);
+        if (sched_setaffinity(0, sizeof(cpu_set_t), &myCpus) != 0) {
+            FAIL() << "setaffinity returned:" << errno;
+        }
+        for (int i = 0; i < NITERS; ++i) {
+            Bar* bar = new Bar(&deleteCount);
+            wp<Bar> wp1(bar);
+            bar->mVisited1 = true;
+            if (i % (NITERS / 10) == 0) {
+                // Do this rarely, since it generates a log message.
+                wp1 = nullptr;  // No longer destroys the object.
+                wp1 = bar;
+            }
+            wpBuffer = wp1;
+            ASSERT_EQ(bar->getWeakRefs()->getWeakCount(), 2);
+            wpBufferFull = true;
+            // Promotion races with that in visit3AndRemove.
+            // This may or may not succeed, but it shouldn't interfere with
+            // the concurrent one.
+            sp<Bar> sp1 = wp1.promote();
+            wpWaitFor(false);  // Waits for other thread to drop strong pointer.
+            sp1 = nullptr;
+            // No strong pointers here.
+            sp1 = wp1.promote();
+            ASSERT_EQ(sp1.get(), nullptr) << "Dead wp promotion succeeded!";
+        }
+        t.join();
+        if (sched_setaffinity(0, sizeof(cpu_set_t), &origCpus) != 0) {
+            FAIL();
+        }
+        ASSERT_EQ(NITERS, deleteCount) << "Deletions missed!";
+    }  // Otherwise this is slow and probably pointless on a uniprocessor.
+}