Vulkan: Optimize sync followed by swap

Previously, inserting a sync object immediately caused a submission.
That was done in
https://chromium-review.googlesource.com/c/angle/angle/+/3200274 to be
able to wait until the sync object is signaled without having to wait
for whatever is recorded after it until a flush naturally happens.

Some applications issue a glFenceSync right before eglSwapBuffers.  The
submission incurred by glFenceSync disallowed the optimizations that
eglSwapBuffers would have done, leading to performance degradations.
This could have been avoided if glFenceSync was issued right after
eglSwapBuffers, but that's not the case with a number of applications.

In this change, when a fence is inserted:

- For EGL sync objects, a submission is issued regardless
- For GL sync objects, a submission is issued if there is no render pass
  open
- For GL sync objects, the submission is deferred if there is an open
  render pass.  This is done by marking the render pass closed, and
  flagging the context as having a deferred flash.

If the context that issued the fence sync issues another draw call, the
render pass is naturally closed and the submission is performed.

If the context that issued the fence sync causes a submission, it would
have a chance to modify the render pass before doing so.  For example,
it could apply swapchain optimizations before swapping, or add a resolve
attachment for blit.

If the context that issued the fence sync doesn't cause a submission
before another context tries to access it (get status, wait, etc), the
other context will flush its render pass and cause a submission on its
behalf.  This is possible because the deferral of submission is done
only for GL sync objects, and those are only accessible by other
contexts in the same share group.

Bug: angleproject:7379
Change-Id: I3dd1c1bfd575206d730dd9ee2e33ba2254318521
Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/3695520
Reviewed-by: Charlie Lao <cclao@google.com>
Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Reviewed-by: Jamie Madill <jmadill@chromium.org>
diff --git a/src/libANGLE/renderer/vulkan/ContextVk.cpp b/src/libANGLE/renderer/vulkan/ContextVk.cpp
index fa0f1c1..6d102ec 100644
--- a/src/libANGLE/renderer/vulkan/ContextVk.cpp
+++ b/src/libANGLE/renderer/vulkan/ContextVk.cpp
@@ -585,6 +585,8 @@
      "Render pass closed due to sync object client wait"},
     {RenderPassClosureReason::SyncObjectServerWait,
      "Render pass closed due to sync object server wait"},
+    {RenderPassClosureReason::SyncObjectGetStatus,
+     "Render pass closed due to sync object get status"},
     {RenderPassClosureReason::XfbPause, "Render pass closed due to transform feedback pause"},
     {RenderPassClosureReason::FramebufferFetchEmulation,
      "Render pass closed due to framebuffer fetch emulation"},
@@ -3946,6 +3948,10 @@
                                                       vk::PresentMode presentMode,
                                                       bool *imageResolved)
 {
+    // Note: mRenderPassCommandBuffer may be nullptr because the render pass is marked for closure.
+    // That doesn't matter and the render pass can continue to be modified.  This function shouldn't
+    // rely on mRenderPassCommandBuffer.
+
     if (!mRenderPassCommands->started())
     {
         return angle::Result::Continue;
@@ -7100,6 +7106,61 @@
     return angle::Result::Continue;
 }
 
+angle::Result ContextVk::onSyncObjectInit(vk::SyncHelper *syncHelper, bool isEGLSyncObject)
+{
+    const bool isRenderPassStarted = mRenderPassCommands->started();
+
+    if (isRenderPassStarted)
+    {
+        mRenderPassCommands->retainResource(syncHelper);
+    }
+    else
+    {
+        mOutsideRenderPassCommands->retainResource(syncHelper);
+    }
+
+    // Submit the commands:
+    //
+    // - This breaks the current render pass to ensure the proper ordering of the sync object in the
+    //   commands,
+    // - The sync object has a valid serial when it's waited on later,
+    // - After waiting on the sync object, every resource that's used so far (and is being synced)
+    //   will also be aware that it's finished (based on the serial) and won't incur a further wait
+    //   (for example when a buffer is mapped).
+    //
+    // The submission is done immediately for EGL sync objects, and when no render pass is open.  If
+    // a render pass is open, the submission is deferred.  This is done to be able to optimize
+    // scenarios such as sync object init followed by eglSwapBuffers() (that would otherwise incur
+    // another submission, as well as not being able to optimize the render-to-swapchain render
+    // pass).
+    if (isEGLSyncObject || !isRenderPassStarted)
+    {
+        return flushImpl(nullptr, RenderPassClosureReason::SyncObjectInit);
+    }
+
+    onRenderPassFinished(RenderPassClosureReason::SyncObjectInit);
+
+    // Mark the context as having a deffered flush.  This is later used to close the render pass and
+    // cause a submission in this context if another context wants to wait on the fence while the
+    // original context never issued a submission naturally.  Note that this also takes care of
+    // contexts that think they issued a submission (through glFlush) but that the submission got
+    // deferred (due to the deferFlushUntilEndRenderPass feature).
+    mHasDeferredFlush = true;
+
+    return angle::Result::Continue;
+}
+
+angle::Result ContextVk::flushCommandsAndEndRenderPassIfDeferredSyncInit(
+    RenderPassClosureReason reason)
+{
+    if (!mHasDeferredFlush)
+    {
+        return angle::Result::Continue;
+    }
+
+    return flushCommandsAndEndRenderPassImpl(QueueSubmitType::PerformQueueSubmit, reason);
+}
+
 void ContextVk::addCommandBufferDiagnostics(const std::string &commandBufferDiagnostics)
 {
     mCommandBufferDiagnostics.push_back(commandBufferDiagnostics);
diff --git a/src/libANGLE/renderer/vulkan/ContextVk.h b/src/libANGLE/renderer/vulkan/ContextVk.h
index 422dabf..805c63c 100644
--- a/src/libANGLE/renderer/vulkan/ContextVk.h
+++ b/src/libANGLE/renderer/vulkan/ContextVk.h
@@ -29,6 +29,11 @@
 
 namespace rx
 {
+namespace vk
+{
+class SyncHelper;
+}  // namespace vk
+
 class ProgramExecutableVk;
 class RendererVk;
 class WindowSurfaceVk;
@@ -659,6 +664,14 @@
 
     angle::Result syncExternalMemory();
 
+    // Either issue a submission or defer it when a sync object is initialized.  If deferred, a
+    // submission will have to be incurred during client wait.
+    angle::Result onSyncObjectInit(vk::SyncHelper *syncHelper, bool isEGLSyncObject);
+    // Called when a sync object is waited on while its submission was deffered in onSyncObjectInit.
+    // It's a no-op if this context doesn't have a pending submission.  Note that due to
+    // mHasDeferredFlush being set, flushing the render pass leads to a submission automatically.
+    angle::Result flushCommandsAndEndRenderPassIfDeferredSyncInit(RenderPassClosureReason reason);
+
     void addCommandBufferDiagnostics(const std::string &commandBufferDiagnostics);
 
     VkIndexType getVkIndexType(gl::DrawElementsType glIndexType) const;
@@ -1210,6 +1223,9 @@
                                                DirtyBits dirtyBitMask,
                                                RenderPassClosureReason reason);
 
+    // Mark the render pass to be closed on the next draw call.  The render pass is not actually
+    // closed and can be restored with restoreFinishedRenderPass if necessary, for example to append
+    // a resolve attachment.
     void onRenderPassFinished(RenderPassClosureReason reason);
 
     void initIndexTypeMap();
diff --git a/src/libANGLE/renderer/vulkan/FenceNVVk.cpp b/src/libANGLE/renderer/vulkan/FenceNVVk.cpp
index 282cc78..2ffa8e2 100644
--- a/src/libANGLE/renderer/vulkan/FenceNVVk.cpp
+++ b/src/libANGLE/renderer/vulkan/FenceNVVk.cpp
@@ -36,7 +36,7 @@
 {
     ContextVk *contextVk = vk::GetImpl(context);
     bool signaled        = false;
-    ANGLE_TRY(mFenceSync.getStatus(contextVk, &signaled));
+    ANGLE_TRY(mFenceSync.getStatus(contextVk, contextVk, &signaled));
 
     ASSERT(outFinished);
     *outFinished = signaled ? GL_TRUE : GL_FALSE;
diff --git a/src/libANGLE/renderer/vulkan/SyncVk.cpp b/src/libANGLE/renderer/vulkan/SyncVk.cpp
index 2bb7796..c648848 100644
--- a/src/libANGLE/renderer/vulkan/SyncVk.cpp
+++ b/src/libANGLE/renderer/vulkan/SyncVk.cpp
@@ -80,24 +80,10 @@
 
 void SyncHelper::releaseToRenderer(RendererVk *renderer) {}
 
-angle::Result SyncHelper::initialize(ContextVk *contextVk, bool isEglSyncObject)
+angle::Result SyncHelper::initialize(ContextVk *contextVk, bool isEGLSyncObject)
 {
     ASSERT(!mUse.getSerial().valid());
-
-    // Submit the commands:
-    //
-    // - This breaks the current render pass to ensure the proper ordering of the sync object in the
-    //   commands,
-    // - The sync object has a valid serial when it's waited on later,
-    // - After waiting on the sync object, every resource that's used so far (and is being synced)
-    //   will also be aware that it's finished (based on the serial) and won't incur a further wait
-    //   (for example when a buffer is mapped).
-    //
-    ResourceUseList resourceUseList;
-    retain(&resourceUseList);
-    contextVk->getShareGroupVk()->acquireResourceUseList(std::move(resourceUseList));
-
-    return contextVk->flushImpl(nullptr, RenderPassClosureReason::SyncObjectInit);
+    return contextVk->onSyncObjectInit(this, isEGLSyncObject);
 }
 
 angle::Result SyncHelper::clientWait(Context *context,
@@ -110,7 +96,7 @@
 
     // If the event is already set, don't wait
     bool alreadySignaled = false;
-    ANGLE_TRY(getStatus(context, &alreadySignaled));
+    ANGLE_TRY(getStatus(context, contextVk, &alreadySignaled));
     if (alreadySignaled)
     {
         *outResult = VK_EVENT_SET;
@@ -124,9 +110,14 @@
         return angle::Result::Continue;
     }
 
-    // We always flush when a sync object is created, so they should always have a valid Serial
-    // when being waited on.
-    ASSERT(mUse.getSerial().valid() && !usedInRecordedCommands());
+    // Submit commands if requested
+    if (flushCommands && contextVk)
+    {
+        ANGLE_TRY(contextVk->flushCommandsAndEndRenderPassIfDeferredSyncInit(
+            RenderPassClosureReason::SyncObjectClientWait));
+    }
+    // Submit commands if it was deferred on the context that issued the sync object
+    ANGLE_TRY(submitSyncIfDeferred(contextVk, RenderPassClosureReason::SyncObjectClientWait));
 
     VkResult status = VK_SUCCESS;
     ANGLE_TRY(renderer->waitForSerialWithUserTimeout(context, mUse.getSerial(), timeout, &status));
@@ -143,6 +134,9 @@
 
 angle::Result SyncHelper::serverWait(ContextVk *contextVk)
 {
+    // Submit commands if it was deferred on the context that issued the sync object
+    ANGLE_TRY(submitSyncIfDeferred(contextVk, RenderPassClosureReason::SyncObjectClientWait));
+
     // Every resource already tracks its usage and issues the appropriate barriers, so there's
     // really nothing to do here.  An execution barrier is issued to strictly satisfy what the
     // application asked for.
@@ -154,15 +148,59 @@
     return angle::Result::Continue;
 }
 
-angle::Result SyncHelper::getStatus(Context *context, bool *signaled) const
+angle::Result SyncHelper::getStatus(Context *context, ContextVk *contextVk, bool *signaled)
 {
-    ASSERT(mUse.getSerial().valid() && !usedInRecordedCommands());
+    // Submit commands if it was deferred on the context that issued the sync object
+    ANGLE_TRY(submitSyncIfDeferred(contextVk, RenderPassClosureReason::SyncObjectClientWait));
 
     ANGLE_TRY(context->getRenderer()->checkCompletedCommands(context));
     *signaled = !isCurrentlyInUse(context->getRenderer()->getLastCompletedQueueSerial());
     return angle::Result::Continue;
 }
 
+angle::Result SyncHelper::submitSyncIfDeferred(ContextVk *contextVk, RenderPassClosureReason reason)
+{
+    if (mUse.getSerial().valid())
+    {
+        ASSERT(!usedInRecordedCommands());
+        return angle::Result::Continue;
+    }
+
+    // The submission of a sync object may be deferred to allow further optimizations to an open
+    // render pass before a submission happens for another reason.  If the sync object is being
+    // waited on by the current context, the application must have used GL_SYNC_FLUSH_COMMANDS_BIT.
+    // However, when waited on by other contexts, the application must have ensured the original
+    // context is flushed.  Due to the deferFlushUntilEndRenderPass feature, a glFlush is not
+    // sufficient to guarantee this.
+    //
+    // Deferring the submission is restricted to non-EGL sync objects, so it's sufficient to ensure
+    // that the contexts in the share group issue their deferred flushes.  Technically only the
+    // context that issued the sync object needs a flush, but practically it would be rare for more
+    // than one context to have flushes deferred at this time.  If necessary, the context could be
+    // queried to know whether it's the one retaining the sync object, so only that would be
+    // flushed.
+
+    // Cannot reach here from EGL syncs, because serial should already be valid.
+    ASSERT(contextVk != nullptr);
+
+    const ContextVkSet &shareContextSet = contextVk->getShareGroupVk()->getContexts();
+    for (ContextVk *ctx : shareContextSet)
+    {
+        ANGLE_TRY(ctx->flushCommandsAndEndRenderPassIfDeferredSyncInit(reason));
+
+        // If this was the context that issued the fence sync, no need to go over the other
+        // contexts.
+        if (mUse.getSerial().valid())
+        {
+            break;
+        }
+    }
+
+    ASSERT(mUse.getSerial().valid() && !usedInRecordedCommands());
+
+    return angle::Result::Continue;
+}
+
 SyncHelperNativeFence::SyncHelperNativeFence() : mNativeFenceFd(kInvalidFenceFd) {}
 
 SyncHelperNativeFence::~SyncHelperNativeFence()
@@ -264,7 +302,7 @@
 
     // If already signaled, don't wait
     bool alreadySignaled = false;
-    ANGLE_TRY(getStatus(context, &alreadySignaled));
+    ANGLE_TRY(getStatus(context, contextVk, &alreadySignaled));
     if (alreadySignaled)
     {
         *outResult = VK_SUCCESS;
@@ -330,7 +368,9 @@
     return angle::Result::Continue;
 }
 
-angle::Result SyncHelperNativeFence::getStatus(Context *context, bool *signaled) const
+angle::Result SyncHelperNativeFence::getStatus(Context *context,
+                                               ContextVk *contextVk,
+                                               bool *signaled)
 {
     // We've got a serial, check if the serial is still in use
     if (mUse.getSerial().valid())
@@ -429,7 +469,7 @@
 {
     ContextVk *contextVk = vk::GetImpl(context);
     bool signaled        = false;
-    ANGLE_TRY(mSyncHelper.getStatus(contextVk, &signaled));
+    ANGLE_TRY(mSyncHelper.getStatus(contextVk, contextVk, &signaled));
 
     *outResult = signaled ? GL_SIGNALED : GL_UNSIGNALED;
     return angle::Result::Continue;
@@ -538,7 +578,7 @@
 egl::Error EGLSyncVk::getStatus(const egl::Display *display, EGLint *outStatus)
 {
     bool signaled = false;
-    if (mSyncHelper->getStatus(vk::GetImpl(display), &signaled) == angle::Result::Stop)
+    if (mSyncHelper->getStatus(vk::GetImpl(display), nullptr, &signaled) == angle::Result::Stop)
     {
         return egl::Error(EGL_BAD_ALLOC);
     }
diff --git a/src/libANGLE/renderer/vulkan/SyncVk.h b/src/libANGLE/renderer/vulkan/SyncVk.h
index c3274c3..d3fe0ab 100644
--- a/src/libANGLE/renderer/vulkan/SyncVk.h
+++ b/src/libANGLE/renderer/vulkan/SyncVk.h
@@ -38,18 +38,21 @@
 
     virtual void releaseToRenderer(RendererVk *renderer);
 
-    virtual angle::Result initialize(ContextVk *contextVk, bool isEglSyncObject);
+    virtual angle::Result initialize(ContextVk *contextVk, bool isEGLSyncObject);
     virtual angle::Result clientWait(Context *context,
                                      ContextVk *contextVk,
                                      bool flushCommands,
                                      uint64_t timeout,
                                      VkResult *outResult);
     virtual angle::Result serverWait(ContextVk *contextVk);
-    virtual angle::Result getStatus(Context *context, bool *signaled) const;
+    virtual angle::Result getStatus(Context *context, ContextVk *contextVk, bool *signaled);
     virtual angle::Result dupNativeFenceFD(Context *context, int *fdOut) const
     {
         return angle::Result::Stop;
     }
+
+  private:
+    angle::Result submitSyncIfDeferred(ContextVk *contextVk, RenderPassClosureReason reason);
 };
 
 // Implementation of sync types: EGLSync(EGL_SYNC_ANDROID_NATIVE_FENCE_ANDROID).
@@ -68,7 +71,7 @@
                              uint64_t timeout,
                              VkResult *outResult) override;
     angle::Result serverWait(ContextVk *contextVk) override;
-    angle::Result getStatus(Context *context, bool *signaled) const override;
+    angle::Result getStatus(Context *context, ContextVk *contextVk, bool *signaled) override;
     angle::Result dupNativeFenceFD(Context *context, int *fdOut) const override;
 
   private:
diff --git a/src/libANGLE/renderer/vulkan/vk_utils.h b/src/libANGLE/renderer/vulkan/vk_utils.h
index 264ce4c..c8f2755 100644
--- a/src/libANGLE/renderer/vulkan/vk_utils.h
+++ b/src/libANGLE/renderer/vulkan/vk_utils.h
@@ -1505,6 +1505,7 @@
     SyncObjectWithFdInit,
     SyncObjectClientWait,
     SyncObjectServerWait,
+    SyncObjectGetStatus,
 
     // Closures that ANGLE could have avoided, but doesn't for simplicity or optimization of more
     // common cases.
diff --git a/src/tests/BUILD.gn b/src/tests/BUILD.gn
index 6d3d4d5..d1e02a7 100644
--- a/src/tests/BUILD.gn
+++ b/src/tests/BUILD.gn
@@ -109,6 +109,7 @@
     ]
     sources = [
       "$angle_root/third_party/renderdoc/src/renderdoc_app.h",
+      "test_utils/MultiThreadSteps.cpp",
       "test_utils/MultiThreadSteps.h",
       "test_utils/RenderDoc.cpp",
       "test_utils/RenderDoc.h",
diff --git a/src/tests/angle_end2end_tests_expectations.txt b/src/tests/angle_end2end_tests_expectations.txt
index 5a01d04..6006412 100644
--- a/src/tests/angle_end2end_tests_expectations.txt
+++ b/src/tests/angle_end2end_tests_expectations.txt
@@ -410,6 +410,10 @@
 7213 PIXEL4ORXL GLES : BufferDataTestES3.BufferDataWithNullFollowedByMap/* = SKIP
 7265 PIXEL4ORXL GLES : PbufferTest.BindTexImageAfterTexImage/* = SKIP
 5981 PIXEL4ORXL GLES : ComputeShaderTest.DrawDispatchImageReadDraw/* = SKIP
+7414 PIXEL4ORXL GLES : MultithreadingTest.CreateFenceThreadAClientWaitSyncThreadBDelayedFlush/* = SKIP
+7414 PIXEL4ORXL GLES : MultithreadingTestES3.ThreadB*BeforeThreadASync* = SKIP
+7414 PIXEL4ORXL GLES : MultithreadingTestES3.ThreadCWaitBeforeThreadBSyncFinish/* = SKIP
+7414 PIXEL4ORXL GLES : EGLMultiContextTest.ThreadB*BeforeThreadASync* = SKIP
 
 5946 PIXEL4ORXL VULKAN : TransformFeedbackTestES32.PrimitivesWrittenAndGenerated/* = SKIP
 5947 PIXEL4ORXL VULKAN : FramebufferFetchES31.DrawFetchBlitDrawFetch_NonCoherent/* = SKIP
diff --git a/src/tests/egl_tests/EGLMultiContextTest.cpp b/src/tests/egl_tests/EGLMultiContextTest.cpp
index 10cb584..b144120 100644
--- a/src/tests/egl_tests/EGLMultiContextTest.cpp
+++ b/src/tests/egl_tests/EGLMultiContextTest.cpp
@@ -13,6 +13,7 @@
 #include "test_utils/angle_test_configs.h"
 #include "test_utils/gl_raii.h"
 #include "util/EGLWindow.h"
+#include "util/test_utils.h"
 
 using namespace angle;
 
@@ -59,18 +60,18 @@
         EGLint count         = 0;
         EGLint clientVersion = EGL_OPENGL_ES3_BIT;
         EGLint attribs[]     = {EGL_RED_SIZE,
-                            8,
-                            EGL_GREEN_SIZE,
-                            8,
-                            EGL_BLUE_SIZE,
-                            8,
-                            EGL_ALPHA_SIZE,
-                            8,
-                            EGL_RENDERABLE_TYPE,
-                            clientVersion,
-                            EGL_SURFACE_TYPE,
-                            EGL_WINDOW_BIT | EGL_PBUFFER_BIT,
-                            EGL_NONE};
+                                8,
+                                EGL_GREEN_SIZE,
+                                8,
+                                EGL_BLUE_SIZE,
+                                8,
+                                EGL_ALPHA_SIZE,
+                                8,
+                                EGL_RENDERABLE_TYPE,
+                                clientVersion,
+                                EGL_SURFACE_TYPE,
+                                EGL_WINDOW_BIT | EGL_PBUFFER_BIT,
+                                EGL_NONE};
 
         result = eglChooseConfig(dpy, attribs, config, 1, &count);
         EXPECT_EGL_TRUE(result && (count > 0));
@@ -103,6 +104,19 @@
         return result;
     }
 
+    enum class FenceTest
+    {
+        ClientWait,
+        ServerWait,
+        GetStatus,
+    };
+    enum class FlushMethod
+    {
+        Flush,
+        Finish,
+    };
+    void testFenceWithOpenRenderPass(FenceTest test, FlushMethod flushMethod);
+
     EGLContext mContexts[2];
     GLuint mTexture;
 };
@@ -382,6 +396,174 @@
         thread.join();
     }
 }
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.  Note that only validatity of the fence operations are tested here.  The test could
+// potentially be enhanced with EGL images similarly to how
+// MultithreadingTestES3::testFenceWithOpenRenderPass tests correctness of synchronization through
+// a shared texture.
+void EGLMultiContextTest::testFenceWithOpenRenderPass(FenceTest test, FlushMethod flushMethod)
+{
+    ANGLE_SKIP_TEST_IF(!platformSupportsMultithreading());
+
+    constexpr uint32_t kWidth  = 100;
+    constexpr uint32_t kHeight = 200;
+
+    EGLSyncKHR sync = EGL_NO_SYNC_KHR;
+
+    std::mutex mutex;
+    std::condition_variable condVar;
+
+    enum class Step
+    {
+        Start,
+        Thread0CreateFence,
+        Thread1WaitFence,
+        Finish,
+        Abort,
+    };
+    Step currentStep = Step::Start;
+
+    auto thread0 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
+        ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
+
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Start));
+
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
+
+        // Issue a draw
+        ANGLE_GL_PROGRAM(program, essl1_shaders::vs::Simple(), essl1_shaders::fs::Red());
+        drawQuad(program, essl1_shaders::PositionAttrib(), 0.0f);
+        ASSERT_GL_NO_ERROR();
+
+        // Issue a fence.  A render pass is currently open, but it should be closed in the Vulkan
+        // backend.
+        sync = eglCreateSyncKHR(dpy, EGL_SYNC_FENCE_KHR, nullptr);
+        EXPECT_NE(sync, EGL_NO_SYNC_KHR);
+
+        // Wait for thread 1 to wait on it.
+        threadSynchronization.nextStep(Step::Thread0CreateFence);
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread1WaitFence));
+
+        // Wait a little to give thread 1 time to wait on the sync object before flushing it.
+        angle::Sleep(500);
+        switch (flushMethod)
+        {
+            case FlushMethod::Flush:
+                glFlush();
+                break;
+            case FlushMethod::Finish:
+                glFinish();
+                break;
+        }
+
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Finish));
+
+        EXPECT_PIXEL_RECT_EQ(0, 0, kWidth, kHeight, GLColor::red);
+
+        // Clean up
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+    };
+
+    auto thread1 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
+        ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
+
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
+
+        // Wait for thread 0 to create the fence object.
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread0CreateFence));
+
+        // Test access to the fence object
+        threadSynchronization.nextStep(Step::Thread1WaitFence);
+
+        constexpr GLuint64 kTimeout = 2'000'000'000;  // 2 seconds
+        EGLint result               = EGL_CONDITION_SATISFIED_KHR;
+        switch (test)
+        {
+            case FenceTest::ClientWait:
+                result = eglClientWaitSyncKHR(dpy, sync, 0, kTimeout);
+                break;
+            case FenceTest::ServerWait:
+                ASSERT_TRUE(eglWaitSyncKHR(dpy, sync, 0));
+                break;
+            case FenceTest::GetStatus:
+            {
+                EGLint value;
+                EXPECT_EGL_TRUE(eglGetSyncAttribKHR(dpy, sync, EGL_SYNC_STATUS_KHR, &value));
+                if (value != EGL_SIGNALED_KHR)
+                {
+                    result = eglClientWaitSyncKHR(dpy, sync, 0, kTimeout);
+                }
+                break;
+            }
+        }
+        ASSERT_TRUE(result == EGL_CONDITION_SATISFIED_KHR);
+
+        // Issue a draw
+        ANGLE_GL_PROGRAM(program, essl1_shaders::vs::Simple(), essl1_shaders::fs::Green());
+        drawQuad(program, essl1_shaders::PositionAttrib(), 0.0f);
+        ASSERT_GL_NO_ERROR();
+
+        EXPECT_PIXEL_RECT_EQ(0, 0, kWidth, kHeight, GLColor::green);
+
+        // Clean up
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+
+        threadSynchronization.nextStep(Step::Finish);
+    };
+
+    std::array<LockStepThreadFunc, 2> threadFuncs = {
+        std::move(thread0),
+        std::move(thread1),
+    };
+
+    RunLockStepThreads(getEGLWindow(), threadFuncs.size(), threadFuncs.data());
+
+    ASSERT_NE(currentStep, Step::Abort);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(EGLMultiContextTest, ThreadBClientWaitBeforeThreadASyncFlush)
+{
+    testFenceWithOpenRenderPass(FenceTest::ClientWait, FlushMethod::Flush);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(EGLMultiContextTest, ThreadBServerWaitBeforeThreadASyncFlush)
+{
+    testFenceWithOpenRenderPass(FenceTest::ServerWait, FlushMethod::Flush);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(EGLMultiContextTest, ThreadBGetStatusBeforeThreadASyncFlush)
+{
+    testFenceWithOpenRenderPass(FenceTest::GetStatus, FlushMethod::Flush);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(EGLMultiContextTest, ThreadBClientWaitBeforeThreadASyncFinish)
+{
+    testFenceWithOpenRenderPass(FenceTest::ClientWait, FlushMethod::Finish);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(EGLMultiContextTest, ThreadBServerWaitBeforeThreadASyncFinish)
+{
+    testFenceWithOpenRenderPass(FenceTest::ServerWait, FlushMethod::Finish);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(EGLMultiContextTest, ThreadBGetStatusBeforeThreadASyncFinish)
+{
+    testFenceWithOpenRenderPass(FenceTest::GetStatus, FlushMethod::Finish);
+}
+
 }  // anonymous namespace
 
 GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(EGLMultiContextTest);
diff --git a/src/tests/gl_tests/MultithreadingTest.cpp b/src/tests/gl_tests/MultithreadingTest.cpp
index ca07055..8fa62c5 100644
--- a/src/tests/gl_tests/MultithreadingTest.cpp
+++ b/src/tests/gl_tests/MultithreadingTest.cpp
@@ -155,7 +155,20 @@
         }
     }
 
-    std::mutex mutex;
+    enum class FenceTest
+    {
+        ClientWait,
+        ServerWait,
+        GetStatus,
+    };
+    enum class FlushMethod
+    {
+        Flush,
+        Finish,
+    };
+    void testFenceWithOpenRenderPass(FenceTest test, FlushMethod flushMethod);
+
+    std::mutex mMutex;
     GLuint mTexture2D;
     std::atomic<bool> mExitThread;
     std::atomic<bool> mDrawGreen;  // Toggle drawing green or red
@@ -491,8 +504,7 @@
             }
 
             while (barrier < kThreadCount)
-            {
-            }
+            {}
 
             {
                 EXPECT_TRUE(eglDestroyContext(dpy, contexts[threadIdx]));
@@ -556,7 +568,7 @@
     // Draw something
     while (!mExitThread)
     {
-        std::lock_guard<decltype(mutex)> lock(mutex);
+        std::lock_guard<decltype(mMutex)> lock(mMutex);
 
         if (mMainThreadSyncObj != nullptr)
         {
@@ -579,7 +591,7 @@
             if (useDraw)
             {
                 glBindFramebuffer(GL_FRAMEBUFFER, fbo);
-                drawQuad(greenProgram.get(), std::string(essl1_shaders::PositionAttrib()), 0.0f);
+                drawQuad(greenProgram, essl1_shaders::PositionAttrib(), 0.0f);
             }
             else
             {
@@ -593,7 +605,7 @@
             if (useDraw)
             {
                 glBindFramebuffer(GL_FRAMEBUFFER, fbo);
-                drawQuad(redProgram.get(), std::string(essl1_shaders::PositionAttrib()), 0.0f);
+                drawQuad(redProgram, essl1_shaders::PositionAttrib(), 0.0f);
             }
             else
             {
@@ -643,7 +655,7 @@
     {
         for (int draws = 0; draws < kNumDraws;)
         {
-            std::lock_guard<decltype(mutex)> lock(mutex);
+            std::lock_guard<decltype(mMutex)> lock(mMutex);
 
             if (mSecondThreadSyncObj != nullptr)
             {
@@ -663,7 +675,7 @@
             glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
             glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
             glUseProgram(texProgram);
-            drawQuad(texProgram.get(), std::string(essl1_shaders::PositionAttrib()), 0.0f);
+            drawQuad(texProgram, essl1_shaders::PositionAttrib(), 0.0f);
 
             ASSERT_EQ(mMainThreadSyncObj.load(), nullptr);
             mMainThreadSyncObj = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
@@ -790,30 +802,9 @@
 {
     ANGLE_SKIP_TEST_IF(!platformSupportsMultithreading());
     ANGLE_SKIP_TEST_IF(!hasFenceSyncExtension() || !hasGLSyncExtension());
-    // TODO: Fails on Pixel 4 with OpenGLES backend.
-    ANGLE_SKIP_TEST_IF(IsAndroid() && IsOpenGLES());
 
-    EGLWindow *window = getEGLWindow();
-    EGLDisplay dpy    = window->getDisplay();
-    EGLConfig config  = window->getConfig();
-    EGLSurface surface;
-    EGLContext context;
-    constexpr EGLint kPBufferSize = 256;
-    // Initialize the pbuffer and context
-    EGLint pbufferAttributes[] = {
-        EGL_WIDTH, kPBufferSize, EGL_HEIGHT, kPBufferSize, EGL_NONE, EGL_NONE,
-    };
-
-    // Create 2 surfaces, one for each thread
-    surface = eglCreatePbufferSurface(dpy, config, pbufferAttributes);
-    EXPECT_EGL_SUCCESS();
-    // Create 2 shared contexts, one for each thread
-    context = window->createContext(EGL_NO_CONTEXT, nullptr);
-    EXPECT_NE(EGL_NO_CONTEXT, context);
-    // Sync object
     EGLSyncKHR sync = EGL_NO_SYNC_KHR;
 
-    // Synchronization tools to ensure the two threads are interleaved as designed by this test.
     std::mutex mutex;
     std::condition_variable condVar;
 
@@ -829,12 +820,12 @@
     };
     Step currentStep = Step::Start;
 
-    std::thread thread0 = std::thread([&]() {
+    auto thread0 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
         ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
 
         ASSERT_TRUE(threadSynchronization.waitForStep(Step::Start));
 
-        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
 
         // Do work.
         glClearColor(1.0, 0.0, 0.0, 1.0);
@@ -845,14 +836,14 @@
         ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread1CreateFence));
 
         // Wait on the sync object, but do *not* flush it, since the other thread will flush.
-        constexpr GLuint64 kTimeout = 2'000'000'000;  // 1 second
+        constexpr GLuint64 kTimeout = 2'000'000'000;  // 2 seconds
         threadSynchronization.nextStep(Step::Thread0ClientWaitSync);
         ASSERT_EQ(EGL_CONDITION_SATISFIED_KHR, eglClientWaitSyncKHR(dpy, sync, 0, kTimeout));
 
         ASSERT_TRUE(threadSynchronization.waitForStep(Step::Finish));
-    });
+    };
 
-    std::thread thread1 = std::thread([&]() {
+    auto thread1 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
         ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
 
         // Wait for thread 0 to clear.
@@ -879,20 +870,333 @@
         EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
 
         threadSynchronization.nextStep(Step::Finish);
-    });
+    };
 
-    thread0.join();
-    thread1.join();
+    std::array<LockStepThreadFunc, 2> threadFuncs = {
+        std::move(thread0),
+        std::move(thread1),
+    };
 
-    // Clean up
-    if (surface != EGL_NO_SURFACE)
+    RunLockStepThreads(getEGLWindow(), threadFuncs.size(), threadFuncs.data());
+
+    ASSERT_NE(currentStep, Step::Abort);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+void MultithreadingTestES3::testFenceWithOpenRenderPass(FenceTest test, FlushMethod flushMethod)
+{
+    ANGLE_SKIP_TEST_IF(!platformSupportsMultithreading());
+    ANGLE_SKIP_TEST_IF(!hasFenceSyncExtension() || !hasGLSyncExtension());
+
+    constexpr uint32_t kWidth  = 100;
+    constexpr uint32_t kHeight = 200;
+
+    GLsync sync    = 0;
+    GLuint texture = 0;
+
+    std::mutex mutex;
+    std::condition_variable condVar;
+
+    enum class Step
     {
-        eglDestroySurface(dpy, surface);
-    }
-    if (context != EGL_NO_CONTEXT)
+        Start,
+        Thread0CreateFence,
+        Thread1WaitFence,
+        Finish,
+        Abort,
+    };
+    Step currentStep = Step::Start;
+
+    auto thread0 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
+        ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
+
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Start));
+
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
+
+        // Create a shared texture to test synchronization
+        GLTexture color;
+        texture = color;
+
+        glBindTexture(GL_TEXTURE_2D, texture);
+        glTexStorage2D(GL_TEXTURE_2D, 1, GL_RGBA8, kWidth, kHeight);
+
+        GLFramebuffer fbo;
+        glBindFramebuffer(GL_FRAMEBUFFER, fbo);
+        glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texture, 0);
+
+        // Draw to shared texture.
+        ANGLE_GL_PROGRAM(program, essl1_shaders::vs::Simple(), essl1_shaders::fs::Red());
+        drawQuad(program, essl1_shaders::PositionAttrib(), 0.0f);
+        ASSERT_GL_NO_ERROR();
+
+        // Issue a fence.  A render pass is currently open, so the fence is not actually submitted
+        // in the Vulkan backend.
+        sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
+        ASSERT_NE(sync, nullptr);
+
+        // Wait for thread 1 to wait on it.
+        threadSynchronization.nextStep(Step::Thread0CreateFence);
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread1WaitFence));
+
+        // Wait a little to give thread 1 time to wait on the sync object before flushing it.
+        angle::Sleep(500);
+        switch (flushMethod)
+        {
+            case FlushMethod::Flush:
+                glFlush();
+                break;
+            case FlushMethod::Finish:
+                glFinish();
+                break;
+        }
+
+        // Clean up
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Finish));
+    };
+
+    auto thread1 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
+        ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
+
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
+
+        // Wait for thread 0 to create the fence object.
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread0CreateFence));
+
+        // Test access to the fence object
+        threadSynchronization.nextStep(Step::Thread1WaitFence);
+
+        constexpr GLuint64 kTimeout = 2'000'000'000;  // 2 seconds
+        GLenum result               = GL_CONDITION_SATISFIED;
+        switch (test)
+        {
+            case FenceTest::ClientWait:
+                result = glClientWaitSync(sync, 0, kTimeout);
+                break;
+            case FenceTest::ServerWait:
+                glWaitSync(sync, 0, GL_TIMEOUT_IGNORED);
+                break;
+            case FenceTest::GetStatus:
+            {
+                GLint value;
+                glGetSynciv(sync, GL_SYNC_STATUS, 1, nullptr, &value);
+                if (value != GL_SIGNALED)
+                {
+                    result = glClientWaitSync(sync, 0, kTimeout);
+                }
+                break;
+            }
+        }
+        ASSERT_TRUE(result == GL_CONDITION_SATISFIED || result == GL_ALREADY_SIGNALED);
+
+        // Verify the shared texture is drawn to.
+        glBindTexture(GL_TEXTURE_2D, texture);
+
+        GLFramebuffer fbo;
+        glBindFramebuffer(GL_FRAMEBUFFER, fbo);
+        glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texture, 0);
+
+        EXPECT_PIXEL_RECT_EQ(0, 0, kWidth, kHeight, GLColor::red);
+
+        // Clean up
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+
+        threadSynchronization.nextStep(Step::Finish);
+    };
+
+    std::array<LockStepThreadFunc, 2> threadFuncs = {
+        std::move(thread0),
+        std::move(thread1),
+    };
+
+    RunLockStepThreads(getEGLWindow(), threadFuncs.size(), threadFuncs.data());
+
+    ASSERT_NE(currentStep, Step::Abort);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(MultithreadingTestES3, ThreadBClientWaitBeforeThreadASyncFlush)
+{
+    testFenceWithOpenRenderPass(FenceTest::ClientWait, FlushMethod::Flush);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(MultithreadingTestES3, ThreadBServerWaitBeforeThreadASyncFlush)
+{
+    testFenceWithOpenRenderPass(FenceTest::ServerWait, FlushMethod::Flush);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(MultithreadingTestES3, ThreadBGetStatusBeforeThreadASyncFlush)
+{
+    testFenceWithOpenRenderPass(FenceTest::GetStatus, FlushMethod::Flush);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(MultithreadingTestES3, ThreadBClientWaitBeforeThreadASyncFinish)
+{
+    testFenceWithOpenRenderPass(FenceTest::ClientWait, FlushMethod::Finish);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(MultithreadingTestES3, ThreadBServerWaitBeforeThreadASyncFinish)
+{
+    testFenceWithOpenRenderPass(FenceTest::ServerWait, FlushMethod::Finish);
+}
+
+// Test that thread B can wait on thread A's sync before thread A flushes it, and wakes up after
+// that.
+TEST_P(MultithreadingTestES3, ThreadBGetStatusBeforeThreadASyncFinish)
+{
+    testFenceWithOpenRenderPass(FenceTest::GetStatus, FlushMethod::Finish);
+}
+
+// Test the following scenario:
+//
+// - Thread A opens a render pass, and flushes it.  In the Vulkan backend, this may make the flush
+//   deferred.
+// - Thread B opens a render pass and creates a fence.  In the Vulkan backend, this also defers the
+//   flush.
+// - Thread C waits on fence
+//
+// In the Vulkan backend, submission of the fence is implied by thread C's wait, and thread A may
+// also be flushed as collateral.  If the fence's serial is updated based on thread A's submission,
+// synchronization between B and C would be broken.
+TEST_P(MultithreadingTestES3, ThreadCWaitBeforeThreadBSyncFinish)
+{
+    ANGLE_SKIP_TEST_IF(!platformSupportsMultithreading());
+    ANGLE_SKIP_TEST_IF(!hasFenceSyncExtension() || !hasGLSyncExtension());
+
+    constexpr uint32_t kWidth  = 100;
+    constexpr uint32_t kHeight = 200;
+
+    GLsync sync    = 0;
+    GLuint texture = 0;
+
+    std::mutex mutex;
+    std::condition_variable condVar;
+
+    enum class Step
     {
-        eglDestroyContext(dpy, context);
-    }
+        Start,
+        Thread0DrawAndFlush,
+        Thread1CreateFence,
+        Thread2WaitFence,
+        Finish,
+        Abort,
+    };
+    Step currentStep = Step::Start;
+
+    auto thread0 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
+        ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
+
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Start));
+
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
+
+        // Open a render pass and flush it.
+        ANGLE_GL_PROGRAM(program, essl1_shaders::vs::Simple(), essl1_shaders::fs::Green());
+        drawQuad(program, essl1_shaders::PositionAttrib(), 0.0f);
+        glFlush();
+        ASSERT_GL_NO_ERROR();
+
+        threadSynchronization.nextStep(Step::Thread0DrawAndFlush);
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Finish));
+
+        // Clean up
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+    };
+
+    auto thread1 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
+        ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
+
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Start));
+
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
+
+        // Wait for thread 0 to set up
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread0DrawAndFlush));
+
+        // Create a shared texture to test synchronization
+        GLTexture color;
+        texture = color;
+
+        glBindTexture(GL_TEXTURE_2D, texture);
+        glTexStorage2D(GL_TEXTURE_2D, 1, GL_RGBA8, kWidth, kHeight);
+
+        GLFramebuffer fbo;
+        glBindFramebuffer(GL_FRAMEBUFFER, fbo);
+        glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texture, 0);
+
+        // Draw to shared texture.
+        ANGLE_GL_PROGRAM(program, essl1_shaders::vs::Simple(), essl1_shaders::fs::Red());
+        drawQuad(program, essl1_shaders::PositionAttrib(), 0.0f);
+        ASSERT_GL_NO_ERROR();
+
+        // Issue a fence.  A render pass is currently open, so the fence is not actually submitted
+        // in the Vulkan backend.
+        sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
+        ASSERT_NE(sync, nullptr);
+
+        // Wait for thread 1 to wait on it.
+        threadSynchronization.nextStep(Step::Thread1CreateFence);
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread2WaitFence));
+
+        // Wait a little to give thread 1 time to wait on the sync object before flushing it.
+        angle::Sleep(500);
+        glFlush();
+
+        // Clean up
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Finish));
+    };
+
+    auto thread2 = [&](EGLDisplay dpy, EGLSurface surface, EGLContext context) {
+        ThreadSynchronization<Step> threadSynchronization(&currentStep, &mutex, &condVar);
+
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, surface, surface, context));
+
+        // Wait for thread 0 to create the fence object.
+        ASSERT_TRUE(threadSynchronization.waitForStep(Step::Thread1CreateFence));
+
+        // Test access to the fence object
+        threadSynchronization.nextStep(Step::Thread2WaitFence);
+
+        constexpr GLuint64 kTimeout = 2'000'000'000;  // 2 seconds
+        GLenum result               = glClientWaitSync(sync, 0, kTimeout);
+        ASSERT_TRUE(result == GL_CONDITION_SATISFIED || result == GL_ALREADY_SIGNALED);
+
+        // Verify the shared texture is drawn to.
+        glBindTexture(GL_TEXTURE_2D, texture);
+
+        GLFramebuffer fbo;
+        glBindFramebuffer(GL_FRAMEBUFFER, fbo);
+        glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texture, 0);
+
+        EXPECT_PIXEL_RECT_EQ(0, 0, kWidth, kHeight, GLColor::red);
+
+        // Clean up
+        EXPECT_EGL_TRUE(eglMakeCurrent(dpy, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT));
+
+        threadSynchronization.nextStep(Step::Finish);
+    };
+
+    std::array<LockStepThreadFunc, 3> threadFuncs = {
+        std::move(thread0),
+        std::move(thread1),
+        std::move(thread2),
+    };
+
+    RunLockStepThreads(getEGLWindow(), threadFuncs.size(), threadFuncs.data());
 
     ASSERT_NE(currentStep, Step::Abort);
 }
diff --git a/src/tests/gl_tests/VulkanPerformanceCounterTest.cpp b/src/tests/gl_tests/VulkanPerformanceCounterTest.cpp
index c8582b2..567c731 100644
--- a/src/tests/gl_tests/VulkanPerformanceCounterTest.cpp
+++ b/src/tests/gl_tests/VulkanPerformanceCounterTest.cpp
@@ -5773,6 +5773,41 @@
     eglDestroyImageKHR(window->getDisplay(), image);
 }
 
+// Test that post-render-pass-to-swapchain glFenceSync followed by eglSwapBuffers incurs only a
+// single submission.
+TEST_P(VulkanPerformanceCounterTest, FenceThenSwapBuffers)
+{
+    ANGLE_SKIP_TEST_IF(!IsGLExtensionEnabled(kPerfMonitorExtensionName));
+    initANGLEFeatures();
+
+    angle::VulkanPerfCounters expected;
+
+    // Expect rpCount+1, depth(Clears+0, Loads+0, LoadNones+0, Stores+0, StoreNones+0),
+    setExpectedCountersForDepthOps(getPerfCounters(), 1, 0, 0, 0, 0, 0, &expected);
+    expected.vkQueueSubmitCallsTotal = getPerfCounters().vkQueueSubmitCallsTotal + 1;
+
+    // Start a render pass and render to the surface.  Enable depth write so the depth/stencil image
+    // is written to.
+    glEnable(GL_DEPTH_TEST);
+    glDepthFunc(GL_ALWAYS);
+
+    ANGLE_GL_PROGRAM(program, essl1_shaders::vs::Simple(), essl1_shaders::fs::Red());
+    drawQuad(program, essl1_shaders::PositionAttrib(), 0.0f);
+    ASSERT_GL_NO_ERROR();
+
+    // Issue a fence
+    glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
+
+    EXPECT_EQ(getPerfCounters().renderPasses, expected.renderPasses);
+
+    // Swap buffers.  The depth/stencil attachment's storeOp should be optimized to DONT_CARE.  This
+    // would not have been possible if the previous glFenceSync caused a submission.
+    swapBuffers();
+
+    EXPECT_EQ(getPerfCounters().vkQueueSubmitCallsTotal, expected.vkQueueSubmitCallsTotal);
+    compareDepthOpCounters(getPerfCounters(), expected);
+}
+
 ANGLE_INSTANTIATE_TEST(VulkanPerformanceCounterTest, ES3_VULKAN(), ES3_VULKAN_SWIFTSHADER());
 ANGLE_INSTANTIATE_TEST(VulkanPerformanceCounterTest_ES31, ES31_VULKAN(), ES31_VULKAN_SWIFTSHADER());
 ANGLE_INSTANTIATE_TEST(VulkanPerformanceCounterTest_MSAA, ES3_VULKAN(), ES3_VULKAN_SWIFTSHADER());
diff --git a/src/tests/test_utils/MultiThreadSteps.cpp b/src/tests/test_utils/MultiThreadSteps.cpp
new file mode 100644
index 0000000..eb5bb84
--- /dev/null
+++ b/src/tests/test_utils/MultiThreadSteps.cpp
@@ -0,0 +1,60 @@
+//
+// Copyright 2022 The ANGLE Project Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+//
+// MultiThreadSteps.cpp:
+//   Synchronization help for tests that use multiple threads.
+
+#include "MultiThreadSteps.h"
+
+#include "gtest/gtest.h"
+#include "util/EGLWindow.h"
+
+namespace angle
+{
+
+void RunLockStepThreads(EGLWindow *window, size_t threadCount, LockStepThreadFunc threadFuncs[])
+{
+    EGLDisplay dpy   = window->getDisplay();
+    EGLConfig config = window->getConfig();
+
+    constexpr EGLint kPBufferSize = 256;
+    // Initialize the pbuffer and context
+    EGLint pbufferAttributes[] = {
+        EGL_WIDTH, kPBufferSize, EGL_HEIGHT, kPBufferSize, EGL_NONE, EGL_NONE,
+    };
+
+    std::vector<EGLSurface> surfaces(threadCount);
+    std::vector<EGLContext> contexts(threadCount);
+
+    // Create N surfaces and shared contexts, one for each thread
+    for (size_t threadIndex = 0; threadIndex < threadCount; ++threadIndex)
+    {
+        surfaces[threadIndex] = eglCreatePbufferSurface(dpy, config, pbufferAttributes);
+        EXPECT_EQ(eglGetError(), EGL_SUCCESS);
+        contexts[threadIndex] =
+            window->createContext(threadIndex == 0 ? EGL_NO_CONTEXT : contexts[0], nullptr);
+        EXPECT_NE(EGL_NO_CONTEXT, contexts[threadIndex]) << threadIndex;
+    }
+
+    std::vector<std::thread> threads(threadCount);
+
+    // Run the threads
+    for (size_t threadIndex = 0; threadIndex < threadCount; ++threadIndex)
+    {
+        threads[threadIndex] = std::thread(std::move(threadFuncs[threadIndex]), dpy,
+                                           surfaces[threadIndex], contexts[threadIndex]);
+    }
+
+    // Wait for them to finish
+    for (size_t threadIndex = 0; threadIndex < threadCount; ++threadIndex)
+    {
+        threads[threadIndex].join();
+
+        // Clean up
+        eglDestroySurface(dpy, surfaces[threadIndex]);
+        eglDestroyContext(dpy, contexts[threadIndex]);
+    }
+}
+}  // namespace angle
diff --git a/src/tests/test_utils/MultiThreadSteps.h b/src/tests/test_utils/MultiThreadSteps.h
index 6dfc241..ac8bfac 100644
--- a/src/tests/test_utils/MultiThreadSteps.h
+++ b/src/tests/test_utils/MultiThreadSteps.h
@@ -3,17 +3,23 @@
 // Use of this source code is governed by a BSD-style license that can be
 // found in the LICENSE file.
 //
-// EGLMultiContextTest.cpp:
+// MultiThreadSteps.h:
 //   Synchronization help for tests that use multiple threads.
 
+#include "gl_raii.h"
+
 #include <atomic>
 #include <condition_variable>
+#include <functional>
 #include <mutex>
 #include <thread>
 
+class EGLWindow;
+
+namespace angle
+{
 namespace
 {
-
 // The following class is used by tests that need multiple threads that coordinate their actions
 // via an enum of "steps".  This enum is the template type E.  The enum must have at least the
 // following values:
@@ -73,7 +79,7 @@
             std::unique_lock<std::mutex> lock(*mMutex);
             *mCurrentStep = newStep;
         }
-        mCondVar->notify_one();
+        mCondVar->notify_all();
     }
 
   private:
@@ -82,3 +88,7 @@
     std::condition_variable *mCondVar;
 };
 }  // anonymous namespace
+
+using LockStepThreadFunc = std::function<void(EGLDisplay, EGLSurface, EGLContext)>;
+void RunLockStepThreads(EGLWindow *window, size_t threadCount, LockStepThreadFunc threadFuncs[]);
+}  // namespace angle