| page.title=Implementing graphics |
| @jd:body |
| |
| <!-- |
| Copyright 2014 The Android Open Source Project |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div id="qv-wrapper"> |
| <div id="qv"> |
| <h2>In this document</h2> |
| <ol id="auto-toc"> |
| </ol> |
| </div> |
| </div> |
| |
| |
| <p>Follow the instructions here to implement the Android graphics HAL.</p> |
| |
| <h2 id=requirements>Requirements</h2> |
| |
| <p>The following list and sections describe what you need to provide to support |
| graphics in your product:</p> |
| |
| <ul> <li> OpenGL ES 1.x Driver <li> OpenGL ES 2.0 Driver <li> OpenGL ES 3.0 |
| Driver (optional) <li> EGL Driver <li> Gralloc HAL implementation <li> Hardware |
| Composer HAL implementation <li> Framebuffer HAL implementation </ul> |
| |
| <h2 id=implementation>Implementation</h2> |
| |
| <h3 id=opengl_and_egl_drivers>OpenGL and EGL drivers</h3> |
| |
| <p>You must provide drivers for OpenGL ES 1.x, OpenGL ES 2.0, and EGL. Here are |
| some key considerations:</p> |
| |
| <ul> <li> The GL driver needs to be robust and conformant to OpenGL ES |
| standards. <li> Do not limit the number of GL contexts. Because Android allows |
| apps in the background and tries to keep GL contexts alive, you should not |
| limit the number of contexts in your driver. <li> It is not uncommon to have |
| 20-30 active GL contexts at once, so you should also be careful with the amount |
| of memory allocated for each context. <li> Support the YV12 image format and |
| any other YUV image formats that come from other components in the system such |
| as media codecs or the camera. <li> Support the mandatory extensions: |
| <code>GL_OES_texture_external</code>, |
| <code>EGL_ANDROID_image_native_buffer</code>, and |
| <code>EGL_ANDROID_recordable</code>. The |
| <code>EGL_ANDROID_framebuffer_target</code> extension is required for Hardware |
| Composer 1.1 and higher, as well. <li> We highly recommend also supporting |
| <code>EGL_ANDROID_blob_cache</code>, <code>EGL_KHR_fence_sync</code>, |
| <code>EGL_KHR_wait_sync</code>, and <code>EGL_ANDROID_native_fence_sync</code>. |
| </ul> |
| |
| <p>Note the OpenGL API exposed to app developers is different from the OpenGL |
| interface that you are implementing. Apps do not have access to the GL driver |
| layer and must go through the interface provided by the APIs.</p> |
| |
| <h3 id=pre-rotation>Pre-rotation</h3> |
| |
| <p>Many hardware overlays do not support rotation, and even if they do it costs |
| processing power. So the solution is to pre-transform the buffer before it |
| reaches SurfaceFlinger. A query hint in <code>ANativeWindow</code> was added |
| (<code>NATIVE_WINDOW_TRANSFORM_HINT</code>) that represents the most likely |
| transform to be applied to the buffer by SurfaceFlinger. Your GL driver can use |
| this hint to pre-transform the buffer before it reaches SurfaceFlinger so when |
| the buffer arrives, it is correctly transformed.</p> |
| |
| <p>For example, you may receive a hint to rotate 90 degrees. You must generate |
| a matrix and apply it to the buffer to prevent it from running off the end of |
| the page. To save power, this should be done in pre-rotation. See the |
| <code>ANativeWindow</code> interface defined in |
| <code>system/core/include/system/window.h</code> for more details.</p> |
| |
| <h3 id=gralloc_hal>Gralloc HAL</h3> |
| |
| <p>The graphics memory allocator is needed to allocate memory that is requested |
| by image producers. You can find the interface definition of the HAL at: |
| <code>hardware/libhardware/modules/gralloc.h</code></p> |
| |
| <h3 id=protected_buffers>Protected buffers</h3> |
| |
| <p>The gralloc usage flag <code>GRALLOC_USAGE_PROTECTED</code> allows the |
| graphics buffer to be displayed only through a hardware-protected path. These |
| overlay planes are the only way to display DRM content. DRM-protected buffers |
| cannot be accessed by SurfaceFlinger or the OpenGL ES driver.</p> |
| |
| <p>DRM-protected video can be presented only on an overlay plane. Video players |
| that support protected content must be implemented with SurfaceView. Software |
| running on unprotected hardware cannot read or write the buffer. |
| Hardware-protected paths must appear on the Hardware Composer overlay. For |
| instance, protected videos will disappear from the display if Hardware Composer |
| switches to OpenGL ES composition.</p> |
| |
| <p>See the <a href="{@docRoot}devices/drm.html">DRM</a> page for a description |
| of protected content.</p> |
| |
| <h3 id=hardware_composer_hal>Hardware Composer HAL</h3> |
| |
| <p>The Hardware Composer HAL is used by SurfaceFlinger to composite surfaces to |
| the screen. The Hardware Composer abstracts objects like overlays and 2D |
| blitters and helps offload some work that would normally be done with |
| OpenGL.</p> |
| |
| <p>We recommend you start using version 1.3 of the Hardware Composer HAL as it |
| will provide support for the newest features (explicit synchronization, |
| external displays, and more). Because the physical display hardware behind the |
| Hardware Composer abstraction layer can vary from device to device, it is |
| difficult to define recommended features. But here is some guidance:</p> |
| |
| <ul> <li> The Hardware Composer should support at least four overlays (status |
| bar, system bar, application, and wallpaper/background). <li> Layers can be |
| bigger than the screen, so the Hardware Composer should be able to handle |
| layers that are larger than the display (for example, a wallpaper). <li> |
| Pre-multiplied per-pixel alpha blending and per-plane alpha blending should be |
| supported at the same time. <li> The Hardware Composer should be able to |
| consume the same buffers that the GPU, camera, video decoder, and Skia buffers |
| are producing, so supporting some of the following properties is helpful: <ul> |
| <li> RGBA packing order <li> YUV formats <li> Tiling, swizzling, and stride |
| properties </ul> <li> A hardware path for protected video playback must be |
| present if you want to support protected content. </ul> |
| |
| <p>The general recommendation when implementing your Hardware Composer is to |
| implement a non-operational Hardware Composer first. Once you have the |
| structure done, implement a simple algorithm to delegate composition to the |
| Hardware Composer. For example, just delegate the first three or four surfaces |
| to the overlay hardware of the Hardware Composer.</p> |
| |
| <p>Focus on optimization, such as intelligently selecting the surfaces to send |
| to the overlay hardware that maximizes the load taken off of the GPU. Another |
| optimization is to detect whether the screen is updating. If not, delegate |
| composition to OpenGL instead of the Hardware Composer to save power. When the |
| screen updates again, continue to offload composition to the Hardware |
| Composer.</p> |
| |
| <p>Devices must report the display mode (or resolution). Android uses the first |
| mode reported by the device. To support televisions, have the TV device report |
| the mode selected for it by the manufacturer to Hardware Composer. See |
| hwcomposer.h for more details.</p> |
| |
| <p>Prepare for common use cases, such as:</p> |
| |
| <ul> <li> Full-screen games in portrait and landscape mode <li> Full-screen |
| video with closed captioning and playback control <li> The home screen |
| (compositing the status bar, system bar, application window, and live |
| wallpapers) <li> Protected video playback <li> Multiple display support </ul> |
| |
| <p>These use cases should address regular, predictable uses rather than edge |
| cases that are rarely encountered. Otherwise, any optimization will have little |
| benefit. Implementations must balance two competing goals: animation smoothness |
| and interaction latency.</p> |
| |
| <p>Further, to make best use of Android graphics, you must develop a robust |
| clocking strategy. Performance matters little if clocks have been turned down |
| to make every operation slow. You need a clocking strategy that puts the clocks |
| at high speed when needed, such as to make animations seamless, and then slows |
| the clocks whenever the increased speed is no longer needed.</p> |
| |
| <p>Use the <code>adb shell dumpsys SurfaceFlinger</code> command to see |
| precisely what SurfaceFlinger is doing. See the <a |
| href="{@docRoot}devices/graphics/architecture.html#hwcomposer">Hardware |
| Composer</a> section of the Architecture page for example output and a |
| description of relevant fields.</p> |
| |
| <p>You can find the HAL for the Hardware Composer and additional documentation |
| in: <code>hardware/libhardware/include/hardware/hwcomposer.h |
| hardware/libhardware/include/hardware/hwcomposer_defs.h</code></p> |
| |
| <p>A stub implementation is available in the |
| <code>hardware/libhardware/modules/hwcomposer</code> directory.</p> |
| |
| <h3 id=vsync>VSYNC</h3> |
| |
| <p>VSYNC synchronizes certain events to the refresh cycle of the display. |
| Applications always start drawing on a VSYNC boundary, and SurfaceFlinger |
| always composites on a VSYNC boundary. This eliminates stutters and improves |
| visual performance of graphics. The Hardware Composer has a function |
| pointer:</p> |
| |
| <pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre> |
| |
| |
| <p>This points to a function you must implement for VSYNC. This function blocks |
| until a VSYNC occurs and returns the timestamp of the actual VSYNC. A message |
| must be sent every time VSYNC occurs. A client can receive a VSYNC timestamp |
| once, at specified intervals, or continuously (interval of 1). You must |
| implement VSYNC to have no more than a 1ms lag at the maximum (0.5ms or less is |
| recommended), and the timestamps returned must be extremely accurate.</p> |
| |
| <h4 id=explicit_synchronization>Explicit synchronization</h4> |
| |
| <p>Explicit synchronization is required and provides a mechanism for Gralloc |
| buffers to be acquired and released in a synchronized way. Explicit |
| synchronization allows producers and consumers of graphics buffers to signal |
| when they are done with a buffer. This allows the Android system to |
| asynchronously queue buffers to be read or written with the certainty that |
| another consumer or producer does not currently need them. See the <a |
| href="#synchronization_framework">Synchronization framework</a> section for an overview of |
| this mechanism.</p> |
| |
| <p>The benefits of explicit synchronization include less behavior variation |
| between devices, better debugging support, and improved testing metrics. For |
| instance, the sync framework output readily identifies problem areas and root |
| causes. And centralized SurfaceFlinger presentation timestamps show when events |
| occur in the normal flow of the system.</p> |
| |
| <p>This communication is facilitated by the use of synchronization fences, |
| which are now required when requesting a buffer for consuming or producing. The |
| synchronization framework consists of three main building blocks: |
| sync_timeline, sync_pt, and sync_fence.</p> |
| |
| <h5 id=sync_timeline>sync_timeline</h5> |
| |
| <p>A sync_timeline is a monotonically increasing timeline that should be |
| implemented for each driver instance, such as a GL context, display controller, |
| or 2D blitter. This is essentially a counter of jobs submitted to the kernel |
| for a particular piece of hardware. It provides guarantees about the order of |
| operations and allows hardware-specific implementations.</p> |
| |
| <p>Please note, the sync_timeline is offered as a CPU-only reference |
| implementation called sw_sync (which stands for software sync). If possible, |
| use sw_sync instead of a sync_timeline to save resources and avoid complexity. |
| If you’re not employing a hardware resource, sw_sync should be sufficient.</p> |
| |
| <p>If you must implement a sync_timeline, use the sw_sync driver as a starting |
| point. Follow these guidelines:</p> |
| |
| <ul> <li> Provide useful names for all drivers, timelines, and fences. This |
| simplifies debugging. <li> Implement timeline_value str and pt_value_str |
| operators in your timelines as they make debugging output much more readable. |
| <li> If you want your userspace libraries (such as the GL library) to have |
| access to the private data of your timelines, implement the fill driver_data |
| operator. This lets you get information about the immutable sync_fence and |
| sync_pts so you might build command lines based upon them. </ul> |
| |
| <p>When implementing a sync_timeline, <strong>don’t</strong>:</p> |
| |
| <ul> <li> Base it on any real view of time, such as when a wall clock or other |
| piece of work might finish. It is better to create an abstract timeline that |
| you can control. <li> Allow userspace to explicitly create or signal a fence. |
| This can result in one piece of the user pipeline creating a denial-of-service |
| attack that halts all functionality. This is because the userspace cannot make |
| promises on behalf of the kernel. <li> Access sync_timeline, sync_pt, or |
| sync_fence elements explicitly, as the API should provide all required |
| functions. </ul> |
| |
| <h5 id=sync_pt>sync_pt</h5> |
| |
| <p>A sync_pt is a single value or point on a sync_timeline. A point has three |
| states: active, signaled, and error. Points start in the active state and |
| transition to the signaled or error states. For instance, when a buffer is no |
| longer needed by an image consumer, this sync_point is signaled so that image |
| producers know it is okay to write into the buffer again.</p> |
| |
| <h5 id=sync_fence>sync_fence</h5> |
| |
| <p>A sync_fence is a collection of sync_pts that often have different |
| sync_timeline parents (such as for the display controller and GPU). These are |
| the main primitives over which drivers and userspace communicate their |
| dependencies. A fence is a promise from the kernel that it gives upon accepting |
| work that has been queued and assures completion in a finite amount of |
| time.</p> |
| |
| <p>This allows multiple consumers or producers to signal they are using a |
| buffer and to allow this information to be communicated with one function |
| parameter. Fences are backed by a file descriptor and can be passed from |
| kernel-space to user-space. For instance, a fence can contain two sync_points |
| that signify when two separate image consumers are done reading a buffer. When |
| the fence is signaled, the image producers know both consumers are done |
| consuming. |
| |
| Fences, like sync_pts, start active and then change state based upon the state |
| of their points. If all sync_pts become signaled, the sync_fence becomes |
| signaled. If one sync_pt falls into an error state, the entire sync_fence has |
| an error state. |
| |
| Membership in the sync_fence is immutable once the fence is created. And since |
| a sync_pt can be in only one fence, it is included as a copy. Even if two |
| points have the same value, there will be two copies of the sync_pt in the |
| fence. |
| |
| To get more than one point in a fence, a merge operation is conducted. In the |
| merge, the points from two distinct fences are added to a third fence. If one |
| of those points was signaled in the originating fence, and the other was not, |
| the third fence will also not be in a signaled state.</p> |
| |
| <p>To implement explicit synchronization, you need to provide the |
| following:</p> |
| |
| <ul> <li> A kernel-space driver that implements a synchronization timeline for |
| a particular piece of hardware. Drivers that need to be fence-aware are |
| generally anything that accesses or communicates with the Hardware Composer. |
| Here are the key files (found in the android-3.4 kernel branch): <ul> <li> Core |
| implementation: <ul> <li> <code>kernel/common/include/linux/sync.h</code> <li> |
| <code>kernel/common/drivers/base/sync.c</code> </ul> <li> sw_sync: <ul> <li> |
| <code>kernel/common/include/linux/sw_sync.h</code> <li> |
| <code>kernel/common/drivers/base/sw_sync.c</code> </ul> <li> Documentation: |
| <li> <code>kernel/common//Documentation/sync.txt</code> Finally, the |
| <code>platform/system/core/libsync</code> directory includes a library to |
| communicate with the kernel-space. </ul> <li> A Hardware Composer HAL module |
| (version 1.3 or later) that supports the new synchronization functionality. You |
| will need to provide the appropriate synchronization fences as parameters to |
| the set() and prepare() functions in the HAL. <li> Two GL-specific extensions |
| related to fences, <code>EGL_ANDROID_native_fence_sync</code> and |
| <code>EGL_ANDROID_wait_sync</code>, along with incorporating fence support into |
| your graphics drivers. </ul> |
| |
| <p>For example, to use the API supporting the synchronization function, you |
| might develop a display driver that has a display buffer function. Before the |
| synchronization framework existed, this function would receive dma-bufs, put |
| those buffers on the display, and block while the buffer is visible, like |
| so:</p> |
| |
| <pre class=prettyprint> |
| /* |
| * assumes buf is ready to be displayed. returns when buffer is no longer on |
| * screen. |
| */ |
| void display_buffer(struct dma_buf *buf); </pre> |
| |
| |
| <p>With the synchronization framework, the API call is slightly more complex. |
| While putting a buffer on display, you associate it with a fence that says when |
| the buffer will be ready. So you queue up the work, which you will initiate |
| once the fence clears.</p> |
| |
| <p>In this manner, you are not blocking anything. You immediately return your |
| own fence, which is a guarantee of when the buffer will be off of the display. |
| As you queue up buffers, the kernel will list dependencies. With the |
| synchronization framework:</p> |
| |
| <pre class=prettyprint> |
| /* |
| * will display buf when fence is signaled. returns immediately with a fence |
| * that will signal when buf is no longer displayed. |
| */ |
| struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence |
| *fence); </pre> |
| |
| |
| <h4 id=sync_integration>Sync integration</h4> |
| |
| <h5 id=integration_conventions>Integration conventions</h5> |
| |
| <p>This section explains how to integrate the low-level sync framework with |
| different parts of the Android framework and the drivers that need to |
| communicate with one another.</p> |
| |
| <p>The Android HAL interfaces for graphics follow consistent conventions so |
| when file descriptors are passed across a HAL interface, ownership of the file |
| descriptor is always transferred. This means:</p> |
| |
| <ul> <li> if you receive a fence file descriptor from the sync framework, you |
| must close it. <li> if you return a fence file descriptor to the sync |
| framework, the framework will close it. <li> if you want to continue using the |
| fence file descriptor, you must duplicate the descriptor. </ul> |
| |
| <p>Every time a fence is passed through BufferQueue - such as for a window that |
| passes a fence to BufferQueue saying when its new contents will be ready - the |
| fence object is renamed. Since kernel fence support allows fences to have |
| strings for names, the sync framework uses the window name and buffer index |
| that is being queued to name the fence, for example: |
| <code>SurfaceView:0</code></p> |
| |
| <p>This is helpful in debugging to identify the source of a deadlock. Those |
| names appear in the output of <code>/d/sync</code> and bug reports when |
| taken.</p> |
| |
| <h5 id=anativewindow_integration>ANativeWindow integration</h5> |
| |
| <p>ANativeWindow is fence aware. <code>dequeueBuffer</code>, |
| <code>queueBuffer</code>, and <code>cancelBuffer</code> have fence |
| parameters.</p> |
| |
| <h5 id=opengl_es_integration>OpenGL ES integration</h5> |
| |
| <p>OpenGL ES sync integration relies upon these two EGL extensions:</p> |
| |
| <ul> <li> <code>EGL_ANDROID_native_fence_sync</code> - provides a way to either |
| wrap or create native Android fence file descriptors in EGLSyncKHR objects. |
| <li> <code>EGL_ANDROID_wait_sync</code> - allows GPU-side stalls rather than in |
| CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the |
| <code>EGL_KHR_wait_sync</code> extension. See the |
| <code>EGL_KHR_wait_sync</code> specification for details. </ul> |
| |
| <p>These extensions can be used independently and are controlled by a compile |
| flag in libgui. To use them, first implement the |
| <code>EGL_ANDROID_native_fence_sync</code> extension along with the associated |
| kernel support. Next add a ANativeWindow support for fences to your driver and |
| then turn on support in libgui to make use of the |
| <code>EGL_ANDROID_native_fence_sync</code> extension.</p> |
| |
| <p>Then, as a second pass, enable the <code>EGL_ANDROID_wait_sync</code> |
| extension in your driver and turn it on separately. The |
| <code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct |
| native fence EGLSync object type so extensions that apply to existing EGLSync |
| object types don’t necessarily apply to <code>EGL_ANDROID_native_fence</code> |
| objects to avoid unwanted interactions.</p> |
| |
| <p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native |
| fence file descriptor attribute that can be set only at creation time and |
| cannot be directly queried onward from an existing sync object. This attribute |
| can be set to one of two modes:</p> |
| |
| <ul> <li> A valid fence file descriptor - wraps an existing native Android |
| fence file descriptor in an EGLSyncKHR object. <li> -1 - creates a native |
| Android fence file descriptor from an EGLSyncKHR object. </ul> |
| |
| <p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object |
| from the native Android fence file descriptor. This has the same result as |
| querying the attribute that was set but adheres to the convention that the |
| recipient closes the fence (hence the duplicate operation). Finally, destroying |
| the EGLSync object should close the internal fence attribute.</p> |
| |
| <h5 id=hardware_composer_integration>Hardware Composer integration</h5> |
| |
| <p>Hardware Composer handles three types of sync fences:</p> |
| |
| <ul> <li> <em>Acquire fence</em> - one per layer, this is set before calling |
| HWC::set. It signals when Hardware Composer may read the buffer. <li> |
| <em>Release fence</em> - one per layer, this is filled in by the driver in |
| HWC::set. It signals when Hardware Composer is done reading the buffer so the |
| framework can start using that buffer again for that particular layer. <li> |
| <em>Retire fence</em> - one per the entire frame, this is filled in by the |
| driver each time HWC::set is called. This covers all of the layers for the set |
| operation. It signals to the framework when all of the effects of this set |
| operation has completed. The retire fence signals when the next set operation |
| takes place on the screen. </ul> |
| |
| <p>The retire fence can be used to determine how long each frame appears on the |
| screen. This is useful in identifying the location and source of delays, such |
| as a stuttering animation. </p> |
| |
| <h4 id=vsync_offset>VSYNC Offset</h4> |
| |
| <p>Application and SurfaceFlinger render loops should be synchronized to the |
| hardware VSYNC. On a VSYNC event, the display begins showing frame N while |
| SurfaceFlinger begins compositing windows for frame N+1. The app handles |
| pending input and generates frame N+2.</p> |
| |
| <p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in |
| apps and SurfaceFlinger and the drifting of displays in and out of phase with |
| each other. This, however, does assume application and SurfaceFlinger per-frame |
| times don’t vary widely. Nevertheless, the latency is at least two frames.</p> |
| |
| <p>To remedy this, you may employ VSYNC offsets to reduce the input-to-display |
| latency by making application and composition signal relative to hardware |
| VSYNC. This is possible because application plus composition usually takes less |
| than 33 ms.</p> |
| |
| <p>The result of VSYNC offset is three signals with same period, offset |
| phase:</p> |
| |
| <ul> <li> <em>HW_VSYNC_0</em> - Display begins showing next frame <li> |
| <em>VSYNC</em> - App reads input and generates next frame <li> <em>SF |
| VSYNC</em> - SurfaceFlinger begins compositing for next frame </ul> |
| |
| <p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the |
| frame, while the application processes the input and renders the frame, all |
| within a single frame of time.</p> |
| |
| <p>Please note, VSYNC offsets reduce the time available for app and composition |
| and therefore provide a greater chance for error.</p> |
| |
| <h5 id=dispsync>DispSync</h5> |
| |
| <p>DispSync maintains a model of the periodic hardware-based VSYNC events of a |
| display and uses that model to execute periodic callbacks at specific phase |
| offsets from the hardware VSYNC events.</p> |
| |
| <p>DispSync is essentially a software phase lock loop (PLL) that generates the |
| VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if |
| not offset from hardware VSYNC.</p> |
| |
| <img src="images/dispsync.png" alt="DispSync flow"> |
| |
| <p class="img-caption"><strong>Figure 4.</strong> DispSync flow</p> |
| |
| <p>DispSync has these qualities:</p> |
| |
| <ul> <li> <em>Reference</em> - HW_VSYNC_0 <li> <em>Output</em> - VSYNC and SF |
| VSYNC <li> <em>Feedback</em> - Retire fence signal timestamps from Hardware |
| Composer </ul> |
| |
| <h5 id=vsync_retire_offset>VSYNC/Retire Offset</h5> |
| |
| <p>The signal timestamp of retire fences must match HW VSYNC even on devices |
| that don’t use the offset phase. Otherwise, errors appear to have greater |
| severity than reality.</p> |
| |
| <p>“Smart” panels often have a delta. Retire fence is the end of direct memory |
| access (DMA) to display memory. The actual display switch and HW VSYNC is some |
| time later.</p> |
| |
| <p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the device’s |
| BoardConfig.mk make file. It is based upon the display controller and panel |
| characteristics. Time from retire fence timestamp to HW Vsync signal is |
| measured in nanoseconds.</p> |
| |
| <h5 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC Offsets</h5> |
| |
| <p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and |
| <code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on |
| high-load use cases, such as partial GPU composition during window transition |
| or Chrome scrolling through a webpage containing animations. These offsets |
| allow for long application render time and long GPU composition time.</p> |
| |
| <p>More than a millisecond or two of latency is noticeable. We recommend |
| integrating thorough automated error testing to minimize latency without |
| significantly increasing error counts.</p> |
| |
| <p>Note these offsets are also set in the device’s BoardConfig.mk make file. |
| The default if not set is zero offset. Both settings are offset in nanoseconds |
| after HW_VSYNC_0. Either can be negative.</p> |
| |
| <h3 id=virtual_displays>Virtual displays</h3> |
| |
| <p>Android added support for virtual displays to Hardware Composer in version |
| 1.3. This support was implemented in the Android platform and can be used by |
| Miracast.</p> |
| |
| <p>The virtual display composition is similar to the physical display: Input |
| layers are described in prepare(), SurfaceFlinger conducts GPU composition, and |
| layers and GPU framebuffer are provided to Hardware Composer in set().</p> |
| |
| <p>Instead of the output going to the screen, it is sent to a gralloc buffer. |
| Hardware Composer writes output to a buffer and provides the completion fence. |
| The buffer is sent to an arbitrary consumer: video encoder, GPU, CPU, etc. |
| Virtual displays can use 2D/blitter or overlays if the display pipeline can |
| write to memory.</p> |
| |
| <h4 id=modes>Modes</h4> |
| |
| <p>Each frame is in one of three modes after prepare():</p> |
| |
| <ul> <li> <em>GLES</em> - All layers composited by GPU. GPU writes directly to |
| the output buffer while Hardware Composer does nothing. This is equivalent to |
| virtual display composition with Hardware Composer <1.3. <li> <em>MIXED</em> - |
| GPU composites some layers to framebuffer, and Hardware Composer composites |
| framebuffer and remaining layers. GPU writes to scratch buffer (framebuffer). |
| Hardware Composer reads scratch buffer and writes to the output buffer. Buffers |
| may have different formats, e.g. RGBA and YCbCr. <li> <em>HWC</em> - All |
| layers composited by Hardware Composer. Hardware Composer writes directly to |
| the output buffer. </ul> |
| |
| <h4 id=output_format>Output format</h4> |
| |
| <p><em>MIXED and HWC modes</em>: If the consumer needs CPU access, the consumer |
| chooses the format. Otherwise, the format is IMPLEMENTATION_DEFINED. Gralloc |
| can choose best format based on usage flags. For example, choose a YCbCr format |
| if the consumer is video encoder, and Hardware Composer can write the format |
| efficiently.</p> |
| |
| <p><em>GLES mode</em>: EGL driver chooses output buffer format in |
| dequeueBuffer(), typically RGBA8888. The consumer must be able to accept this |
| format.</p> |
| |
| <h4 id=egl_requirement>EGL requirement</h4> |
| |
| <p>Hardware Composer 1.3 virtual displays require that eglSwapBuffers() does |
| not dequeue the next buffer immediately. Instead, it should defer dequeueing |
| the buffer until rendering begins. Otherwise, EGL always owns the “next” output |
| buffer. SurfaceFlinger can’t get the output buffer for Hardware Composer in |
| MIXED/HWC mode. </p> |
| |
| <p>If Hardware Composer always sends all virtual display layers to GPU, all |
| frames will be in GLES mode. Although it is not recommended, you may use this |
| method if you need to support Hardware Composer 1.3 for some other reason but |
| can’t conduct virtual display composition.</p> |
| |
| <h2 id=testing>Testing</h2> |
| |
| <p>For benchmarking, we suggest following this flow by phase:</p> |
| |
| <ul> <li> <em>Specification</em> - When initially specifying the device, such |
| as when using immature drivers, you should use predefined (fixed) clocks and |
| workloads to measure the frames per second rendered. This gives a clear view of |
| what the hardware is capable of doing. <li> <em>Development</em> - In the |
| development phase as drivers mature, you should use a fixed set of user actions |
| to measure the number of visible stutters (janks) in animations. <li> |
| <em>Production</em> - Once the device is ready for production and you want to |
| compare against competitors, you should increase the workload until stutters |
| increase. Determine if the current clock settings can keep up with the load. |
| This can help you identify where you might be able to slow the clocks and |
| reduce power use. </ul> |
| |
| <p>For the specification phase, Android offers the Flatland tool to help derive |
| device capabilities. It can be found at: |
| <code>platform/frameworks/native/cmds/flatland/</code></p> |
| |
| <p>Flatland relies upon fixed clocks and shows the throughput that can be |
| achieved with composition-based workloads. It uses gralloc buffers to simulate |
| multiple window scenarios, filling in the window with GL and then measuring the |
| compositing. Please note, Flatland uses the synchronization framework to |
| measure time. So you must support the synchronization framework to readily use |
| Flatland.</p> |