| # Time-Based GC Trigger |
| |
| Time-based GC triggering is a strategy for triggering garbage collection (GC) |
| designed to make efficient use of overall GC CPU and memory across the device. |
| The more time has passed since the previous GC in an app, the lower the |
| threshold is for triggering the next GC. This decay in the GC trigger |
| threshold is chosen to optimize the heap size of to-be-collected objects |
| against the sustained per-second CPU cost of garbage collection across all |
| running applications. |
| |
| Time-based GC triggering is different from the traditional |
| target-heap-utilization-based strategy used by ART (described in |
| `runtime/gc_configuration.md`), which was designed to keep GC cost per byte |
| allocated roughly constant. |
| |
| The main motivations for switching from target-heap-utilization-based GC |
| triggering to time-based GC triggering are to save more memory across the |
| device with less overall CPU effort and simplify the tuning knob configuration |
| of GC. |
| |
| ## Motivations |
| |
| ### More efficient use of GC effort |
| |
| Time-based GC triggering is designed to make efficient use of overall GC CPU |
| and memory use across the device. When it comes to memory use, we are |
| specifically interested in the heap size of to-be-collected objects, which we |
| refer to as `memory overhead` from here on. |
| |
| In traditional heap-utilization-based triggering, memory overhead is |
| proportional to the live heap size of an application. The per-second CPU cost |
| of GC is proportional to the allocation rate of the application. |
| |
| Consider two processes `A` and `B`. Process `A` has a small 10MB live heap |
| size and high allocation rate of 100MB per second. Process `B` has a large |
| 100MB live heap size and small allocation rate of 10MB per second. |
| |
| Assume target heap utilization is set to 0.5, it takes 50ms CPU time to do GC |
| in process `A`, and that it takes 500ms CPU time to do GC in process `B` |
| (because the live heap size of `B` is 10 times the live heap size of `A`). Say |
| we have 8GB total RAM on device. |
| |
| With traditional heap-utilization-based triggering, GC will happen |
| 100MB/10MB = 10 times per second in process `A`, and every |
| 10MB/100MB = 0.1 seconds in process `B`. The total memory overhead across both |
| processes is 110MB with GC CPU cost of 10 * 50 + 0.1 * 500 = 550ms per second. |
| |
| | Utilization Based Trigger (UBT) | `A` | `B` | UBT Overall | |
| | :------------------------------- | :------- | :------ | :---------- | |
| | Live Heap | 10MB | 100MB | 110MB | |
| | Alloc Rate | 100MB/s | 10MB/s | 110MB/s | |
| | Per-GC CPU | 50ms | 500ms | | |
| | Memory Overhead | 10MB | 100MB | 110MB | |
| | Memory Overhead % of total RAM | 0.12% | 1.2% | 1.34% | |
| | Heap Utilization | 0.5 | 0.5 | 0.5 | |
| | GCs per second | 10 | 0.1 | 10.1 | |
| | GC CPU per second | 500ms/s | 50ms/s | 550ms/s | |
| | GC CPU per byte allocated | 5ms/MB | 5ms/MB | 5ms/MB | |
| | GC CPU % of 1 CPU core | 50% | 5% | 55% | |
| | Memory GC cost factor | 417 | 4.17 | 41 | |
| |
| The time-based GC trigger implicitly takes into account the allocation rate of |
| an app. The higher the allocation rate, the more memory the app can allocate |
| before triggering GC. For example, say GC triggers every 55MB allocated for |
| both process `A` and process `B`, because process `A` allocates at 10x the |
| rate despite having 10x smaller per-GC cost. In this case, GC will happen |
| 100MB/55MB = 1.8 times per second in process `A` and 10MB/55MB = 0.18 times |
| per second in process `B`. The total memory overhead across device is still |
| 110MB, but with GC CPU cost of only 1.8 * 50 + 0.18 * 500 = 180ms per second. |
| That's 1/3rd the GC CPU cost for the same overall memory overhead on device, |
| which is a much more efficient use of GC CPU effort. |
| |
| | Time Based Trigger (TBT) | `A` | `B` | TBT Overall | UBT Overall | |
| | :----------------------------- | :------- | :------ | :---------- | :---------- | |
| | Live Heap | 10MB | 100MB | 110MB | 110MB | |
| | Alloc Rate | 100MB/s | 10MB/s | 110MB/s | 110MB/s | |
| | Per-GC CPU | 50ms | 500ms | | | |
| | Memory Overhead | 55MB | 55MB | 110MB | 110MB | |
| | Memory Overhead % of total RAM | 0.67% | 0.67% | 1.34% | 1.34% | |
| | Heap Utilization | 0.15 | 0.65 | 0.5 | 0.5 | |
| | GCs per second | 1.8 | 0.18 | 1.98 | 10.1 | |
| | GC CPU per second | 90ms/s | 90ms/s | 180ms/s | 550ms/s | |
| | GC CPU per byte allocated | 0.9ms/MB | 9ms/MB | 1.6ms/MB | 5ms/MB | |
| | GC CPU % of 1 CPU core | 9% | 9% | 18% | 55% | |
| | Memory GC cost factor | 13.5 | 13.5 | 13.5 | 41 | |
| |
| The approach to time-based GC triggering is based on the same idea as |
| [Marisa Kirisame et al, "Optimal heap limits for reducing browser memory |
| use"](https://dl.acm.org/doi/10.1145/3563323). |
| |
| ### Improve balance between GC memory overhead and CPU |
| |
| The example in the previous section shows significant improvements in |
| efficiency because it involved processes whose live heap sizes were relatively |
| large or relatively small compared to their allocation rates. |
| |
| In the traditional target heap utilization GC strategy, a process with small |
| live heap and high allocation rate uses a relatively high amount of GC CPU for |
| relatively little memory overhead. A process with large live heap and low |
| allocation has relatively high memory overhead for relatively little GC CPU. |
| |
| This imbalance between memory overhead and GC CPU can be mitigated to some |
| extent with use of `heapminfree` and `heapmaxfree` to limit how extreme the |
| imbalance can get, but that's not what those knobs were designed for. Use of |
| `heapminfree` and `heapmaxfree` can introduce additional imbalances, |
| particularly as live heap size and allocation rates change over time in an |
| application. |
| |
| Time-based GC triggering implicitly takes allocation rate into the decision |
| for when to trigger GC. An app will be given a larger heap allowance during |
| burst allocation activities and less of an allowance when it is idling to |
| maintain the balance of memory overhead to GC CPU overhead. |
| |
| ### Reduce lingering garbage in idle processes |
| |
| In the heap target utilization approach to GC triggering, you can imagine a |
| case where an application allocates to just below the GC threshold and then |
| goes idle. For target utilization of 0.5, for example, that could mean just |
| shy of 50% of the Java heap consists of unreachable objects and will continue |
| to do so for the foreseeable future. The costs of leaving these unreachable |
| objects on the heap are further compounded if some of the unreachable objects |
| are pointers to "owned" native objects or binder objects causing memory to be |
| retained in other processes. |
| |
| Time-based GC triggering addresses this case by limiting how long large |
| amounts of unreachable objects can stay on the heap before GC is triggered. |
| |
| ### More meaningful primary tuning knob |
| |
| `heaptargetutilization` is the primary tuning knob to trade off space and time |
| in garbage collection in the traditional GC triggering approach. The |
| relationship between target heap utilization and memory use is indirect, and |
| you can't know how much space and time are being traded off without also |
| knowing the live heap size and allocation rates of the applications on device. |
| |
| For example, for an app with live heap size of 50MB, a target heap utilization |
| of 0.5 means 50MB of memory overhead. Increasing target heap utilization to, |
| say, 0.75, reduces memory overhead in this case to 16.6MB. |
| |
| If target heap utilization is tuned for a device empirically, the tuning value |
| can be thrown off if app allocation rates trend up or down or as GC |
| performance is improved release over release. |
| |
| The time-based GC trigger specifies the main trade-off between space and time |
| using a `heap-memory-gc-cost-factor` parameter that directly represents the |
| trade-off between memory and space (see the section on tuning below). Changes |
| to application allocation rates or optimizations to GC impact how much GC and |
| memory overhead there is on device, without changing the balance between space |
| and time used by GC. |
| |
| ### Simplified GC tuning |
| |
| The time-based GC trigger approach eliminates `heapminfree`, `heapmaxfree`, |
| and `foreground-heap-growth-multiplier` tuning knobs. The |
| `heaptargetheaputilization` knob is replaced with a |
| `heap-memory-gc-cost-factor` which has a natural default value of 1. |
| |
| The ideal is that all devices can now rely on the default tuning knob setting |
| without requiring additional GC tuning, and that tuning knob values no longer |
| need to be updated every few years to adjust to increased memory capacity and |
| CPU capability of devices. |
| |
| ## Enabling time-based GC triggering |
| |
| To enable time-based GC triggering, set the following system property: |
| |
| ``` |
| dalvik.vm.enable_time_based_gc_trigger=true |
| ``` |
| |
| For example, add the following to your device.mk file: |
| |
| ``` |
| PRODUCT_VENDOR_PROPERTIES += dalvik.vm.enable_time_based_gc_trigger=true |
| ``` |
| |
| Time-based GC triggering will not be enabled by default initially. As the |
| feature gains greater adoption and we have had a chance to see the broader |
| implications of the time-based approach to GC triggering, we will consider |
| switching it to be enabled by default. |
| |
| ## Tuning time-based GC triggering |
| |
| When time-based GC triggering is enabled, the following GC tuning knobs will |
| no longer have any effect: |
| |
| * `dalvik.vm.heaptargetutilization` (`-XX:HeapTargetUtilization`) |
| * `dalvik.vm.heapminfree` (`-XX:HeapMinFree`) |
| * `dalvik.vm.heapmaxfree` (`-XX:HeapMaxFree`) |
| * `dalvik.vm.foreground-heap-growth-multiplier` (`-XX:ForegroundHeapGrowthMultiplier`) |
| |
| The following knobs continue to behave as before: |
| |
| * `dalvik.vm.heapsize` (`-Xmx`) |
| * `dalvik.vm.heapgrowthlimit` (`-XX:HeapGrowthLimit`) |
| * `dalvik.vm.heapstartsize` (`-Xms`) |
| |
| There is a single new tuning knob: |
| |
| * `dalvik.vm.heap-memory-gc-cost-factor` (`-XX:HeapMemoryGcCostFactor`) |
| |
| ### Heap Memory GC Cost Factor |
| |
| The more applications are running and allocating on device, the more GC is |
| required. How much GC is required varies over time depending on how the phone |
| is being used. The new tuning knob for time-based GC triggering controls how |
| GC costs are distributed between memory and GC CPU as the need for GC |
| increases on device. |
| |
| `dalvik.vm.heap-memory-gc-cost-factor` (`-XX:HeapMemoryGcCostFactor`) |
| specifies the ratio of CPU overhead to memory overhead associated with garbage |
| collection on device. It can be thought of as specifying what percent of a |
| single CPU core's time is worth spending on garbage collection to save 1% of |
| total device memory. It defaults to 1.0. |
| |
| For example, consider a device with 8GB total memory using the default tuning |
| knob value of 1.0. Time-based GC triggering balances memory and CPU so that |
| the ratio of memory to CPU overhead is 81.92MB per 10ms of CPU spent doing GC |
| per second wall clock time. Every 1% increase in required GC costs an |
| additional 81.92MB of memory and 10ms CPU per second. |
| |
| Increasing the tuning knob value makes garbage collection more aggressive, |
| reducing memory overhead at the cost of additional GC. In theory, increasing |
| the value by 4x will result in around twice the GC CPU effort for half the |
| memory overhead. |
| |
| Consider an extremely aggressive tuning knob value of 100.0. For every 1% |
| increase in required GC, the added cost is 10% of a CPU core but only 0.1% of |
| total device memory. At the margin you could save 5% of a CPU core for the low |
| cost of 0.1% of device memory by reducing the tuning knob value by 4x down to |
| 25.0. |
| |
| Consider an extremely relaxed tuning knob value of 0.01. For every 1% increase |
| in required GC, the added cost is 10% of total device memory but only 0.1% of |
| a CPU core. At the margin you could save 5% of device memory overhead for the |
| low cost of 0.1% of GC CPU by increasing the tuning knob value by 4x to 0.04. |
| |
| To avoid such large imbalances in overhead of memory compared to CPU, we |
| recommend using the default tuning knob value of 1.0. |
| |
| ## Implications of time-based GC triggering |
| |
| ### Increased allocation rate leads to increased total heap size |
| |
| Previously the total heap size of an application was a function of the |
| application's live heap size. Now the total heap size of an application |
| depends on both live heap size and allocation rate. The higher the allocation |
| rate, the larger the total heap size. |
| |
| ### Applications with higher allocation rates will do more GC |
| |
| As with before, an application will do more GC the more it allocates. However, |
| with time-based GC triggering the increase in GC effort grows sublinearly |
| instead of linearly with the increase of allocation rate. |
| |
| ### Garbage collection can be triggered by the passage of time |
| |
| Previously the GC trigger threshold was evaluated only when a new object was |
| allocated. With time-based GC triggering, there is additional logic to |
| trigger GC if the passage of time causes the threshold to drop below what has |
| already been allocated. This makes it possible for GC to be triggered in |
| applications that have stopped allocating entirely. |
| |
| ### Changes to java.lang.Runtime.totalMemory |
| |
| The value returned by `java.lang.Runtime.totalMemory` now refers to the growth |
| limit for the heap rather than the target footprint. It cannot be used as a |
| good predictor for when the next garbage collection will be triggered, now |
| that the GC trigger depends in a more complicated way on both bytes allocated |
| and time passed since the previous GC. |
| |
| ### Future GC optimizations will result in memory and GC CPU savings |
| |
| With time-based GC triggering, any optimizations that speed up the GC |
| algorithm will result in savings to both GC CPU and memory overhead on device. |
| The GC CPU savings comes from reduced per-GC cost, and the memory savings come |
| from increased rate of GC. |
| |
| Previously, optimizations to the GC algorithm would result in GC CPU savings, |
| but not impact memory overhead. |
| |
| ## Time-based GC triggering in Perfetto traces |
| |
| An easy way to check if time-based GC triggering is enabled via Perfetto trace |
| is to enable the `dalvik` atrace tag when taking the trace and search for |
| slices with `TimeBased` in the name. If any slices with `TimeBased` in the |
| name are found, time-based GC triggering is enabled. If there aren't any |
| slices with `TimeBased` in the name, it is unlikely that time-based GC |
| triggering is enabled. |