App Code is Memory

The code you write is itself a form of memory use. Every class, method, and string constant in your application must be loaded into RAM when it is executed. The larger your application's code base, the more memory it will consume just to exist.

File-Backed Memory and Demand Paging

Android loads executable code from your .apk (like .oat or .so files) using mmap. This means the code is file-backed.

Crucially, Android uses demand paging. When your app starts, the kernel does not load the entire APK into RAM immediately. Instead, it only maps the file into the process's virtual address space. As your app executes, and the CPU jumps to a new function, it triggers a “page fault.” The kernel pauses the thread, reads that specific 4KB page of code from the storage into physical RAM, and resumes execution.

A diagram illustrating demand paging, showing virtual pages being mapped tophysical RAM pages only when accessed

This means that code you package but never execute does not use physical memory for the code pages themselves. However, unused libraries still increase the overall APK size and can significantly increase the memory used by the system's internal metadata (like DEX indices and class descriptors), which must be read to even know that the code exists. Furthermore, many libraries contain static initializers or are touched by dependency injection frameworks during app startup, causing them to be paged into RAM anyway.

Page Eviction and Slowdowns

Because file-backed memory can always be re-read from storage, the kernel considers these pages “clean”. When the system experiences memory pressure, the kernel will evict (drop) these clean code pages from RAM to make room for other things.

If your app later needs to execute that code again, the CPU will fault, and the kernel must re-read the page from storage. The more code your app has, the more vulnerable it is to having its code evicted. When a user returns to your bloated app after using other apps, they will experience random jank and slowdowns as the CPU constantly stalls waiting for code to be paged back in from storage.

The Cost of a Page Fault: While it varies heavily based on the device's storage speed (UFS vs. eMMC) and kernel state, a major page fault (reading 4KB from storage) can cost anywhere from 0.5ms to 5ms. If your startup path touches 500 different pages of unoptimized code, you could easily introduce several hundred milliseconds of pure I/O latency to your app startup time.

Exploring Code Size with Compiler Explorer

To build an intuition for how your Java or Kotlin code translates into native machine code (and thus memory bytes), you can use Compiler Explorer.

Android support is built directly into Godbolt. It allows you to see how different parts of the Android toolchain (D8, R8, and dex2oat) transform your source code.

How to use Compiler Explorer with Android:

  1. Navigate to godbolt.org.
  2. Select Android Java or Android Kotlin from the language dropdown (top left).
  3. In the compiler dropdown (top right of the code pane), you can choose between different tools:
    • d8: Shows the Dalvik bytecode (.dex). This is the closest representation to your original code and is easier to read.
    • r8: Shows how the R8 optimizer shrinks and optimizes your bytecode.
    • dex2oat: Shows the final ARM64 machine code that actually executes on the device. This is where you can see the real memory impact (4 bytes per instruction). dex2oat can target different ISAs, but ARM64 is the most common for mobile phones.
  4. Source<>Output Highlighting: Hovering over a line of code will highlight the corresponding bytecode or machine code instructions, making it easy to trace the impact of specific statements.
  5. Optimization Pipeline: In the disassembly view, you can click Add new... -> Opt Pipeline. This allows you to see the internal steps the compiler takes. You can inspect how the Internal Representation (IR) is transformed at each stage (e.g., between the “Inliner (before)” and “Inliner (after)” steps) before it is lowered into the final ARM64 machine code.

A screenshot of the Compiler Explorer UI showing a sample program and itsdex2oat output disassembly and optimization pipeline with the inlining stepshown

Why this matters for Memory

Every instruction you see in the dex2oat output targeting the ARM64 ISA takes up 4 bytes in your app's executable (.odex or .oat) file.

Try entering code that utilizes different language features and study the compiler’s output:

  • Array access vs. List Iterators:
    • A simple array loop over int[] might compile to ~10 instructions (~40 bytes).
    • A foreach loop over a List implicitly uses an Iterator. This can result in 30-40 instructions (~160 bytes) because of the extra method calls (hasNext(), next()) and the allocation of the iterator object itself.
    • R8 Optimization: Under the right conditions (e.g., when the List is proven to be an ArrayList), the R8 optimizer can transform a foreach loop back into a simple indexed loop, eliminating the iterator overhead and reducing both code size and runtime memory churn.
  • Virtual Method Calls: Involve loading the object's class, finding the method in the vtable, and then branching. This usually takes 4-5 instructions (~20 bytes).
  • Direct/Static Calls: Often translate to a single bl (Branch with Link) instruction (4 bytes).
  • Kotlin Lambdas: Can generate entire anonymous classes and additional bridge methods, adding hundreds of bytes of code and metadata overhead for a simple functional block.

By using Compiler Explorer, you can see how sophisticated language features (like Kotlin lambdas, stream APIs, or heavy use of generics) impact the final compiled size of your application, and how optimizers such as R8 can counteract the cost of language abstractions in some cases. This tool can help you make informed tradeoffs in designing and implementing an app.

Broadly speaking, more complexity in your app's code leads to higher memory use. Conversely, simpler code - or code that is simplified by R8 - results in a smaller representation as CPU instructions and bytes in storage and RAM.

Measuring Code Impact with meminfo and showmap

You can use the standard Android memory tools to see how much memory your app's code is consuming.

dumpsys meminfo

When you run adb shell dumpsys meminfo <package>, the Code category in the App Summary section provides a high-level view of code-related memory:

 App Summary
                       Pss(KB)
                        ------
           Java Heap:     3244
         Native Heap:     5412
                Code:    24512  # <--- Sum of .so, .dex, .oat, .art, etc.

showmap

For a more granular view, use showmap. It reveals regions out of specific files being mapped to memory:

adb shell showmap $(pidof <package>) | grep -E "\.oat|\.odex|\.dex|\.apk"

You will see entries for your application's compiled code:

   size      RSS      PSS    clean    dirty    clean    dirty     swap  swapPSS object
------- -------- -------- -------- -------- -------- -------- -------- -------- ----------------
  12288     8192     8192     8192        0        0        0        0        0 /data/app/.../base.odex

Dead Code and R8

Because every executed method takes up memory, having a “bloated” app with unnecessary initializations or unused libraries can severely impact startup performance and baseline memory usage.

This is why tools like R8 (ProGuard) are critical. R8 analyzes your application's bytecode and removes any classes or methods that are never called (“dead code stripping”).

Hands-on Exercise: The Cost of Bloat

To demonstrate the impact of code size, we have created two versions of an application in the samples/CodeBloat/ directory.

The build script generate_code.sh artificially creates 300 Java classes, each with 500 methods containing unique strings.

  • CodeBloat: The standard, unoptimized build containing all generated classes.
  • CodeBloatOptimized: The same source code, but compiled with R8 shrinking enabled.

The MainActivity in both apps attempts to touch all 300 classes on a background thread during startup.

1. Build and Install

m CodeBloat CodeBloatOptimized
adb install -r $OUT/system/app/CodeBloat/CodeBloat.apk
adb install -r $OUT/system/app/CodeBloatOptimized/CodeBloatOptimized.apk

2. Speed Compile

To maximize the file-backed memory impact, we will use the cmd package compile tool to ahead-of-time (AOT) compile the apps into .oat files.

adb shell cmd package compile -m speed -f com.android.codebloat
adb shell cmd package compile -m speed -f com.android.codebloat.optimized

Please note that this is a synthetic example. Typically, apps will use the speed-profile compilation mode (see further below).

3. Launch and Compare

Launch the unoptimized app and check its memory footprint:

adb shell am start -W -n com.android.codebloat/.MainActivity
sleep 5 # Wait for the background thread to load classes
adb shell dumpsys meminfo -s com.android.codebloat

Now do the same for the optimized app:

adb shell am start -W -n com.android.codebloat.optimized/com.android.codebloat.MainActivity
sleep 5
adb shell dumpsys meminfo -s com.android.codebloat.optimized

The Results:

If you look at the Code row in the App Summary section, you will see a massive difference:

  • Unoptimized Code: ~30,000 KB (30 MB)
  • Optimized Code: ~2,000 KB (2 MB)

Because R8 determined that the 500 methods inside those classes were never actually doing anything useful (the doSomething() method only calls method0(), and the results are ignored), it stripped almost all of the artificially generated code out of the final APK.

4. View Startup in Perfetto

The impact of this code bloat is extremely visible during application startup.

Unoptimized App (mem.rss.file climbs massively):

A screenshot of the Perfetto UI showing the com.android.codebloat processstartup with the mem.rss.file track climbing significantly, resulting in a 1.2sstartup delay

In the unoptimized app, the mem.rss.file track (representing file-backed memory) increases dramatically during the application startup phase. As the app touches the bloated, artificially generated classes, the operating system is forced to page in large amounts of code from the compiled .oat file on storage. You can visually see this impact in the thread state track for the main thread: the high frequency of yellow slices indicates the thread is frequently blocked and stalling on file I/O while waiting for these code pages to be read. The bottom panel shows a massive delta value, adding up to over 141MB of file-backed memory paged into RAM. This heavy I/O causes the startup to take over 1.2 seconds, resulting in a noticeably sluggish user experience.

Note: the trace screenshots demonstrate a memory trend, but actual magnitudes will vary by device characteristics.

Optimized App (mem.rss.file peaks at a lower value):

A screenshot of the Perfetto UI showing the com.android.codebloat.optimizedprocess startup with a relatively flat mem.rss.file track, taking only743ms

<--! TODO retake screenshots, showing the breakdown of thread state time, and zooming on classloading slices. -->

In the optimized app, R8 has stripped the dead code out of the APK during the build process, leaving far fewer executable pages to read from storage. The mem.rss.file track climbs much less (a delta of only ~114MB), and the total startup time is drastically reduced to roughly 743ms. This prevents I/O stalls and leaves more free memory for the rest of the system.

Startup Comparison:

MetricUnoptimized (CodeBloat)Optimized (CodeBloatOptimized)
Startup Time~1.25 seconds~743 ms
mem.rss.file Delta~141 MB~114 MB

PerfettoSQL for File-Backed Memory

You can run a query to track the maximum amount of file-backed memory that any codebloat application touched during its execution:

SELECT
  p.name AS process_name,
  max(c.value)/1024.0/1024.0 AS max_rss_file_mb
FROM counter c
JOIN process_counter_track t ON c.track_id = t.id
JOIN process p USING (upid)
WHERE p.name LIKE 'com.android.codebloat%' AND t.name = 'mem.rss.file'
GROUP BY p.name;

ART Compilation Modes and Memory

The Android Runtime (ART) can compile your application code in one of several different modes, also known as compiler filters. The selected compiler filter has a direct impact on your app's memory footprint.

  • verify: ART only performs bytecode verification. No AOT compilation is performed. Code is executed via the Interpreter or compiled at runtime by the JIT compiler.
    • Memory Impact: Smallest on-disk size. Native code memory usage is pushed into the JIT Cache (anonymous dirty memory).
  • speed: ART performs full AOT compilation of all methods.
    • Memory Impact: Largest .odex size. Maximizes file-backed (clean) memory usage.
  • speed-profile: ART only compiles methods that have been marked as “hot” in a startup profile.
    • Memory Impact: Balanced approach. Only the most critical code is AOT compiled.

The most common filter is speed-profile, which is used when installing user apps. This is configured in the system properties pm.dexopt.install and pm.dexopt.bg-dexopt, and is typically set in build/make/target/product/runtime_libart.mk.

Some system apps will use speed compilation, and will also compile at system image build time. verify is typically only used in development use cases.

Use caseTypical compiler filter
Developmentverify
System imagespeed
User appsspeed-profile

Hands-on Exercise: Compilation Modes and Memory

We can use the CodeBloat app to see how these filters affect memory. To reproduce these measurements:

  1. Force re-compile the app into the target mode.
  2. Force-stop and cold-start the app.
  3. Wait for the background thread to finish touching classes (watch logcat or wait 5s).
  4. Run adb shell dumpsys meminfo com.android.codebloat.

Note: These measurements were taken on a Pixel 10 Pro Fold. Actual numbers will vary by device; these are for scale.

Mode: verify (No AOT)

adb shell cmd package compile -m verify -f com.android.codebloat
adb shell am force-stop com.android.codebloat
adb shell am start -W -n com.android.codebloat/.MainActivity
sleep 5
adb shell dumpsys meminfo com.android.codebloat

In verify mode, the app Summary shows: * Code PSS: ~8,000 KB * Dalvik Other (JIT): ~25,000 KB

Because no code is compiled AOT, the runtime must JIT-compile hot methods into the JIT Cache, which shows up as dirty anonymous memory (Dalvik Other).

Mode: speed (Full AOT)

adb shell cmd package compile -m speed -f com.android.codebloat
adb shell am force-stop com.android.codebloat
adb shell am start -W -n com.android.codebloat/.MainActivity
sleep 5
adb shell dumpsys meminfo com.android.codebloat

In speed mode, the results shift dramatically: * Code PSS: ~24,000 KB * Dalvik Other (JIT): ~5,000 KB

The application's code is now mapped from the .odex file as clean file-backed memory. This reduces pressure on the JIT cache and makes the memory eligible for eviction under pressure, rather than being “stuck” as dirty RAM.

Mode: speed-profile (Selective AOT)

Modern apps may bundle a baseline.prof startup profile. ART uses this to selectively compile only the code needed for a fast, memory-efficient startup.

In this exercise we will create a baseline profile to list the app's startup classes. However in reality the compiler may also receive profiles from external sources such as from the application store (“cloud profiles”), which can provide crowdsourced startup profiles for apps regardless of whether the developer also bundled a baseline profile that they generated.

Generating and Using On-device Profiles

To see the impact of speed-profile, you can generate your own profile on-device:

  1. Reset and Start:

    adb shell am force-stop com.android.codebloat
    
  2. Interact: Start the app and let it run its startup sequence.

  3. Dump Profile:

    adb shell kill -s SIGUSR1 $(pidof com.android.codebloat)
    

    (This forces the app to write its current profile to disk).

  4. Install Profile:

    adb shell cp /data/misc/profiles/cur/0/com.android.codebloat/primary.prof \
    /data/misc/profiles/ref/com.android.codebloat/primary.prof
    
  5. Compile:

    adb shell cmd package compile -m speed-profile -f com.android.codebloat
    

When you launch again, you'll see a balance: Code PSS will be lower than speed (e.g., ~16,000 KB) because only the “hot” startup methods were compiled, leaving the rest to be handled by the interpreter or JIT only if they are ever actually used.

See:

Deep Dive into Compiled Code

If you want to see exactly what instructions ART is generating, refer to the Disassembly Guide.

It provides detailed instructions on using:

  • oatdump: To see ARM64 instructions inside an existing .odex file.
  • dex2oat: To simulate compilation with verbose debug flags.

Exercise: Code Inlining

One reason compiled code can grow unexpectedly is method inlining. The compiler may decide to copy the body of a small, frequently called method directly into its callers.

In our CodeBloat app, the doSomething() method in every generated class simply calls method0(). When compiled in speed mode, ART's Optimizing compiler will likely inline method0() into doSomething().

Exercise: Verify this using oatdump on your device:

# 1. Find the path to the application's APK and compiled .odex file
adb shell pm path com.android.codebloat
# Output: package:/data/app/~~.../base.apk

adb shell "dumpsys package com.android.codebloat | grep 'location is' | head -n 1"
# Example output: [location is /data/app/~~.../oat/arm64/base.odex]

# 2. Run oatdump (substituting the correct path to base.odex)
adb shell oatdump --oat-file=/data/app/~~.../oat/arm64/base.odex \
                  --class-filter=com.android.codebloat.GeneratedClass0

Look for the doSomething method in the output. If it was inlined, you will see the instructions to load the long string constant directly within doSomething, rather than a bl instruction targeting method0.

Visualizing the Optimization (CFG)

To see exactly when the compiler decided to inline the method, you can produce a Control Flow Graph (CFG). This shows the state of the code at every stage of the optimization pipeline, with every transformation over the compiler's Intermediate Representation (IR) until the code is lowered to the target ISA (e.g. ARM64).

  1. Run dex2oat with dump flags: Use the --verbose-methods flag to limit the output to specific methods; otherwise, the .cfg file for a large app can grow to several gigabytes.

    # Substitution of actual paths required:
    adb shell dex2oat64 --dex-file=/data/app/~~.../base.apk \
                        --oat-file=/data/local/tmp/dump.odex \
                        --compiler-filter=speed \
                        --dump-cfg=/data/local/tmp/codebloat.cfg \
                        --verbose-methods=doSomething
    
  2. Pull and View: Pull the .cfg file to your workstation and open it with IR Hydra.

  3. Find the Inliner: In IR Hydra, load the compilation artifacts and search for doSomething. Compare the representation before and after the Inliner pass. You will see the graph expand as the instructions from method0 are merged into the caller.

Alternatively, use the Opt Pipeline tool in Compiler Explorer (as described in the section above) and enter similar code to see a similar transformation performed at the Inliner pass.

Exercise: Volatile Fields and Memory Barriers

In the MemoryLab app, the mGarbageSink field is marked as volatile. This ensures that the compiler doesn't optimize away our garbage allocations.

public volatile byte[] mGarbageSink;

In the ARM64 disassembly, you will see that every store to this field is accompanied by a Memory Barrier (dmb ish) or use of Load-Acquire/Store-Release instructions (ldar/stlr). This ensures thread visibility but adds a few extra instructions to every access, increasing the code size slightly compared to a regular field.

Exercise: Find the field accesses and the associated memory barriers in the disassembly.

Exercise: Implicit Suspend Checks

If you disassemble a loop, like the one in generateAllocationChurn, you will notice a curious instruction at the end of the loop body:

ldr x21, [x21]

This is an Implicit Suspend Check. ART uses this to allow the Garbage Collector to pause threads safely. Register x21 normally points to itself. When the GC needs to suspend the thread, it “poisons” that memory location. The next time the thread executes that ldr, it will trigger a fault, which the runtime catches and uses to transition the thread into a suspended state.

This pattern is repeated in every loop and at the start of every method, contributing to the total code size of your application.

Exercise: Find all the implicit suspend checks in the method disassembly, and try to correlate them to the original source code.


Next: Threads and Memory