tree: 5ec05f3986a78d66612833ea2051ec52b10207f0 [path history] [tgz]
  1. cli.py
  2. gen_etrecord.py
  3. qairt_visualizer_demo.py
  4. qnn_intermediate_debugger_demo.py
  5. README.md
examples/qualcomm/util_scripts/README.md

CLI Tool for Quantize / Compile / Deploy PyTorch Model with QNN Backend

An easy-to-use tool for quantizing / compiling / executing .pte program with Qualcomm AI Engine Direct. Tool is verified with host environement.

Description

This tool aims for users who want to deploy models with ExecuTorch runtime. It's possible for them to produce .pte program in few steps.

Quantizing Model

  • Save torch.nn.Module with .pt2 format & prepare input data

    # create workspace for following operations
    cd path/to/executorch
    mkdir cli_example
    
    # take SimpleModel as an example
    import torch
    from executorch.backends.qualcomm.tests.models import SimpleModel
    from pathlib import Path
    # make example inputs
    example_inputs = (torch.randn(1, 32, 28, 28), torch.randn(1, 32, 28, 28))
    # generate ExportedProgram
    ep = torch.export.export(SimpleModel(), example_inputs)
    # save to workspace
    ws = f"{Path().cwd()}/cli_example"
    torch.export.save(ep, f"{ws}/simple_model.pt2")
    # prepare calibration dataset: 2 sets of data with 2 inputs each
    input_list = ""
    for i in range(2):
        current_input = ""
        for j in range(2):
            file_name = f"{ws}/input_{i}_{j}.pt"
            torch.save(torch.randn(1, 32, 28, 28), file_name)
            current_input += f"{file_name} "
        input_list += f"{current_input.strip()}\n"
    
    with open(f"{ws}/input_list", 'w') as f:
        f.write(input_list)
    
  • Quantize

    # user could get more information via: PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli quantize -h
    PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli quantize -a cli_example/simple_model.pt2 -o cli_example/quantize_output -c use_8a8w -i cli_example/input_list --per_channel
    
  • Artifacts for quantized .pt2 file

    • cli_example/quantize_output/simple_model_quantized.pt2

Compiling Program

  • Compile .pt2 to .pte program
    # `pip install pydot` if package is missing
    # user could get more information via: PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli compile -h
    PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli compile -a cli_example/quantize_output/simple_model_quantized.pt2 -o cli_example/compile_output -m SM8750
    
  • (Optional) Compile pre-generated context binary to .pte program
    # `pip install pydot` if package is missing
    # user could get more information via: PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli compile -h
    PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli compile -a model.bin -o path/to/model/output -m SM8750
    
  • Artifacts for .pte file and figure of graph information
    • cli_example/compile_output/simple_model_quantized.pte
    • cli_example/compile_output/simple_model_quantized.svg

Executing Program

  • Execute .pte program
    # user could get more information via: PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli execute -h
    PYTHONPATH=.. python -m examples.qualcomm.util_scripts.cli execute -a cli_example/compile_output/simple_model_quantized.pte -o cli_example/execute_output -i cli_example/input_list -s $DEVICE_SERIAL -b build-android -m SM8750
    
  • Artifacts for .pte file and figure of graph information
    • cli_example/execute_output/output_{data_index}_{output_index}.pt.
      data_index represents the sequence of dataset, output_index stands for the order of graph output.

Generate ET Record

This section describes how to generate an ET record for a .pte program using the provided script.

  • Generate ET record for .pte using the provided script:

    # Example usage to generate ET record and inspect execution statistics
    PYTHONPATH=.. python -m examples.qualcomm.util_scripts.gen_etrecord \
      -b build-android \
      --device $DEVICE_SERIAL \
      --model SM8750 \
    
  • This script will:

    • Quantize and compile a sample model to generate .pte file.
    • Push the model and input data to the device and execute the program.
    • Retrieve the execution dump from the device and generate an ET record (etrecord.bin).
    • Use the Inspector API to display execution statistics.
  • Artifacts generated:

    • qnn_simple_model.pte: Compiled program.
    • etdump.etdp: Execution dump from device.
    • etrecord.bin: ET record for analysis.
    • Printed statistics table in the console.
  • refer to the runtime-profiling for more details.

Example console output:

event_block_nameevent_namerawp10 (cycles)p50 (cycles)p90 (cycles)avg (cycles)min (cycles)max (cycles)op_typesdelegate_debug_identifierstack_tracesmodule_hierarchyis_delegated_opdelegate_backend_namedebug_datastart_time
............
Executeaten_relu_default_3:OpId_60 (cycles)[2045.0]2045.02045.02045.02045.02045.02045.0[]aten_relu_default_3:OpId_60 (cycles){}{}TrueQnnBackend[][0]
Executeaten_add_tensor:OpId_61 (cycles)[10271.0]10271.010271.010271.010271.010271.010271.0[]aten_add_tensor:OpId_61 (cycles){}{}TrueQnnBackend[][0]
Executeaten_permute_copy_default_4:OpId_63 (cycles)[31959.0]31959.031959.031959.031959.031959.031959.0[]aten_permute_copy_default_4:OpId_63 (cycles){}{}TrueQnnBackend[][0]
Executeaten_mean_dim:OpId_65 (cycles)[11008.0]11008.011008.011008.011008.011008.011008.0[]aten_mean_dim:OpId_65 (cycles){}{}TrueQnnBackend[][0]
Executeaten_view_copy_default:OpId_67 (cycles)[5893.0]5893.05893.05893.05893.05893.05893.0[]aten_view_copy_default:OpId_67 (cycles){}{}TrueQnnBackend[][0]
Executeaten_linear_default:OpId_70 (cycles)[0.0]0.00.00.00.00.00.0[]aten_linear_default:OpId_70 (cycles){}{}TrueQnnBackend[][0]
Executeaten_hardtanh_default:OpId_72 (cycles)[9799.0]9799.09799.09799.09799.09799.09799.0[]aten_hardtanh_default:OpId_72 (cycles){}{}TrueQnnBackend[][0]
............