| # CLI Tool for Compile / Deploy Pre-Built QNN Artifacts |
| |
| An easy-to-use tool for generating / executing .pte program from pre-built model libraries / context binaries from Qualcomm AI Engine Direct. Tool is verified with [host environement](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#host-os). |
| |
| ## Description |
| |
| This tool aims for users who want to leverage ExecuTorch runtime framework with their existent artifacts generated by QNN. It's possible for them to produce .pte program in few steps.<br/> |
| If users are interested in well-known applications, [Qualcomm AI HUB](https://aihub.qualcomm.com/) is a great approach which provides tons of optimized state-of-the-art models ready for deploying. All of them could be downloaded in model library or context binary format. |
| |
| * Model libraries(.so) came from `qnn-model-lib-generator` | AI HUB, or context binaries(.bin) came from `qnn-context-binary-generator` | AI HUB, could apply tool directly with: |
| - To produce .pte program: |
| ```bash |
| $ python export.py compile |
| ``` |
| - To perform inference with generated .pte program: |
| ```bash |
| $ python export.py execute |
| ``` |
| |
| ### Dependencies |
| |
| * Register for Qualcomm AI HUB. |
| * Download the corresponding QNN SDK via [link](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk) which your favorite model is compiled with. Ths link will automatically download the latest version at this moment (users should be able to specify version soon, please refer to [this](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#software) for earlier releases). |
| |
| ### Target Model |
| |
| * Consider using [virtual environment](https://app.aihub.qualcomm.com/docs/hub/getting_started.html) for AI HUB scripts to prevent package conflict against ExecuTorch. Please finish the [installation section](https://app.aihub.qualcomm.com/docs/hub/getting_started.html#installation) before proceeding following steps. |
| * Take [QuickSRNetLarge-Quantized](https://aihub.qualcomm.com/models/quicksrnetlarge_quantized?searchTerm=quantized) as an example, please [install](https://huggingface.co/qualcomm/QuickSRNetLarge-Quantized#installation) package as instructed. |
| * Create workspace and export pre-built model library: |
| ```bash |
| mkdir $MY_WS && cd $MY_WS |
| # target chipset is `SM8650` |
| python -m qai_hub_models.models.quicksrnetlarge_quantized.export --target-runtime qnn --chipset qualcomm-snapdragon-8gen3 |
| ``` |
| * The compiled model library will be located under `$MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so`. This model library maps to the artifacts generated by SDK tools mentioned in `Integration workflow` section on [Qualcomm AI Engine Direct document](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html). |
| |
| ### Compiling Program |
| |
| * Compile .pte program |
| ```bash |
| # `pip install pydot` if package is missing |
| # Note that device serial & hostname might not be required if given artifacts is in context binary format |
| PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py compile -a $MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so -m SM8650 -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android |
| ``` |
| * Artifacts for checking IO information |
| - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.json` |
| - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.svg` |
| |
| ### Executing Program |
| |
| * Prepare test image |
| ```bash |
| cd $MY_WS |
| wget https://user-images.githubusercontent.com/12981474/40157448-eff91f06-5953-11e8-9a37-f6b5693fa03f.png -O baboon.png |
| ``` |
| Execute following python script to generate input data: |
| ```python |
| import torch |
| import torchvision.transforms as transforms |
| from PIL import Image |
| img = Image.open('baboon.png').resize((128, 128)) |
| transform = transforms.Compose([transforms.PILToTensor()]) |
| # convert (C, H, W) to (N, H, W, C) |
| # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg |
| img = transform(img).permute(1, 2, 0).unsqueeze(0) |
| torch.save(img, 'baboon.pt') |
| ``` |
| * Execute .pte program |
| ```bash |
| PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py execute -p output_pte/quicksrnetlarge_quantized -i baboon.pt -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android |
| ``` |
| * Post-process generated data |
| ```bash |
| cd output_data |
| ``` |
| Execute following python script to generate output image: |
| ```python |
| import io |
| import torch |
| import torchvision.transforms as transforms |
| # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg |
| # generally we would have same layout for input / output tensors: e.g. either NHWC or NCHW |
| # this might not be true under different converter configurations |
| # learn more with converter tool from Qualcomm AI Engine Direct documentation |
| # https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/tools.html#model-conversion |
| with open('output__142.pt', 'rb') as f: |
| buffer = io.BytesIO(f.read()) |
| img = torch.load(buffer, weights_only=False) |
| transform = transforms.Compose([transforms.ToPILImage()]) |
| img_pil = transform(img.squeeze(0)) |
| img_pil.save('baboon_upscaled.png') |
| ``` |
| You could check the upscaled result now! |
| |
| ## Help |
| |
| Please check help messages for more information: |
| ```bash |
| PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/export.py -h |
| PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py compile -h |
| PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py execute -h |
| ``` |