examples/qualcomm/qaihub_scripts/utils/README.md - platform/external/executorch - Git at Google

 # CLI Tool for Compile / Deploy Pre-Built QNN Artifacts

 An easy-to-use tool for generating / executing .pte program from pre-built model libraries / context binaries from Qualcomm AI Engine Direct. Tool is verified with [host environement](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#host-os).

 ## Description

 This tool aims for users who want to leverage ExecuTorch runtime framework with their existent artifacts generated by QNN. It's possible for them to produce .pte program in few steps.<br/>
 If users are interested in well-known applications, [Qualcomm AI HUB](https://aihub.qualcomm.com/) is a great approach which provides tons of optimized state-of-the-art models ready for deploying. All of them could be downloaded in model library or context binary format.

 * Model libraries(.so) came from `qnn-model-lib-generator` | AI HUB, or context binaries(.bin) came from `qnn-context-binary-generator` | AI HUB, could apply tool directly with:
   - To produce .pte program:
     ```bash
     $ python export.py compile
     ```
   - To perform inference with generated .pte program:
     ```bash
     $ python export.py execute
     ```

 ### Dependencies

 * Register for Qualcomm AI HUB.
 * Download the corresponding QNN SDK via [link](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk) which your favorite model is compiled with. Ths link will automatically download the latest version at this moment (users should be able to specify version soon, please refer to [this](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#software) for earlier releases).

 ### Target Model

 * Consider using [virtual environment](https://app.aihub.qualcomm.com/docs/hub/getting_started.html) for AI HUB scripts to prevent package conflict against ExecuTorch. Please finish the [installation section](https://app.aihub.qualcomm.com/docs/hub/getting_started.html#installation) before proceeding following steps.
 * Take [QuickSRNetLarge-Quantized](https://aihub.qualcomm.com/models/quicksrnetlarge_quantized?searchTerm=quantized) as an example, please [install](https://huggingface.co/qualcomm/QuickSRNetLarge-Quantized#installation) package as instructed.
 * Create workspace and export pre-built model library:
   ```bash
   mkdir $MY_WS && cd $MY_WS
   # target chipset is `SM8650`
   python -m qai_hub_models.models.quicksrnetlarge_quantized.export --target-runtime qnn --chipset qualcomm-snapdragon-8gen3
   ```
 * The compiled model library will be located under `$MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so`. This model library maps to the artifacts generated by SDK tools mentioned in `Integration workflow` section on [Qualcomm AI Engine Direct document](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html).

 ### Compiling Program

 * Compile .pte program
   ```bash
   # `pip install pydot` if package is missing
   # Note that device serial & hostname might not be required if given artifacts is in context binary format
   PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py compile -a $MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so -m SM8650 -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android
   ```
 * Artifacts for checking IO information
   - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.json`
   - `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.svg`

 ### Executing Program

 * Prepare test image
   ```bash
   cd $MY_WS
   wget https://user-images.githubusercontent.com/12981474/40157448-eff91f06-5953-11e8-9a37-f6b5693fa03f.png -O baboon.png
   ```
   Execute following python script to generate input data:
   ```python
   import torch
   import torchvision.transforms as transforms
   from PIL import Image
   img = Image.open('baboon.png').resize((128, 128))
   transform = transforms.Compose([transforms.PILToTensor()])
   # convert (C, H, W) to (N, H, W, C)
   # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg
   img = transform(img).permute(1, 2, 0).unsqueeze(0)
   torch.save(img, 'baboon.pt')
   ```
 * Execute .pte program
   ```bash
   PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py execute -p output_pte/quicksrnetlarge_quantized -i baboon.pt -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android
   ```
 * Post-process generated data
   ```bash
   cd output_data
   ```
   Execute following python script to generate output image:
   ```python
   import io
   import torch
   import torchvision.transforms as transforms
   # IO tensor info. could be checked with quicksrnetlarge_quantized.json | .svg
   # generally we would have same layout for input / output tensors: e.g. either NHWC or NCHW
   # this might not be true under different converter configurations
   # learn more with converter tool from Qualcomm AI Engine Direct documentation
   # https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/tools.html#model-conversion
   with open('output__142.pt', 'rb') as f:
       buffer = io.BytesIO(f.read())
   img = torch.load(buffer, weights_only=False)
   transform = transforms.Compose([transforms.ToPILImage()])
   img_pil = transform(img.squeeze(0))
   img_pil.save('baboon_upscaled.png')
   ```
   You could check the upscaled result now!

 ## Help

 Please check help messages for more information:
 ```bash
 PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/export.py -h
 PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py compile -h
 PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py execute -h
 ```
	# CLI Tool for Compile / Deploy Pre-Built QNN Artifacts

	An easy-to-use tool for generating / executing .pte program from pre-built model libraries / context binaries from Qualcomm AI Engine Direct. Tool is verified with [host environement](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#host-os).

	## Description

	This tool aims for users who want to leverage ExecuTorch runtime framework with their existent artifacts generated by QNN. It's possible for them to produce .pte program in few steps.<br/>
	If users are interested in well-known applications, [Qualcomm AI HUB](https://aihub.qualcomm.com/) is a great approach which provides tons of optimized state-of-the-art models ready for deploying. All of them could be downloaded in model library or context binary format.

	* Model libraries(.so) came from `qnn-model-lib-generator` \| AI HUB, or context binaries(.bin) came from `qnn-context-binary-generator` \| AI HUB, could apply tool directly with:
	- To produce .pte program:
	```bash
	$ python export.py compile
	```
	- To perform inference with generated .pte program:
	```bash
	$ python export.py execute
	```

	### Dependencies

	* Register for Qualcomm AI HUB.
	* Download the corresponding QNN SDK via [link](https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk) which your favorite model is compiled with. Ths link will automatically download the latest version at this moment (users should be able to specify version soon, please refer to [this](../../../../docs/source/build-run-qualcomm-ai-engine-direct-backend.md#software) for earlier releases).

	### Target Model

	* Consider using [virtual environment](https://app.aihub.qualcomm.com/docs/hub/getting_started.html) for AI HUB scripts to prevent package conflict against ExecuTorch. Please finish the [installation section](https://app.aihub.qualcomm.com/docs/hub/getting_started.html#installation) before proceeding following steps.
	* Take [QuickSRNetLarge-Quantized](https://aihub.qualcomm.com/models/quicksrnetlarge_quantized?searchTerm=quantized) as an example, please [install](https://huggingface.co/qualcomm/QuickSRNetLarge-Quantized#installation) package as instructed.
	* Create workspace and export pre-built model library:
	```bash
	mkdir $MY_WS && cd $MY_WS
	# target chipset is `SM8650`
	python -m qai_hub_models.models.quicksrnetlarge_quantized.export --target-runtime qnn --chipset qualcomm-snapdragon-8gen3
	```
	* The compiled model library will be located under `$MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so`. This model library maps to the artifacts generated by SDK tools mentioned in `Integration workflow` section on [Qualcomm AI Engine Direct document](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html).

	### Compiling Program

	* Compile .pte program
	```bash
	# `pip install pydot` if package is missing
	# Note that device serial & hostname might not be required if given artifacts is in context binary format
	PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py compile -a $MY_WS/build/quicksrnetlarge_quantized/quicksrnetlarge_quantized.so -m SM8650 -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android
	```
	* Artifacts for checking IO information
	- `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.json`
	- `output_pte/quicksrnetlarge_quantized/quicksrnetlarge_quantized.svg`

	### Executing Program

	* Prepare test image
	```bash
	cd $MY_WS
	wget https://user-images.githubusercontent.com/12981474/40157448-eff91f06-5953-11e8-9a37-f6b5693fa03f.png -O baboon.png
	```
	Execute following python script to generate input data:
	```python
	import torch
	import torchvision.transforms as transforms
	from PIL import Image
	img = Image.open('baboon.png').resize((128, 128))
	transform = transforms.Compose([transforms.PILToTensor()])
	# convert (C, H, W) to (N, H, W, C)
	# IO tensor info. could be checked with quicksrnetlarge_quantized.json \| .svg
	img = transform(img).permute(1, 2, 0).unsqueeze(0)
	torch.save(img, 'baboon.pt')
	```
	* Execute .pte program
	```bash
	PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/qaihub_scripts/utils/export.py execute -p output_pte/quicksrnetlarge_quantized -i baboon.pt -s $DEVICE_SERIAL -b $EXECUTORCH_ROOT/build-android
	```
	* Post-process generated data
	```bash
	cd output_data
	```
	Execute following python script to generate output image:
	```python
	import io
	import torch
	import torchvision.transforms as transforms
	# IO tensor info. could be checked with quicksrnetlarge_quantized.json \| .svg
	# generally we would have same layout for input / output tensors: e.g. either NHWC or NCHW
	# this might not be true under different converter configurations
	# learn more with converter tool from Qualcomm AI Engine Direct documentation
	# https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/tools.html#model-conversion
	with open('output__142.pt', 'rb') as f:
	buffer = io.BytesIO(f.read())
	img = torch.load(buffer, weights_only=False)
	transform = transforms.Compose([transforms.ToPILImage()])
	img_pil = transform(img.squeeze(0))
	img_pil.save('baboon_upscaled.png')
	```
	You could check the upscaled result now!

	## Help

	Please check help messages for more information:
	```bash
	PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/export.py -h
	PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py compile -h
	PYTHONPATH=$EXECUTORCH_ROOT/.. python $EXECUTORCH_ROOT/examples/qualcomm/aihub/utils/python export.py execute -h
	```