tree: 7e55a5ee1ff6db736be22aaf7b6850c51c317d16 [path history] [tgz]
  1. __init__.py
  2. CMakeLists.txt
  3. eager.py
  4. export_phi-3-mini.py
  5. main.cpp
  6. phi_3_mini.py
  7. README.md
  8. runner.cpp
  9. runner.h
  10. static_cache.py
examples/models/phi-3-mini/README.md

Summary

This example demonstrates how to run a Phi-3-mini 3.8B model via ExecuTorch. We use XNNPACK to accelarate the performance and XNNPACK symmetric per channel quantization.

Instructions

Step 1: Setup

  1. Follow the tutorial to set up ExecuTorch. For installation run ./install_requirements.sh --pybind xnnpack
  2. To export Phi-3-mini, we need this PR. Install transformers from master with the following command:
pip uninstall -y transformers ; pip install git+https://github.com/huggingface/transformers

Step 2: Prepare and run the model

  1. Download the tokenizer.model from HuggingFace and create tokenizer.bin.
cd executorch
wget -O tokenizer.model "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/tokenizer.model?download=true"
python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
  1. Export the model. This step will take a few minutes to finish.
python -m examples.models.phi-3-mini.export_phi-3-mini -c "4k" -s 128 -o phi-3-mini.pte
  1. Build and run the model.
  • Build executorch with optimized CPU performance as follows. Build options available here.
cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out .

cmake --build cmake-out -j16 --target install --config Release
  • Build Phi-3-mini runner.
cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -Bcmake-out/examples/models/phi-3-mini \
    examples/models/phi-3-mini

cmake --build cmake-out/examples/models/phi-3-mini -j16 --config Release
  • Run model. Options available here
cmake-out/examples/models/phi-3-mini/phi_3_mini_runner --model_path=<model pte file> --tokenizer_path=<tokenizer.bin> --seq_len=128 --prompt=<prompt>