This section provides guidance on enabling Large Language Models (LLMs), starting with a simple example and gradually introducing new concepts to improve performance and productivity.
To run this tutorial, you’ll first need to first Set up your ExecuTorch environment.
We highly suggest you to check out LLama2 README in our examples for end-to-end Llama2 mobile demo.
Let's create a simple LLM app from scratch. TODO
Most LLMs are too large to fit into a mobile phone, making quantization necessary. In this example, we will demonstrate how to use the XNNPACKQuantizer to quantize the model and run it on a CPU. TODO
One of the benefits of ExecuTorch is the ability to delegate to mobile accelerators. Now, we will show a few examples of how to easily take advantage of mobile accelerators. TODO
It is sometimes necessary to profile and inspect the execution process. In this example, we will demonstrate how the ExecuTorch SDK can be used to identify which operations are being executed on which hardware. TODO
In some cases, it is necessary to write custom kernels or import them from another source in order to achieve the desired performance. In this example, we will demonstrate how to use the kvcache_with_sdpa kernel.
Here's how to finally build a mobile app on Android and iOS. TODO
For Android demo app build, please see this tutorial.