docs/source/llm/getting-started.md - platform/external/executorch - Git at Google

 # Getting Started with LLMs via ExecuTorch

 This section provides guidance on enabling Large Language Models (LLMs), starting with a simple example and gradually introducing new concepts to improve performance and productivity.

 ## Prerequisites

 - To run this tutorial, you’ll first need to first [Set up your ExecuTorch environment](../getting-started-setup.md).

 - We highly suggest you to check out [LLama2 README](../../../examples/models/llama2/README.md) in our examples for end-to-end Llama2 mobile demo.


 ## Simple “Hello World” LLM example

 Let's create a simple LLM app from scratch. TODO

 ## Quantization

 Most LLMs are too large to fit into a mobile phone, making quantization necessary. In this example, we will demonstrate how to use the XNNPACKQuantizer to quantize the model and run it on a CPU. TODO

 ## Use Mobile Acceleration

 One of the benefits of ExecuTorch is the ability to delegate to mobile accelerators. Now, we will show a few examples of how to easily take advantage of mobile accelerators. TODO

 ## Debugging and Profiling

 It is sometimes necessary to profile and inspect the execution process. In this example, we will demonstrate how the ExecuTorch SDK can be used to identify which operations are being executed on which hardware.  TODO

 ## How to use custom kernels

 In some cases, it is necessary to write custom kernels or import them from another source in order to achieve the desired performance. In this example, we will demonstrate how to use the `kvcache_with_sdpa` kernel.

 ## How to build Mobile Apps

 Here's how to finally build a mobile app on Android and iOS. TODO

 For Android demo app build, please see [this tutorial](./llama-demo-android.md).
	# Getting Started with LLMs via ExecuTorch

	This section provides guidance on enabling Large Language Models (LLMs), starting with a simple example and gradually introducing new concepts to improve performance and productivity.

	## Prerequisites

	- To run this tutorial, you’ll first need to first [Set up your ExecuTorch environment](../getting-started-setup.md).

	- We highly suggest you to check out [LLama2 README](../../../examples/models/llama2/README.md) in our examples for end-to-end Llama2 mobile demo.


	## Simple “Hello World” LLM example

	Let's create a simple LLM app from scratch. TODO

	## Quantization

	Most LLMs are too large to fit into a mobile phone, making quantization necessary. In this example, we will demonstrate how to use the XNNPACKQuantizer to quantize the model and run it on a CPU. TODO

	## Use Mobile Acceleration

	One of the benefits of ExecuTorch is the ability to delegate to mobile accelerators. Now, we will show a few examples of how to easily take advantage of mobile accelerators. TODO

	## Debugging and Profiling

	It is sometimes necessary to profile and inspect the execution process. In this example, we will demonstrate how the ExecuTorch SDK can be used to identify which operations are being executed on which hardware. TODO

	## How to use custom kernels

	In some cases, it is necessary to write custom kernels or import them from another source in order to achieve the desired performance. In this example, we will demonstrate how to use the `kvcache_with_sdpa` kernel.

	## How to build Mobile Apps

	Here's how to finally build a mobile app on Android and iOS. TODO

	For Android demo app build, please see [this tutorial](./llama-demo-android.md).