AIAny - ExecuTorch

Introduction

Most paths from a trained PyTorch model to a phone or microcontroller force a detour through a foreign format — TFLite, ONNX, or a vendor SDK — where operators get reinterpreted and debuggability breaks down. The bet here is different: the exported PyTorch graph itself becomes the on-device artifact, compiled ahead of time and executed by a runtime small enough (~50KB) to live on a microcontroller. One model representation now stretches from server training to an MCU.

What Sets It Apart

Single representation end to end: models stay as PyTorch graphs standardized on the Core ATen operator set, so there is no lossy export step to a competing format to chase down when numbers drift.
Backend delegation, not a monolith: partitioners hand subgraphs to 12+ hardware backends (XNNPACK, CoreML, Vulkan, Qualcomm, MediaTek, OpenVINO, ARM Ethos-U), so the same model targets very different silicon without a rewrite.
Footprint you can actually fit: a ~50KB runtime plus selective operator builds and torchao 8/4-bit quantization make it realistic to run Llama, Qwen, Whisper, or YOLO where memory is measured in kilobytes.

Great Fit / Look Elsewhere

Great fit if you already train in PyTorch and need to ship the same model to iOS, Android, and embedded MCUs without maintaining parallel conversion pipelines, or if a tiny C++ runtime and per-backend delegation matter. Look elsewhere if you only deploy to cloud GPUs (eager PyTorch or a server runtime is simpler), if your target hardware lacks a mature backend here, or if you need a stable, fully settled API — the project is still moving fast and some backends are more battle-tested than others.

ExecuTorch

Introduction

What Sets It Apart

Great Fit / Look Elsewhere

Information

Categories

Tags

More Items

Triton Inference Server

codex-lb

Y2A-Auto