February 28, 2024
New Paper: Creating Edge AI from Cloud-based LLMs
Cyber-human and cyber-physical systems have tight end-to-end latency bounds, typically on the order of a few tens of milliseconds. In contrast, cloud-based large-language models (LLMs) have end-to-end latencies that are two to three orders of magnitude larger. This paper shows how to bridge this large gap by using LLMs as offline compilers for creating task-specific code that avoids LLM accesses. We provide three case studies as proofs of concept, and discuss the challenges in generalizing this technique to broader uses.
Dong, Qifei, Xiangliang Chen, and Mahadev Satyanarayanan. "Creating Edge AI from Cloud-based LLMs." Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications. 2024.