C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.
Category: inference · clear
High-performance C/C++ inference engine for OpenAI's Whisper speech-to-text model with CPU and GPU acceleration.
Category: inference · clear
C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.
High-performance C/C++ inference engine for OpenAI's Whisper speech-to-text model with CPU and GPU acceleration.