Command-line tool for running, creating, and sharing large language models locally using optimized inference.
Category: ml · clear
C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.
High-performance C/C++ inference engine for OpenAI's Whisper speech-to-text model with CPU and GPU acceleration.
C++ library implementing the Open Neural Network Exchange (ONNX) format for ML model serialization and inference.
Array computing framework optimized for Apple silicon with NumPy-like Python API.
C++ toolkit for optimizing and deploying AI inference models across Intel and other hardware.
C++ library for real-time computer vision, image processing, and machine learning model inference.
CLI tool that identifies which large language models can run on your hardware, with real-world performance benchmarks from the community.
Python machine learning library with dynamic neural networks and GPU acceleration support.
General-purpose speech recognition model for multilingual recognition, translation, and language identification via Python.
C library for large-scale linear classification and regression via support vector machines.