llama.cpp
brew install llama.cpp
v9740
MIT
C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.
Why you might care
Enables fast, efficient LLM inference without cloud APIs—run quantized models directly on your machine with minimal dependencies. Single-binary distribution, supports various hardware accelerators (CUDA, Metal, OpenCL), and includes tools for model conversion and optimization. Popular foundation for local LLM pipelines and embedding services.
39.7k
30-day installs · #160
99.6k
90-day · #181
260.4k
365-day · #242
Runtime dependencies
Build dependencies
Links
- https://llama.app
- Brew formula source: Formula/l/llama.cpp.rb
Blurb generated by claude-haiku-4-5 on today.
Raw metadata
{
"aliases": [],
"alternatives": [
"vLLM",
"ollama",
"text-generation-webui",
"GPT4All"
],
"build_dependencies": [
"cmake"
],
"categories": [
"ml",
"inference",
"compiler",
"library"
],
"caveats": null,
"conflicts_with": [],
"dependencies": [
"ggml",
"openssl@3"
],
"deprecated": 0,
"deprecation_reason": null,
"desc": "LLM inference in C/C++",
"disable_reason": null,
"disabled": 0,
"enrichment_fetched_at": "2026-06-20T23:40:42+00:00",
"first_seen": "2026-06-20T23:34:18+00:00",
"full_name": "llama.cpp",
"github_default_branch": null,
"github_last_commit_at": null,
"github_readme_excerpt": null,
"github_repo": null,
"github_stars": null,
"github_topics": [],
"homepage": "https://llama.app",
"homepage_og_description": "Official website for the llama.cpp project",
"homepage_og_image": "/og-image-llama-cpp.png",
"homepage_title": "llama.app - Official home for llama.cpp",
"installs_30d": 39683,
"installs_365d": 260382,
"installs_90d": 99611,
"keg_only": 0,
"keg_only_reason": null,
"last_seen": "2026-06-20T23:34:18+00:00",
"license": "MIT",
"llm_generated_at": "2026-06-20T23:43:07+00:00",
"llm_model": "claude-haiku-4-5",
"name": "llama.cpp",
"oldnames": [],
"one_liner": "C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.",
"optional_dependencies": [],
"rank_30d": 160,
"rank_365d": 242,
"rank_90d": 181,
"raw_hash": "fd0f0811db93bcf0",
"recommended_dependencies": [],
"revision": 0,
"ruby_source_path": "Formula/l/llama.cpp.rb",
"tap": "homebrew/core",
"test_dependencies": [
"cmake"
],
"uses_from_macos": [],
"version_head": "HEAD",
"version_stable": "9740",
"versioned_formulae": [],
"why_use_this": "Enables fast, efficient LLM inference without cloud APIs\u2014run quantized models directly on your machine with minimal dependencies. Single-binary distribution, supports various hardware accelerators (CUDA, Metal, OpenCL), and includes tools for model conversion and optimization. Popular foundation for local LLM pipelines and embedding services."
}