🍺 BREW Explorer

← all formulae

llama.cpp

brew install llama.cpp v9740 MIT

C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.

Why you might care

Enables fast, efficient LLM inference without cloud APIs—run quantized models directly on your machine with minimal dependencies. Single-binary distribution, supports various hardware accelerators (CUDA, Metal, OpenCL), and includes tools for model conversion and optimization. Popular foundation for local LLM pipelines and embedding services.

Categories

Alternatives

vLLM ollama text-generation-webui GPT4All
39.7k
30-day installs · #160
99.6k
90-day · #181
260.4k
365-day · #242

Runtime dependencies

Build dependencies

Links

Blurb generated by claude-haiku-4-5 on today.

Raw metadata
{
  "aliases": [],
  "alternatives": [
    "vLLM",
    "ollama",
    "text-generation-webui",
    "GPT4All"
  ],
  "build_dependencies": [
    "cmake"
  ],
  "categories": [
    "ml",
    "inference",
    "compiler",
    "library"
  ],
  "caveats": null,
  "conflicts_with": [],
  "dependencies": [
    "ggml",
    "openssl@3"
  ],
  "deprecated": 0,
  "deprecation_reason": null,
  "desc": "LLM inference in C/C++",
  "disable_reason": null,
  "disabled": 0,
  "enrichment_fetched_at": "2026-06-20T23:40:42+00:00",
  "first_seen": "2026-06-20T23:34:18+00:00",
  "full_name": "llama.cpp",
  "github_default_branch": null,
  "github_last_commit_at": null,
  "github_readme_excerpt": null,
  "github_repo": null,
  "github_stars": null,
  "github_topics": [],
  "homepage": "https://llama.app",
  "homepage_og_description": "Official website for the llama.cpp project",
  "homepage_og_image": "/og-image-llama-cpp.png",
  "homepage_title": "llama.app - Official home for llama.cpp",
  "installs_30d": 39683,
  "installs_365d": 260382,
  "installs_90d": 99611,
  "keg_only": 0,
  "keg_only_reason": null,
  "last_seen": "2026-06-20T23:34:18+00:00",
  "license": "MIT",
  "llm_generated_at": "2026-06-20T23:43:07+00:00",
  "llm_model": "claude-haiku-4-5",
  "name": "llama.cpp",
  "oldnames": [],
  "one_liner": "C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.",
  "optional_dependencies": [],
  "rank_30d": 160,
  "rank_365d": 242,
  "rank_90d": 181,
  "raw_hash": "fd0f0811db93bcf0",
  "recommended_dependencies": [],
  "revision": 0,
  "ruby_source_path": "Formula/l/llama.cpp.rb",
  "tap": "homebrew/core",
  "test_dependencies": [
    "cmake"
  ],
  "uses_from_macos": [],
  "version_head": "HEAD",
  "version_stable": "9740",
  "versioned_formulae": [],
  "why_use_this": "Enables fast, efficient LLM inference without cloud APIs\u2014run quantized models directly on your machine with minimal dependencies. Single-binary distribution, supports various hardware accelerators (CUDA, Metal, OpenCL), and includes tools for model conversion and optimization. Popular foundation for local LLM pipelines and embedding services."
}