← all formulae

llama.cpp

brew install llama.cpp v9740 MIT

C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.

Why you might care

Enables fast, efficient LLM inference without cloud APIs—run quantized models directly on your machine with minimal dependencies. Single-binary distribution, supports various hardware accelerators (CUDA, Metal, OpenCL), and includes tools for model conversion and optimization. Popular foundation for local LLM pipelines and embedding services.

Categories

ml inference compiler library

Alternatives

vLLM ollama text-generation-webui GPT4All

39.7k

30-day installs · #160

99.6k

90-day · #181

260.4k

365-day · #242

Runtime dependencies

ggml openssl@3

Build dependencies

cmake

Links

https://llama.app
Brew formula source: Formula/l/llama.cpp.rb

Blurb generated by claude-haiku-4-5 on today.

Raw metadata

{
  "aliases": [],
  "alternatives": [
    "vLLM",
    "ollama",
    "text-generation-webui",
    "GPT4All"
  ],
  "build_dependencies": [
    "cmake"
  ],
  "categories": [
    "ml",
    "inference",
    "compiler",
    "library"
  ],
  "caveats": null,
  "conflicts_with": [],
  "dependencies": [
    "ggml",
    "openssl@3"
  ],
  "deprecated": 0,
  "deprecation_reason": null,
  "desc": "LLM inference in C/C++",
  "disable_reason": null,
  "disabled": 0,
  "enrichment_fetched_at": "2026-06-20T23:40:42+00:00",
  "first_seen": "2026-06-20T23:34:18+00:00",
  "full_name": "llama.cpp",
  "github_default_branch": null,
  "github_last_commit_at": null,
  "github_readme_excerpt": null,
  "github_repo": null,
  "github_stars": null,
  "github_topics": [],
  "homepage": "https://llama.app",
  "homepage_og_description": "Official website for the llama.cpp project",
  "homepage_og_image": "/og-image-llama-cpp.png",
  "homepage_title": "llama.app - Official home for llama.cpp",
  "installs_30d": 39683,
  "installs_365d": 260382,
  "installs_90d": 99611,
  "keg_only": 0,
  "keg_only_reason": null,
  "last_seen": "2026-06-20T23:34:18+00:00",
  "license": "MIT",
  "llm_generated_at": "2026-06-20T23:43:07+00:00",
  "llm_model": "claude-haiku-4-5",
  "name": "llama.cpp",
  "oldnames": [],
  "one_liner": "C++ inference engine for running LLaMA and other large language models locally on CPU/GPU.",
  "optional_dependencies": [],
  "rank_30d": 160,
  "rank_365d": 242,
  "rank_90d": 181,
  "raw_hash": "fd0f0811db93bcf0",
  "recommended_dependencies": [],
  "revision": 0,
  "ruby_source_path": "Formula/l/llama.cpp.rb",
  "tap": "homebrew/core",
  "test_dependencies": [
    "cmake"
  ],
  "uses_from_macos": [],
  "version_head": "HEAD",
  "version_stable": "9740",
  "versioned_formulae": [],
  "why_use_this": "Enables fast, efficient LLM inference without cloud APIs\u2014run quantized models directly on your machine with minimal dependencies. Single-binary distribution, supports various hardware accelerators (CUDA, Metal, OpenCL), and includes tools for model conversion and optimization. Popular foundation for local LLM pipelines and embedding services."
}