🍺 BREW Explorer

← all formulae

tesseract

brew install tesseract v5.5.2 Apache-2.0

OCR engine that converts scanned images and PDFs into searchable text via command-line or library API.

Why you might care

Tesseract is the gold-standard open-source OCR tool, supporting 100+ languages and delivering high accuracy on both modern and historical documents. Use it to batch-process document archives, build text-extraction pipelines, or embed OCR into your application; it's language-agnostic and trainable on custom fonts.

Categories

Alternatives

Paddleocr EasyOCR PaddleOCR Google Cloud Vision
41.1k
30-day installs · #154
115.8k
90-day · #167
679.5k
365-day · #99

Runtime dependencies

Build dependencies

Links

Caveats

This formula contains only the "eng", "osd", and "snum" language data files.
If you need any other supported languages, run `brew install tesseract-lang`.

Blurb generated by claude-haiku-4-5 on today.

Raw metadata
{
  "aliases": [],
  "alternatives": [
    "Paddleocr",
    "EasyOCR",
    "PaddleOCR",
    "Google Cloud Vision"
  ],
  "build_dependencies": [
    "autoconf",
    "automake",
    "libtool",
    "pkgconf"
  ],
  "categories": [
    "ocr",
    "image-tool",
    "library"
  ],
  "caveats": "This formula contains only the \"eng\", \"osd\", and \"snum\" language data files.\nIf you need any other supported languages, run `brew install tesseract-lang`.\n",
  "conflicts_with": [],
  "dependencies": [
    "cairo",
    "fontconfig",
    "glib",
    "harfbuzz",
    "icu4c@78",
    "leptonica",
    "libarchive",
    "pango",
    "freetype",
    "gettext"
  ],
  "deprecated": 0,
  "deprecation_reason": null,
  "desc": "OCR (Optical Character Recognition) engine",
  "disable_reason": null,
  "disabled": 0,
  "enrichment_fetched_at": "2026-06-20T23:40:41+00:00",
  "first_seen": "2026-06-20T23:34:18+00:00",
  "full_name": "tesseract",
  "github_default_branch": null,
  "github_last_commit_at": null,
  "github_readme_excerpt": null,
  "github_repo": null,
  "github_stars": null,
  "github_topics": [],
  "homepage": "https://tesseract-ocr.github.io/",
  "homepage_og_description": "Documentation",
  "homepage_og_image": null,
  "homepage_title": "Tesseract documentation | Tesseract OCR",
  "installs_30d": 41101,
  "installs_365d": 679465,
  "installs_90d": 115751,
  "keg_only": 0,
  "keg_only_reason": null,
  "last_seen": "2026-06-20T23:34:18+00:00",
  "license": "Apache-2.0",
  "llm_generated_at": "2026-06-20T23:43:04+00:00",
  "llm_model": "claude-haiku-4-5",
  "name": "tesseract",
  "oldnames": [],
  "one_liner": "OCR engine that converts scanned images and PDFs into searchable text via command-line or library API.",
  "optional_dependencies": [],
  "rank_30d": 154,
  "rank_365d": 99,
  "rank_90d": 167,
  "raw_hash": "30c83d3677dc83a2",
  "recommended_dependencies": [],
  "revision": 0,
  "ruby_source_path": "Formula/t/tesseract.rb",
  "tap": "homebrew/core",
  "test_dependencies": [],
  "uses_from_macos": [],
  "version_head": "HEAD",
  "version_stable": "5.5.2",
  "versioned_formulae": [],
  "why_use_this": "Tesseract is the gold-standard open-source OCR tool, supporting 100+ languages and delivering high accuracy on both modern and historical documents. Use it to batch-process document archives, build text-extraction pipelines, or embed OCR into your application; it\u0027s language-agnostic and trainable on custom fonts."
}