tesseract

brew install tesseract v5.5.2 Apache-2.0

OCR engine that converts scanned images and PDFs into searchable text via command-line or library API.

Why you might care

Tesseract is the gold-standard open-source OCR tool, supporting 100+ languages and delivering high accuracy on both modern and historical documents. Use it to batch-process document archives, build text-extraction pipelines, or embed OCR into your application; it's language-agnostic and trainable on custom fonts.

Alternatives

Paddleocr EasyOCR PaddleOCR Google Cloud Vision

41.1k

30-day installs · #154

115.8k

90-day · #167

679.5k

365-day · #99

Runtime dependencies

cairo fontconfig glib harfbuzz icu4c@78 leptonica libarchive pango freetype gettext

Build dependencies

autoconf automake libtool pkgconf

Caveats

This formula contains only the "eng", "osd", and "snum" language data files.
If you need any other supported languages, run `brew install tesseract-lang`.

Blurb generated by claude-haiku-4-5 on today.

Raw metadata

{
  "aliases": [],
  "alternatives": [
    "Paddleocr",
    "EasyOCR",
    "PaddleOCR",
    "Google Cloud Vision"
  ],
  "build_dependencies": [
    "autoconf",
    "automake",
    "libtool",
    "pkgconf"
  ],
  "categories": [
    "ocr",
    "image-tool",
    "library"
  ],
  "caveats": "This formula contains only the \"eng\", \"osd\", and \"snum\" language data files.\nIf you need any other supported languages, run `brew install tesseract-lang`.\n",
  "conflicts_with": [],
  "dependencies": [
    "cairo",
    "fontconfig",
    "glib",
    "harfbuzz",
    "icu4c@78",
    "leptonica",
    "libarchive",
    "pango",
    "freetype",
    "gettext"
  ],
  "deprecated": 0,
  "deprecation_reason": null,
  "desc": "OCR (Optical Character Recognition) engine",
  "disable_reason": null,
  "disabled": 0,
  "enrichment_fetched_at": "2026-06-20T23:40:41+00:00",
  "first_seen": "2026-06-20T23:34:18+00:00",
  "full_name": "tesseract",
  "github_default_branch": null,
  "github_last_commit_at": null,
  "github_readme_excerpt": null,
  "github_repo": null,
  "github_stars": null,
  "github_topics": [],
  "homepage": "https://tesseract-ocr.github.io/",
  "homepage_og_description": "Documentation",
  "homepage_og_image": null,
  "homepage_title": "Tesseract documentation | Tesseract OCR",
  "installs_30d": 41101,
  "installs_365d": 679465,
  "installs_90d": 115751,
  "keg_only": 0,
  "keg_only_reason": null,
  "last_seen": "2026-06-20T23:34:18+00:00",
  "license": "Apache-2.0",
  "llm_generated_at": "2026-06-20T23:43:04+00:00",
  "llm_model": "claude-haiku-4-5",
  "name": "tesseract",
  "oldnames": [],
  "one_liner": "OCR engine that converts scanned images and PDFs into searchable text via command-line or library API.",
  "optional_dependencies": [],
  "rank_30d": 154,
  "rank_365d": 99,
  "rank_90d": 167,
  "raw_hash": "30c83d3677dc83a2",
  "recommended_dependencies": [],
  "revision": 0,
  "ruby_source_path": "Formula/t/tesseract.rb",
  "tap": "homebrew/core",
  "test_dependencies": [],
  "uses_from_macos": [],
  "version_head": "HEAD",
  "version_stable": "5.5.2",
  "versioned_formulae": [],
  "why_use_this": "Tesseract is the gold-standard open-source OCR tool, supporting 100+ languages and delivering high accuracy on both modern and historical documents. Use it to batch-process document archives, build text-extraction pipelines, or embed OCR into your application; it\u0027s language-agnostic and trainable on custom fonts."
}