tesseract
brew install tesseract
v5.5.2
Apache-2.0
OCR engine that converts scanned images and PDFs into searchable text via command-line or library API.
Why you might care
Tesseract is the gold-standard open-source OCR tool, supporting 100+ languages and delivering high accuracy on both modern and historical documents. Use it to batch-process document archives, build text-extraction pipelines, or embed OCR into your application; it's language-agnostic and trainable on custom fonts.
41.1k
30-day installs · #154
115.8k
90-day · #167
679.5k
365-day · #99
Runtime dependencies
Build dependencies
Links
- https://tesseract-ocr.github.io/
- Brew formula source: Formula/t/tesseract.rb
Caveats
This formula contains only the "eng", "osd", and "snum" language data files. If you need any other supported languages, run `brew install tesseract-lang`.
Blurb generated by claude-haiku-4-5 on today.
Raw metadata
{
"aliases": [],
"alternatives": [
"Paddleocr",
"EasyOCR",
"PaddleOCR",
"Google Cloud Vision"
],
"build_dependencies": [
"autoconf",
"automake",
"libtool",
"pkgconf"
],
"categories": [
"ocr",
"image-tool",
"library"
],
"caveats": "This formula contains only the \"eng\", \"osd\", and \"snum\" language data files.\nIf you need any other supported languages, run `brew install tesseract-lang`.\n",
"conflicts_with": [],
"dependencies": [
"cairo",
"fontconfig",
"glib",
"harfbuzz",
"icu4c@78",
"leptonica",
"libarchive",
"pango",
"freetype",
"gettext"
],
"deprecated": 0,
"deprecation_reason": null,
"desc": "OCR (Optical Character Recognition) engine",
"disable_reason": null,
"disabled": 0,
"enrichment_fetched_at": "2026-06-20T23:40:41+00:00",
"first_seen": "2026-06-20T23:34:18+00:00",
"full_name": "tesseract",
"github_default_branch": null,
"github_last_commit_at": null,
"github_readme_excerpt": null,
"github_repo": null,
"github_stars": null,
"github_topics": [],
"homepage": "https://tesseract-ocr.github.io/",
"homepage_og_description": "Documentation",
"homepage_og_image": null,
"homepage_title": "Tesseract documentation | Tesseract OCR",
"installs_30d": 41101,
"installs_365d": 679465,
"installs_90d": 115751,
"keg_only": 0,
"keg_only_reason": null,
"last_seen": "2026-06-20T23:34:18+00:00",
"license": "Apache-2.0",
"llm_generated_at": "2026-06-20T23:43:04+00:00",
"llm_model": "claude-haiku-4-5",
"name": "tesseract",
"oldnames": [],
"one_liner": "OCR engine that converts scanned images and PDFs into searchable text via command-line or library API.",
"optional_dependencies": [],
"rank_30d": 154,
"rank_365d": 99,
"rank_90d": 167,
"raw_hash": "30c83d3677dc83a2",
"recommended_dependencies": [],
"revision": 0,
"ruby_source_path": "Formula/t/tesseract.rb",
"tap": "homebrew/core",
"test_dependencies": [],
"uses_from_macos": [],
"version_head": "HEAD",
"version_stable": "5.5.2",
"versioned_formulae": [],
"why_use_this": "Tesseract is the gold-standard open-source OCR tool, supporting 100+ languages and delivering high accuracy on both modern and historical documents. Use it to batch-process document archives, build text-extraction pipelines, or embed OCR into your application; it\u0027s language-agnostic and trainable on custom fonts."
}