ocrmypdf
brew install ocrmypdf
v17.7.0
MPL-2.0
Python CLI tool that adds searchable OCR text layers to PDF files using Tesseract.
Why you might care
Convert scanned PDFs into text-searchable documents without re-encoding image quality. Useful for archiving paper documents, legal records, or anything you've already scanned—ocrmypdf handles multi-page PDFs, preserves layout, and can automatically rotate pages and despeckle. Much faster and more flexible than doing OCR in a GUI tool.
7.1k
30-day installs · #460
18.1k
90-day · #562
53.0k
365-day · #656
Runtime dependencies
cryptography
freetype
ghostscript
img2pdf
jbig2enc
libheif
libpng
pillow
pngquant
pybind11
pydantic
python@3.14
qpdf
tesseract
unpaper
Build dependencies
Links
- https://ocrmypdf.readthedocs.io/en/latest/
- Brew formula source: Formula/o/ocrmypdf.rb
Blurb generated by claude-haiku-4-5 on today.
Raw metadata
{
"aliases": [],
"alternatives": [
"ImageMagick",
"Ghostscript",
"pdftotext"
],
"build_dependencies": [
"cmake",
"pkgconf"
],
"categories": [
"pdf-tool",
"ocr",
"image-tool",
"text-processor"
],
"caveats": null,
"conflicts_with": [],
"dependencies": [
"cryptography",
"freetype",
"ghostscript",
"img2pdf",
"jbig2enc",
"libheif",
"libpng",
"pillow",
"pngquant",
"pybind11",
"pydantic",
"python@3.14",
"qpdf",
"tesseract",
"unpaper"
],
"deprecated": 0,
"deprecation_reason": null,
"desc": "Adds an OCR text layer to scanned PDF files",
"disable_reason": null,
"disabled": 0,
"enrichment_fetched_at": "2026-06-20T23:40:53+00:00",
"first_seen": "2026-06-20T23:34:18+00:00",
"full_name": "ocrmypdf",
"github_default_branch": null,
"github_last_commit_at": null,
"github_readme_excerpt": null,
"github_repo": null,
"github_stars": null,
"github_topics": [],
"homepage": "https://ocrmypdf.readthedocs.io/en/latest/",
"homepage_og_description": null,
"homepage_og_image": null,
"homepage_title": "OCRmyPDF documentation \u2014 ocrmypdf 17.7.1 documentation",
"installs_30d": 7101,
"installs_365d": 52972,
"installs_90d": 18086,
"keg_only": 0,
"keg_only_reason": null,
"last_seen": "2026-06-20T23:34:18+00:00",
"license": "MPL-2.0",
"llm_generated_at": "2026-06-20T23:45:01+00:00",
"llm_model": "claude-haiku-4-5",
"name": "ocrmypdf",
"oldnames": [],
"one_liner": "Python CLI tool that adds searchable OCR text layers to PDF files using Tesseract.",
"optional_dependencies": [],
"rank_30d": 460,
"rank_365d": 656,
"rank_90d": 562,
"raw_hash": "2903c64ef46cf55c",
"recommended_dependencies": [],
"revision": 0,
"ruby_source_path": "Formula/o/ocrmypdf.rb",
"tap": "homebrew/core",
"test_dependencies": [],
"uses_from_macos": [
"libffi",
"libxml2",
"libxslt"
],
"version_head": null,
"version_stable": "17.7.0",
"versioned_formulae": [],
"why_use_this": "Convert scanned PDFs into text-searchable documents without re-encoding image quality. Useful for archiving paper documents, legal records, or anything you\u0027ve already scanned\u2014ocrmypdf handles multi-page PDFs, preserves layout, and can automatically rotate pages and despeckle. Much faster and more flexible than doing OCR in a GUI tool."
}