← all formulae

ocrmypdf

brew install ocrmypdf v17.7.0 MPL-2.0

Python CLI tool that adds searchable OCR text layers to PDF files using Tesseract.

Why you might care

Convert scanned PDFs into text-searchable documents without re-encoding image quality. Useful for archiving paper documents, legal records, or anything you've already scanned—ocrmypdf handles multi-page PDFs, preserves layout, and can automatically rotate pages and despeckle. Much faster and more flexible than doing OCR in a GUI tool.

Categories

pdf-tool ocr image-tool text-processor

Alternatives

ImageMagick Ghostscript pdftotext

7.1k

30-day installs · #460

18.1k

90-day · #562

53.0k

365-day · #656

Runtime dependencies

cryptography freetype ghostscript img2pdf jbig2enc libheif libpng pillow pngquant pybind11 pydantic python@3.14 qpdf tesseract unpaper

Build dependencies

cmake pkgconf

Links

https://ocrmypdf.readthedocs.io/en/latest/
Brew formula source: Formula/o/ocrmypdf.rb

Blurb generated by claude-haiku-4-5 on today.

Raw metadata

{
  "aliases": [],
  "alternatives": [
    "ImageMagick",
    "Ghostscript",
    "pdftotext"
  ],
  "build_dependencies": [
    "cmake",
    "pkgconf"
  ],
  "categories": [
    "pdf-tool",
    "ocr",
    "image-tool",
    "text-processor"
  ],
  "caveats": null,
  "conflicts_with": [],
  "dependencies": [
    "cryptography",
    "freetype",
    "ghostscript",
    "img2pdf",
    "jbig2enc",
    "libheif",
    "libpng",
    "pillow",
    "pngquant",
    "pybind11",
    "pydantic",
    "python@3.14",
    "qpdf",
    "tesseract",
    "unpaper"
  ],
  "deprecated": 0,
  "deprecation_reason": null,
  "desc": "Adds an OCR text layer to scanned PDF files",
  "disable_reason": null,
  "disabled": 0,
  "enrichment_fetched_at": "2026-06-20T23:40:53+00:00",
  "first_seen": "2026-06-20T23:34:18+00:00",
  "full_name": "ocrmypdf",
  "github_default_branch": null,
  "github_last_commit_at": null,
  "github_readme_excerpt": null,
  "github_repo": null,
  "github_stars": null,
  "github_topics": [],
  "homepage": "https://ocrmypdf.readthedocs.io/en/latest/",
  "homepage_og_description": null,
  "homepage_og_image": null,
  "homepage_title": "OCRmyPDF documentation \u2014 ocrmypdf 17.7.1 documentation",
  "installs_30d": 7101,
  "installs_365d": 52972,
  "installs_90d": 18086,
  "keg_only": 0,
  "keg_only_reason": null,
  "last_seen": "2026-06-20T23:34:18+00:00",
  "license": "MPL-2.0",
  "llm_generated_at": "2026-06-20T23:45:01+00:00",
  "llm_model": "claude-haiku-4-5",
  "name": "ocrmypdf",
  "oldnames": [],
  "one_liner": "Python CLI tool that adds searchable OCR text layers to PDF files using Tesseract.",
  "optional_dependencies": [],
  "rank_30d": 460,
  "rank_365d": 656,
  "rank_90d": 562,
  "raw_hash": "2903c64ef46cf55c",
  "recommended_dependencies": [],
  "revision": 0,
  "ruby_source_path": "Formula/o/ocrmypdf.rb",
  "tap": "homebrew/core",
  "test_dependencies": [],
  "uses_from_macos": [
    "libffi",
    "libxml2",
    "libxslt"
  ],
  "version_head": null,
  "version_stable": "17.7.0",
  "versioned_formulae": [],
  "why_use_this": "Convert scanned PDFs into text-searchable documents without re-encoding image quality. Useful for archiving paper documents, legal records, or anything you\u0027ve already scanned\u2014ocrmypdf handles multi-page PDFs, preserves layout, and can automatically rotate pages and despeckle. Much faster and more flexible than doing OCR in a GUI tool."
}