pdf

Overview

# PDF Processing Guide

Overview

This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.

Quick Start

```python

from pypdf import PdfReader, PdfWriter

# Read a PDF

reader = PdfReader("document.pdf")

print(f"Pages: {len(reader.pages)}")

# Extract text

text = ""

for page in reader.pages:

text += page.extract_text()

```

Python Libraries

pypdf - Basic Operations

#### Merge PDFs

```python

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()

for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:

reader = PdfReader(pdf_file)

for page in reader.pages:

writer.add_page(page)

with open("merged.pdf", "wb") as output:

writer.write(output)

```

#### Split PDF

```python

reader = PdfReader("input.pdf")

Copy first, validate next

Copy the install command

Check source and behavior

Overview

Overview

Quick Start

Python Libraries

pypdf - Basic Operations

Validate with a real task

pdfplumber - Text and Table Extraction

reportlab - Create PDFs

Command-Line Tools

pdftotext (poppler-utils)

qpdf

pdftk (if available)

Common Tasks

Extract Text from Scanned PDFs

Add Watermark

Extract Images

Password Protection

Quick Reference

Next Steps

Browse skill packs

Read the install guide

Explore more skills