# Chunker Extract text, chunk it, and save images from a PDF. chunks is a List[str] of ~800-token strings (100-token overlap). Outputs (text files and images) are written under extracted_content//. ## Usage ```python from chunker import Chunker chunker = Chunker("path/to/file.pdf") chunks = chunker.run() Setup: pip install -r requirements.txt python -m spacy download xx_ent_wiki_sm