# Atlas Librarian A comprehensive content processing and management system for extracting, chunking, and vectorizing information from various sources. ## Overview Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings. ## Project Structure ``` atlas/ ├── librarian/ │ ├── atlas-librarian/ # Main application │ ├── librarian-core/ # Core functionality and storage │ └── plugins/ │ ├── librarian-chunker/ # Content chunking │ ├── librarian-extractor/ # Content extraction with AI │ ├── librarian-scraper/ # Web scraping and crawling │ └── librarian-vspace/ # Vector space operations ``` ## Components - **Atlas Librarian**: Main application with API, web app, and recipe management - **Librarian Core**: Shared utilities, storage, and Supabase integration - **Chunker Plugin**: Splits content into processable chunks - **Extractor Plugin**: Extracts and sanitizes content using AI - **Scraper Plugin**: Crawls and downloads web content - **VSpace Plugin**: Vector embeddings and similarity search ## Getting Started 1. Clone the repository 2. Install dependencies for each component 3. Configure environment variables 4. Run the main application ## Features - Web content scraping and crawling - AI-powered content extraction and sanitization - Intelligent content chunking - Vector embeddings for semantic search - Supabase integration for data storage - Modular plugin architecture --- *For detailed documentation, see the individual component directories.*