Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings.

Project Structure

atlas/
├── librarian/
│   ├── atlas-librarian/     # Main application
│   ├── librarian-core/      # Core functionality and storage
│   └── plugins/
│       ├── librarian-chunker/    # Content chunking
│       ├── librarian-extractor/  # Content extraction with AI
│       ├── librarian-scraper/    # Web scraping and crawling
│       └── librarian-vspace/     # Vector space operations

Components

Atlas Librarian: Main application with API, web app, and recipe management
Librarian Core: Shared utilities, storage, and Supabase integration
Chunker Plugin: Splits content into processable chunks
Extractor Plugin: Extracts and sanitizes content using AI
Scraper Plugin: Crawls and downloads web content
VSpace Plugin: Vector embeddings and similarity search

Getting Started

Clone the repository
Install dependencies for each component
Configure environment variables
Run the main application

Features

Web content scraping and crawling
AI-powered content extraction and sanitization
Intelligent content chunking
Vector embeddings for semantic search
Supabase integration for data storage
Modular plugin architecture

For detailed documentation, see the individual component directories.