diff --git a/README.md b/README.md new file mode 100644 index 0000000..1412e84 --- /dev/null +++ b/README.md @@ -0,0 +1,50 @@ +# Atlas Librarian + +A comprehensive content processing and management system for extracting, chunking, and vectorizing information from various sources. + +## Overview + +Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings. + +## Project Structure + +``` +atlas/ +├── librarian/ +│ ├── atlas-librarian/ # Main application +│ ├── librarian-core/ # Core functionality and storage +│ └── plugins/ +│ ├── librarian-chunker/ # Content chunking +│ ├── librarian-extractor/ # Content extraction with AI +│ ├── librarian-scraper/ # Web scraping and crawling +│ └── librarian-vspace/ # Vector space operations +``` + +## Components + +- **Atlas Librarian**: Main application with API, web app, and recipe management +- **Librarian Core**: Shared utilities, storage, and Supabase integration +- **Chunker Plugin**: Splits content into processable chunks +- **Extractor Plugin**: Extracts and sanitizes content using AI +- **Scraper Plugin**: Crawls and downloads web content +- **VSpace Plugin**: Vector embeddings and similarity search + +## Getting Started + +1. Clone the repository +2. Install dependencies for each component +3. Configure environment variables +4. Run the main application + +## Features + +- Web content scraping and crawling +- AI-powered content extraction and sanitization +- Intelligent content chunking +- Vector embeddings for semantic search +- Supabase integration for data storage +- Modular plugin architecture + +--- + +*For detailed documentation, see the individual component directories.*