Compare commits

...

1 Commits

Author SHA1 Message Date
DotNaos
da58811ef1 Add basic readme 2025-05-24 12:21:21 +02:00

50
README.md Normal file
View File

@ -0,0 +1,50 @@
# Atlas Librarian
A comprehensive content processing and management system for extracting, chunking, and vectorizing information from various sources.
## Overview
Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings.
## Project Structure
```
atlas/
├── librarian/
│ ├── atlas-librarian/ # Main application
│ ├── librarian-core/ # Core functionality and storage
│ └── plugins/
│ ├── librarian-chunker/ # Content chunking
│ ├── librarian-extractor/ # Content extraction with AI
│ ├── librarian-scraper/ # Web scraping and crawling
│ └── librarian-vspace/ # Vector space operations
```
## Components
- **Atlas Librarian**: Main application with API, web app, and recipe management
- **Librarian Core**: Shared utilities, storage, and Supabase integration
- **Chunker Plugin**: Splits content into processable chunks
- **Extractor Plugin**: Extracts and sanitizes content using AI
- **Scraper Plugin**: Crawls and downloads web content
- **VSpace Plugin**: Vector embeddings and similarity search
## Getting Started
1. Clone the repository
2. Install dependencies for each component
3. Configure environment variables
4. Run the main application
## Features
- Web content scraping and crawling
- AI-powered content extraction and sanitization
- Intelligent content chunking
- Vector embeddings for semantic search
- Supabase integration for data storage
- Modular plugin architecture
---
*For detailed documentation, see the individual component directories.*