# Atlas Librarian

A comprehensive content processing and management system for extracting, chunking, and vectorizing information from various sources.

## Overview

Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings.

## Project Structure

```
atlas/
├── librarian/
│   ├── atlas-librarian/     # Main application
│   ├── librarian-core/      # Core functionality and storage
│   └── plugins/
│       ├── librarian-chunker/    # Content chunking
│       ├── librarian-extractor/  # Content extraction with AI
│       ├── librarian-scraper/    # Web scraping and crawling
│       └── librarian-vspace/     # Vector space operations
```

## Components

- **Atlas Librarian**: Main application with API, web app, and recipe management
- **Librarian Core**: Shared utilities, storage, and Supabase integration
- **Chunker Plugin**: Splits content into processable chunks
- **Extractor Plugin**: Extracts and sanitizes content using AI
- **Scraper Plugin**: Crawls and downloads web content
- **VSpace Plugin**: Vector embeddings and similarity search

## Getting Started

1. Clone the repository
2. Install dependencies for each component
3. Configure environment variables
4. Run the main application

## Features

- Web content scraping and crawling
- AI-powered content extraction and sanitization
- Intelligent content chunking
- Vector embeddings for semantic search
- Supabase integration for data storage
- Modular plugin architecture

---

*For detailed documentation, see the individual component directories.*