2025-06-01 17:32:50 +02:00
2025-06-01 17:32:50 +02:00
2025-05-24 12:15:48 +02:00
2025-06-01 17:32:50 +02:00

Atlas Librarian

A comprehensive content processing and management system for extracting, chunking, and vectorizing information from various sources.

Overview

Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings.

Project Structure

atlas/
├── librarian/
│   ├── atlas-librarian/     # Main application
│   ├── librarian-core/      # Core functionality and storage
│   └── plugins/
│       ├── librarian-chunker/    # Content chunking
│       ├── librarian-extractor/  # Content extraction with AI
│       ├── librarian-scraper/    # Web scraping and crawling
│       └── librarian-vspace/     # Vector space operations

Components

  • Atlas Librarian: Main application with API, web app, and recipe management
  • Librarian Core: Shared utilities, storage, and Supabase integration
  • Chunker Plugin: Splits content into processable chunks
  • Extractor Plugin: Extracts and sanitizes content using AI
  • Scraper Plugin: Crawls and downloads web content
  • VSpace Plugin: Vector embeddings and similarity search

Getting Started

  1. Clone the repository
  2. Install dependencies for each component
  3. Configure environment variables
  4. Run the main application

Features

  • Web content scraping and crawling
  • AI-powered content extraction and sanitization
  • Intelligent content chunking
  • Vector embeddings for semantic search
  • Supabase integration for data storage
  • Modular plugin architecture

For detailed documentation, see the individual component directories.

Description
Programmieren und Prompt Engineering Semester Projekt Von: - Oliver Schütz - Michael Graber - Sven Gahlinger - Andre Ruegger
Readme 10 MiB
Languages
TypeScript 51.5%
Python 47.5%
JavaScript 1%