Add Questions and Answers
This commit is contained in:
parent
b6789cb5f7
commit
2e3d0fe172
36
README.md
36
README.md
@ -2,6 +2,41 @@
|
|||||||
|
|
||||||
A comprehensive content processing and management system for extracting, chunking, and vectorizing information from various sources.
|
A comprehensive content processing and management system for extracting, chunking, and vectorizing information from various sources.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 💬 Fragen & Antworten
|
||||||
|
|
||||||
|
### 📦 Wie gross sind die Daten?
|
||||||
|
- **~500 MB Rohdaten** (ca. **250 MB pro Semester**, abhängig von der Anzahl der hochgeladenen Bilder, aktuell 2 Semester)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 🧑💻 Wie viele Codezeilen hat euer Tool bisher?
|
||||||
|
- **~20 000 Lines of Code**
|
||||||
|
- Python: **~6 000**
|
||||||
|
- SQL (Datenbank): **~3 500**
|
||||||
|
- Website: **~10 000**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ⏱️ Dauert das immer 40 Minuten?
|
||||||
|
- **Ca. 40 Minuten** bei 16 GB RAM MacBook Pro
|
||||||
|
- **~20 Minuten** bei 64 GB RAM Windows Workstation
|
||||||
|
- **<1 Minute** bei nur ein/zwei Kursen
|
||||||
|
- **Mit GPU**: Noch schneller möglich, aber Batch Size muss ggf. reduziert werden, da das VRAM möglicherweise nicht ausreicht.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 📊 Warum sind die Charts grün?
|
||||||
|
- **Grün** bedeutet, dass ein Task erfolgreich abgeschlossen wurde.
|
||||||
|
- Die Farben stammen vom **Progress Indicator** von [Prefect](https://www.prefect.io/) (Workflow-Orchestrator, Darstellung ist vorgegeben):
|
||||||
|
- **Blau** = läuft (running)
|
||||||
|
- **Grün** = erfolgreich (success)
|
||||||
|
- **Rot** = fehlgeschlagen (failed)
|
||||||
|
- **Jeder Balken** steht für einen Task-Run (z. B. das Verarbeiten einer Datei)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings.
|
Atlas Librarian is a modular system designed to process, organize, and make searchable large amounts of content through web scraping, content extraction, chunking, and vector embeddings.
|
||||||
@ -17,6 +52,7 @@ atlas/
|
|||||||
│ ├── librarian-chunker/ # Content chunking
|
│ ├── librarian-chunker/ # Content chunking
|
||||||
│ ├── librarian-extractor/ # Content extraction with AI
|
│ ├── librarian-extractor/ # Content extraction with AI
|
||||||
│ ├── librarian-scraper/ # Web scraping and crawling
|
│ ├── librarian-scraper/ # Web scraping and crawling
|
||||||
|
│ ├── librarian-summarizer/ # Daily AI summarization
|
||||||
│ └── librarian-vspace/ # Vector space operations
|
│ └── librarian-vspace/ # Vector space operations
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user