LLM_Inferenz_Server_1

Author	SHA1	Message	Date
herzogflorian	eff76401ee	Add Qwen3.5-122B-A10B-FP8 model support - Add download script (10), start script (11), and background launcher (12) for the 122B FP8 model using all 4 GPUs with TP=4 - Both models share port 7080; only one runs at a time - Update README with dual-model hardware table, switching workflow, and updated file overview - Update STUDENT_GUIDE with both model names and discovery instructions Made-with: Cursor	2026-03-02 19:00:32 +01:00
herzogflorian	f4fdaab732	Add Open WebUI integration and enhance Streamlit app - Add Open WebUI scripts (06-09) for server-hosted ChatGPT-like interface connected to the vLLM backend on port 7081 - Add context window management to chat (auto-trim, token counter, progress bar) - Add terminal output panel to file editor for running Python/LaTeX files - Update README with Open WebUI setup, architecture diagram, and troubleshooting - Update STUDENT_GUIDE with step-by-step Open WebUI login instructions Made-with: Cursor	2026-03-02 18:48:51 +01:00
herzogflorian	d59285fe69	Update student guide with full app.py documentation Add clone/venv setup instructions, feature descriptions for both tabs, sidebar parameter table, and clarify that files stay local. Made-with: Cursor	2026-03-02 16:43:21 +01:00
herzogflorian	9e1e0c0751	Add Streamlit chat app, update container to vLLM nightly - Add app.py: Streamlit UI with chat and file editor tabs - Add requirements.txt: streamlit + openai dependencies - Update vllm_qwen.def: use nightly image for Qwen3.5 support - Update README.md: reflect 35B-A3B model, correct script names - Update STUDENT_GUIDE.md: add app usage and thinking mode docs - Update .gitignore: exclude .venv/ and workspace/ Made-with: Cursor	2026-03-02 16:30:04 +01:00
herzogflorian	076001b07f	Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer Scripts to build container, download model, and serve Qwen3.5-35B-A3B via vLLM with OpenAI-compatible API on port 7080. Configured for 2x NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent students. Made-with: Cursor	2026-03-02 14:43:39 +01:00

5 Commits