LLM_Inferenz_Server_1

6 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
herzogflorian	f4fdaab732	Add Open WebUI integration and enhance Streamlit app - Add Open WebUI scripts (06-09) for server-hosted ChatGPT-like interface connected to the vLLM backend on port 7081 - Add context window management to chat (auto-trim, token counter, progress bar) - Add terminal output panel to file editor for running Python/LaTeX files - Update README with Open WebUI setup, architecture diagram, and troubleshooting - Update STUDENT_GUIDE with step-by-step Open WebUI login instructions Made-with: Cursor	2026-03-02 18:48:51 +01:00
herzogflorian	d59285fe69	Update student guide with full app.py documentation Add clone/venv setup instructions, feature descriptions for both tabs, sidebar parameter table, and clarify that files stay local. Made-with: Cursor	2026-03-02 16:43:21 +01:00
herzogflorian	deee5038d1	Update README to reflect current project state Add Streamlit app section with setup, usage, and sidebar controls. Document nightly Docker image requirement, scp workflow for server sync, and practical troubleshooting tips from setup experience. Made-with: Cursor	2026-03-02 16:42:33 +01:00
herzogflorian	12f9e3ac9b	Add LLM parameter controls to sidebar Thinking mode toggle, temperature, max tokens, top_p, and presence penalty sliders in the Streamlit sidebar. Parameters apply to both chat and file editor generation. Made-with: Cursor	2026-03-02 16:41:05 +01:00
herzogflorian	9e1e0c0751	Add Streamlit chat app, update container to vLLM nightly - Add app.py: Streamlit UI with chat and file editor tabs - Add requirements.txt: streamlit + openai dependencies - Update vllm_qwen.def: use nightly image for Qwen3.5 support - Update README.md: reflect 35B-A3B model, correct script names - Update STUDENT_GUIDE.md: add app usage and thinking mode docs - Update .gitignore: exclude .venv/ and workspace/ Made-with: Cursor	2026-03-02 16:30:04 +01:00
herzogflorian	076001b07f	Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer Scripts to build container, download model, and serve Qwen3.5-35B-A3B via vLLM with OpenAI-compatible API on port 7080. Configured for 2x NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent students. Made-with: Cursor	2026-03-02 14:43:39 +01:00