LLM_Inferenz_Server_1

3 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
herzogflorian	12f9e3ac9b	Add LLM parameter controls to sidebar Thinking mode toggle, temperature, max tokens, top_p, and presence penalty sliders in the Streamlit sidebar. Parameters apply to both chat and file editor generation. Made-with: Cursor	2026-03-02 16:41:05 +01:00
herzogflorian	9e1e0c0751	Add Streamlit chat app, update container to vLLM nightly - Add app.py: Streamlit UI with chat and file editor tabs - Add requirements.txt: streamlit + openai dependencies - Update vllm_qwen.def: use nightly image for Qwen3.5 support - Update README.md: reflect 35B-A3B model, correct script names - Update STUDENT_GUIDE.md: add app usage and thinking mode docs - Update .gitignore: exclude .venv/ and workspace/ Made-with: Cursor	2026-03-02 16:30:04 +01:00
herzogflorian	076001b07f	Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer Scripts to build container, download model, and serve Qwen3.5-35B-A3B via vLLM with OpenAI-compatible API on port 7080. Configured for 2x NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent students. Made-with: Cursor	2026-03-02 14:43:39 +01:00

Author

SHA1

Message

Date

herzogflorian

12f9e3ac9b

Add LLM parameter controls to sidebar

Thinking mode toggle, temperature, max tokens, top_p, and presence
penalty sliders in the Streamlit sidebar. Parameters apply to both
chat and file editor generation.

Made-with: Cursor

2026-03-02 16:41:05 +01:00

herzogflorian

9e1e0c0751

Add Streamlit chat app, update container to vLLM nightly

- Add app.py: Streamlit UI with chat and file editor tabs
- Add requirements.txt: streamlit + openai dependencies
- Update vllm_qwen.def: use nightly image for Qwen3.5 support
- Update README.md: reflect 35B-A3B model, correct script names
- Update STUDENT_GUIDE.md: add app usage and thinking mode docs
- Update .gitignore: exclude .venv/ and workspace/

Made-with: Cursor

2026-03-02 16:30:04 +01:00

herzogflorian

076001b07f

Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer

Scripts to build container, download model, and serve Qwen3.5-35B-A3B
via vLLM with OpenAI-compatible API on port 7080. Configured for 2x
NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent
students.

Made-with: Cursor

2026-03-02 14:43:39 +01:00