Auto-detect available models from the vLLM API instead of hardcoding.
Extract code blocks by matching on language tag and picking the largest
block, avoiding false matches on short pip/run commands.
Made-with: Cursor
- Add download script (10), start script (11), and background launcher (12)
for the 122B FP8 model using all 4 GPUs with TP=4
- Both models share port 7080; only one runs at a time
- Update README with dual-model hardware table, switching workflow, and
updated file overview
- Update STUDENT_GUIDE with both model names and discovery instructions
Made-with: Cursor
- Add Open WebUI scripts (06-09) for server-hosted ChatGPT-like interface
connected to the vLLM backend on port 7081
- Add context window management to chat (auto-trim, token counter, progress bar)
- Add terminal output panel to file editor for running Python/LaTeX files
- Update README with Open WebUI setup, architecture diagram, and troubleshooting
- Update STUDENT_GUIDE with step-by-step Open WebUI login instructions
Made-with: Cursor
Add Streamlit app section with setup, usage, and sidebar controls.
Document nightly Docker image requirement, scp workflow for server
sync, and practical troubleshooting tips from setup experience.
Made-with: Cursor
Thinking mode toggle, temperature, max tokens, top_p, and presence
penalty sliders in the Streamlit sidebar. Parameters apply to both
chat and file editor generation.
Made-with: Cursor
Scripts to build container, download model, and serve Qwen3.5-35B-A3B
via vLLM with OpenAI-compatible API on port 7080. Configured for 2x
NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent
students.
Made-with: Cursor