- Add download script (10), start script (11), and background launcher (12)
for the 122B FP8 model using all 4 GPUs with TP=4
- Both models share port 7080; only one runs at a time
- Update README with dual-model hardware table, switching workflow, and
updated file overview
- Update STUDENT_GUIDE with both model names and discovery instructions
Made-with: Cursor
- Add Open WebUI scripts (06-09) for server-hosted ChatGPT-like interface
connected to the vLLM backend on port 7081
- Add context window management to chat (auto-trim, token counter, progress bar)
- Add terminal output panel to file editor for running Python/LaTeX files
- Update README with Open WebUI setup, architecture diagram, and troubleshooting
- Update STUDENT_GUIDE with step-by-step Open WebUI login instructions
Made-with: Cursor
Scripts to build container, download model, and serve Qwen3.5-35B-A3B
via vLLM with OpenAI-compatible API on port 7080. Configured for 2x
NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent
students.
Made-with: Cursor