herzogflorian 076001b07f Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer
Scripts to build container, download model, and serve Qwen3.5-35B-A3B
via vLLM with OpenAI-compatible API on port 7080. Configured for 2x
NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent
students.

Made-with: Cursor
2026-03-02 14:43:39 +01:00

15 lines
160 B
Plaintext

# Apptainer container image (large binary)
*.sif
# Logs
logs/
# Model weights (downloaded separately)
models/
# HuggingFace cache
.cache/
# macOS
.DS_Store