Scripts to build container, download model, and serve Qwen3.5-35B-A3B via vLLM with OpenAI-compatible API on port 7080. Configured for 2x NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent students. Made-with: Cursor
15 lines
160 B
Plaintext
15 lines
160 B
Plaintext
# Apptainer container image (large binary)
|
|
*.sif
|
|
|
|
# Logs
|
|
logs/
|
|
|
|
# Model weights (downloaded separately)
|
|
models/
|
|
|
|
# HuggingFace cache
|
|
.cache/
|
|
|
|
# macOS
|
|
.DS_Store
|