LLM_Inferenz_Server_1/vllm_qwen.def
herzogflorian 076001b07f Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer
Scripts to build container, download model, and serve Qwen3.5-35B-A3B
via vLLM with OpenAI-compatible API on port 7080. Configured for 2x
NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent
students.

Made-with: Cursor
2026-03-02 14:43:39 +01:00

24 lines
750 B
Modula-2

Bootstrap: docker
From: vllm/vllm-openai:latest
%labels
Author herzogfloria
Description vLLM nightly inference server for Qwen3.5-35B-A3B
Version 2.0
%environment
export HF_HOME=/tmp/hf_cache
export VLLM_USAGE_SOURCE=production
%post
apt-get update && apt-get install -y --no-install-recommends git && rm -rf /var/lib/apt/lists/*
pip install --no-cache-dir vllm --extra-index-url https://wheels.vllm.ai/nightly
pip install --no-cache-dir "transformers @ git+https://github.com/huggingface/transformers.git@main"
pip install --no-cache-dir huggingface_hub[cli]
%runscript
exec python3 -m vllm.entrypoints.openai.api_server "$@"
%help
Apptainer container for serving Qwen3.5-35B-A3B via vLLM (nightly).