LLM_Inferenz_Server_1

herzogfloria/LLM_Inferenz_Server_1

Fork 0

Commit Graph

Author	SHA1	Message	Date
herzogflorian	076001b07f	Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer Scripts to build container, download model, and serve Qwen3.5-35B-A3B via vLLM with OpenAI-compatible API on port 7080. Configured for 2x NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent students. Made-with: Cursor	2026-03-02 14:43:39 +01:00

Author

SHA1

Message

Date

herzogflorian

076001b07f

Add vLLM inference setup for Qwen3.5-35B-A3B on Apptainer

Scripts to build container, download model, and serve Qwen3.5-35B-A3B
via vLLM with OpenAI-compatible API on port 7080. Configured for 2x
NVIDIA L40S GPUs with tensor parallelism, supporting ~15 concurrent
students.

Made-with: Cursor

2026-03-02 14:43:39 +01:00

1 Commits