diff --git a/README.md b/README.md
index 75d452d..7c6dd8d 100644
--- a/README.md
+++ b/README.md
@@ -2,12 +2,13 @@
 
 Self-hosted LLM inference for ~15 concurrent students using **Qwen3.5-35B-A3B**
 (MoE, 35B total / 3B active per token), served via **vLLM** inside an
-**Apptainer** container on a GPU server.
+**Apptainer** container on a GPU server. Includes a **Streamlit web app** for
+chat and file editing.
 
 ## Architecture
 
 ```
-Students (OpenAI SDK / curl)
+Students (Streamlit App / OpenAI SDK / curl)
         │
         ▼
   ┌──────────────────────────────┐
@@ -67,6 +68,10 @@ cd ~/LLM_local
 chmod +x *.sh
 ```
 
+> **Note**: `git` is not installed on the host. Use the container:
+> `apptainer exec vllm_qwen.sif git clone ...`
+> Or copy files via `scp` from your local machine.
+
 ### Step 2: Check GPU and Environment
 
 ```bash
@@ -81,9 +86,9 @@ df -h ~
 bash 01_build_container.sh
 ```
 
-Pulls the `vllm/vllm-openai:latest` Docker image, upgrades vLLM to nightly
-(required for Qwen3.5 support), installs latest `transformers` from source,
-and packages everything into `vllm_qwen.sif` (~8 GB). Takes 15-20 minutes.
+Pulls the `vllm/vllm-openai:nightly` Docker image (required for Qwen3.5
+support), installs latest `transformers` from source, and packages everything
+into `vllm_qwen.sif` (~8 GB). Takes 15-20 minutes.
 
 ### Step 4: Download the Model (~67 GB)
 
@@ -122,9 +127,11 @@ From another terminal on the server:
 curl http://localhost:7080/v1/models
 ```
 
-Or run the full test (uses `openai` SDK inside the container):
+Quick chat test:
 ```bash
-apptainer exec --writable-tmpfs vllm_qwen.sif python3 test_server.py
+curl http://localhost:7080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model":"qwen3.5-35b-a3b","messages":[{"role":"user","content":"Hello!"}],"max_tokens":128}'
 ```
 
 ### Step 7: Share with Students
@@ -135,7 +142,51 @@ Distribute `STUDENT_GUIDE.md` with connection details:
 
 ---
 
-## Configuration
+## Streamlit App
+
+A web-based chat and file editor that connects to the inference server.
+Students run it on their own machines.
+
+### Setup
+
+```bash
+pip install -r requirements.txt
+```
+
+Or with a virtual environment:
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+### Run
+
+```bash
+streamlit run app.py
+```
+
+Opens at `http://localhost:8501` with two tabs:
+
+- **Chat** — Conversational interface with streaming responses. Save the
+  model's last response directly into a workspace file (code auto-extracted).
+- **File Editor** — Create/edit `.py`, `.tex`, `.html`, or any text file.
+  Use "Generate with LLM" to modify files via natural language instructions.
+
+### Sidebar Controls
+
+| Parameter | Default | Range | Purpose |
+|-----------|---------|-------|---------|
+| Thinking Mode | Off | Toggle | Chain-of-thought reasoning (slower, better for complex tasks) |
+| Temperature | 0.7 | 0.0 – 2.0 | Creativity vs determinism |
+| Max Tokens | 4096 | 256 – 16384 | Maximum response length |
+| Top P | 0.95 | 0.0 – 1.0 | Nucleus sampling threshold |
+| Presence Penalty | 0.0 | 0.0 – 2.0 | Penalize repeated topics |
+
+---
+
+## Server Configuration
 
 All configuration is via environment variables passed to `03_start_server.sh`:
 
@@ -197,7 +248,9 @@ tmux attach -t llm
 | `03_start_server.sh`             | Starts vLLM server (foreground)                      |
 | `04_start_server_background.sh`  | Starts server in background with logging             |
 | `05_stop_server.sh`              | Stops the background server                          |
-| `test_server.py`                 | Tests the running server                             |
+| `app.py`                         | Streamlit chat & file editor web app                 |
+| `requirements.txt`               | Python dependencies for the Streamlit app            |
+| `test_server.py`                 | Tests the running server via CLI                     |
 | `STUDENT_GUIDE.md`               | Instructions for students                            |
 
 ---
@@ -210,7 +263,7 @@ tmux attach -t llm
 
 ### Container build fails
 - Ensure internet access and sufficient disk space (~20 GB for build cache)
-- Try pulling manually first: `apptainer pull docker://vllm/vllm-openai:latest`
+- Try pulling manually first: `apptainer pull docker://vllm/vllm-openai:nightly`
 
 ### "No NVIDIA GPU detected"
 - Verify `nvidia-smi` works on the host
@@ -218,7 +271,7 @@ tmux attach -t llm
 - Test: `apptainer exec --nv vllm_qwen.sif nvidia-smi`
 
 ### "Model type qwen3_5_moe not recognized"
-- The container needs vLLM nightly and latest transformers
+- The container needs `vllm/vllm-openai:nightly` (not `:latest`)
 - Rebuild the container: `rm vllm_qwen.sif && bash 01_build_container.sh`
 
 ### Students can't connect
@@ -229,4 +282,11 @@ tmux attach -t llm
 ### Slow generation with many users
 - Expected — vLLM batches requests but throughput is finite
 - The MoE architecture (3B active) helps with per-token speed
+- Disable thinking mode for faster simple responses
 - Monitor: `curl http://localhost:7080/metrics`
+
+### Syncing files to the server
+- No `git` or `pip` on the host — use `scp` from your local machine:
+```bash
+scp app.py 03_start_server.sh herzogfloria@silicon.fhgr.ch:~/LLM_local/
+```