OK after installing vLLM, here are what i did: hf download cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit --local-dir models/gemma-4-26B-A4B-it-AWQ-4bit --> check the model is downloaded in the dir specified --> this is the tuned setting for me: vllm serve models/gemma-4-26B-A4B-it-AWQ-4bit --served-model-name gemma-4-26B-A4B-it-AWQ-4bit --max-model-len 20480 --gpu-memory-utilization 0.9 --enforce-eager --enable-auto-tool-choice --tool-call-parser gemma4 --default-chat-template-kwargs '{"enable_thinking": true}' Why this command works: --default-chat-template-kwargs: This is the global server flag that tells vLLM to pass enable_thinking=True to the Gemma 4 tokenizer every time it prepares a prompt. --enforce-eager: Critical for your 3090; it prevents CUDA graph overhead which can save you up to 2GB of VRAM. --max-model-len 20480: Your safe upper limit. In openclaw, i need to make the following changes: Setting,Value,Purpose contextWindow,20480,T...