vLLM install

cd ~/vLLM python3 -m venv venv source venv/bin/activate pip install torch --index-url https://download.pytorch.org/whl/cu118 pip install vllm pip install huggingface_hub huggingface-cli login python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-9B-Instruct-AWQ \ --quantization awq \ --gpu-memory-utilization 0.9 \ --max-model-len 4096 With Comfyui Step 1 — Start vLLM with limit python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-9B-Instruct-AWQ \ --quantization awq \ --gpu-memory-utilization 0.6 \ --max-model-len 4096 This reserves ~60% VRAM (~14GB) ComfyUI will: Use remaining VRAM (~10GB) Work fine for most workflows needed, launch ComfyUI with: export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 start-vllm.sh #!/bin/bash cd ~/ai-stack/vllm source venv/bin/activate python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-9B-Instruct-AWQ \ --quantization awq \ --gpu-memory-utilization 0.6 start-comfyui.sh #!/bin/bash cd ~/ai-stack/comfyui/ComfyUI source ../venv/bin/activate python main.py

Comments

Popular posts from this blog

Set manual ip via command line

Exposing Docker to portainer remotely

Expanding filesystem on ubuntu