Hugging Face Transformers crawling on WSL2 with CUDA
Ran a simple AutoModelForCausalLM generate on a T4 but got one token every five seconds.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, time
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
t0 = time.time()
model.generate(**tokenizer("hello", return_tensors="pt").to("cuda"))
print(time.time() - t0) # ~5 seconds for a single token
GPU showed zero utilisation.
Ubuntu 22.04 on WSL2. NVIDIA driver 531.79 installed on Windows, nvidia-smi inside WSL worked. torch was 1.13.1+cpu by accident, built without CUDA.
Fixed it by:
- installing the WSL‑specific CUDA driver from NVIDIA
pip3 install --upgrade --index-url https://download.pytorch.org/whl/cu117 torch==2.0.1+cu117 torchvision torchaudio- verifying with
python -c "import torch; print(torch.cuda.is_available())"which now printsTrue - re‑running, generation time dropped to 0.03 seconds per token
If it still crawls, check that the process is pinned to GPU with model.to("cuda"). CPU wheels ship first in pip, easy to miss.
