Hugging Face Transformers crawling on WSL2 with CUDA

Mar 15, 2023
transformers wsl2 cuda
1 min read

Hugging Face Transformers crawling on WSL2 with CUDA

Ran a simple AutoModelForCausalLM generate on a T4 but got one token every five seconds.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, time

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

t0 = time.time()
model.generate(**tokenizer("hello", return_tensors="pt").to("cuda"))
print(time.time() - t0)  # ~5 seconds for a single token

GPU showed zero utilisation.

Ubuntu 22.04 on WSL2. NVIDIA driver 531.79 installed on Windows, nvidia-smi inside WSL worked. torch was 1.13.1+cpu by accident, built without CUDA.

Fixed it by:

installing the WSL‑specific CUDA driver from NVIDIA
pip3 install --upgrade --index-url https://download.pytorch.org/whl/cu117 torch==2.0.1+cu117 torchvision torchaudio
verifying with python -c "import torch; print(torch.cuda.is_available())" which now prints True
re‑running, generation time dropped to 0.03 seconds per token

If it still crawls, check that the process is pinned to GPU with model.to("cuda"). CPU wheels ship first in pip, easy to miss.