Hugging Face Transformers crawling on WSL2 with CUDA

Hugging Face Transformers crawling on WSL2 with CUDA

Ran a simple AutoModelForCausalLM generate on a T4 but got one token every five seconds.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, time

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

t0 = time.time()
model.generate(**tokenizer("hello", return_tensors="pt").to("cuda"))
print(time.time() - t0)  # ~5 seconds for a single token

GPU showed zero utilisation.

Ubuntu 22.04 on WSL2. NVIDIA driver 531.79 installed on Windows, nvidia-smi inside WSL worked. torch was 1.13.1+cpu by accident, built without CUDA.

Fixed it by:

  • installing the WSL‑specific CUDA driver from NVIDIA
  • pip3 install --upgrade --index-url https://download.pytorch.org/whl/cu117 torch==2.0.1+cu117 torchvision torchaudio
  • verifying with python -c "import torch; print(torch.cuda.is_available())" which now prints True
  • re‑running, generation time dropped to 0.03 seconds per token

If it still crawls, check that the process is pinned to GPU with model.to("cuda"). CPU wheels ship first in pip, easy to miss.

comments powered by Disqus