Mistral

Mistral ·15 Jan 2024
Mistral inference on local GPU hits OOM with 13B model
Mistral inference on local GPU hits OOM with 13B model Last week I tried the 13B Mistral model on a single RTX 3060 (12 GB). python main.py crashed instantly: RuntimeError: CUDA out of memory. Tried to allocate 10.2 GiB torch.cuda.mem_get_info() showed only 11.7 GB free.