Gpu
- Mistral inference on local GPU hits OOM with 13B model
Mistral inference on local GPU hits OOM with 13B model Last week I tried the 13B Mistral model on a single RTX 3060 (12 GB). python main.py crashed instantly: RuntimeError: CUDA out of memory. Tried to allocate 10.2 GiB torch.cuda.mem_get_info() showed only 11.7 GB free.