How to Load LLaMA 13b for Inference on a single RTX 4090

May 15, 2023 by John Graff

These large language models are huge. Until fairly recently, using 1-billion+ parameter versions of them on consumer hardware was unthinkable. Earlier this year, Facebook (or, Meta I guess) released their own LLM called LLaMA. They released it in a number of different sizes, with the smallest being 7-billion parameters, and the largest at a whopping … Read more