![]() But for now I'm sticking with Nvidia GPUs. If you have working instructions on how to get it running (under Windows 11, though using WSL2 is allowed) and you want me to try them, hit me up and I'll give it a shot. ![]() The latter requires running Linux, and after fighting with that stuff to do Stable Diffusion benchmarks earlier this year, I just gave it a pass for now. In theory, you can get the text generation web UI running on Nvidia's GPUs via CUDA, or AMD's graphics cards via ROCm. Testing Text Generation Web UI Performance ![]() Hopefully the people downloading these models don't have a data cap on their internet connection. There's even a 65 billion parameter model, in case you have an Nvidia A100 40GB PCIe (opens in new tab) card handy, along with 128GB of system memory (well, 128GB of memory plus swap space). Do you have a graphics card with 24GB of VRAM and 64GB of system memory? Then the 30 billion parameter model (opens in new tab) is only a 75.7 GiB download, and another 15.7 GiB for the 4-bit stuff. LLaMa-13b for example consists of 36.3 GiB download for the main data (opens in new tab), and then another 6.5 GiB for the pre-quantized 4-bit model (opens in new tab). Getting the models isn't too difficult at least, but they can be very large. (You'll also need a decent amount of system memory, 32GB or more most likely - that's what we used, at least.) Even better, loading the model with 4-bit precision halves the VRAM requirements yet again, allowing for LLaMa-13b to work on 10GB VRAM. Loading the model with 8-bit precision cuts the RAM requirements in half, meaning you could run LLaMa-7b with many of the best graphics cards - anything with at least 10GB VRAM could potentially suffice. That's a start, but very few home users are likely to have such a graphics card, and it runs quite poorly. Using the base models with 16-bit data, for example, the best you can do with an RTX 4090, RTX 3090 Ti, RTX 3090, or Titan RTX - cards that all have 24GB of VRAM - is to run the model with seven billion parameters (LLaMa-7b). A lot of the work to get things running on a single GPU (or a CPU) has focused on reducing the memory requirements. It might seem obvious, but let's also just get this out of the way: You'll need a GPU with a lot of memory, and probably a lot of system memory as well, should you want to run a large language model on your own hardware - it's right there in the name. You may also find some helpful people in the LMSys Discord (opens in new tab), who were good about helping me with some of my questions. We'll provide our version of instructions below for those who want to give this a shot on their own PCs. Sometimes you can get it working, other times you're presented with error messages and compiler warnings that you have no idea how to solve. It's like running Linux and only Linux, and then wondering how to play the latest games. And then the repository was updated and our instructions broke, but a workaround/fix was posted today. ![]() We encountered varying degrees of success/failure, but with some help from Nvidia and others, we finally got things working. There are the basic instructions in the readme, the one-click installers, and then multiple guides for how to build and run the LLaMa 4-bit models (opens in new tab). Getting the webui running wasn't quite as simple as we had hoped, in part due to how fast everything is moving within the LLM space. The oobabooga text generation webui (opens in new tab) might be just what you're after, so we ran some tests to find out what it could - and couldn't! - do, which means we also have some benchmarks. Also, all of your queries are taking place on ChatGPT's server, which means that you need Internet and that OpenAI can see what you're doing.įortunately, there are ways to run a ChatGPT-like LLM (Large Language Model) on your local PC, using the power of your GPU. But while it's free to talk with ChatGPT in theory, often you end up with messages about the system being at capacity, or hitting your maximum number of chats for the day, with a prompt to subscribe to ChatGPT Plus. ChatGPT can give some impressive results, and also sometimes some very poor advice.
0 Comments
Leave a Reply. |