diff --git a/docs/faq.md b/docs/faq.md index ae865ee7..f43af868 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -131,3 +131,37 @@ This can impact both installing Ollama, as well as downloading models. Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`. Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these properties. + +## How can I pre-load a model to get faster response times? + +If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints. + +To preload the mistral model using the generate endpoint, use: +```shell +curl http://localhost:11434/api/generate -d '{"model": "mistral"}' +``` + +To use the chat completions endpoint, use: +```shell +curl http://localhost:11434/api/chat -d '{"model": "mistral"}' +``` + +## How do I keep a model loaded in memory or make it unload immediately? + +By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the `keep_alive` parameter with either the `/api/generate` and `/api/chat` API endpoints to control how long the model is left in memory. + +The `keep_alive` parameter can be set to: +* a duration string (such as "10m" or "24h") +* a number in seconds (such as 3600) +* any negative number which will keep the model loaded in memory (e.g. -1 or "-1m") +* '0' which will unload the model immediately after generating a response + +For example, to preload a model and leave it in memory use: +```shell +curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}' +``` + +To unload the model and free up memory use: +```shell +curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}' +``` \ No newline at end of file