ollama
# Ollama [![Discord](https://dcbadge.vercel.app/api/server/ollama?style=flat&compact=true)](https://discord.gg/ollama) Get up and running with large language models. ### Linux with rx580 Radeon GPU This branch is has had changes for building on amd64 architecture(arm has been commented out in the Docker file) so as Ollama works with rx590 Redeon GPU. It should be considered experimental. I've only been testing using the docker build. Make sure docker is installed and running ok, and the docker host machine has rocm 5.7.1 libraries installed. Follow this documentation for rocm installation, just substitute the 5.7.0 references to 5.7.1 in the documentation. --https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html To build ``` ./scripts/build_docker.sh ``` After that has compiled successfully Then to start a container using the image ``` docker run -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama_gpu ollama/release:0.3.10-rc1-2-g56318fb-dirty-rocm ``` But make sure to change the tag "0.3.10-rc1-2-g56318fb-dirty-rocm" to what gets built from your build process. This is shown in the last phase of the build where it exports the images. The debug info that gets output should look something like: ``` docker run -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama_gpu_2 ollama/release:3449201-rocm 2024/09/15 14:56:41 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-09-15T14:56:41.304Z level=INFO source=images.go:753 msg="total blobs: 18" time=2024-09-15T14:56:41.307Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" time=2024-09-15T14:56:41.307Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 3449201)" time=2024-09-15T14:56:41.308Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2706594826/runners time=2024-09-15T14:56:51.283Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v0 cpu]" time=2024-09-15T14:56:51.283Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs" time=2024-09-15T14:56:51.296Z level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-09-15T14:56:51.308Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx803 time=2024-09-15T14:56:51.308Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx803 driver=0.0 name=1002:67df total="8.0 GiB" available="8.0 GiB" [GIN] 2024/09/15 - 14:57:20 | 200 | 46.11µs | 127.0.0.1 | HEAD "/" [GIN] 2024/09/15 - 14:57:20 | 200 | 24.189203ms | 127.0.0.1 | POST "/api/show" ``` Once running, in another terminal window, test it out: ``` docker exec -it ollama_gpu ollama run llama3.1 ``` Checkout the debug log again, should look something like: ``` time=2024-09-15T14:57:20.500Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe gpu=0 parallel=4 available=8584495104 required="6.2 GiB" time=2024-09-15T14:57:20.500Z level=INFO source=server.go:101 msg="system memory" total="15.6 GiB" free="14.6 GiB" free_swap="46.5 GiB" time=2024-09-15T14:57:20.500Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-09-15T14:57:20.503Z level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama2706594826/runners/rocm_v0/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --parallel 4 --port 43843" time=2024-09-15T14:57:20.503Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2024-09-15T14:57:20.503Z level=INFO source=server.go:590 msg="waiting for llama runner to start responding" time=2024-09-15T14:57:20.503Z level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=3661 commit="8962422b" tid="126494289312832" timestamp=1726412240 INFO [main] system info | n_threads=4 n_threads_batch=4 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="126494289312832" timestamp=1726412240 total_threads=8 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="43843" tid="126494289312832" timestamp=1726412240 llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 8B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 32 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 4096 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 13: llama.attention.head_count u32 = 32 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... time=2024-09-15T14:57:21.006Z level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 66 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon RX 580 Series, compute capability 8.0, VMM: no llm_load_tensors: ggml ctx size = 0.27 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 4156.00 MiB llm_load_tensors: CPU buffer size = 281.81 MiB llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 2.02 MiB llama_new_context_with_model: ROCm0 compute buffer size = 560.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 24.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 2 INFO [main] model loaded | tid="126494289312832" timestamp=1726412253 time=2024-09-15T14:57:33.297Z level=INFO source=server.go:629 msg="llama runner started in 12.79 seconds" [GIN] 2024/09/15 - 14:57:33 | 200 | 12.853561919s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/09/15 - 14:57:43 | 200 | 1.091025241s | 127.0.0.1 | POST "/api/chat" ``` Goog luck! ### macOS [Download](https://ollama.com/download/Ollama-darwin.zip) ### Windows preview [Download](https://ollama.com/download/OllamaSetup.exe) ### Linux ``` curl -fsSL https://ollama.com/install.sh | sh ``` [Manual install instructions](https://github.com/ollama/ollama/blob/main/docs/linux.md) ### Docker The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `ollama/ollama` is available on Docker Hub. ### Libraries - [ollama-python](https://github.com/ollama/ollama-python) - [ollama-js](https://github.com/ollama/ollama-js) ## Quickstart To run and chat with [Llama 3.1](https://ollama.com/library/llama3.1): ``` ollama run llama3.1 ``` ## Model library Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library 'ollama model library') Here are some example models that can be downloaded: | Model | Parameters | Size | Download | | ------------------ | ---------- | ----- | ------------------------------ | | Llama 3.1 | 8B | 4.7GB | `ollama run llama3.1` | | Llama 3.1 | 70B | 40GB | `ollama run llama3.1:70b` | | Llama 3.1 | 405B | 231GB | `ollama run llama3.1:405b` | | Phi 3 Mini | 3.8B | 2.3GB | `ollama run phi3` | | Phi 3 Medium | 14B | 7.9GB | `ollama run phi3:medium` | | Gemma 2 | 2B | 1.6GB | `ollama run gemma2:2b` | | Gemma 2 | 9B | 5.5GB | `ollama run gemma2` | | Gemma 2 | 27B | 16GB | `ollama run gemma2:27b` | | Mistral | 7B | 4.1GB | `ollama run mistral` | | Moondream 2 | 1.4B | 829MB | `ollama run moondream` | | Neural Chat | 7B | 4.1GB | `ollama run neural-chat` | | Starling | 7B | 4.1GB | `ollama run starling-lm` | | Code Llama | 7B | 3.8GB | `ollama run codellama` | | Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` | | LLaVA | 7B | 4.5GB | `ollama run llava` | | Solar | 10.7B | 6.1GB | `ollama run solar` | > [!NOTE] > You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. ## Customize a model ### Import from GGUF Ollama supports importing GGUF models in the Modelfile: 1. Create a file named `Modelfile`, with a `FROM` instruction with the local filepath to the model you want to import. ``` FROM ./vicuna-33b.Q4_0.gguf ``` 2. Create the model in Ollama ``` ollama create example -f Modelfile ``` 3. Run the model ``` ollama run example ``` ### Import from PyTorch or Safetensors See the [guide](docs/import.md) on importing models for more information. ### Customize a prompt Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.1` model: ``` ollama pull llama3.1 ``` Create a `Modelfile`: ``` FROM llama3.1 # set the temperature to 1 [higher is more creative, lower is more coherent] PARAMETER temperature 1 # set the system message SYSTEM """ You are Mario from Super Mario Bros. Answer as Mario, the assistant, only. """ ``` Next, create and run the model: ``` ollama create mario -f ./Modelfile ollama run mario >>> hi Hello! It's your friend Mario. ``` For more examples, see the [examples](examples) directory. For more information on working with a Modelfile, see the [Modelfile](docs/modelfile.md) documentation. ## CLI Reference ### Create a model `ollama create` is used to create a model from a Modelfile. ``` ollama create mymodel -f ./Modelfile ``` ### Pull a model ``` ollama pull llama3.1 ``` > This command can also be used to update a local model. Only the diff will be pulled. ### Remove a model ``` ollama rm llama3.1 ``` ### Copy a model ``` ollama cp llama3.1 my-model ``` ### Multiline input For multiline input, you can wrap text with `"""`: ``` >>> """Hello, ... world! ... """ I'm a basic program that prints the famous "Hello, world!" message to the console. ``` ### Multimodal models ``` ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png" The image features a yellow smiley face, which is likely the central focus of the picture. ``` ### Pass the prompt as an argument ``` $ ollama run llama3.1 "Summarize this file: $(cat README.md)" Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. ``` ### Show model information ``` ollama show llama3.1 ``` ### List models on your computer ``` ollama list ``` ### Start Ollama `ollama serve` is used when you want to start ollama without running the desktop application. ## Building See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/development.md) ### Running local builds Next, start the server: ``` ./ollama serve ``` Finally, in a separate shell, run a model: ``` ./ollama run llama3.1 ``` ## REST API Ollama has a REST API for running and managing models. ### Generate a response ``` curl http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt":"Why is the sky blue?" }' ``` ### Chat with a model ``` curl http://localhost:11434/api/chat -d '{ "model": "llama3.1", "messages": [ { "role": "user", "content": "why is the sky blue?" } ] }' ``` See the [API documentation](./docs/api.md) for all endpoints. ## Community Integrations ### Web & Desktop - [Open WebUI](https://github.com/open-webui/open-webui) - [Enchanted (macOS native)](https://github.com/AugustDev/enchanted) - [Hollama](https://github.com/fmaclen/hollama) - [Lollms-Webui](https://github.com/ParisNeo/lollms-webui) - [LibreChat](https://github.com/danny-avila/LibreChat) - [Bionic GPT](https://github.com/bionic-gpt/bionic-gpt) - [HTML UI](https://github.com/rtcfirefly/ollama-ui) - [Saddle](https://github.com/jikkuatwork/saddle) - [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama) - [Chatbot UI v2](https://github.com/mckaywrigley/chatbot-ui) - [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file) - [Minimalistic React UI for Ollama Models](https://github.com/richawo/minimal-llm-ui) - [Ollamac](https://github.com/kevinhermawan/Ollamac) - [big-AGI](https://github.com/enricoros/big-AGI/blob/main/docs/config-local-ollama.md) - [Cheshire Cat assistant framework](https://github.com/cheshire-cat-ai/core) - [Amica](https://github.com/semperai/amica) - [chatd](https://github.com/BruceMacD/chatd) - [Ollama-SwiftUI](https://github.com/kghandour/Ollama-SwiftUI) - [Dify.AI](https://github.com/langgenius/dify) - [MindMac](https://mindmac.app) - [NextJS Web Interface for Ollama](https://github.com/jakobhoeg/nextjs-ollama-llm-ui) - [Msty](https://msty.app) - [Chatbox](https://github.com/Bin-Huang/Chatbox) - [WinForm Ollama Copilot](https://github.com/tgraupmann/WinForm_Ollama_Copilot) - [NextChat](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) with [Get Started Doc](https://docs.nextchat.dev/models/ollama) - [Alpaca WebUI](https://github.com/mmo80/alpaca-webui) - [OllamaGUI](https://github.com/enoch1118/ollamaGUI) - [OpenAOE](https://github.com/InternLM/OpenAOE) - [Odin Runes](https://github.com/leonid20000/OdinRunes) - [LLM-X](https://github.com/mrdjohnson/llm-x) (Progressive Web App) - [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm) - [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat) - [Ollama-chats RPG](https://github.com/drazdra/ollama-chats) - [QA-Pilot](https://github.com/reid41/QA-Pilot) (Chat with Code Repository) - [ChatOllama](https://github.com/sugarforever/chat-ollama) (Open Source Chatbot based on Ollama with Knowledge Bases) - [CRAG Ollama Chat](https://github.com/Nagi-ovo/CRAG-Ollama-Chat) (Simple Web Search with Corrective RAG) - [RAGFlow](https://github.com/infiniflow/ragflow) (Open-source Retrieval-Augmented Generation engine based on deep document understanding) - [StreamDeploy](https://github.com/StreamDeploy-DevRel/streamdeploy-llm-app-scaffold) (LLM Application Scaffold) - [chat](https://github.com/swuecho/chat) (chat web app for teams) - [Lobe Chat](https://github.com/lobehub/lobe-chat) with [Integrating Doc](https://lobehub.com/docs/self-hosting/examples/ollama) - [Ollama RAG Chatbot](https://github.com/datvodinh/rag-chatbot.git) (Local Chat with multiple PDFs using Ollama and RAG) - [BrainSoup](https://www.nurgo-software.com/products/brainsoup) (Flexible native client with RAG & multi-agent automation) - [macai](https://github.com/Renset/macai) (macOS client for Ollama, ChatGPT, and other compatible API back-ends) - [Olpaka](https://github.com/Otacon/olpaka) (User-friendly Flutter Web App for Ollama) - [OllamaSpring](https://github.com/CrazyNeil/OllamaSpring) (Ollama Client for macOS) - [LLocal.in](https://github.com/kartikm7/llocal) (Easy to use Electron Desktop Client for Ollama) - [AiLama](https://github.com/zeyoyt/ailama) (A Discord User App that allows you to interact with Ollama anywhere in discord ) - [Ollama with Google Mesop](https://github.com/rapidarchitect/ollama_mesop/) (Mesop Chat Client implementation with Ollama) - [Painting Droid](https://github.com/mateuszmigas/painting-droid) (Painting app with AI integrations) - [Kerlig AI](https://www.kerlig.com/) (AI writing assistant for macOS) - [AI Studio](https://github.com/MindWorkAI/AI-Studio) - [Sidellama](https://github.com/gyopak/sidellama) (browser-based LLM client) - [LLMStack](https://github.com/trypromptly/LLMStack) (No-code multi-agent framework to build LLM agents and workflows) - [BoltAI for Mac](https://boltai.com) (AI Chat Client for Mac) - [Harbor](https://github.com/av/harbor) (Containerized LLM Toolkit with Ollama as default backend) - [Go-CREW](https://www.jonathanhecl.com/go-crew/) (Powerful Offline RAG in Golang) - [PartCAD](https://github.com/openvmp/partcad/) (CAD model generation with OpenSCAD and CadQuery) - [Ollama4j Web UI](https://github.com/ollama4j/ollama4j-web-ui) - Java-based Web UI for Ollama built with Vaadin, Spring Boot and Ollama4j - [PyOllaMx](https://github.com/kspviswa/pyOllaMx) - macOS application capable of chatting with both Ollama and Apple MLX models. - [Claude Dev](https://github.com/saoudrizwan/claude-dev) - VSCode extension for multi-file/whole-repo coding - [Cherry Studio](https://github.com/kangfenmao/cherry-studio) (Desktop client with Ollama support) - [ConfiChat](https://github.com/1runeberg/confichat) (Lightweight, standalone, multi-platform, and privacy focused LLM chat interface with optional encryption) ### Terminal - [oterm](https://github.com/ggozad/oterm) - [Ellama Emacs client](https://github.com/s-kostyaev/ellama) - [Emacs client](https://github.com/zweifisch/ollama) - [gen.nvim](https://github.com/David-Kunz/gen.nvim) - [ollama.nvim](https://github.com/nomnivore/ollama.nvim) - [ollero.nvim](https://github.com/marco-souza/ollero.nvim) - [ollama-chat.nvim](https://github.com/gerazov/ollama-chat.nvim) - [ogpt.nvim](https://github.com/huynle/ogpt.nvim) - [gptel Emacs client](https://github.com/karthink/gptel) - [Oatmeal](https://github.com/dustinblackman/oatmeal) - [cmdh](https://github.com/pgibler/cmdh) - [ooo](https://github.com/npahlfer/ooo) - [shell-pilot](https://github.com/reid41/shell-pilot) - [tenere](https://github.com/pythops/tenere) - [llm-ollama](https://github.com/taketwo/llm-ollama) for [Datasette's LLM CLI](https://llm.datasette.io/en/stable/). - [typechat-cli](https://github.com/anaisbetts/typechat-cli) - [ShellOracle](https://github.com/djcopley/ShellOracle) - [tlm](https://github.com/yusufcanb/tlm) - [podman-ollama](https://github.com/ericcurtin/podman-ollama) - [gollama](https://github.com/sammcj/gollama) - [Ollama eBook Summary](https://github.com/cognitivetech/ollama-ebook-summary/) ### Apple Vision Pro - [Enchanted](https://github.com/AugustDev/enchanted) ### Database - [MindsDB](https://github.com/mindsdb/mindsdb/blob/staging/mindsdb/integrations/handlers/ollama_handler/README.md) (Connects Ollama models with nearly 200 data platforms and apps) - [chromem-go](https://github.com/philippgille/chromem-go/blob/v0.5.0/embed_ollama.go) with [example](https://github.com/philippgille/chromem-go/tree/v0.5.0/examples/rag-wikipedia-ollama) ### Package managers - [Pacman](https://archlinux.org/packages/extra/x86_64/ollama/) - [Gentoo](https://github.com/gentoo/guru/tree/master/app-misc/ollama) - [Helm Chart](https://artifacthub.io/packages/helm/ollama-helm/ollama) - [Guix channel](https://codeberg.org/tusharhero/ollama-guix) - [Nix package](https://search.nixos.org/packages?channel=24.05&show=ollama&from=0&size=50&sort=relevance&type=packages&query=ollama) - [Flox](https://flox.dev/blog/ollama-part-one) ### Libraries - [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa) - [Firebase Genkit](https://firebase.google.com/docs/genkit/plugins/ollama) - [LangChainGo](https://github.com/tmc/langchaingo/) with [example](https://github.com/tmc/langchaingo/tree/main/examples/ollama-completion-example) - [LangChain4j](https://github.com/langchain4j/langchain4j) with [example](https://github.com/langchain4j/langchain4j-examples/tree/main/ollama-examples/src/main/java) - [LangChainRust](https://github.com/Abraxas-365/langchain-rust) with [example](https://github.com/Abraxas-365/langchain-rust/blob/main/examples/llm_ollama.rs) - [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/ollama.html) - [LiteLLM](https://github.com/BerriAI/litellm) - [OllamaFarm for Go](https://github.com/presbrey/ollamafarm) - [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp) - [Ollama for Ruby](https://github.com/gbaptista/ollama-ai) - [Ollama-rs for Rust](https://github.com/pepperoni21/ollama-rs) - [Ollama-hpp for C++](https://github.com/jmont-dev/ollama-hpp) - [Ollama4j for Java](https://github.com/ollama4j/ollama4j) - [ModelFusion Typescript Library](https://modelfusion.dev/integration/model-provider/ollama) - [OllamaKit for Swift](https://github.com/kevinhermawan/OllamaKit) - [Ollama for Dart](https://github.com/breitburg/dart-ollama) - [Ollama for Laravel](https://github.com/cloudstudio/ollama-laravel) - [LangChainDart](https://github.com/davidmigloz/langchain_dart) - [Semantic Kernel - Python](https://github.com/microsoft/semantic-kernel/tree/main/python/semantic_kernel/connectors/ai/ollama) - [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md) - [Elixir LangChain](https://github.com/brainlid/langchain) - [Ollama for R - rollama](https://github.com/JBGruber/rollama) - [Ollama for R - ollama-r](https://github.com/hauselin/ollama-r) - [Ollama-ex for Elixir](https://github.com/lebrunel/ollama-ex) - [Ollama Connector for SAP ABAP](https://github.com/b-tocs/abap_btocs_ollama) - [Testcontainers](https://testcontainers.com/modules/ollama/) - [Portkey](https://portkey.ai/docs/welcome/integration-guides/ollama) - [PromptingTools.jl](https://github.com/svilupp/PromptingTools.jl) with an [example](https://svilupp.github.io/PromptingTools.jl/dev/examples/working_with_ollama) - [LlamaScript](https://github.com/Project-Llama/llamascript) - [Gollm](https://docs.gollm.co/examples/ollama-example) - [Ollamaclient for Golang](https://github.com/xyproto/ollamaclient) - [High-level function abstraction in Go](https://gitlab.com/tozd/go/fun) - [Ollama PHP](https://github.com/ArdaGnsrn/ollama-php) ### Mobile - [Enchanted](https://github.com/AugustDev/enchanted) - [Maid](https://github.com/Mobile-Artificial-Intelligence/maid) - [ConfiChat](https://github.com/1runeberg/confichat) (Lightweight, standalone, multi-platform, and privacy focused LLM chat interface with optional encryption) ### Extensions & Plugins - [Raycast extension](https://github.com/MassimilianoPasquini97/raycast_ollama) - [Discollama](https://github.com/mxyng/discollama) (Discord bot inside the Ollama discord channel) - [Continue](https://github.com/continuedev/continue) - [Obsidian Ollama plugin](https://github.com/hinterdupfinger/obsidian-ollama) - [Logseq Ollama plugin](https://github.com/omagdy7/ollama-logseq) - [NotesOllama](https://github.com/andersrex/notesollama) (Apple Notes Ollama plugin) - [Dagger Chatbot](https://github.com/samalba/dagger-chatbot) - [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot) - [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram) - [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation) - [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama) - [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot) - [Cliobot](https://github.com/herval/cliobot) (Telegram bot with Ollama support) - [Copilot for Obsidian plugin](https://github.com/logancyang/obsidian-copilot) - [Obsidian Local GPT plugin](https://github.com/pfrankov/obsidian-local-gpt) - [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama) - [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama) - [Ollama Copilot](https://github.com/bernardo-bruning/ollama-copilot) (Proxy that allows you to use ollama as a copilot like Github copilot) - [twinny](https://github.com/rjmacarthy/twinny) (Copilot and Copilot chat alternative using Ollama) - [Wingman-AI](https://github.com/RussellCanfield/wingman-ai) (Copilot code and chat alternative using Ollama and Hugging Face) - [Page Assist](https://github.com/n4ze3m/page-assist) (Chrome Extension) - [AI Telegram Bot](https://github.com/tusharhero/aitelegrambot) (Telegram bot using Ollama in backend) - [AI ST Completion](https://github.com/yaroslavyaroslav/OpenAI-sublime-text) (Sublime Text 4 AI assistant plugin with Ollama support) - [Discord-Ollama Chat Bot](https://github.com/kevinthedang/discord-ollama) (Generalized TypeScript Discord Bot w/ Tuning Documentation) - [Discord AI chat/moderation bot](https://github.com/rapmd73/Companion) Chat/moderation bot written in python. Uses Ollama to create personalities. - [Headless Ollama](https://github.com/nischalj10/headless-ollama) (Scripts to automatically install ollama client & models on any OS for apps that depends on ollama server) - [vnc-lm](https://github.com/jk011ru/vnc-lm) (A containerized Discord bot with support for attachments and web links) - [LSP-AI](https://github.com/SilasMarvin/lsp-ai) (Open-source language server for AI-powered functionality) ### Supported backends - [llama.cpp](https://github.com/ggerganov/llama.cpp) project founded by Georgi Gerganov.