ollama/llm
Quinn Slack f4432e1dba
treat stop as stop sequences, not exact tokens (#442)
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.
2023-08-30 11:53:42 -04:00
..
ggml-alloc.c update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-alloc.h update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-cuda.cu update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-cuda.h update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-metal.h update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-metal.m update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-metal.metal update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-mpi.c update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-mpi.h update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-opencl.cpp update llama.cpp 2023-08-14 15:47:00 -07:00
ggml-opencl.h update llama.cpp 2023-08-14 15:47:00 -07:00
ggml.c update llama.cpp 2023-08-14 15:47:00 -07:00
ggml.go add 34b model type 2023-08-24 10:35:44 -07:00
ggml.h update llama.cpp 2023-08-14 15:47:00 -07:00
k_quants.c update llama.cpp 2023-08-14 15:47:00 -07:00
k_quants.h update llama.cpp 2023-08-14 15:47:00 -07:00
llama-util.h update llama.cpp 2023-08-14 15:47:00 -07:00
llama.cpp add missing entries for 34B 2023-08-25 18:35:35 -07:00
llama.go treat stop as stop sequences, not exact tokens (#442) 2023-08-30 11:53:42 -04:00
llama.h update llama.cpp 2023-08-14 15:47:00 -07:00
llama_darwin.go close open files 2023-08-14 16:08:06 -07:00
llama_test.go treat stop as stop sequences, not exact tokens (#442) 2023-08-30 11:53:42 -04:00
llm.go allow F16 to use metal 2023-08-26 08:38:48 -07:00
update-llama-cpp.sh partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00
utils.go partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00