Commit graph

58 commits

Author SHA1 Message Date
Michael Yang b25dd1795d allow F16 to use metal
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang 304f2b6c96 add 34b to mem check 2023-08-26 08:29:21 -07:00
Michael Yang a894cc792d model and file type as strings 2023-08-17 12:08:04 -07:00
Michael Yang e26085b921 close open files 2023-08-14 16:08:06 -07:00
Michael Yang 6de5d032e1 implement loading ggml lora adapters through the modelfile 2023-08-10 09:23:39 -07:00
Michael Yang d791df75dd check memory requirements before loading 2023-08-10 09:23:11 -07:00
Michael Yang 020a3b3530 disable gpu for q5_0, q5_1, q8_0 quants 2023-08-10 09:23:11 -07:00
Michael Yang fccf8d179f partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00