Wander@yiffit.net to LocalLLaMA@sh.itjust.worksEnglish · 1 year agoWhat is better: higher quantiation or higher parameter count?message-squaremessage-square8fedilinkarrow-up113arrow-down10file-text
arrow-up113arrow-down1message-squareWhat is better: higher quantiation or higher parameter count?Wander@yiffit.net to LocalLLaMA@sh.itjust.worksEnglish · 1 year agomessage-square8fedilinkfile-text
For example, does a 13B parameter model at 2_K quantiation perform worse than a 7B parameter model at 8bit or 16bit?
minus-squarerufus@discuss.tchncs.delinkfedilinkEnglisharrow-up6·edit-21 year agohttps://github.com/ggerganov/llama.cpp#quantization https://github.com/ggerganov/llama.cpp/pull/1684 Regarding your question: 13B 2_K seems to be on par with 7B 16bit and 8bit. Not much of a difference between all those. (Look at the perplexity values. Lower is better.) The second link has a nice graph.
minus-squarenoneabove1182@sh.itjust.worksMlinkfedilinkEnglisharrow-up2·1 year agoThese are good sources, to add one more, the GPTQ paper talks a lot about perplexity at several quantization and model sizes: https://arxiv.org/abs/2210.17323
https://github.com/ggerganov/llama.cpp#quantization
https://github.com/ggerganov/llama.cpp/pull/1684
Regarding your question: 13B 2_K seems to be on par with 7B 16bit and 8bit. Not much of a difference between all those. (Look at the perplexity values. Lower is better.) The second link has a nice graph.
These are good sources, to add one more, the GPTQ paper talks a lot about perplexity at several quantization and model sizes:
https://arxiv.org/abs/2210.17323