What is better: higher quantiation or higher parameter count?

Wander@yiffit.net · 1 year ago

What is better: higher quantiation or higher parameter count?

rufus@discuss.tchncs.de · edit-2 1 year ago

https://github.com/ggerganov/llama.cpp#quantization

https://github.com/ggerganov/llama.cpp/pull/1684

Regarding your question: 13B 2_K seems to be on par with 7B 16bit and 8bit. Not much of a difference between all those. (Look at the perplexity values. Lower is better.) The second link has a nice graph.

noneabove1182@sh.itjust.works · 1 year ago

These are good sources, to add one more, the GPTQ paper talks a lot about perplexity at several quantization and model sizes:

https://arxiv.org/abs/2210.17323