Quantization

Description

The weights of an LLM are numeric values with a given precision, which can be expressed by the number of bits like float64 or float32. If we lower the amount of bits to represent a value, we get a less accurate result. However, if we lower the number of bits we also lower the memory requirements of that model.

Info

Notice the lowered accuracy when we halve the number of bits.