Quantization

A technique that reduces model size and speeds up inference by representing weights with lower-precision numbers (e.g., 4-bit instead of 16-bit). Quantization makes it feasible to run large models on consumer hardware with modest accuracy trade-offs.