Thoughts on code, artificial intelligence, and the future of tech.
How to reduce latency and cost when deploying LLMs using quantization and efficient inference servers.